Mid-Stream Memory: Injecting Context at Execution Time

Imagine an employee who gets reminded of relevant company policies exactly when making decisions, not just at the start of the day.

At 9am, you brief them: "Here are our quality standards, client preferences, and the project scope." Perfect. By 3pm, they're making a critical decision—but the morning briefing has faded. They remember the general direction but not the specific constraint that matters right now.

Now imagine a different employee. Every time they're about to make a decision, a helpful colleague appears: "Hey, remember the Johnson incident? Here's what you need to know for this exact situation." That's memory injection.

You start a complex task with a clear goal. Your AI assistant knows your preferences, your project conventions, your stakeholder requirements. Turn 1 goes perfectly. By turn 20, the AI has forgotten half of what made turn 1 successful.

What happened?

The memories you injected at the beginning are still in context. They haven't disappeared. But they're now buried under 19 turns of conversation, research outputs, and evolving requirements. The AI's attention budget (Lesson 2) means those turn 1 memories are receiving only a fraction of the processing power they need. More importantly, those memories were selected for turn 1's intent. Turn 20 has different intent. The memories that would help turn 20 are sitting unused in your memory store.

This is workflow drift. And fixing it requires a fundamental shift in when and how you inject context.

The Workflow Drift Problem

Semantic memory injection typically happens at prompt submission. You type a request. A hook queries your memory store for relevant context. The results are injected into the AI's context. Then the AI processes your request.

This works well for single-turn interactions. But multi-turn workflows create a problem:

Consider a legal professional reviewing a complex contract:

Turn 1: You ask the AI to review vendor agreement terms. The memory hook finds relevant memories about your firm's standard terms, this client's risk tolerance, and contract negotiation history. Perfect match.

Turn 5: The AI has identified a problematic indemnification clause. It's now focused on liability allocation. Your general contract memories are still there, but they're less relevant than memories about indemnification precedents and insurance requirements.

Turn 12: The AI is drafting alternative language for the dispute resolution section. It needs memories about this client's arbitration preferences and past dispute history. The indemnification memories from turn 5 are noise.

Turn 20: The AI is preparing a summary memo for the partner. The drafting memories from turn 12 are now irrelevant. Client communication preferences would help, but they were never injected.

Or consider a marketing strategist developing a campaign:

Turn 1: You ask the AI to develop Q4 campaign strategy. The memory hook finds brand voice guidelines, audience demographics, and budget parameters. Perfect match.

Turn 5: The AI has pivoted to channel strategy. It's evaluating media mix options. Your brand voice memories are still there, but they're less relevant than memories about channel performance history and media costs.

Turn 12: The AI is writing creative briefs for each channel. It needs memories about past creative that worked with this audience, design constraints, and production timelines. The channel strategy memories from turn 5 are noise.

Turn 20: The AI is forecasting campaign ROI for leadership approval. The creative brief memories from turn 12 are now irrelevant. CFO preferences for financial presentations would help, but they were never injected.

Each turn, the AI's actual needs drift further from the context you provided at the start. The memories you injected were correct for turn 1. They're wrong for turn 20.

Two Injection Timing Strategies

Claude Code's hook system offers two points where you can inject context:

Hook	When It Fires	Best For
UserPromptSubmit	When user submits a prompt	Initial context, session setup, one-shot queries
PreToolUse	Before each tool execution	Ongoing relevance, multi-step workflows, execution-time context

UserPromptSubmit happens once per user message. It's synchronous with your input. The memories it injects reflect what you asked for at that moment.

PreToolUse happens potentially many times per user message. Each time the AI is about to use a tool—reading a document, searching files, editing content—this hook fires. That means you get multiple opportunities to inject relevant context throughout the workflow.

The key insight: The AI's thinking evolves during the reasoning process. By turn 20, the AI's thinking block contains intent and reasoning about what it's about to do next. That thinking is the perfect query for semantic memory.

PreToolUse Memory Injection Architecture

Here's the flow:

1. User submits prompt
2. AI reasons about the request → creates thinking block
3. AI decides to take an action (read file, search, edit document)
4. PreToolUse hook fires (synchronous)
   └── Hook receives transcript_path (file path, not thinking directly)
   └── Hook reads the transcript file from disk
   └── Hook parses JSONL to extract thinking blocks
   └── Hook embeds the thinking text (~100-300ms for embeddings)
   └── Hook queries vector database for similar memories (~5-50ms)
   └── Hook returns JSON with hookSpecificOutput containing memories
5. AI receives the injected memories via additionalContext
6. AI continues reasoning with fresh, relevant context
7. Action executes
8. AI reasons about results → new thinking block
9. AI decides to take another action
10. PreToolUse fires again...

Important implementation detail: The PreToolUse hook does NOT receive thinking blocks directly. It receives a transcript_path input pointing to a JSONL file on disk. Your hook must:

Read the file at transcript_path
Parse each line as JSON
Extract thinking content from the parsed objects
This I/O adds latency (typically 10-50ms depending on transcript size)

Why this works: The AI's thinking block contains what it is about to do and why. When you embed that thinking and search for similar memories, you find memories that are relevant to the current action, not the original prompt.

Legal example: Turn 20's thinking might be: "I need to summarize the key risks in this agreement for the partner review memo. The main concerns are the indemnification carve-outs and the ambiguous termination provisions. I should highlight these in language appropriate for partner consumption."

Embedding that thinking and searching your memory store finds:

Partner memo formatting preferences
This partner's risk communication style
Previous memos on similar contract issues

These memories are exactly what turn 20 needs. They would never have been selected at turn 1 when the thinking was about reviewing contract terms.

Marketing example: Turn 20's thinking might be: "I need to forecast ROI for leadership approval. The campaign costs total $450K across channels. I should present this using the format the CFO prefers."

Embedding that thinking finds:

CFO presentation preferences
ROI calculation standards your company uses
Past campaign performance benchmarks

Again—exactly what turn 20 needs, not what turn 1 needed.

Why Thinking Blocks Matter

The AI's thinking isn't just internal monologue. It's structured reasoning that reveals:

Current intent: What the AI is trying to accomplish right now
Decision context: Why the AI chose this approach
Constraints remembered: What limitations the AI is working within
Assumptions made: What the AI believes to be true

This makes thinking blocks the ideal query for semantic search. They contain the dense, specific context about what the AI needs to know.

Compare:

User prompt (turn 1): "Review the vendor agreement"

Broad, could match many memories
Matches general contract memories
Doesn't indicate current state

Thinking block (turn 20): "I'm preparing the partner memo summarizing key contract risks. The main issues are the indemnification carve-outs in Section 7.2 and the ambiguous termination language in Section 12.1. I should use the format the partner prefers for risk summaries."

Specific, narrow match space
Matches memo writing memories, risk communication preferences
Indicates exactly what the AI needs to know

The specificity of thinking blocks produces more relevant memory retrievals.

Building Your Memory Corpus

A memory injection system is only as good as its memories. What should you store?

High-value memories across domains:

Domain	High-Value Memories
Legal	Precedent patterns, client risk preferences, judge tendencies, clause variations that have worked
Marketing	Brand voice rules, audience insights, successful campaign patterns, stakeholder presentation styles
Research	Citation standards, methodology conventions, reviewer preferences, field-specific writing patterns
Business	Process templates, stakeholder communication styles, industry conventions, decision-making patterns
Operations	Vendor preferences, compliance requirements, approval workflows, exception handling patterns
Software	Error handling patterns, API conventions, architectural decisions, testing strategies

Lower-value memories (across all domains):

Standard practices the AI already knows
Obvious patterns the AI can infer from documents
One-time decisions that won't recur
Information that changes frequently

Memory structure example (Legal):

# Memory: Client Risk Communication for Acme Corp

## Summary

Acme Corp's General Counsel prefers risk summaries that lead with business impact,
not legal analysis. Always quantify exposure when possible.

## Context

After the Henderson dispute, their GC restructured how legal communicates risk
to the board. They now require dollar ranges, not just "material exposure."

## Pattern

GOOD: "This clause exposes us to $500K-$2M in potential liability if the vendor
fails to deliver. We recommend adding a cap at $750K."

NOT: "The indemnification provisions create significant exposure that should
be addressed through additional limitations."

## When to Apply

Any client communication about contract risk. Any memo going to their GC.

Memory structure example (Marketing):

# Memory: CFO Presentation Format

## Summary

CFO requires all campaign proposals to lead with ROI projection and
payback period before any creative discussion.

## Context

After Q2 2024 campaign exceeded budget without clear ROI tracking, CFO
mandated new financial-first presentation format.

## Pattern

GOOD: "This $450K campaign projects 3.2x ROI with 4-month payback, based on
comparable Q3 2023 campaign performance. Here's the channel breakdown..."

NOT: "We've developed an exciting creative concept that will resonate with
our target demographic. The budget is $450K."

## When to Apply

Any campaign proposal going to leadership. Any budget approval request.

Memory structure example (Research):

# Memory: Journal of Applied Psychology Submission Standards

## Summary

JAP requires APA 7th edition with DOI links. Reviewers in this journal
expect power analysis justification and effect size reporting.

## Context

Our lab's 2023 submission was desk-rejected for insufficient power analysis
documentation despite strong methodology otherwise.

## Pattern

GOOD: "Power analysis using G\*Power indicated a required sample of n=120
to detect medium effects (d=0.50) with 80% power at alpha=.05. Our sample
of n=147 exceeded this threshold."

NOT: "The sample size was adequate for the planned analyses."

## When to Apply

Any submission to JAP or similar APA journals in organizational psychology.

Start with 10-20 memories. You don't need a massive corpus. A focused collection of genuinely useful memories outperforms a large collection of noise.

Combining Both Injection Strategies

PreToolUse doesn't replace UserPromptSubmit. They serve different purposes:

Strategy	Use When
UserPromptSubmit	Session initialization, user preferences, project-wide context
PreToolUse	Task-specific memories, execution-time relevance, long workflows

A robust system uses both:

UserPromptSubmit injects baseline context: project conventions, user preferences, active working context
PreToolUse injects execution-time context: memories relevant to what the AI is doing right now

This layered approach provides both stability (consistent baseline) and adaptability (evolving relevance).

Lab: Understanding Memory Injection

Objective: Understand how workflow drift affects your work and design a memory corpus that maintains relevance throughout multi-step tasks.

Duration: 90 minutes

Choose your path:

Conceptual Track (60 min): Design memory corpus and understand injection timing
Technical Track (additional 60 min): Implement a working memory injection hook

Part 1: Workflow Drift Analysis (All Participants, 30 min)

Step 1: Map Your Own Workflow Drift

Think about a recent complex task you worked on with AI assistance. This could be:

Legal: Reviewing a complex agreement and preparing a client memo
Marketing: Developing a campaign from strategy through creative briefs
Research: Moving from literature review through methodology design
Business: Analyzing a process and developing recommendations
Operations: Troubleshooting an issue across multiple systems

Write down the major turns in that workflow:

Turn 1: What you started with
Turn 5: How the focus shifted
Turn 10: What you were working on at this point
Turn 15: How the work evolved further
Turn 20: What you ended up doing

Step 2: Identify the Context Gaps

For each turn, answer:

What context would have been most helpful at this turn?
How different is that from what would have helped at Turn 1?
What memories were relevant earlier but are now noise?
What memories would you wish the AI had access to?

Step 3: Quantify the Drift

Create a simple drift score for your workflow:

Turn	Relevance of Turn 1 Context	New Context Needed
1	100%	(baseline)
5	?%	(list new needs)
10	?%	(list new needs)
15	?%	(list new needs)
20	?%	(list new needs)

This exercise makes workflow drift concrete and visible.

Part 2: Memory Corpus Design (All Participants, 30 min)

Step 1: Brainstorm High-Value Memories

Based on your workflow analysis, list 15-20 memories that would have helped at different points. Don't worry about format yet—just capture the knowledge.

Step 2: Categorize by Relevance Pattern

Group your memories:

Always Relevant (inject at UserPromptSubmit):

General preferences that apply throughout
Project-wide conventions
Stakeholder communication styles

Situationally Relevant (inject at PreToolUse):

Task-specific patterns
Stage-specific knowledge
Context-dependent rules

Step 3: Structure 5-7 Key Memories

Pick your highest-value situationally relevant memories and structure them:

# Memory: [Clear, searchable title]

## Summary

[One paragraph: what this memory captures]

## Context

[Why this became important—the history]

## Pattern

[Concrete example of good practice]
[Contrast with what to avoid]

## When to Apply

[Trigger conditions—what thinking would retrieve this]

Deliverable (Conceptual Track): A memory corpus document with 5-7 well-structured memories for your domain.

Part 3: Simple Hook Implementation (Optional, 20 min)

For those comfortable with Python, here's a minimal memory injection hook. No vector databases or embeddings—just read a markdown file and inject it.

Step 1: Create Your Memory File

Create memories.md in your project:

# Project Memories

## Client Preferences

- Acme Corp's GC prefers risk summaries with dollar ranges, not vague language
- Always lead with business impact before legal analysis

## Writing Standards

- Use active voice in all client communications
- Keep executive summaries under 200 words

## Past Decisions

- We chose JWT over sessions for the auth system (see ADR-003)
- The billing module uses Stripe, not custom implementation

Step 2: Create the Hook Script

Create memory_hook.py:

#!/usr/bin/env python3
"""Simple memory injection hook - reads a markdown file and injects it."""
import json
import sys
from pathlib import Path

def main():
    # Read hook input from stdin
    hook_input = json.loads(sys.stdin.read())

    # Find memory file (adjust path as needed)
    memory_file = Path.home() / "project" / "memories.md"

    if not memory_file.exists():
        print("{}")  # No injection if file missing
        return

    memories = memory_file.read_text()

    # Return properly formatted hook response
    result = {
        "hookSpecificOutput": {
            "hookEventName": "PreToolUse",
            "additionalContext": f"## Relevant Context\n\n{memories}"
        }
    }

    print(json.dumps(result))

if __name__ == "__main__":
    main()

Step 3: Register the Hook

Add to your .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "command": "python3 /path/to/memory_hook.py"
      }
    ]
  }
}

Step 4: Test It

Run any Claude Code session. Before each tool use, your memories get injected into context.

That's it. ~25 lines of Python. No dependencies beyond the standard library.

Going Further

Want smarter injection? Here are paths to explore:

Enhancement	Approach
Keyword matching	Check if tool name or recent conversation contains keywords before injecting
Multiple memory files	Different files for different project areas
Semantic search	Add ChromaDB + sentence-transformers for similarity matching
Tool-specific memories	Different memories for Read vs Edit vs Bash

The simple version works. Start there. Add complexity only when you need it.

Assessment Criteria

Conceptual Understanding (All Participants):

Can explain workflow drift with domain-specific examples
Can distinguish UserPromptSubmit from PreToolUse timing
Has designed a coherent memory corpus for their domain

Technical Implementation (Optional):

Hook script runs without errors
Hook returns proper JSON with hookSpecificOutput format
Memory file is structured and readable
Hook is registered in settings.json

Common Issues and Solutions

Problem: Memories too generic

Solution: Make memories specific. Instead of "communicate clearly with clients," write "Acme Corp's GC requires dollar ranges in risk summaries—never use vague terms like 'significant exposure.'"

Problem: Too much context injected

Solution: Keep your memory file focused. If it's over 500 words, you're probably injecting noise. Split into multiple files and inject selectively.

Problem: Hook not firing

Solution: Check that your hook is registered in .claude/settings.json and the script path is correct. Test the script standalone first: echo '{}' | python3 memory_hook.py

Try With AI

Prompt 1: Diagnose Your Workflow Drift

I'm going to describe a multi-step workflow from my field. For each step, tell me:
1. What context would be most helpful at this step
2. How different that is from step 1's context
3. What memories you wish you had access to

My field: [legal/marketing/research/business/operations/other]

The workflow:
Step 1: [Describe your starting point]
Step 5: [How the work evolved]
Step 10: [Current focus at this point]
Step 15: [Further evolution]
Step 20: [What you ended up doing]

Show me how context needs evolve across this workflow.

What you're learning: Workflow drift is invisible until you examine it explicitly. This prompt makes the drift concrete by tracking context needs across steps in YOUR domain. You're learning to anticipate where prompt-time injection will fail.

Prompt 2: Design Your Memory Corpus

I work in [describe your field and typical projects in 2-3 sentences].

Help me design a memory corpus of 10-15 memories that would be most valuable
for semantic injection during my workflows. For each memory, specify:
1. Title (searchable identifier)
2. Core content (what AI needs to know)
3. When it's relevant (what thinking would trigger retrieval)

Prioritize memories that:
- Contain non-obvious conventions or preferences
- Apply frequently during my work
- Would be hard for AI to infer from documents alone

Avoid memories that:
- State standard professional practices
- Would match too many different contexts
- Contain information that changes frequently

What you're learning: Memory corpus design is intentional, not exhaustive. This prompt trains you to identify high-value memories for YOUR domain. You're learning the difference between memories that genuinely help versus documentation that happens to exist.

Prompt 3: Evaluate Injection Timing

For each of these knowledge types from my field, recommend the injection strategy
(UserPromptSubmit, PreToolUse, both, or neither) and explain why:

My general writing style preferences
The specific project I'm working on today
Client/stakeholder communication preferences for specific recipients
Standard conventions in my field
Historical context about why decisions were made
Current deadlines and time constraints
My organization's approval workflows
Quality standards and review criteria

Format as a table with columns: Knowledge Type | Strategy | Reasoning

What you're learning: Not all context needs the same injection strategy. Some context is stable across the entire session. Some evolves with the workflow. This prompt builds intuition for matching context to injection timing in your professional domain.

The Workflow Drift Problem​

Two Injection Timing Strategies​

PreToolUse Memory Injection Architecture​

Why Thinking Blocks Matter​

Building Your Memory Corpus​

Combining Both Injection Strategies​

Lab: Understanding Memory Injection​

Part 1: Workflow Drift Analysis (All Participants, 30 min)​

Part 2: Memory Corpus Design (All Participants, 30 min)​

Part 3: Simple Hook Implementation (Optional, 20 min)​

Step 1: Create Your Memory File​

Step 2: Create the Hook Script​

Step 3: Register the Hook​

Step 4: Test It​

Going Further​

Assessment Criteria​

Common Issues and Solutions​

Try With AI​

Prompt 1: Diagnose Your Workflow Drift​

Prompt 2: Design Your Memory Corpus​

Prompt 3: Evaluate Injection Timing​