Production Patterns: Hosting, Sandbox, and Compaction
Your agent works flawlessly in testing. You've built skills, integrated tools, managed sessions. Now: production. Where uptime matters. Where cost matters. Where security matters.
The architecture of production agents differs fundamentally from development prototypes. Testing runs one agent once. Production runs thousands of agents continuously, each with different security requirements, cost constraints, and memory needs.
This lesson teaches you to design production-grade agents by choosing hosting patterns that match your economics, configuring security boundaries that match your threat model, and managing memory constraints that keep costs sustainable.
The Production Architecture Decision
Before implementing, you face a foundational architectural choice: How long should your agent live?
Three fundamental patterns exist:
| Pattern | Container Lifetime | State Preservation | Cost Model | Best For |
|---|---|---|---|---|
| Ephemeral | Minutes to hours | Discarded after task | Per-request | Bug fixes, one-off processing, stateless tasks |
| Long-Running | Days to months | Persistent across sessions | Subscription/monthly | Email agents, chatbots, customer support |
| Hybrid | Hours to weeks | Checkpoint-based persistence | Tiered (base + overage) | Project managers, research assistants, domain experts |
Each pattern answers a different production problem. Choosing wrong wastes money. Choosing right makes your Digital FTE profitable.
Pattern 1: Ephemeral Agents
The question: "When would you want to discard agent state after each task?"
An ephemeral agent spins up for one task, completes it, shuts down. No memory persists. No session carries forward. The next request starts fresh.
Real-world example: Your customer reports a bug in their API integration. You spin up an ephemeral agent to:
- Reproduce the bug
- Analyze logs
- Suggest fix
- Terminate
The agent never needs to remember its analysis next week. The context is task-specific.
Ephemeral Architecture
from claude_agent_sdk import query, ClaudeAgentOptions
async def fix_customer_bug(customer_id, bug_report):
"""Ephemeral agent: One request, zero persistence"""
options = ClaudeAgentOptions(
allowed_tools=["Read", "Bash", "Browser"],
# No resume parameter—new session every time
max_turns=5, # Bounded execution
model="claude-opus-4-5-20251101" # Full power for one-off problem solving
)
prompt = f"""
Customer {customer_id} reported: {bug_report}
1. Reproduce the issue
2. Analyze logs to find root cause
3. Suggest a fix
4. Output: [Reproduction steps] [Root cause] [Fix]
"""
analysis = []
async for message in query(prompt=prompt, options=options):
if hasattr(message, 'content'):
analysis.append(message.content)
# Agent terminates here. No session saved. No memory persists.
return "\n".join(analysis)
# Usage: One call per bug report. Fresh agent each time.
result = await fix_customer_bug(
customer_id="acme-corp",
bug_report="API returns 500 on concurrent requests"
)
print(result)
Output:
[Agent reproduces bug by sending concurrent requests...]
Root cause: Race condition in database transaction handling
Fix: Add transaction lock before updating inventory...
When Ephemeral Works
✓ Stateless workloads (translate text, generate reports) ✓ One-off problems (debug this error, analyze this file) ✓ Cost-conscious scenarios (spin up when needed, terminate) ✓ Security isolation (each task runs in clean environment)
When Ephemeral Fails
✗ Workflows needing context (customer support remembering past tickets) ✗ Long-running projects (research agent needing weeks of exploration) ✗ Relational tasks (agent coordinating across multiple requests)
Pattern 2: Long-Running Agents
The question: "What advantages does session persistence give?"
A long-running agent runs continuously or semi-continuously. A session persists for days or months. The agent accumulates knowledge, learns customer preferences, maintains context.
Real-world example: Your customer has an email inbox of unresolved issues. They want an agent that:
- Checks email daily
- Remembers previous conversations
- Maintains running state of each ticket
- Never forgets a customer's constraints
That's a long-running agent.
Long-Running Architecture
from claude_agent_sdk import query, ClaudeAgentOptions
import json
class EmailDigestAgent:
def __init__(self, customer_id, session_id=None):
self.customer_id = customer_id
self.session_id = session_id # Resumed if provided
async def process_daily(self):
"""Run once daily. Accumulates context over time."""
options = ClaudeAgentOptions(
allowed_tools=["Email", "Slack", "Database"],
resume=self.session_id, # Resume existing context
max_turns=10 # More turns for complex decision-making
)
prompt = """
Review new emails since yesterday.
For each email:
1. Assess urgency (critical/high/normal)
2. Check if we've handled similar issues before
3. Draft response incorporating our past solutions
4. Flag any new patterns you notice
Remember: You've been monitoring this inbox for 3 months.
Use that context to recognize patterns.
"""
messages = []
async for message in query(prompt=prompt, options=options):
# Capture session ID on first run
if hasattr(message, 'subtype') and message.subtype == 'init':
self.session_id = message.session_id
if hasattr(message, 'content'):
messages.append(message.content)
return {
"session_id": self.session_id,
"analysis": "\n".join(messages)
}
# Usage: Daily trigger that persists state
async def run_email_digest():
# Retrieve agent's session from database
stored_session_id = db.get_session_id("acme-corp")
agent = EmailDigestAgent(
customer_id="acme-corp",
session_id=stored_session_id
)
result = await agent.process_daily()
# Save session for tomorrow
db.save_session_id("acme-corp", result["session_id"])
return result["analysis"]
# Runs every day at 9am. Agent remembers everything from day 1.
schedule.every().day.at("09:00").do(run_email_digest)
Output (Day 1):
No history. Analyzing 5 new emails...
[Agent processes emails, learns patterns...]
Session created for future reference.
Output (Day 5):
Session resumed. I remember 4 previous customers and their preferences...
[Agent recognizes patterns, applies past solutions...]
I notice 3 customers are asking about the same feature—recommend to product team.
When Long-Running Works
✓ Customer-facing products (chatbots, support agents) ✓ Relationship workflows (remembering preferences, history) ✓ Learning-based tasks (agent improves with experience) ✓ Subscription models (persistence justifies recurring cost)
When Long-Running Fails
✗ High-security environments (persistent memory is liability) ✗ Seasonal tasks (agent sits idle most of year) ✗ Tight cost budgets (context grows indefinitely)
Pattern 3: Hybrid Agents
The question: "How can you get benefits of both?"
A hybrid agent combines ephemeral and long-running. It runs ephemeral agents for independent tasks but checkpoints critical context back to a long-running orchestrator.
Real-world example: A research agent exploring a 100-page legal document.
- Main session (long-running): Orchestrates research, remembers conclusions
- Analysis sessions (ephemeral): Spin up for each section, analyze independently
- Checkpoint: Save key findings back to main session
Hybrid Architecture
from claude_agent_sdk import query, ClaudeAgentOptions
class ResearchOrchestrator:
def __init__(self, document_id, session_id=None):
self.document_id = document_id
self.session_id = session_id
self.findings = [] # Accumulated across ephemeral analyses
async def analyze_section(self, section_text):
"""Ephemeral agent: Analyze one section independently"""
options = ClaudeAgentOptions(
allowed_tools=["Read"],
model="claude-haiku-4-5-20251001" # Cheaper for isolated analysis
# No resume—new session for this section
)
prompt = f"Analyze this section. Extract key claims, evidence, conclusions:\n{section_text}"
findings = []
async for message in query(prompt=prompt, options=options):
if hasattr(message, 'content'):
findings.append(message.content)
return "\n".join(findings)
async def checkpoint_findings(self, section_findings):
"""Long-running session: Accumulate knowledge"""
options = ClaudeAgentOptions(
allowed_tools=["Database"],
resume=self.session_id, # Resume main context
model="claude-opus-4-5-20251101" # Full reasoning
)
prompt = f"""
New findings from section analysis:
{section_findings}
1. Integrate with previous findings
2. Identify patterns or contradictions
3. Update our overall assessment
"""
async for message in query(prompt=prompt, options=options):
if hasattr(message, 'subtype') and message.subtype == 'init':
self.session_id = message.session_id
if hasattr(message, 'content'):
self.findings.append(message.content)
async def research_document(self, sections):
"""Hybrid workflow: Ephemeral analysis + long-running synthesis"""
for i, section in enumerate(sections):
# Ephemeral: Analyze this section independently (cheap)
section_findings = await self.analyze_section(section)
# Long-running: Integrate findings (expensive but essential)
await self.checkpoint_findings(section_findings)
return self.findings
# Usage: Process 10-section document efficiently
async def research_report():
orchestrator = ResearchOrchestrator(document_id="report-123")
sections = [
"Section 1: Executive Summary...",
"Section 2: Background...",
"Section 3: Methodology...",
# ... 10 sections total
]
results = await orchestrator.research_document(sections)
return results
# Cost profile: Cheaper analysis (Haiku) + expensive synthesis (Opus)
# = balanced cost for complex projects
Output:
Analyzing Section 1 (ephemeral, Haiku)...
Found: 5 claims, 3 evidence pieces
Checkpointing to main session...
Analyzing Section 2 (ephemeral, Haiku)...
Found: 4 claims, 2 contradictions with Section 1
Checkpointing...
[Session 5]
Pattern detected: Sections 2, 4, 6 contradict each other on methodology.
Recommend: Verify author consistency across document.
When Hybrid Works
✓ Large projects with parallel work (process multiple sections independently) ✓ Cost-sensitive complex work (cheap analysis + selective expensive synthesis) ✓ Learning-based systems (ephemeral explores, long-running synthesizes)
Security Isolation: Sandbox Configuration
You've learned patterns. Now: how do you prevent your agent from breaking production?
Agents execute code. Agents run Bash. Agents access files. Without security boundaries, a compromised agent could:
- Access production databases
- Modify source code
- Exfiltrate customer data
- Consume unbounded resources
Sandbox configuration is how you draw these boundaries.
Understanding Sandbox Options
The Claude Agent SDK provides three sandbox controls:
options = ClaudeAgentOptions(
sandbox={
"enabled": True, # Whether Bash execution is isolated
"autoAllowBashIfSandboxed": True, # Auto-approve safe Bash
"network": {
"allowLocalBinding": True, # Can agent listen on localhost?
}
}
)
enabled: Controls whether Bash runs in isolated environment
True(Development): Bash runs in container, cannot access host systemFalse(Production): Bash runs with host permissions (dangerous without trust)
autoAllowBashIfSandboxed: Auto-approve Bash if sandbox is enabled
True: Sandboxed Bash commands execute automaticallyFalse: All Bash requires explicit human approval
allowLocalBinding: Can agent listen on network ports?
True: Agent can start web servers, APIsFalse: Agent cannot bind to network ports (safer)
Configuration for Development
async def development_agent():
"""Development: Maximize agent freedom, sandbox everything"""
options = ClaudeAgentOptions(
sandbox={
"enabled": True, # Isolate Bash execution
"autoAllowBashIfSandboxed": True, # Auto-approve safe commands
"network": {
"allowLocalBinding": True # Allow test servers
}
},
allowed_tools=["Bash", "Read", "Write", "Browser"]
)
prompt = "Debug why tests are failing. Run the test suite and analyze errors."
async for message in query(prompt=prompt, options=options):
if hasattr(message, 'content'):
print(message.content)
# Agent can safely run: rm -rf src/ (only affects sandbox, not host)
# Agent can safely run: npm install (only affects sandbox container)
# Agent cannot access: /home, /var/www (host filesystem protected)
Output:
Running: npm test
Error: test_database_connection failed
Root cause: Connection string missing in test config
Fix: Set TEST_DB_URL environment variable
Configuration for Production
async def production_agent():
"""Production: Minimal permissions, human review required"""
options = ClaudeAgentOptions(
sandbox={
"enabled": False, # Run on host (trust the agent)
"autoAllowBashIfSandboxed": False, # Explicit approval
"network": {
"allowLocalBinding": False # No network access
}
},
allowed_tools=["Read"], # Only read-only operations
max_turns=3 # Fewer turns = less opportunity for problems
)
prompt = "Analyze these logs and generate a report"
async for message in query(prompt=prompt, options=options):
if hasattr(message, 'content'):
print(message.content)
# Agent cannot: Run Bash
# Agent cannot: Write files
# Agent cannot: Access network
# Agent can: Read production logs, generate analysis
Decision Framework: When to Sandbox
| Requirement | Sandbox Setting | Rationale |
|---|---|---|
| Development testing | enabled=true, auto=true | Safe to experiment |
| Production read-only | enabled=false, auto=false, restricted tools | Minimal attack surface |
| Production with write access | enabled=true, auto=false, human approval | Dangerous operations logged |
| High-security environment (HIPAA, SOC2) | enabled=true, network=false, minimal tools | Multiple layers of isolation |
Memory Management: Context Compaction
You know how to architect agents: ephemeral, long-running, hybrid. You know how to secure them: sandbox configuration.
Now: How do you keep a 24/7 agent from consuming infinite memory?
A long-running agent accumulates context. Days of conversation. Weeks of decisions. Months of learning. The conversation history grows. Each new message costs tokens. Context window eventually fills.
Context compaction is how you manage this growth.
The Compaction Problem
Imagine an email agent running for 6 months:
- Day 1-30: 500 messages (light load)
- Day 31-60: 1000 messages (growing)
- Day 61-180: 5000+ messages (conversation exploding)
By month 6:
Total tokens in session history: 450,000
Context window limit: 200,000 tokens
Result: Agent cannot process new messages—context is full
Compaction solves this by archiving old context before the window fills.
The PreCompact Hook
from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher
async def pre_compact_handler(context):
"""Called before context compaction. Archive key knowledge."""
return {
"archive": True, # Tell SDK to proceed with compaction
"summary": """
=== ARCHIVED CONTEXT (6 months of operation) ===
Key customers:
- acme-corp: Enterprise client, requires JSON responses
- startup-xyz: Startup, flexible requirements
Resolved patterns:
- Database connection issues → Use connection pooling
- Rate limiting → Implement exponential backoff
- Authentication failures → Check token expiration first
Known issues:
- Email API sometimes returns 429 errors (rate limit)
- PDF parsing fails on scanned documents (need OCR)
Current priority tickets:
- Feature request from acme-corp: CSV export
- Bug: Timezone handling in batch jobs
""",
"checkpoint_data": {
"processed_tickets": 1247,
"customer_preferences": {...},
"learned_patterns": {...}
}
}
options = ClaudeAgentOptions(
hooks={
'PreCompact': [
HookMatcher(
matcher='*', # Trigger on any compaction
hooks=[pre_compact_handler]
)
]
}
)
# When session grows too large, SDK automatically calls pre_compact_handler
# Agent loses detailed conversation history but retains critical knowledge
Output (Before Compaction):
Conversation history: 6 months, 450K tokens
Output (After Compaction):
Archived summary: Key customers, patterns, issues (15K tokens)
+ Recent conversation: Last 10 exchanges (5K tokens)
Total: 20K tokens (freed 430K for new work)
Compaction Strategies
Strategy 1: Archive Everything
For agents that need to start fresh monthly:
async def monthly_archive_handler(context):
# Keep last 7 days, archive everything else
return {
"archive": True,
"keep_recent_days": 7,
"summary": "Full monthly summary of all work"
}
Strategy 2: Smart Retention
For agents that need specific knowledge preserved:
async def smart_archive_handler(context):
# Archive everything except:
# - Customer preferences
# - Unresolved tickets
# - Learned patterns
return {
"archive": True,
"preserve_tags": ["customer_preference", "open_ticket", "pattern"],
"summary": "Archive with selective preservation"
}
Strategy 3: Hybrid Archival
For agents juggling multiple concerns:
async def hybrid_archive_handler(context):
# Archive to external system, keep summary
# 1. Export full history to database
full_history_id = await export_to_db(context.messages)
# 2. Keep memory-efficient summary
summary = await generate_summary(context)
# 3. Return minimal context
return {
"archive": True,
"external_archive_id": full_history_id,
"summary": summary,
"memory_target": "50K tokens" # Keep under this
}
When to Trigger Compaction
async def email_agent_with_compaction():
"""Automatic compaction at decision points"""
options = ClaudeAgentOptions(
hooks={
'PreCompact': [
HookMatcher(
matcher='*',
hooks=[pre_compact_handler]
)
]
},
# Trigger compaction when context exceeds 80% of window
compaction_threshold=0.80
)
# Agent runs normally until context fills to 80%
# Then: PreCompact hook fires
# Then: Context compressed automatically
# Then: Agent continues with archived summary
async for message in query(
prompt="Process today's emails",
options=options
):
print(message.content)
Decision Framework: Choosing Your Production Pattern
You've learned three patterns. Which do you choose?
The Decision Tree
Question 1: Does your agent need to remember things between requests?
- NO → Ephemeral pattern
- YES → Go to Question 2
Question 2: Does the agent run continuously (24/7 or near it)?
- NO → Long-running pattern (checkpoint at session boundaries)
- YES → Go to Question 3
Question 3: Does the agent do multiple parallel tasks or expensive synthesis?
- NO → Pure long-running
- YES → Hybrid pattern (ephemeral tasks + long-running synthesis)
Your TaskManager Digital FTE
Recall: TaskManager is your capstone from Chapter 6. It's a project management agent that:
- Tracks task status
- Remembers project context
- Coordinates between team members
- Learns project preferences
Which pattern fits?
Answer: Hybrid
- Orchestrator (long-running): Maintains project state, learns preferences
- Task analyzer (ephemeral): Analyzes individual tasks independently
- Report generator (ephemeral): Generates daily reports without storing them
Sandbox configuration?
- Development:
enabled=true, auto=true(experiment freely) - Production:
enabled=true, auto=false(audit all file access)
Context compaction?
- Trigger monthly: Archive project archives, keep active task list
- PreCompact hook: Export detailed history to database before compacting
Try With AI
The three scenarios below test your ability to evaluate patterns and configure production agents. Each prompt builds on your accumulated knowledge about hosting, security, and memory.
Scenario 1: Bug Fix Agent
Setup: Your customer reports API failures in their production system. You need an agent to investigate, diagnose, and suggest fixes.
Your task: Determine the hosting pattern and sandbox configuration.
What you're learning: Evaluating when agents can be ephemeral vs. persistent. Understanding that stateless debugging doesn't require session persistence.
Prompt:
You're building a bug-fix agent for customer support.
Requirements:
- Investigate customer's reported issue
- Analyze logs and stack traces
- Suggest fix with code examples
- No need to remember this case for future debugging
What hosting pattern fits? Ephemeral or long-running?
What sandbox settings would you use for development vs production?
Explain your reasoning for both choices.
Expected approach:
- Recognize stateless nature (no past context needed)
- Choose ephemeral pattern (spin up, solve, terminate)
- Suggest development sandbox: enabled=true, auto=true (freedom to explore)
- Suggest production sandbox: enabled=true, auto=false (audit sensitive operations)
Scenario 2: Email Digest Agent
Setup: You're building a 24/7 email monitoring agent for enterprise customers. It checks inboxes daily, learns which emails matter, and prioritizes responses.
Your task: Design the hosting pattern, sandbox configuration, and a compaction strategy.
What you're learning: Long-running agents need sophisticated memory management. Context compaction isn't optional—it's essential for sustainability.
Prompt:
Design a production email agent with:
- 24/7 operation
- Daily inbox monitoring
- Memory of customer preferences and past solutions
- Operation for 6+ months without manual intervention
1. What hosting pattern? Why?
2. What sandbox settings for production?
3. When would you trigger context compaction? What would you preserve?
Show your PreCompact hook strategy. What goes into the archive summary?
Expected approach:
- Identify long-running pattern (continuous operation + persistent memory)
- Configure sandbox: enabled=true, auto=false (email access is sensitive)
- Design compaction trigger: Monthly (preserve customer preferences, archive old exchanges)
- PreCompact hook: Archive last 30 days of conversations, keep 6-month customer preference database
Scenario 3: Research Assistant Digital FTE
Setup: You're monetizing a research assistant that analyzes documents, extracts insights, and composes reports. Customers subscribe ($500/month) and use it for ongoing research projects.
Your task: Design a production architecture that balances cost, quality, and memory management. You target 60% gross margin.
What you're learning: Hybrid patterns enable cost optimization. Cheap analysis + expensive synthesis = profitable Digital FTE. Context compaction enables indefinite operation without losing value.
Prompt:
Design a production research agent Digital FTE:
- Customers subscribe at $500/month
- Process 10-20 document chunks daily
- Projects run for 3-6 months continuously
- Target: 60% gross margin (cost < $200/month per customer)
1. Which hosting pattern and why?
2. Show the architecture: how do ephemeral and long-running components interact?
3. Design context compaction: How do you archive project findings without losing learning?
4. How does this architecture achieve 60% margin?
Provide code showing:
- Ephemeral component (cheap analysis)
- Long-running component (expensive synthesis)
- PreCompact hook that archives findings
Expected approach:
- Identify hybrid pattern (parallel analysis + synthesis)
- Ephemeral tasks use Haiku (cheap: $0.80 per million input tokens)
- Long-running uses Opus (expensive: $20 per million—only for synthesis)
- Cost calculation: 20 documents × $0.05/analysis + $2 synthesis = $3/day = $90/month (vs $500 revenue = 82% margin)
- Compaction: Archive document analyses, preserve project insights and customer preferences
Safety note: When configuring sandbox for production, start with maximum restrictions (enabled=true, auto=false) and relax only after testing with restricted permissions. Never assume unsafe is acceptable because "it's just this one agent."
Reflect on Your Skill
You built a claude-agent skill in Lesson 0. Test and improve it based on what you learned.
Test Your Skill
Using my claude-agent skill, design a production deployment architecture.
Does my skill cover hosting patterns, sandbox config, and context compaction?
Identify Gaps
Ask yourself:
- Did my skill explain the three hosting patterns (ephemeral, long-running, hybrid)?
- Did it show sandbox configuration and context compaction strategies?
Improve Your Skill
If you found gaps:
My claude-agent skill is missing production deployment patterns.
Update it to include:
- Hosting pattern comparison (ephemeral vs long-running vs hybrid)
- Sandbox security configuration
- PreCompact hooks for memory management