Skip to main content

Principle 7: Observability

You ask an AI system to refactor a database query. It says it's done. You run the application. It crashes. You check the query—it looks completely different from what you expected. When did it change? What steps did it take? What files did it modify? You have no idea. You're flying blind.

This is the observability problem: if you can't see what the AI is doing, you can't debug problems, build trust, or improve the collaboration.

Observability means seeing into the black box. It's understanding what actions the AI took, in what order, with what results. This principle is about making AI workflows transparent, traceable, and debuggable.

The Black Box Problem: What Happens When You Can't See

Without Observability

You: "Fix the authentication bug"
AI: [Works for 30 seconds] "Done!"
You: "Thanks!"

[Later - Authentication completely broken]
You: "What did you actually change?"
AI: "I don't remember—this is a new conversation."

[You spend 2 hours debugging, reading git diff, trying to understand what happened]

With Observability

You: "Fix the authentication bug"
AI:
1. Reading src/auth/login.js...
2. Found issue: token validation missing
3. Adding token validation...
4. Running tests...
5. Tests passed
6. Changes: Modified src/auth/login.js (added 5 lines)
"Done! Here's what I changed."

[Later - Authentication completely broken]
You: "What did you actually change?"
[You check the log]
You: "I see—you added token validation but the validation function doesn't exist yet. That's the real bug."

The difference: Observability lets you understand the full context of what happened, not just the final result.

The Three Pillars of Observability

Pillar 1: Action Visibility (What Did It Do?)

You need to see each action the AI took:

✓ Read package.json
✓ Read src/auth/login.js
✓ Modified src/auth/login.js
- Added validateToken() call
- Added error handling
✓ Ran npm test
✓ Tests passed
✓ Git diff shows 5 lines added

Without this, you can't debug. With this, you can trace exactly what happened.

Pillar 2: Rationale Visibility (Why Did It Do It?)

You need to understand the AI's reasoning:

Reading src/auth/login.js...
→ Identified issue: Missing token validation
→ Chose approach: Add validateToken() call
→ Why: This matches the pattern used in other auth functions

Without rationale, you see changes but not the intent. With rationale, you can evaluate whether the approach makes sense.

Pillar 3: Result Visibility (What Was the Outcome?)

You need to see the result of each action:

Ran npm test...
→ PASS: src/auth/login.test.js
→ 12 tests passed
→ 0 tests failed
→ Coverage: 85% (unchanged)

Modified files:
- src/auth/login.js (+5 lines, -1 line)

Without results, you can't verify success. With results, you can confirm the AI achieved what it intended.

Reading Activity Logs: A Practical Guide

Most AI tools provide activity logs. Here's how to read them effectively.

Log Structure

Typical activity log structure:

[TIME] [ACTION] [DETAIL]
[2025-01-22 14:32:15] [READ] /Users/project/src/auth/login.js
[2025-01-22 14:32:16] [ANALYZE] Found missing token validation
[2025-01-22 14:32:17] [EDIT] /Users/project/src/auth/login.js
+ Added: validateToken() call
+ Added: try-catch for validation errors
[2025-01-22 14:32:18] [COMMAND] npm test
→ Exit code: 0
→ Output: 12 passing
[2025-01-22 14:32:19] [COMPLETE] Task finished successfully

What to Look For

Success Pattern:

READ → ANALYZE → EDIT → VERIFY → COMPLETE

Each step logically follows the previous one. Verification happens after changes.

Warning Pattern:

READ → EDIT → EDIT → EDIT → [NO VERIFICATION] → COMPLETE

Multiple edits without verification. No testing. High risk of problems.

Failure Pattern:

READ → EDIT → VERIFY → [TESTS FAIL] → EDIT → [TESTS FAIL AGAIN] → GAVE UP

AI tried but couldn't solve the problem. Needs human intervention.

Debugging Through Logs

When something goes wrong, trace through the log:

# Problem: Tests failing after AI work
# Log shows:

[14:32:15] [EDIT] src/utils/validation.js
+ Added: stricter email validation
- Removed: regex-based validation

[14:32:16] [COMMAND] npm test
→ FAIL: 15 tests failing
→ All failures in email validation tests

# Diagnosis: AI changed validation approach but broke existing tests
# Solution: Revert change, ask AI to run tests first

Designing for Observability: Building Transparent Workflows

When working with AI, design workflows that make actions visible.

Pattern 1: Explain Before Executing

You: "Refactor the authentication system"

AI: "Here's my plan:
1. Extract token logic to TokenService class
2. Update login.js to use TokenService
3. Update logout.js to use TokenService
4. Run tests to verify
5. If tests pass, commit changes

Shall I proceed?"

You: "Yes, proceed"

AI: [Executes with full logging]

The plan makes intentions visible. You can redirect before execution.

Pattern 2: Checkpoint After Major Steps

AI: "Step 1 complete: Created TokenService class
Changes:
- src/services/TokenService.js (new file, 45 lines)

Status: Ready for step 2
Shall I continue with updating login.js?"

Checkpoints let you verify progress incrementally.

Pattern 3: Summary After Completion

AI: "Task complete. Summary:

Files modified: 3
- src/services/TokenService.js (created)
- src/auth/login.js (modified: -12 lines, +8 lines)
- src/auth/logout.js (modified: -8 lines, +6 lines)

Tests: 15 passing, 0 failing

Git status: 3 files staged

Next steps:
- Review changes with git diff
- Commit if satisfied
- Or ask me to adjust anything"

The summary provides complete context for review.

Tool-Specific Observability Features

Different AI tools provide different observability features.

Claude Code

Activity Logs: .claude/activity-logs/prompts.jsonl

  • Records all prompts and responses
  • Can review past sessions
  • Full conversation history

Subagent Logs: .claude/activity-logs/subagent-usage.jsonl

  • Tracks when Claude delegated to specialized agents
  • Shows which subagent handled what task

Cursor

History Panel: Shows all AI interactions in current session

  • Can review each suggestion
  • See diffs before accepting

Cmd+K Quick Actions: Contextual suggestions with preview

  • See what will change before accepting

GitHub Copilot

Copilot Workspace: Full AI project work with visible steps

  • Shows plan before executing
  • Displays file changes
  • Provides test results

Observability Anti-Patterns

Anti-Pattern 1: Silent Failures

AI: "Done!" [but something actually failed]

You only discover hours later when the system breaks.

Fix: Require confirmation/visibility for all operations, not just successes.

Anti-Pattern 2: Output Without Context

AI: [Shows diff] "I changed this file"

[You can't tell why, or if it's correct]

Fix: Require rationale with every change. "I changed X because Y."

Anti-Pattern 3: Missing Intermediate Steps

AI: [Works for 2 minutes] "Done!"

[You have no idea what happened in those 2 minutes]

Fix: Require progress updates for long-running tasks.

Building Your Observability Toolkit

Essential Observability Tools

1. Git History

# See what changed
git log --oneline -10

# See the exact changes
git diff HEAD~1 HEAD

# See who changed what (including AI if attributed)
git blame file.js

2. Activity Log Review

# Claude Code logs
cat .claude/activity-logs/prompts.jsonl | jq

# Filter by time
cat .claude/activity-logs/prompts.jsonl | jq 'select(.timestamp > "2025-01-22")'

3. Test Results

# Run tests and save output
npm test 2>&1 | tee test-results.log

# Compare before/after
git diff HEAD~1:test-results.log

Custom Logging Patterns

Add logging to your AI workflows:

// Log AI actions for later review
function logAIAction(action, details) {
const logEntry = {
timestamp: new Date().toISOString(),
action: action,
details: details,
user: process.env.USER,
workingDirectory: process.cwd()
};

fs.appendFileSync('.ai-activity.log', JSON.stringify(logEntry) + '\n');
}

// Use in workflow
logAIAction('READ', { file: 'src/auth/login.js' });
logAIAction('EDIT', { file: 'src/auth/login.js', changes: '+5 -1' });

Why Observability Enables Trust

Trust isn't given—it's earned through transparency. When you can see what AI is doing:

  • You understand its decisions
  • You can correct mistakes early
  • You learn its patterns
  • You feel confident giving it more autonomy

Without observability, you're always second-guessing. With it, you can build genuine trust based on evidence.

This Principle in Both Interfaces

"If you can't see what the agent is doing, you can't fix it when it goes wrong."

Both interfaces provide observability—through different mechanisms.

LayerClaude CodeClaude Cowork
PlanChat shows reasoningProgress panel shows plan
ActionsTerminal shows commands executedProgress panel shows steps taken
OutputsFiles visible in filesystemArtifacts panel shows outputs
ErrorsTerminal shows error outputProgress panel shows issues

Cowork's observability advantage: The three-panel layout (chat, progress, artifacts) was designed specifically for observability. You can see plan, execution, and outputs simultaneously without switching contexts.

Claude Code's observability advantage: Full terminal access means nothing is hidden. You see exactly every command executed, every file read, every output generated. The raw transparency of the terminal is unmatched.

The principle is the same: Regardless of interface, you need visibility into what the agent is doing. Without it, agents are black boxes. With it, they're debuggable systems you can trust and improve.

Try With AI

Prompt 1: Log Analysis Practice

I want to practice reading and understanding AI activity logs.

Here's an activity log from an AI session:
[Paste a real or hypothetical activity log showing a sequence of actions]

Help me analyze:
1. What actions did the AI take? (List them in order)
2. What was the AI trying to accomplish?
3. Did it succeed? How do you know?
4. Are there any warning signs or potential issues?
5. What would I check to verify the work is correct?

Then, help me understand: What patterns should I look for in logs to identify successful vs problematic AI sessions?

What you're learning: How to read and interpret AI activity logs. You're developing the skill of understanding agent behavior through observation—essential for debugging and building trust.

Prompt 2: Designing Observable Workflows

I want to design more observable AI workflows.

I'm going to have you help me with [describe a task]. But first, let's design how you'll make your work visible:

For this task, I want you to:
1. Show me your plan before executing
2. Check in with me after each major step
3. Provide a summary when complete
4. Explain the rationale for significant changes

Let's execute this task with full observability. After we're done, help me reflect:
- What was most useful to see?
- What was missing?
- How would I modify this approach for future tasks?

What you're learning: How to design workflows that are transparent and observable. You're learning to structure AI collaboration so that actions are visible, traceable, and understandable.

Prompt 3: Debugging Through Logs

I want to practice debugging AI work using logs.

Scenario: I had an AI help me with a task, but something isn't working right.

Here's what I know:
- [Describe the problem—tests failing, unexpected behavior, etc.]
- [Share the activity log if available, or describe what the AI did]

Help me debug this by:
1. Reconstructing what likely happened based on the information
2. Identifying the most likely cause of the problem
3. Suggesting what to check or verify
4. Proposing a fix

Then, help me understand: What observability would have made this easier to debug? What should I track next time?

What you're learning: How to use observability to debug problems effectively. You're learning to trace issues through logs, understand agent behavior, and identify what additional visibility would help.

Safety Note

Observability is your defense against unexpected behavior. Always review activity logs when something seems wrong. The more you understand what the AI is doing, the better you can direct it and catch problems early.