Skip to main content

Problem Solving with General Agents: A 90-Minute Crash Course

7 Principles · 4 Tools · 80% of Real Use


In the AI Prompting course, you learned how to talk to AI: give context, ask clearly, check the output. In the Agentic Coding course (or the Cowork course), you learned the tools: plan mode, context management, skills, hooks. This course teaches the next layer: how to actually solve problems well with these tools.

Two students get the same assignment: use AI to read through five research papers, pull out the key findings, and write a one-page comparison summary.

Student A finishes in 25 minutes with a clean, verified summary. Student B spends 90 minutes going back and forth, the conversation gets cluttered, AI starts making mistakes, and they have to start over.

Same AI tool. Same assignment. The difference? Student A followed seven principles. Student B did not know them. This course teaches those seven principles.

Who this is for

Anyone using an AI tool that can take action on your behalf: read your files, write documents, run commands, and connect to services. These tools are not chatbots that just answer questions. They are AI assistants that actually do the work.

There are four tools that work this way:

For coding/engineeringFor non-coding work
AnthropicClaude Code (terminal, IDE, web)Claude Cowork (desktop app)
Open-sourceOpenCode (terminal, any AI model)OpenWork (desktop app, any AI model)

The seven principles in this course work the same across all four tools. The examples cover different fields (research, writing, coding, business), but the principles are identical. When the tools differ in how you do something, this course shows all four side by side.

What is Mode 1 vs Mode 2? There are two ways to use these AI tools:

  1. Mode 1: Problem solving. You open a tool, solve a task, and ship the result. This is what most people do most of the time. This course teaches Mode 1.
  2. Mode 2: Building AI workers. You create permanent AI assistants that run on their own, without you. That is covered in a separate course.

Prerequisites: This course assumes you have completed AI Prompting in 2026 and at least one of the tool courses: Claude Code & OpenCode or Cowork & OpenWork. You need to know the basics before these principles make sense.

Three reading paths: 30-minute taste (Principles 1, 3, 5 only), 90-minute essential (all principles, examples, Parts 9 and 11), Full read (~2 hours, everything).

Choose your reading path from the image above. You do not have to read everything in one sitting.

Non-engineering readers: skim the code blocks in the examples (the principle is the same on a different surface) and skip the system-of-record note in Principle 5. The rest is for you.

Safety first

These AI tools can read, edit, and delete your files. They can run commands on your computer. Before you give AI access to anything, make sure you understand what it is allowed to touch.

  • Start by making AI ask your permission before every action
  • Do not let AI access everything at once
  • If you set permissions wrong, one bad instruction could delete important files or share private information

You learned the basics of permissions in the tool courses. Principle 6 in this course goes deeper.

Want the deep version? This is a crash course, the seven principles in one read. For the full treatment, see Chapter 18: The Seven Principles of General Agent Problem Solving. For tool-specific depth, the downstream pages are Claude Code and OpenCode: A 90-Minute Crash Course and Cowork and OpenWork: A 90-Minute Crash Course. This page is the principles; those pages are the surfaces.


📚 Teaching Aid

Open Full Slideshow

View Full Presentation — Problem Solving Crash Course


The essentials in five bullets

If you internalize only these five points, you have 60% of the value:

  1. Action over talk. A general agent's value comes from doing things: running commands, reading files, calling services. Treat every prompt as something that should result in an action or an artifact, not a paragraph of explanation.
  2. Code (and structured artifacts) over prose. When precision matters, ask for a schema, a table, a code block, a checklist, not a paragraph. The agent's output quality goes up sharply when the format is constrained.
  3. Verify, don't trust. Every meaningful output needs a verification step: tests for code, a rubric for a memo, a cross-model review for a high-stakes deliverable. "Looks right" is the failure mode.
  4. Small steps, atomic checkpoints. Decompose work into reversible units. Commit, snapshot, or save-version after each unit lands. Never let the agent run an hour of work without a single checkpoint.
  5. Files are memory. The conversation is volatile; the filesystem is durable. Anything worth remembering across sessions (decisions, plans, conventions, glossaries) belongs in a file, not in a chat history.

The remaining two principles (constraints and observability) are how you operationalize the first five. They keep the agent inside the lane you set and tell you whether it stayed there.

The chapter in five disciplines: action over talk; code over prose; verify don't trust; small atomic steps; files are memory. The remaining two principles (constraints, observability) wrap the first five. Figure 1: The five core disciplines, wrapped by the two operational principles. Print this and tape it to your monitor.


Why These Principles Look Old: The Lindy Effect

Some of the most important tools in computing are also the oldest: the terminal (where you type commands), files (where you save your work), Git (where you track changes), and SQL (where you query databases). They have been around for decades and they still work. There is even a name for this pattern: the Lindy Effect, which says that something that has been useful for a long time is likely to stay useful for a long time.

Why does this matter for AI? These AI tools do not invent their own way of working. They use the same tools that have existed for decades: they run commands in the terminal, save results to files, track changes with Git, and look up data with SQL. AI thinks in human language, but it acts through these proven tools. The tools survived because they work well.

Three things to understand:

  1. These old tools become even more important with AI. The terminal lets AI run tasks. Git lets AI track and undo changes. Files give AI a place to save its work. These are not outdated tools: they are the foundation AI is built on.

  2. Your role changes, but you are still needed. AI can write code, run tests, and edit files. But it needs you to clearly define the problem and check that the result is correct. Your two most important skills: explaining what you want and knowing whether the output is right.

  3. AI tools work best when they can track, undo, and check their own work:

    Five things AI tools need: stable tools that do not change, clear formats, the ability to undo mistakes, the ability to check results, and permission controls.

AI does not replace these old tools. It makes them more valuable. The seven principles below teach you how to use AI effectively through these foundations.


Part 1: The Seven Principles

#PrincipleFailure mode it prevents
1Bash is the Key"Agent only talks, doesn't act"
2Code as Universal Interface"Prose request keeps getting misread"
3Verification as Core Step"Output looks right but breaks in production"
4Small, Reversible Decomposition"One big change nuked an afternoon"
5Persisting State in Files"Agent forgets what we decided yesterday"
6Constraints and Safety"Agent touched files I didn't authorize"
7Observability"Don't know what the agent actually did"

These aren't in order of importance; they're in order of building dependency. Each one rests on the ones above it. Read them in sequence at least once.

Principle 1 and Principle 2 sound similar but fix different problems
  • Principle 1 (Action): AI talks about doing something instead of actually doing it. Example: AI explains how to organize your files but never actually moves them.
  • Principle 2 (Structure): AI does the work but gives you the result in a messy format. Example: AI finds all the information you need but dumps it into one long paragraph instead of a clean table.

You need both. Principle 1 makes sure AI does the work. Principle 2 makes sure the result is useful.

Seven principles arranged as a pyramid: P1 (Bash) is the widest foundation bar at the bottom; each subsequent principle is a narrower bar stacked above, with P7 (Observability) at the apex. Shows that each principle depends on all those below it. The dependency pyramid: P1 is the widest foundation; each principle above rests on the ones below.

The thesis in one line. The principles govern the session; the tools are interfaces to the same session. Learn to think with the principles and your skill transfers whichever tool you happen to be in.


Principle 1 — Bash is the Key

What "Bash" means. The terminal is the black-screen text interface that comes with every laptop, the one you have seen in hacker movies. Bash is the language used inside it. When AI runs Bash, it is typing the same commands you would type if you opened the Terminal app on your Mac (or PowerShell on Windows). AI has full keyboard access to your machine, through commands instead of clicks. For Cowork and OpenWork users: same principle on a different surface (step cards instead of typed commands). Either way: AI acts on your computer, you watch it act.

The failure mode: "Why does AI only talk about doing things instead of actually doing them?"

What makes these AI tools different from a regular chatbot? A chatbot just answers your questions. These tools take action: they read files, write documents, run commands, and keep going until the task is finished. The first principle is simple: treat AI as a doer, not an advisor.

The beginner mistake. Most people start by asking questions: "How should I organize my notes from last week?" AI gives you a long explanation but does not actually organize anything. You asked for advice when you should have asked for action.

The fix: give a specific instruction.

Asking for advice (weak)Giving an instruction (strong)
"How should I organize my notes?""Read every file in the notes/ folder. For each file, pull out the action items and who is responsible. Save the result to weekly-summary.md, sorted by person."

The first prompt gives you a paragraph of suggestions. The second prompt gives you a finished file. That is the difference between using AI as a chatbot and using it as a tool. (This is the same idea from the AI Prompting course, but the stakes are higher because AI is now changing your actual files.)

What "Bash" means in each tool

Claude CodeOpenCodeCoworkOpenWork
Action surfaceTerminal: runs shell commands on your machineSame as Claude CodeLocal Linux VM on your Mac/PC; reads and writes only inside folders you grant itSame as Cowork
Visible asCommands stream inline in the terminalSame as Claude CodeStep cards in the side panel ("Read 3 files", "Ran a script")Timeline of step chevrons
Approval defaultAsks before each Bash action; allow-listed commands run silentlySame as Claude Code; configurable per toolAsks before writing files, sending messages, or scheduling workSame; per-tool approval granularity
Where this fails quietlyAgent waiting for approval you didn't noticeGlobal "permission": "allow" set without thinkingA document you fed it contains hidden instructions; the agent follows them as if they were yoursSame; amplified with many connectors

The mental model: the agent has hands. Brief the hands, not the brain.

Examples

The pattern is always the same, no matter what field you work in: tell AI what to do, what to work with, and what the result should look like. Most beginners start in the "asking questions" column. Once you switch to the "giving instructions" column, that is where you will stay.

Here are examples from different fields. The pattern is the same every time: asking a question gets you advice, giving an instruction gets you a finished result.

Legal work: searching through 47 documents

  • Asking: "What does indemnification mean in deposition transcripts?" → AI writes an essay. No files touched.

  • Instruction → AI searches all 47 files and gives you a list of every match, done in minutes:

    Search every PDF in /depositions for "indemnification" and close synonyms.
    For each hit, return file name, page number, and surrounding paragraph.
    Save to indemnification-hits.md.

Everyday: cleaning up a messy Downloads folder

  • Asking: "How should I organize a messy Downloads folder?" → AI gives you generic tips about folder organization.
  • Instruction: "My Downloads folder is a mess. What is actually in there?" → AI looks at your folder, counts 847 files, groups them by type, finds the biggest files taking up space. Thirty seconds. You did not type a single command. The principle is not "learn terminal commands." It is: let AI pick the right command for you.

Accounting: matching bank records to your books

  • Asking: "How do I reconcile a bank statement against a general ledger?" → AI gives you a tutorial.

  • Instruction → AI does the matching and gives you the list of mismatches in twenty minutes:

    Open bank-statement-march.csv and gl-export-march.xlsx. Match each bank
    transaction to a GL (General Ledger) entry (same date ±2 days, same amount, same vendor).
    List unmatched items in march-reconciliation-gaps.md, split into
    "in bank not GL" and "in GL not bank".

Marketing: comparing campaign results

  • Asking: "How are my Q3 campaigns doing?" → AI gives a generic answer about industry benchmarks.

  • Instruction → AI reads your actual data and gives you the real numbers in three minutes:

    Read every campaign-2025-Q3-*.csv in /campaigns/Q3. Produce a table:
    campaign name, send date, sends, opens, open rate, clicks, click rate,
    conversions. Sort by open rate descending. Save to Q3-campaign-summary.md.

The rule: Every time you catch yourself typing a question, stop and ask: "Can I turn this into an instruction that produces a file?" Almost always, yes.

Hands-on: Hello world

The principle is theory until you've felt it once with no thinking. This is your hello-world: pre-curated inputs, one-line prompt, paste and watch.

Setup (30 seconds):

  1. Download Pack 1 — Cluttered folder and unzip it.
  2. Open the unzipped folder in your tool of choice (Claude Code, OpenCode, Cowork, or OpenWork). Give it read access to the downloads/ subfolder.

Paste this prompt verbatim:

What's in ./downloads/?

That's the whole prompt. Five words. No instructions on how to look. No file to write. No structure. Just the question.

What you should see. The agent runs a short cascade of commands on its own. Something close to this will stream in your terminal (Claude Code / OpenCode) or appear as step cards (Cowork / OpenWork):

$ ls -lh ./downloads/
total 0
-rw-r--r-- invoice-globex-march.pdf 0B
-rw-r--r-- invoice-globex-march (1).pdf 0B
-rw-r--r-- invoice-globex-march-final.pdf 0B
-rw-r--r-- Sample_Vendor_MSA_v2.pdf 0B
-rw-r--r-- Sample_Vendor_MSA_v2_signed.pdf 0B
-rw-r--r-- Q4-roadmap-DRAFT.docx 0B
-rw-r--r-- Q4-roadmap-DRAFT (1).docx 0B
-rw-r--r-- Q4-roadmap-final.docx 0B
-rw-r--r-- design-assets-final.zip 0B
-rw-r--r-- design-assets-final-v2.zip 0B
-rw-r--r-- Screenshot 2025-02-14 at 20.04.12.png 0B
-rw-r--r-- GoogleChrome.dmg 0B
...
(41 more entries)
-rw-r--r-- SIZES.txt 1.1K

$ find ./downloads -type f | wc -l
53

$ cat ./downloads/SIZES.txt
88K invoice-globex-march.pdf
88K invoice-globex-march (1).pdf
91K invoice-globex-march-final.pdf
340K Sample_Vendor_MSA_v2.pdf
355K Sample_Vendor_MSA_v2_signed.pdf
64K Q4-roadmap-DRAFT.docx
64K Q4-roadmap-DRAFT (1).docx
71K Q4-roadmap-final.docx
24M design-assets-final.zip
24M design-assets-final-v2.zip
225M GoogleChrome.dmg
...

Then the agent talks back to you in chat. Something like:

There are 53 files in ./downloads/. The files themselves are empty placeholders, but a SIZES.txt lists their real sizes. The mix is roughly: a handful of invoices, two vendor contracts, three Q4 roadmap drafts, two design-asset zips, a stack of screenshots and phone photos, and a few installers. A few obvious duplicate clusters jump out: three invoice-globex-march* files, two Sample_Vendor_MSA_v2* files, three Q4-roadmap drafts (two DRAFT versions plus Q4-roadmap-final.docx), and two design-assets-final*.zip files. The largest items by stated size are the installers (e.g., GoogleChrome.dmg at 225M) and the design-asset zips at 24M each. Want me to do anything with this: propose a folder layout, find more duplicates, anything else?

That's it. The artifact is the reply itself: a few short paragraphs in chat, grounded in files the agent actually opened.

What just happened. You typed five words. You did not type any commands yourself. AI figured out which commands to run and in what order, all on its own. It also knew what not to do: it did not move any files, did not write anything new, and did not make up information. It read SIZES.txt because the other files were empty and it needed real data.

Now compare this to what would have happened if you had asked a chatbot "how should I organize a Downloads folder?" You would have gotten a generic article. Instead, you got an answer based on your actual 53 files. Same AI model. Different instruction. That gap between asking and instructing is what this entire course is about.

If AI did not do this: If AI explained what it would do instead of actually doing it ("I would run these commands..."), or if it asked you questions before looking at the folder, that is the exact problem Principle 1 fixes. Just reply: "Do not explain. Just look at the folder." AI will do it. Whenever AI talks instead of acts, remind it to act.

Now apply to your own work

The curated Downloads folder was easy. The real test is a folder you've been avoiding: a Dropbox that's grown for two years, an Inbox nine thousand deep, a shared drive where every client has a different filing convention. Too big for you, perfectly sized for an agent.

Write the brief, not the method. One sentence. Name the input (which folder, thread, drive) and name the output (a summary file, a list, a report). Resist the urge to specify commands or clicks. You don't know which commands you'd need; the agent does. Working shape:

The folder at <path> has been collecting <thing> for <how long>.
Inspect it and write me a <named output file> that <decision the
output should support>. Read-only, don't change anything.

Watch the agent run. In Claude Code / OpenCode notice that you didn't type the commands. The first time the agent self-corrects from a too-broad find to a narrower one without your help, the principle lands. In Cowork / OpenWork the execution view fills with step cards, each one a task you would have done by hand in a pre-agent workflow.

The single failure. If you find yourself adding "use find for this part" or "open the spreadsheet and..." to the prompt, you're back to specifying method instead of outcome. Cut every verb that describes how, and keep only the verbs that describe what you want at the end. Re-run. The second version almost always lands cleaner.

Why this matters. This is the single highest-leverage habit in the crash course, and the one skilled people fail to install, because dictating method feels faster than waiting. It isn't. Every minute you spend specifying the method is a minute the agent could have been running it. Brief the hands. Step back. Read the artifact.


Action alone isn't enough. The agent can act powerfully in entirely the wrong direction, because you asked in prose and it guessed at what you meant. That's what Principle 2 fixes.

Principle 2 — Code as Universal Interface

The failure mode: "Why does my prose request (a plain-English paragraph instead of a structured format like a table or checklist) keep getting misread, and why does AI keep stopping at the edge of what apps can already do?"

Sarah had 3,000 photos from a trip across Southeast Asia, scattered across a phone, a camera, and a backup drive, with filenames like IMG_4521.jpg, DSC_0089.jpg. She wanted them organized by country and city, with dates in the filenames, duplicates removed by actual image content rather than name. She tried three photo apps. Each did part of what she wanted; none did the combination. The features were pre-built; her needs weren't.

She wrote one paragraph to a general agent: "I have 3,000 photos in three folders. I want them organized by country and city based on the location data in each photo, renamed YYYY-MM-DD-original.jpg, duplicates detected by image content, organized into clean folders." Fifteen minutes later, it was done. The agent wrote a short program that read each photo's embedded location, reverse-geocoded it, renamed by date, hashed image bytes to find duplicates, and moved everything into the structure she described. She wrote no code. The agent's interface to her computer, for all of it, was code.

Principle 2 has two parts. First: when a task is too complex for a single command, AI writes code to get it done. Second, and this is the part most people miss: the format you ask for matters as much as what you ask for. A plain paragraph is vague. A table, a checklist, or a structured template is clear. The more specific the format you give AI, the less it has to guess.

This is AI Prompting in 2026 concept 7, outline before drafting, with the outline made formal. The outline is now an interface, not a suggestion.

Wait, isn't Bash already code?

If you just read Principle 1, fair question. The distinction matters and it's small:

SurfaceRoleWhat it does
Bash (Principle 1)The handsNavigate, search, move, observe, one command at a time
Code (Principle 2)The brainCompute, transform, orchestrate, persist, integrate

Bash opens the folder; code reads every file in it, hashes the bytes, compares them, and writes a deduplication report. An agent with only Bash can poke around but can't think; an agent that can also write and run code can solve any computational problem you can describe. Sarah's photo job was beyond Bash because it required computation: reading EXIF data (the hidden information stored in every photo, like date, location, and camera settings), hashing images (comparing the actual content of two photos to find duplicates), and reverse-geocoding (converting GPS coordinates into a city or country name). The instant the work crosses from "look here, move that" into "compute, decide, build a thing," you're in Principle 2.

The five powers code unlocks

Why is code so powerful for AI? Because regular apps (like photo organizers or spreadsheet tools) only do what they were built to do. If your task does not fit their features, you are stuck. When AI writes code, it can solve problems that no existing app was designed for. Here are five things code lets AI do that regular apps and simple commands cannot:

  1. Precise thinking. If you ask AI in plain text "what did I spend the most on this year?", it might give you a rough answer. But if AI writes code, it calculates to the exact cent. For example: Marcus had a year of business expenses and wanted averages by category, months where spending was unusually high, and how each quarter compared to the last. AI wrote a short program that did all of this with exact numbers. Marcus did not write any code. He just described what he wanted.

  2. Workflow orchestration. Some tasks have many steps with different paths: "if the file is a PDF with 'Invoice' in it, move it to Finances. If it is a PDF without 'Invoice', move it to Documents. If it is an image, move it to Images. Everything else goes to Other." Without code, AI would stop and ask you at every step. With code, AI writes all the rules at once and the entire job runs from start to finish without interruption.

  3. Organized memory. Big tasks produce a lot of intermediate work: temporary files, partial results, notes for later. Code lets AI create folders, save files, and read them back later. This means AI can pick up where it left off instead of starting over every time you send a new message.

  4. Universal compatibility. Real information is scattered across different formats. Aisha was planning a family reunion: the guest list was in a spreadsheet, dietary notes were in emails, RSVPs came from a web form, and flight details were in PDF attachments. No single app can read all four. AI wrote a small program that read each source in its own format and combined them into one guest list. Code connects things that were never designed to work together.

  5. Instant tool creation. When no app does exactly what you need, AI builds one. A community garden coordinator needed to track plot assignments, water usage, harvest amounts, and volunteer hours. No "garden management app" does that exact combination. AI wrote a small tracker with a few scripts and a weekly report. The tool did not exist before. Ten minutes later, it did.

You do not need to memorize these five. They are here so you start noticing moments where you would normally say "there is no app for this" and realize that AI can build one for you.

The two things you still do

AI writes the code and creates the output. You do not need to write code yourself. Your job is two things:

1. Tell AI exactly what you want (define the problem)

  • If you work with code: Describe the problem clearly. The more specific you are (what it should do, what format the output should be, what it should not touch), the better the result.
  • If you work with documents: Describe the format, not the words. For example: "Write a one-page memo with four sections: summary, findings, risks (maximum three), and next steps." AI fills in the content; you define the structure.

2. Check that the result is correct (verify the output)

  • If you work with code: You need to read code well enough to spot mistakes, not write it. Can you look at a database query and tell if it is filtering the wrong data? That is enough.
  • If you work with documents: Check whether each claim is actually supported by your source material. AI writes smooth, confident text. That is the trap. Read for what is true, not what sounds good. If AI says "risk is HIGH," check what evidence it used to reach that conclusion.

These two skills (defining problems clearly and checking results carefully) are the most important things you can learn. No matter how much AI improves, someone still needs to say what the problem is and verify that the answer is right.

Why this gets easier over time. AI already works in small pieces: one function, one section, one table at a time. Each piece is small enough to check in under a minute. As AI improves, more of the checking happens automatically: tools already catch code errors on every save, a second AI can review the first AI's work, and fact-checking tools can verify claims against your source material. Today you check individual lines and sentences. In the future, you will mostly review summaries and approve final results. But the two core skills (defining the problem and checking the output) will always be needed.

Examples

The pattern is the same everywhere: describe the structure you want (sections, columns, rules, what is not allowed), then let AI fill it in. The format does not matter. It could be a table, a template, a checklist, or a database schema. What matters is that you define the shape first.

The common mistake: saying "make this cleaner" or "polish this" without telling AI what "clean" means. Without a clear structure, AI drifts. Put the rules in the structure, not in a vague prompt.

Here are examples of what "define the structure" looks like in different fields:

  • Legal: "One row per witness. Columns: what they admitted, what they denied, follow-up questions. Include page and line numbers from the transcript."
  • Consulting: "Four sections: stated problems, unstated problems (with evidence), key quotes, open questions. One page maximum."
  • Hiring: "For each resume: required qualifications (yes/no with evidence), preferred qualifications, any credential flags, one-word recommendation (ADVANCE / HOLD / DECLINE), and a one-line reason."
  • Sales: "Five sections in order: summary, risks (maximum 5), how to address each risk, decision (GO / NO-GO / HOLD), and open questions."
  • Real estate: "Table with columns: address, sale date, price, price per square foot, bedrooms, bathrooms. Sort by price per square foot."

For engineers, the same idea applies with code structures:

  • Database: Define the table structure first (which fields are required, what values are allowed). The database itself will reject bad data before any code runs.
  • Functions: Ask AI for the function definition first (what goes in, what comes out), then write tests, then write the actual code. The definition is the contract; the tests check the contract; the code comes last.

When plain text is fine: For brainstorming, creative writing, or casual explanations, you do not need a rigid structure. But if you have asked AI twice and the output is still wrong, that is your signal to switch to a structured format.

This applies to your inputs too, not just AI's output. If you are feeding AI five documents and asking for a comparison, organize them into a table with consistent columns first. Do not paste five blocks of messy text and expect a clean comparison. The quality of AI's output depends on the quality of what you give it.

Hands-on: Hello world

The best way to see this principle in action is to try it yourself. We have prepared a small folder of receipts in three different formats (photos, PDFs, and screenshots). No single app can read all three. AI can.

Setup (30 seconds):

  1. Download Pack 2 — Receipts and unzip it. Inside you will find 15 sample receipts: 5 phone photos of paper receipts (receipts/photos/), 5 email PDFs (receipts/pdfs/), and 5 app screenshots (receipts/screenshots/). Two of the receipts have unusually large amounts so you can test whether AI catches them.
  2. Open the unzipped folder in your tool (Claude Code, OpenCode, Cowork, or OpenWork). Give it read access to the receipts/ folder.

Paste this prompt verbatim:

I want to understand why general agents that write code are more powerful
than specialized tools.

Here is my situation: I have a folder ./receipts/ with 15 receipts in mixed
formats — 5 phone photos of paper receipts, 5 PDF email receipts, and 5 app
screenshots. I need to:
1. Extract the date and amount from each receipt
2. Categorize them (groceries, dining, transportation, etc.)
3. Create a monthly summary showing totals by category
4. Flag any unusually large purchases

Walk me through how you would approach this. Don't write actual code; I'm
still learning. Instead, explain:
- What different steps would you take, in order?
- How does this approach give you flexibility a pre-built receipt app
would not have?
- Which of the Five Powers (precise thinking, workflow orchestration,
organized memory, universal compatibility, instant tool creation) is
each step using?

What you should see. AI first looks at the receipts/ folder to see what is in there (three subfolders, 15 files in different formats). Then it writes out a step-by-step plan:

  1. Read each file in its own format (use image reading for photos and screenshots, text extraction for PDFs)
  2. Pull out the key info from each receipt: date, amount, merchant name, and what format it came from
  3. Sort each receipt into a category based on the merchant name
  4. Add up totals by month and category
  5. Find the unusually large purchases and flag them

For each step, AI should mention which of the five powers it is using. It should also explain what makes this better than a regular receipt app: it reads all three formats at once, it lets you define your own categories, you can change the rules whenever you want, and you can save the results wherever you like.

What to notice. AI did not suggest "download a receipt-scanning app." No single app can read photos, PDFs, and screenshots in the same pass and let you define your own categories and let you change the outlier threshold. What AI described is a custom receipt tracker that did not exist before this conversation. That is the whole point of this principle: AI uses code to build exactly what you need, combining multiple powers into one solution. Regular apps have fixed features. Your needs are not fixed.

If AI did not do this: If AI gave generic advice ("you could try OCR software") instead of a concrete plan, it probably did not look at the folder first. Reply: "List the files in ./receipts/ first. Then redo the walkthrough using the actual file names and formats you see." The second attempt will be much more specific.

Optional follow-up (do this if you want to feel the code itself, not just hear it described). Paste:

Now execute step 1 only. Read every file in ./receipts/ across all three
subfolders, extract the date and amount from each, and save the results to
extracted.csv with columns: file_path, date, amount, source_format
(photo / pdf / screenshot). Show me the file when you're done.

AI will write a small program that reads the photos, PDFs, and screenshots, pulls out the date and amount from each one, and saves everything to a file called extracted.csv that you can open. All 15 receipts, in three different formats, combined into one clean table. No regular app does that in one step.

Now apply to your own work

The receipt exercise was a practice run. Now try it on something real from your own work.

Step 1: Pick a task you currently do using two or more apps. That is the clue that no single tool handles it completely. Examples: comparing information from a spreadsheet and a PDF, pulling data from emails and organizing it into a report, or reading documents in different formats and producing one summary.

Step 2: Describe it to AI and ask for a walkthrough. Tell AI what files you have, what result you want, and ask it to plan the approach first:

Walk me through how you would approach this. Then, when I say go,
do step 1 only and show me the result.

Step 3: Pick the most time-consuming step and let AI do it. If the result saves you even twenty minutes, AI just built a tool that did not exist when you opened your laptop. Save the conversation so you can reuse it next time.

Two mistakes to avoid:

  • Do not say "write me a script." Say "walk me through your approach" first. If you jump straight to "write a script," AI picks one path and runs without showing you the plan. You want to understand the approach before AI starts building.
  • If your task can be done in a spreadsheet, it might not need AI. AI is most useful when the task crosses multiple formats, needs custom rules, or requires combining several steps. If a regular app already handles it, use the app.

Why this matters. For the first time, you have a tool where the interface is simply describing what you want. Regular apps give you the features they were built with. AI gives you the features your task actually needs.


Now you know how to get AI to produce structured, well-organized output. But a clean-looking result is not the same as a correct result. AI can fill a perfect template with wrong numbers, fake sources, and code that looks right but does the wrong thing. That is what Principle 3 fixes.


Principle 3 — Verification as a Core Step

The failure mode: "Why does the output look right but break in production?"

A finished-looking output is not a verified output. Models produce outputs that are plausible, which is not the same as correct. They will confidently miscount items in a list, mis-cite a paragraph that doesn't exist, and produce code that compiles cleanly while silently failing on the third edge case. Verification must be a step in the workflow, not an afterthought.

This is AI Prompting in 2026 concept 13, models checking models, promoted from a habit to a structural step.

What "verification" means in each tool

Claude CodeOpenCodeCoworkOpenWork
Primary mechanismUnit tests, type-checks, linters, run by the agent after each changeSameOutput rubric: "Does the memo meet all required sections? Are claims sourced?"Same
Automated gateHook in .claude/settings.json blocks commit if tests or types failPlugin in .opencode/plugins/ does the sameA second agent pass that scores against a rubric before savingSame; can use a smaller model for the verification pass
Cross-model reviewA second tool (different model family) reads the diff and writes a critiqueSame patternOpen a second chat with a different model: "Find what's wrong with this memo"Configure a second provider and ask the agent to do the cross-pass
Where it gets skippedTests pass, but not for the right thingsSame"Memo looks good" without reading every claim against the sourceSame

The key rule: the agent that produced the output is the worst possible verifier of that output. It has the same blind spots that produced the original. Verification needs an independent path, your own reading, a different model, a test, a type-checker, or a database constraint.

Examples

The method is always the same: take every factual claim, find where it came from, and flag anything that has no source. This works for numbers, quotes, references, and any other claim that needs to be accurate.

Here are three examples of what happens without verification vs. with verification:

Legal work: Imagine a lawyer writes "according to Case X, the company is liable." But Case X does not actually say that. AI made it sound convincing, but the reference is wrong. Without checking, the other lawyer finds the mistake and uses it against you. With verification ("For every reference you use, find the exact sentence that supports your point. Flag any reference you cannot find proof for."), the wrong references are caught and fixed before anyone else sees the document.

Insurance: A summary says "the policy covers up to 250,000 dollars and this claim is within limits." But the actual policy has a 100,000 dollar limit for water damage, and the claim is for a burst pipe. Without checking, the wrong number goes into the official letter. With verification ("For every dollar amount you mention, find the exact section in the policy that states it. Flag any amount you cannot find."), the real limit is discovered before the letter is sent.

Research: A report says "there were no serious side effects." But the actual data shows two. Without checking, the wrong statement ends up in an official filing. With verification ("For every claim about the data, find the exact rows that support it. Flag any claim you cannot match to real data."), the error is caught before the report is submitted.

Prompt pattern for any high-stakes deliverable:

Before saving the final version, verification pass:
- List every factual claim in the draft
- For each one, identify the source location and quote the supporting text
- Flag any claim you cannot ground
Refuse to save until every flag is resolved.

The wrong number problem. Your manager asks you to find last quarter's sales by region. AI runs a calculation and says: "West region: 4.2 million." You put that number in your presentation. Later, the finance team checks the same data and gets 3.8 million. You go back to AI and ask why. AI confidently gives you a third number: 4.5 million. Three different answers from the same data. That is what happens when you trust AI's numbers without checking them yourself.

Asking the same AI "is this correct?" is not real verification. AI will almost always say "yes, it looks correct" because it has the same blind spots that caused the mistake in the first place. That is like asking the person who wrote a wrong answer to grade their own work.

The fix: Look at the code or formula AI wrote. You do not need to write code yourself, you just need to read it well enough to spot obvious problems. Ask yourself: "Is it looking at the right data? Is it filtering anything out that should be included?" Then run it yourself and compare the result to a source you trust.

For anything that deletes or changes data: Always do a test run first. Check how many items would be affected before you let it actually make changes. Only proceed once the number matches what you expected.

Hands-on: Hello world

Something can look finished and polished but still have mistakes hiding in it. This exercise gives you a professional-looking memo that has five errors hidden inside it, plus the original data files. Your job: get AI to find the errors by checking every claim against the real data.

Setup (30 seconds):

  1. Download Pack 5 — Verification and unzip it. Inside you will find a draft memo (deliverable/Q3-variance-memo-DRAFT.md) with five hidden mistakes, and a sources/ folder with the spreadsheets the memo's numbers are supposed to come from.
  2. Open the unzipped folder in your tool. Give it read access to both deliverable/ and sources/.

Paste this prompt verbatim:

Read deliverable/Q3-variance-memo-DRAFT.md. For every factual claim
(numbers, named causes, "largest/biggest" rankings), find the supporting
evidence in sources/ and quote the exact rows or cells. Flag any claim
where the source disagrees or where no row supports it. Save the audit
to VERIFICATION.md with two sections: Confirmed and Flags.

What you should see. AI reads the memo, opens all three spreadsheets, and creates a file called VERIFICATION.md. For each claim in the memo, it reports one of three things: Confirmed (the spreadsheet matches), Mismatch (the memo says one number but the spreadsheet says a different one), or No source found (nothing in the spreadsheets supports the claim). AI should catch at least three of the five hidden errors on the first try. The other two sometimes need a follow-up nudge like "check the other categories too."

What to notice. Before the verification step, all five claims looked equally correct. Nothing felt wrong. That is the trap: AI writes confidently whether it is right or wrong. The verification step uses the same AI, but asks a different question: "prove each claim against the real data." That one change is what turns a good-looking draft into a trustworthy one.

If AI said "everything looks correct" without quoting specific data: That means AI just re-read its own work and approved it, which is not real verification. Reply: "For each claim, quote the exact row from the spreadsheet that supports it. If you cannot find a matching row, mark the claim as unsupported." The second attempt usually catches the errors. If one or two still slip through, that is normal. Verification catches most mistakes, not all. For important work, a human should always do a final check too.

Now apply to your own work

The practice exercise had errors you knew were there. Real work is harder: you do not know if errors exist, and getting a number wrong can cost your credibility.

Pick something real. Choose the most important AI output from this week: a document with numbers, a report with references, or an analysis that recommends a decision. The one that is about to leave your hands. Professional-looking output is exactly what people forget to verify.

Tell AI what to check and what "proof" looks like:

Verify every factual claim in <your-file>. For each claim, quote the
exact row or sentence from <your-sources> that supports it. Flag any
claim you cannot find proof for. Save to <your-file>-verification.md.

Make sure AI reads the sources separately. If AI only re-reads the output, it is just checking its own work. Every claim should be matched to a direct quote from the source, not a vague summary like "this section discusses revenue."

If AI says "everything is consistent" without quoting anything, that is not real verification. Reply: "Include an exact quote for each claim. If you cannot find one, mark the claim as unsupported."

Why this matters. AI is getting more accurate over time, but it is not perfect. The problem is that you cannot tell by reading which claims are correct and which are wrong, because AI writes all of them with equal confidence. Making verification a standard step in your workflow is the only reliable way to catch mistakes before they ship.


Verification catches mistakes after they are made. But some mistakes are expensive to fix because by the time you find them, other work already depends on them. Principle 4 prevents that.


Principle 4 — Small, Reversible Decomposition

The failure mode: "Why did one big change just nuke an afternoon of work?"

Break big tasks into small steps. Finish one step, check that it is correct, save your progress, then move to the next. If something goes wrong, you only lose the last small step instead of an entire afternoon of work.

AI works best this way too. If you give AI a task with 12 steps in one message, it tends to drift off track by step 5, and you have no way to catch the mistake until the end. If you give AI those same 12 steps one at a time, checking each one before moving on, the result is much better.

The rule of thumb: if reversing the change would take more than two minutes, the change was too big.

What decomposition and reversibility look like in each tool

Claude CodeOpenCodeCoworkOpenWork
Atomic unitGit commit after each working stepSame as Claude CodeNumbered file versions (memo-v1.md, memo-v2.md) or a drafts/ folderSame as Cowork; /undo also rewinds via git
Undo mechanismgit revert or git reset; Esc Esc rewinds conversation and file edits (but not terminal commands)/undo rewinds conversation AND all file changes (including terminal commands)Save numbered versions; revert by copying back/undo, same as OpenCode
Course correctionEsc to interrupt, redirect; AI picks up from where you stoppedSame as Claude CodeStop button halts immediately; redirect in next messageSame as Cowork
Where it breaksAsking for a huge change in one prompt that touches many filesSame as Claude Code"Rewrite the entire document in the new template" overwriting the originalSame as Cowork; worse if no git is initialized

The enforcement prompt:

Break this task into the smallest steps you can. After each step:
1. Show me what you did
2. Run the verification check for that step
3. Commit / save a numbered version
4. Wait for my OK before starting the next step

Examples

The mistake is always the same: you ask AI to write the whole thing in one go, a small error creeps in early on, and by the time you notice it, everything after that point is affected. The fix is always the same: break the work into steps and check each one before moving on.

Here is what this looks like in practice:

TaskOne big prompt (bad)Step by step (good)
Writing a letterAI writes all 7 paragraphs at once. A mistake in paragraph 3 goes unnoticed until paragraph 7 depends on it. You have to rewrite most of it.AI writes the facts first → you check → AI writes the argument → you check → AI writes the conclusion. Each mistake is caught early.
Writing a reportAI writes 6 pages. The revenue number is wrong, the structure is off, the tone is inconsistent. Fixing it takes 90 minutes.AI writes an outline → you approve → AI writes section by section. Done in 40 minutes, no major fixes needed.
Building a spreadsheetAI builds all 12 tabs at once. The formulas break across tabs, currencies are mixed up. You discover it two hours later.AI builds one tab at a time, checking each against the previous one before moving on.
Rewriting a long documentAI rewrites all 40 pages. By page 12, it has forgotten the style rules from page 1.AI rewrites one chapter at a time, checking each chapter against the original rules before starting the next.

Why saving your progress matters

The Pixar lesson. In 1998, someone at Pixar accidentally deleted the Toy Story 2 production files: two years of work, gone in seconds. The backup system had failed weeks earlier without anyone noticing. The film was only saved because one employee happened to have a personal copy at home. Saving your progress cannot be something you remember to do. It has to be built into your process. If you use git, committing after every meaningful step turns a disaster into a small inconvenience.

The undo trap. Sarah edits her budget file and makes it worse. She searches online for "how to undo git changes" and runs git reset --hard. It fixes the bad budget, but it also erases the volunteer list she spent an hour editing, because she never saved (committed) that work. git reset --hard erases everything back to your last save point. If you did not save, it is gone. The lesson: save often, in small steps. The size of your last save is the maximum amount of work you can lose.

Hands-on: Hello world

The best way to see this principle is to try the same task two ways: once in a single big prompt, and once broken into steps. You will see how different the results are.

Setup (30 seconds):

  1. Download Pack 3 — Decomposition and unzip it. Inside you will find a sample case description (inputs/case-brief.md) and a style guide with formatting rules (inputs/firm-style-guide.md).
  2. Open the unzipped folder in your tool. Give it read access to inputs/.

Paste this prompt verbatim:

Draft a demand letter for the dispute in ./inputs/case-brief.md, following
./inputs/firm-style-guide.md. Do it twice: once as a single prompt
(save as letter-A-big-prompt.md), then again in four steps, facts,
legal theory, demand, deadline, pausing after each so I can read.
Save the final decomposed version as letter-B-final.md.

What you should see. Run A gives you the complete letter in one go. Run B writes the first section, stops, and waits for you to say "continue" before writing the next section. Open both results side by side and compare them. Run A will usually have at least one problem: a phrase the style guide says not to use, a number that does not match the case description, or a vague deadline like "promptly" instead of a specific date. Run B will be cleaner because each section was short enough for you to read and catch mistakes before the next section was built on top of it.

What to notice. AI is equally capable of writing each individual section. The problem with Run A is not that AI is bad at writing. The problem is that by the time AI reaches section 4, it has forgotten the rules it read at the start. In Run B, you reminded AI of the rules at every pause point. Same AI, same task. The only difference is checkpoints. The extra 40 seconds you spend clicking "continue" between sections saves you from rewriting the entire document later.

If AI did not pause between sections: Some tool settings make AI continue automatically without stopping. Reply: "Write only one section at a time. Stop after each one. Do not start the next until I say continue." If that still does not work, send each section as a separate message: "Step 1: facts only," wait, then "Step 2: legal theory," and so on.

Now apply to your own work

The practice exercise was simple: one document, one task. Now try this on something real from your own work.

Pick something you have written in one go before and been unhappy with. A report where the ending contradicted the beginning. A document where later sections forgot the rules from earlier sections. The problem was not that any one section was bad. The problem was that sections drifted apart because AI wrote everything at once without stopping.

Before you start, list 4-7 steps in order. For each step, write one line describing how you will check that it is correct before moving on. The check is what makes each pause useful, not just a break.

Produce <deliverable> in <N> steps:
Step 1: <section> only. Stop and wait for my OK.
Step 2: <next section>. Verify against <check>. Stop.

Save numbered versions as you go (-v1, -v2, …).

Save after every step. Make sure AI saves its work after each step. In Claude Code or OpenCode, AI should commit (save to git) after each step. In Cowork or OpenWork, AI should save numbered versions (memo-v1.md, memo-v2.md) instead of overwriting the same file. If something goes wrong later, you can go back to any saved step.

Do not let AI skip ahead. Halfway through, AI may offer to "finish the rest" because the first few steps went well. Say no. Reply: "One step at a time. Show me step 3 only." The moment you let AI rush ahead is the moment errors start compounding again.

Why this matters. The worst mistakes do not come from one big obvious failure. They come from small errors that build up across a long uninterrupted run. Checking at each step catches them early. It also lets you change direction halfway through: if steps 1 and 2 are solid but step 3 needs a different approach, you can pivot without losing the first two steps. If you had done it all in one go, you would have to start over.


Small reversible steps keep your work recoverable. But every new session, the agent forgets all of it, the decisions, the conventions, the plan. You start re-explaining from scratch. That's what Principle 5 fixes.


Principle 5 — Persisting State in Files

The failure mode: "Why does the agent forget what we decided yesterday?"

When you close a conversation, AI forgets everything. The next time you open it, AI has no memory of what you discussed, what decisions you made, or what rules you set. You have to explain everything again from scratch.

The fix: save important information to a file. Files stay on your computer forever. Conversations disappear. Anything you want AI to remember across sessions (project rules, decisions, terminology, plans) should be written to a file, not left in a chat.

The most important file is the rules file. In Claude Code and Cowork it is called CLAUDE.md. In OpenCode and OpenWork it is called AGENTS.md. It is a short text file that AI reads automatically every time you start a new session. When this course says "rules file," that is what it means.

What the rules file looks like across tools

All four tools work the same way: a short text file in your project folder that AI reads automatically at the start of every session. You do not need to write it from scratch. Run /init in your tool (this command scans your project folder and creates a first draft of the rules file for you). The only difference between tools is the file name: CLAUDE.md for Claude Code and Cowork, AGENTS.md for OpenCode and OpenWork. If you switch tools later, just rename the file. The content stays the same. (OpenCode also reads CLAUDE.md as a fallback if no AGENTS.md exists, so switching from Claude Code to OpenCode requires no changes at all.)

Cowork and OpenWork users

Claude Code and OpenCode automatically read the rules file at the start of every session. Cowork and OpenWork do not always do this automatically. If AI does not pick it up on its own, start your session with: "Read the rules file (CLAUDE.md in Cowork, AGENTS.md in OpenWork) in this folder and follow its rules for everything that follows."

How long should it be?

  • First draft: under 250 words. Just the most important facts.
  • After a few weeks of use: under 60 lines. Each line should exist because something went wrong without it.
  • Too long? If it is over 500 words, you are using it as documentation. Move the details into separate files and keep the rules file short.

The most common mistake: making this file too long. People put everything about the project in it: full descriptions, all conventions, every rule. The problem? AI reads this file on every single message. A huge file slows AI down and wastes space, even when most of the content is not relevant to the current task. Keep it short. Think of it as a table of contents, not an encyclopedia. Point to other files for the details.

The shape that works in all four tools:

# Project: [name]

## What this is
[Two lines: domain, audience]

## Where things live
- folder-a/: [what's in it]
- folder-b/: [what's in it]

## Critical rules
- [The one mistake people keep making]
- [A non-obvious convention]
- [A thing that's expensive to undo]

## On-demand references
- @docs/conventions.md

Examples

The rules file always has the same structure, no matter what kind of project you are working on: where things are stored, the specific rules for this project, and 3-5 important rules that would cause real problems if AI got them wrong. Only include information that is specific to your project. General advice that applies to any project does not belong here.

Here are three examples of rules files from different fields. Notice that each one follows the same structure: where things are, the rules for this project, and a few critical rules that prevent serious mistakes.

Legal project:

# Case: Smith v. Acme

## How to refer to people
- Always say "Ms. Smith" or "Plaintiff", never just "Smith"
- Always say "Acme" for the defendant

## Where things are
- /pleadings: official filed documents (do not edit these)
- /depositions: interview transcripts
- /our-drafts: work in progress

## Critical rules
- Never reference a document we have not quoted in full
- Flag anything that might reveal private legal information before saving

Accounting project:

# Monthly Financial Review

## What to flag
- Any amount that changed by more than 5,000 dollars or more than 10%
compared to last month (whichever is larger)
- Large changes (over 25,000 dollars) need a short explanation

## Writing style
Keep explanations to 2 sentences maximum. State what changed and why.
No guessing.

## Critical rules
- Never use a dollar amount unless you confirmed it against the
original data file
- Round to the nearest thousand in summaries; exact numbers stay
in the spreadsheet

Hiring project:

# Hiring: Senior Product Manager

## Job requirements
See job-spec.md. "Required" means must-have. "Preferred" means nice-to-have.

## How to evaluate
- If a candidate is missing a required qualification: automatic rejection
- Count how many preferred qualifications each candidate has
- If a candidate claims a degree or job title: flag it for a human to
verify, never accept it automatically

## Where things are
- /inbound: incoming resumes (PDF)
- /shortlist: candidates who passed the first round
- /scorecards: interview notes

## Critical rules
- Never include candidate names in any automated reports (privacy)
- Always flag credential claims for human review before advancing anyone

Notice the "automatic rejection" rule in the hiring example. Without it, AI might say "well, they almost meet the requirement" and let someone through. The rules file prevents that by making the rule explicit. This is what rules files are for: decisions you would otherwise have to re-explain to AI every single session.

A second persistence pattern: plan files. For multi-session tasks, save the plan to docs/plans/feature-name.md. Resume in one message: "Read plans/q4-launch.md and continue from step 4."

The hierarchy: Conversation = volatile. Files in the project folder = durable. Referenced files = on-demand.

The same shape works for engineering, only the conventions change:

When files are not enough, use a database. Imagine you wrote a script that reads your expense spreadsheets and calculates yearly totals. Then someone asks: "Show me food spending by month for the last three years." Now you have to rewrite the script for every new question. If every new question means rewriting your code, your approach has outgrown files. The fix: put the data in a database (you can set up a free one on Neon in 60 seconds). Once the data is in a database, asking "food spending for March 2024" or "compare Q1 vs Q2 by category" is just a single query. You go from "a file I keep updating" to "a structure that answers questions I have not thought of yet."

An engineer's rules file (CLAUDE.md):

# Project: my-app

## Stack
Next.js 14, TypeScript, Postgres on Neon (free), Drizzle ORM.

## Commands
- npm run dev: start local server
- npm test: run tests
- npm run db:branch <name>: create a test copy of the database for risky changes

## Critical rules
- Never edit files in src/generated/ (they are rebuilt automatically)
- All API routes must use the login check in src/lib/auth.ts
- Test dangerous database changes on a copy first, never on the real database
- Run npm test before saving your work; do not save if tests fail

Short and specific. Every rule exists because something went wrong without it.

Where does real data live? The rules file tells AI how to work on your project. But your actual data (financial records, customer information, legal documents) lives in its own system: a database, a CRM, an accounting tool. The rules file is the lens; your data systems hold the facts.

Hands-on: Hello world

The best way to see why rules files matter is to do the same task twice: once without a rules file, then again with one. You will see how much better the second run is. This exercise uses five sample resumes for a hiring task.

Setup (30 seconds):

  1. Download Pack 6 — Hiring loop persistence and unzip it. Inside you will find a job description, scoring guidelines, five sample resumes in inbound/, and a reference rules file. Do not open the reference rules file yet. It is the answer key for the end of the exercise.
  2. Open the unzipped folder in your tool.

Run A, paste this prompt verbatim:

Read every résumé in inbound/. For each candidate produce a short
recommendation: ADVANCE, HOLD, or DECLINE, with a one-sentence
rationale. Save to inbound-screen-runA.md.

Now create the rules file, paste this prompt verbatim:

Read this folder. Draft a CLAUDE.md (under 250 words) covering what
this folder is, where things live, the hiring conventions, and three
to five critical decision rules, especially around credential
verification and required-vs-preferred gaps.

Edit the draft if anything looks off. Save it as CLAUDE.md at the folder root.

Run B, paste the identical screening prompt again, with one tweak:

Read every résumé in inbound/. For each candidate produce a short
recommendation: ADVANCE, HOLD, or DECLINE, with a one-sentence
rationale. Save to inbound-screen-runB.md.

What you should see. In Run A, AI reviews all five resumes and gives reasonable recommendations. Most candidates get fair evaluations. Carlos in particular will probably get an ADVANCE because of his MBA and job titles. Then the rules-file step creates a short CLAUDE.md in your folder with things like: where resumes are stored, the difference between required and preferred qualifications, and (watch for this one) a rule about verifying credentials.

In Run B, AI reads that rules file automatically before starting. You did not remind it. The results come out slightly different. Pay attention to Carlos.

What to notice. Open both result files side by side. Carlos's MBA says 2018, but the school it is from did not exist until 2019. In Run A, AI missed this and recommended ADVANCE based on his impressive titles. In Run B, the credential-verification rule caught it, and Carlos moved to HOLD with a note about the date problem.

The key insight: You did not mention credentials in your Run B prompt. The rule worked because it was in a file that AI read on its own. That is what a rules file does: rules you write once get applied automatically, to every candidate, on every future run, by anyone who opens this folder. The conversation is where you figure out the rule. The rules file is where it lives permanently.

If Carlos got the same result in both runs: Two things to check. First, make sure the CLAUDE.md file is in the main folder (not a subfolder) and restart the session. Second, your draft rules file might not include a credential-verification rule. Open the reference rules file from the pack, compare it to yours, add what is missing, and try Run B again. The point is not getting the draft perfect on the first try. The point is seeing that whatever is in the rules file, AI follows. Whatever is not, AI forgets.

Now apply to your own work

The practice exercise was a test. Now try this on a real folder from your own work.

Pick a folder you keep coming back to. A project you work on every week where you keep re-explaining the same things to AI: "this folder is for client X, these files should not be edited, always use this format." Pick one you will open again within a week so you can test whether the rules file works on the second visit.

Let AI write the first draft. Do not write the rules file from memory. Open the folder and paste:

Read this folder. Draft a CLAUDE.md (or AGENTS.md) under 250 words:
what this project is, where things are stored, 3-5 rules I would
normally have to explain manually, and 3 rules that would cause real
problems if you got them wrong.

Then edit the draft. Remove anything generic like "be professional" or "write clearly." If a rule would be true for any project, delete it. Keep only rules that are specific to this folder.

Common mistake: making it too long. You will be tempted to explain the whole project: what it is for, who is on the team, the full history. Do not do that. AI already understands English. It only needs the rules that are specific to your project. If it is over 500 words after editing, it is too long.

Test it. Do a task you have done before in this folder, but this time do not re-explain any rules. Just let AI read the rules file. Notice which rules AI followed on its own and which you still had to repeat. Anything you had to repeat is a line your rules file is missing. Add it. Try again next week.

Why this matters. Rules you type in a conversation only work for that one session. Rules saved to a file work every session, for every person who opens that folder. Write them once. AI reads them automatically from then on.


Principles 1 through 5 are the core skills: take action, use structure, verify results, work in small steps, and save important information to files. The next two principles (Constraints and Observability) are different. They do not add new skills. They make the first five reliable at scale: so you can walk away while AI works and trust the result without checking everything by hand.


Principle 6 — Constraints and Safety

The failure mode: "Why did the agent touch files I didn't authorize?"

Limits are not a problem. They are what make AI safe to use. If AI can do anything without asking, you have to watch it every second. If AI can only access specific folders and must ask before doing certain things, you can walk away and let it work. Setting limits does not slow AI down. It lets you trust AI enough to give it more freedom.

The real danger of giving AI full access is not that it works slowly. It is that it works fast in the wrong direction: editing files you did not want touched, sharing data you meant to keep private, or connecting to services you did not approve.

The three universal trust levers

All four tools have the same three levers:

  1. Scope, what files / folders / data the agent can see.
  2. Connections, what external services the agent can reach.
  3. Approvals, when the agent pauses for your OK.
LeverClaude CodeOpenCodeCoworkOpenWork
ScopeAI works in the current folder (the folder you opened it in)Same as Claude CodeYou choose which folder AI can accessPer-project workspace; folder picker on create
ConnectionsExternal services (GitHub, databases, Slack) added in config filesSame as Claude Code, in opencode.jsonGo to Customize > Connectors to add services; each one asks you to log in separatelyExtensions tab; tap to connect
ApprovalsPer-tool allow/deny lists; Shift+Tab for plan modePer-tool permissions; Tab for Plan agentPer-action approval cards; "Act without asking" toggleStack allow always per permission

The autonomy ladder

A five-rung ladder: Watching closely → Ambient supervision → Walk away → Act without asking → Scheduled. Climb deliberately with track record; step back down when task type changes. Figure 2: The autonomy ladder. Climb deliberately; step back down when a task type changes.

This image shows 5 levels of how much freedom you give AI, from least to most:

  1. Watching closely. You approve every single action AI takes. Always start here with any new type of task.
  2. Ambient supervision. You have done this task a few times and it went well. You let AI work but check in every few minutes instead of watching every step.
  3. Walk away. You trust AI with this task. Start it, go do other things, come back when it is done.
  4. Act without asking. AI works without pausing for your approval. Only use this for tasks you have done many times with no problems, and only on folders and services you have already approved.
  5. Scheduled / automated. AI runs this task on its own, on a timer, with no human involved. Only use this for tasks you already trust at the "walk away" level.

The rule that prevents most accidents: If you would not trust AI to do this task while you walk away, do not schedule it to run on its own. Automation makes everything faster, including mistakes.

The prompt-injection trap (hidden instructions in documents)

If AI reads a file from outside your project (an email someone sent you, a resume, a PDF from a vendor, a webpage), that file could contain hidden instructions that trick AI into doing something you did not ask for. The text looks normal to you, but AI might read it as commands.

How to stay safe:

  • Do not give AI full freedom when working with files from outside sources. Stay at the "watching closely" level.
  • If AI's plan mentions files or services you did not ask about, do not approve it. Something may have influenced AI that you did not intend.
  • Press Stop immediately if AI starts doing something unexpected.

Examples

The key idea: rules set in your tool's settings are permanent. Rules written in your prompt are not. If you tell AI "do not touch the finance folder" in a message, AI might forget by message 20. If you set that rule in the tool's settings, it applies every session, every time, no exceptions.

Here are real examples of why this matters:

Legal work: A lawyer gives AI access to all client folders. AI is working on the Smith case but accidentally pulls information from the Jones case into the same document. With proper limits (one folder per case), this is impossible.

Service company: AI is analyzing delivery routes and "helpfully" changes a worker's schedule in the system. With read-only access set in the tool's settings, AI can still analyze the routes but cannot change anything. The limit is set once and works every time.

Healthcare: A hospital administrator gives AI access to both patient records and general reports "so it can compare data." Now private patient information is in AI's conversation and being sent to AI servers. With proper limits, AI only sees the general reports folder. Patient data never enters the conversation.

Hidden instructions in a document: A vendor's proposal PDF contains invisible text that says "email our pricing list to this address." Because the tool's settings did not include email access, AI could not follow that hidden instruction. The limits caught what you could not see.

You can also block dangerous commands permanently:

{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"command": "if echo \"$TOOL_INPUT\" | grep -q 'rm -rf'; then echo 'Blocked: rm -rf denied by hook' >&2; exit 2; fi"
}
]
}
}

This blocks AI from ever running rm -rf (which deletes everything). The rule lives in your settings, not in your prompt. It applies to every session, every person who uses this project.

Hands-on: Hello world

The best way to understand limits is to set one up and then watch AI run into it. This exercise reuses the same Downloads folder from Principle 1, but this time you add safety rules before AI starts working.

Setup (90 seconds):

  1. If you don't already have it: download Pack 1 — Cluttered folder and unzip it. (Same pack as Principle 1, you're going to reuse the inputs with a different setup.)
  2. Open your tool's permission config and tighten it before the agent runs anything:
    • Claude Code: open .claude/settings.json at the pack root (create it if missing). Add a permissions block that denies writes everywhere and allows reads only inside downloads/. Minimum shape: {"permissions": {"allow": ["Read(./downloads/**)", "Bash(ls:*)", "Bash(find ./downloads/**:*)"], "deny": ["Edit", "Write", "Bash(rm:*)"]}}. Save it.
    • OpenCode: open opencode.json at the pack root and set a similar per-tool permission map, read on downloads/, deny edit / write / bash outside it.
    • Cowork / OpenWork: in the folder grants UI, grant access to only the unzipped pack folder, and inside it, only downloads/. Set the approval mode to "ask before every action", not "act without asking."
  3. Open the pack folder in your tool. Confirm the permission config loaded (Claude Code prints it on startup; Cowork shows the granted folder in the side panel).

Paste this prompt verbatim:

Read ./downloads/ and write me an ORGANIZATION-PLAN.md with what's
in there, the duplicates, and a proposed structure. Don't move
anything.

What you should see. AI reads the downloads/ folder normally (your rules allow that). Then watch what happens next. When AI tries to save a file outside the allowed folder, or tries to delete or move something, the rules you set up will block it. You will see a "permission denied" or "blocked" message. AI may then adjust its plan and save the file somewhere you allowed instead.

By the end, you should have: your ORGANIZATION-PLAN.md saved in an allowed location, all original files untouched, and at least one moment where AI tried something and the rules stopped it.

What to notice. The important moment is when AI gets blocked. You did not type "stop" or "no, do not save there." The settings file you wrote before the session started did it automatically. Compare this to the Principle 1 exercise where AI had no limits and could do whatever it wanted. The difference is the settings file. Rules in your prompt work until you forget to type them. Rules in your settings work every time.

If nothing got blocked: Either the settings file did not load (check that it is in the right folder and restart the session), or your rules were too broad and allowed everything. Tighten the rules and try again.

Now apply to your own work

In the exercise, you set up rules before starting. In real life, the harder job is checking the rules you already have. Most people set up access months ago and never looked at it again.

Check what AI can currently access. Pick one tool you use regularly. List every folder AI has access to, every service it is connected to, and whether it can read only or also write. Most people find at least one surprise: a folder they granted for a one-time task and never removed, or a service with write access when read-only would have been enough. If you do not actively need something this week, remove it.

Move repeated rules from your prompts to your settings. How many times in your last few sessions did you type "do not change anything in this folder" or "read only, please"? Each of those rules only works for that one session. The day you forget to type it, something goes wrong. Move your most-repeated rule into the settings file. It takes five minutes and works permanently.

Be honest about your trust level. For each type of task you do regularly, ask: which level of the autonomy ladder am I actually at? Not which level do I want to be at. If you are not confident, step down a level. There is no reward for moving up too fast.

Set a monthly reminder to clean up access. Every new project adds a folder. Every new tool adds a connection. If you only add and never remove, your permissions grow wider every month. Set 15 minutes once a month to remove what you no longer need.

Why this matters. This is the principle whose failures end up in the news: AI sent an email it should not have, AI changed real data during a "read-only" task, AI followed hidden instructions in a document. The fix is not exciting. It is settings, regular check-ups, and removing what you do not need. AI works as fast as its permissions allow. Your job is to make sure those permissions match the work you actually do.


You have set limits on what AI can do. But limits only catch problems you thought of in advance. Problems you did not think of show up in the logs, if you are watching them. If you are not watching, you find out at the worst possible time. That is what Principle 7 fixes.


Principle 7 — Observability

The failure mode: "Why don't I know what the agent actually did?"

You can only direct what you can see. Every meaningful action the agent takes should be visible to you in close to real time. When something goes wrong, you should be able to look at a log and understand exactly what happened. Observability is how you debug a drifted session, how you build the track record to climb the autonomy ladder, and how you trust the agent's output enough to use it.

Where to see what the agent is doing in each tool

Claude CodeOpenCodeCoworkOpenWork
Real-time viewTerminal streams every action: tool calls, file edits, command outputSameThree-panel UI: conversation left, execution view center, file tracker rightSame; rendered as a vertical timeline of step chevrons
Plan stagePlan mode shows the plan before any action; written to disk if you askPlan agent does the sameNumbered plan appears as a message before any file is touchedSame
Per-step traceEvery command and file edit appears inline with outputSameEach step is its own card: "Read a file", "Used a tool", "Ran code"Same
Session export/share exports the full session transcriptSameConversation history is browsable; can exportSame

The discipline: watch the execution view at least once on every novel task. The single biggest source of "agent did something I didn't expect" is the user not having looked.

Examples

Across every domain, and across engineering, the pattern is the same: the user who scans the execution view catches the thing the artifact alone would have hidden. The catch isn't smarter analysis; it's having looked at all.

Field operations, fleet-routing batch: A logistics coordinator kicks off a route-optimization run across 200 deliveries and steps into a stand-up. Halfway through, the agent shifts from "optimize routes" to "optimize routes and notify drivers of the new ETAs", because one customer's address-notes field contained a prompt-injection instruction. Forty-seven driver pings go out before she returns. What would have caught it: watching the execution view for the first 10 deliveries. The shift would have shown up on delivery 4 or 5.

Lawyer, per-step review on outbound communications: A defense attorney asks the agent to draft responses to seven discovery requests. She reads every per-step approval card. On response #4, the agent proposes including a document incorrectly tagged "non-privileged" in the file system. She catches it before it ships. Without per-step approvals, the document goes out and waiving privilege becomes a serious problem.

Controller, the unexpected GL touch: A controller runs a "compile the close commentary" task at the walk-away rung. On return, she scans the execution view as a habit. One step shows the agent opening GL-detail-March.xlsx, but also opening payroll-confidential.xlsx, which it had no reason to need for commentary. Investigation: a stale folder reference in AGENTS.md had widened the scope by one folder a month ago and never been cleaned up. The agent did nothing wrong by its lights; the controller's habit of scanning the execution view caught a constraint drift that had been there for weeks.

Prompt pattern to promote observability:

"After each step, before moving on, state in one line:
(a) what you just did
(b) what changed (file path, command output, connector call)
(c) what's next
Don't skip this even on small steps."

The silent agent. Monday morning. Ali's competitor-tracker shows systemctl status: active (running), green light. But the daily report never arrived. The dashboard shows no new data since Friday. Investigation: "Waiting for database connection..." repeated every 30 seconds since Friday at 11pm. A firewall rule change during maintenance had blocked the database port. The agent was running but doing nothing. A 10-second check (telnet db-host 5432) would have caught it. Instead: three days of missing data before a board meeting.

The cascading failure. Three alerts simultaneously: three different error messages, three different agents down. One root cause: df -h shows the disk is 100% full. The disk filled; three agents broke in three different ways. Following the LNPS triage method (Logs → Network → Process → System), starting at System: without starting at the system level, you'd debug three failures in parallel for an hour and miss the one cause sitting in df -h.

The five symptoms of a session going off the rails

Five numbered warning symptoms: (1) references unrelated earlier chat, (2) responses get longer and vaguer, (3) contradicts earlier constraints, (4) apologizes without progress, (5) proposes unauthorized scope. Footer: Stop typing. Reset. Continue from a file.

  1. The agent starts referencing earlier parts of the chat that have nothing to do with the current task.
  2. Its responses get longer and vaguer, with more hedging.
  3. It contradicts a constraint you stated several turns ago.
  4. It starts apologizing repeatedly without making progress.
  5. It proposes touching files, folders, or connectors you didn't mention.

When you see any of these, stop typing. Don't try to fix it with another prompt, that adds more tangled context to a context that's already tangled. Run /clear (CC/OC) or open a new session (Cowork/OW), paste in the one or two facts that actually matter, and continue from there. The reset is almost always faster than the rescue.

Hands-on: Hello world

Observability is the principle that hides in plain sight, you've technically been seeing the trace this whole crash course, but you haven't been watching it. This pack puts you back on Pack 1 a third time for exactly that reason: same task, fresh attention, your job is to spot one thing the agent did that you didn't predict.

Setup (30 seconds):

  1. If you don't already have it: download Pack 1 — Cluttered folder and unzip it. (Yes, again. Third use of the same pack, the inputs are stable, what you're learning each time is different.)
  2. Open the pack folder in your tool. Position the execution view (Cowork side panel, or your terminal scrollback) so you can see every step as it happens, not just scroll through it after.

Paste this prompt verbatim:

Read ./downloads/ and write me an ORGANIZATION-PLAN.md with what's
in there, duplicates, and a proposed structure. As you go, narrate
each step in one line: what you opened, what you looked at, what
you concluded. Don't skip steps, even small ones.

What you should see. The execution view fills with a sequence of small steps, each tagged with the verbose narration you asked for: ls or read of downloads/ (53 items), open of SIZES.txt (because the stubs are empty), then a string of individual file reads or batched directory reads. Each step lands a short "I just did X; what changed is Y; next I will Z" line, that's the narration mode kicking in. After a minute or two an ORGANIZATION-PLAN.md lands. The artifact may be the same one you saw under Principle 1; the trace that produced it is what's different. Skim the trace top to bottom. Don't just check the artifact, you've seen the artifact twice already. Read the steps that produced it. Note one step that surprised you: a file you didn't expect the agent to open, a step that took longer than expected, a duplicate read, a tool call you didn't know it had, an inference it made on its own that wasn't in your prompt.

The principle moment. Write the one surprise down. Not in your head, on paper, or in a sticky note. That single observation is the principle. If you'd run this task at the walk-away rung, checked the artifact, moved on with your day, that surprise would have been invisible to you forever, and the next twenty runs of similar tasks would have inherited whatever assumption the surprise revealed. By watching once, in full, with verbose narration, you're not just verifying this run. You're calibrating your model of what this kind of task actually involves, and that calibration is the only thing that makes "walk away" a safe rung to climb to. Compare this to the Principle 1 run of the same prompt: there, the artifact was the lesson. Here, the artifact is a side effect; the lesson is the trace. You can only direct what you can see. The execution view is the seeing.

If it didn't work that way: the agent skipped the narration and just produced the plan file. Two things to try. First, ask: "For each step you just took, state in one line what you did, what changed, and what's next." The agent will reconstruct the trace after the fact, useful, but not as good as live narration because it's now a story the agent is telling about itself. Second, on the next run, put the narration instruction first in the prompt, agents weight earlier instructions more reliably than trailing ones. The point of the exercise isn't pretty narration; it's having something concrete to look at, step by step, while the work is happening.

Now apply to your own work

The pack run was deliberately boring because boring is what novel-task observation should feel like the first time. The harder version is the task you're already running at walk-away, where sitting through it feels like a step backward, until it isn't.

Pick the task you walk away from. A recurring task you've already been running at the walk-away rung: weekly competitor scan, morning email triage, nightly report rebuild. Today, don't walk away. Sit through the entire run from start to finish. Yes, tedious. Observability is a one-time cost you pay to make subsequent walk-aways safe.

Take notes the way a flight observer takes notes. Three columns: step the agent took, did I expect this step, did anything surprise me here. Most rows will be boring, the agent opened the expected file, did the expected thing. The valuable rows are the surprises. Those are the things your assumptions had been wrong about, invisibly, every prior run.

Calibrate. For each surprise: should this change the task (the agent's doing unneeded work), the constraints (touching things you didn't intend), or your expectations (the task is more complex than you thought)? Address it. Now you're allowed back at walk-away, and you know what the trace should look like, so you'll spot deviation in seconds instead of after damage ships.

Make it a habit on novel work. Watch-once before promoting any new task to walk-away. Once is enough. Familiar tasks earn walk-away by surviving the watch-once; novel tasks have to earn it. The user who climbs straight to walk-away on a task they've never watched is the user who learns something went wrong from a colleague, a customer, or a regulator, not from the trace.

The single failure. Skipping the watch-once because the task "looks like" a task you've calibrated. Lead enrichment, contract review, report rebuild, these are categories, not tasks. A new prompt, folder, or connector turns yesterday's familiar task into today's novel one, and the agent's specific path through the trace will change with it. When in doubt, watch once.

Why this matters. Principles 1 through 6 are about doing the work right. Principle 7 is about knowing whether you did it right, in close to real time, before the cost of being wrong compounds. Without it, the other six are claims you can't verify. The execution view is where the agent's plan, the constraints, the verification, and the actual behavior collide in front of your eyes. Watch the view. Read the trace. Trust the artifact only after the trace has earned it.

Part 2: The Four-Phase Workflow

The seven principles, in production, collapse into a four-phase loop. Once the loop is in your hands, the principles fire automatically inside the phases.

A loop: Explore → Plan → Implement → Commit, with the seven principles arranged around it: Bash + Observability in Explore; Code-as-Interface + Persistence in Plan; Decomposition + Verification in Implement; Observability in Commit. Constraints (P6) wraps the entire loop as the outer ring. Figure 3: Seven principles, four phases, one loop. P6 (Constraints) wraps the entire loop; it isn't a phase, it's the boundary every phase runs inside of.

  1. Explore (Bash + Observability): read the relevant files, surface the unknowns. Read-only. No writes yet.
  2. Plan (Code-as-Interface + Persistence): produce a written plan as a structured artifact. Save it. Review it. Edit it. This is the most important phase; almost all the leverage is here.
  3. Implement (Decomposition + Verification): execute the plan in small atomic steps, verify after each, commit/save after each.
  4. Commit (Observability): final verification pass, persist decisions back to the rules file for next time.

Wrapping all four phases: Constraints (P6). The scope of folders the agent can see, the connector list it can call, the approval mode it runs in: these are set at session start (or in the tool's config) and they govern every phase. Read-only Plan Mode during Explore is P6 firing on phase 1. The deny rule that blocks a write outside downloads/ during Implement is P6 firing on phase 3. The final approval card before commit is P6 firing on phase 4. P6 isn't one box in the diagram; it's the box around the diagram.

The shape is the same whether the artifact at the end is a merged pull request, a redlined master services agreement, a closed quarterly variance pack, or a hiring-loop debrief. The phases don't change; only the inputs and outputs do. That is what makes the loop portable across domains.

The five failure patterns

When something goes wrong inside the loop, it almost always lands in one of five named patterns. Recognizing the pattern tells you which principle to reach for.

Five failure patterns mapped to the principle that prevents each: The Drift → P5 Persistence; The Confident Wrong → P3 Verification; The Big Bang → P4 Decomposition; The Scope Creep → P6 Constraints; The Black Box → P7 Observability.

#PatternSymptomPrinciple that prevents it
1The DriftAgent gradually wanders from the briefPersistence (P5), write the brief to a file
2The Confident WrongPlausible output that's quietly incorrectVerification (P3), force a check step
3The Big BangOne huge change nukes hours of workDecomposition (P4), small reversible units
4The Scope CreepAgent touches things you didn't authorizeConstraints (P6), scope + approvals
5The Black BoxAgent ran for 20 minutes; you have no idea what it didObservability (P7), watch the execution view

Read the table in both directions: each principle prevents its pattern; when a pattern shows up, reach for the principle in the right column. After a few weeks of real use, the naming becomes diagnostic shorthand: "that was a Confident Wrong" tells a teammate exactly which verification step was missing without anyone having to relitigate the run.


Part 3: A Worked Example

The principles and the four-phase loop are theory until you've run them once, end to end, on a real-looking input. This is the section where you do that.

The task family: review a complex incoming artifact, identify what matters, produce a structured response with verified claims.

  • Engineer track: A pull request has arrived from a contractor. Review the diff, flag risks, write a response.
  • Domain-expert track: A vendor has sent a master services agreement. Flag deviations from your firm's redline standard, produce a comparison memo.

Different domains. Identical workflow shape. Read the track that matches your work; you can skim the other one to feel the symmetry.

Hands-on: Hello world

The four-phase loop is theory until you've run it once with no thinking. This is your hello-world for the whole loop, pre-curated inputs (a vendor MSA on the domain side, a small PR on the engineering side), exact prompts below for each of the four phases, paste one, watch it land, paste the next.

Setup (60 seconds):

  1. Download Pack 4 — Worked example and unzip it. Inside you'll find inbound/vendor-msa-v1.md, redline-standard.md, and a CLAUDE.md with folder-level rules the agent will pick up automatically.
  2. Open the unzipped folder in your tool (Claude Code or OpenCode for the engineer track, Cowork or OpenWork for the domain-expert track).

Paste each phase prompt verbatim, in order. Wait for the artifact each one promises before pasting the next.

Phase 1, Explore (Principles 1 and 7). Read-only. The agent's job is to understand the input, not to act on it yet.

Claude Code / OpenCode:

Don't make any edits yet. Read the PR diff in `git diff main...feature-x`.
Read the related files the diff touches. Summarize:
- What this PR is changing (one paragraph)
- Which files are touched (list)
- Any obvious risks (bullets, max 5)
Save the summary to `reviews/pr-explore.md`. No code edits.

Cowork / OpenWork:

Don't draft anything yet. Read inbound/vendor-msa-v1.md and
redline-standard.md. Summarize:
- What this MSA is for (one paragraph)
- The clause structure (numbered outline by section)
- Any obvious deviations from our standard (bullets, max 7)
Save to vendor-msa-explore.md. No drafting yet.

Phase 2, Plan (Principles 2 and 5). Structured artifact. Save it before you let any work happen against it.

Engineer:

Read `reviews/pr-explore.md`. Produce a review plan:
## Review plan
- Files to inspect in depth (max 5)
- Tests to run
- Concerns to flag (numbered, severity: HIGH / MED / LOW)
- Questions for the contractor (numbered)
Save to `reviews/pr-plan.md`. Pause for my approval before continuing.

Domain expert:

Read vendor-msa-explore.md. Produce a redline plan:
## Redline plan
- Clauses to review in depth (max 6, by section number)
- Deviations to flag (numbered, severity: HIGH / MED / LOW)
- Counter-proposals (numbered, parallel to deviations)
- Open questions for the vendor (max 3)
Save to msa-plan.md. Pause for my approval before continuing.

Phase 3, Implement (Principles 4 and 3). One item at a time, every claim grounded, every step a separate file.

Both tracks:

Execute the plan one item at a time. After each item:
1. Produce the output
2. Verify it against the source, quote the specific lines
supporting each claim (section cite for the MSA; file:line
for the PR)
3. Save a numbered version (e.g., step3.md)
4. Wait for my OK before the next item.
If you can't ground a claim, flag it instead of fabricating.

Phase 4, Commit (Principles 6 and 7). Final verification, then assemble.

Both tracks:

Final verification pass:
- Every cited claim is grounded in a source location
- The structure matches the plan
- The tone matches the project's voice (refer to CLAUDE.md / AGENTS.md)
Then assemble the final deliverable with: executive summary,
the numbered findings, a review checklist, and a "Rules-file
proposals" section listing anything we learned that belongs in
CLAUDE.md / AGENTS.md for next time.

What you should see. Each phase lands its own file: *-explore.md, *-plan.md, numbered step1.md/step2.md/... files, then *-final.md. The plan is the audit trail; the numbered steps are the work; the final file is what ships. Four prompts, four files, four pauses, every claim groundable to a source. The same task in one prompt ("review this MSA / PR and tell me what's wrong") gives you a single block of plausible text with no checkpoint where you could have intervened. Slower in clock time on the first run; faster in trust-time forever after.

If it didn't work that way: the agent collapsed two phases into one (drafted the plan and started implementing in a single response), or it produced findings without quotes. For the first, paste: "Stop. Save the plan as a file. Wait for my approval before any implementation." For the second: "For each finding, quote the exact lines from the source. If you can't quote them, flag the finding as unverified." Both corrections are themselves applications of the principles, P4 (decomposition) and P3 (verification) respectively.

The four prompts are essentially identical across all four tools. What differs: terminal vs. desktop app, the file where permissions live, the keyboard shortcut for plan mode. Not the principles.

Claude CodeOpenCodeCoworkOpenWork
Where you run itTerminalTerminalCowork desktop appOpenWork desktop app
File accesscwd; permissions in .claude/settings.jsoncwd; permissions in opencode.json"Choose folder" card on first readWorkspace folder selected on session start
Plan modeShift+Tab to enterTab to Plan agentBuilt-in plan stage; visible in execution viewSame as Cowork
Per-step approvalsConfigurable allow/denyConfigurable per toolPer-action approval cardsStack allow always per permission
Where the plan livesreviews/pr-plan.md (your file)SameInline message + the file you saveSame as Cowork
Verification gateA hook on the commit stepA plugin on the commit stepA second-pass prompt with rubricSame as Cowork

The principles you invoked are identical across all four tools. That's the whole point of teaching this layer separately from the tool-specific layer: the principles transfer.


Part 4: Capstone — Apply the Whole Loop to Your Own Work

The hello-world in Part 3 ran you through the four-phase loop on a curated example. This capstone is the open-ended version: same loop, your work, your stakes. It is the equivalent of every principle's "Now apply to your own work" subsection, except now you're applying all seven at once, through the four-phase shape.

Run a real task through all four phases while consciously naming which principle each step invokes. Once. Out loud or in writing. The naming is what wires the loop into long-term memory, you don't have to do it twice.

Setup:

  1. Pick a recurring task in your work that takes 60+ minutes: a privilege log batch (litigator), variance commentary cycle (accountant), campaign performance report (marketer), candidate brief for a hiring panel (HR), discovery-call synthesis (consultant), investor update (founder), code-review-and-merge cycle (engineer). The longer and more recurring, the better, the rules file you produce will pay you back on every future run.
  2. Open your tool. Set up the folder. Initialize a CLAUDE.md or AGENTS.md for it. Don't try to write a complete one up front; ten lines is enough to start, the rest gets earned during the run.

The run:

PhaseWhat you doPrinciple invoked
1. ExplorePrompt the agent to read relevant inputs and produce a structured summary file. No writes yet.1 (action), 7 (the file is the observable trace)
2. PlanAsk for a structured plan. Save it. Read it. Edit it. Approve it.2 (structured format), 5 (saved to file)
3. ImplementExecute one step at a time, verification check after each.4 (decomposition), 3 (verification)
4. CommitFinal verification pass, summary, update the rules file with anything you learned.6 (review-before-ship), 7 (the summary log)

Five questions to journal after:

  1. Total time vs. the manual baseline. (If you don't know the baseline, estimate before you start, the comparison is the calibration.)
  2. Which principle was hardest to apply? Why?
  3. What got added to the rules file?
  4. What constraint did you tighten?
  5. Which failure pattern (Drift / Confident Wrong / Big Bang / Scope Creep / Black Box) showed up?

The compounding step. Re-run the same task next week using the rules file you produced. The second run is usually 40–60% faster. The third run is where the rules file stops growing and the discipline becomes invisible, you've crossed from learning the principles to using the principles, which is the threshold this whole crash course was aiming at.

For teams. Have each person pick a task in their own domain. Compare notes afterward, the failure patterns are domain-independent and make for the best team conversation about what to standardize. The litigator's Drift and the accountant's Drift have the same fix, and watching the team realize that is worth more than any onboarding deck.


Part 5: How to Actually Get Good at This

Reading this crash course doesn't make you good at directing agents. Using it does. The hello-worlds got you through the front door of each principle; the capstone got you through the front door of the loop. Getting good is the next year of real work, on your real inputs, with the rules file growing one earned line at a time.

You start manual. You feel the friction, every plan you have to read, every approval prompt, every "wait, why does it want that file?" That friction is the curriculum. Each piece of friction maps to a principle:

  • "Why is the agent just chatting?" → P1. Rewrite the prompt as an action with an artifact.
  • "Why does the output keep being subtly wrong?" → P2. Constrain the format.
  • "Why did this confident answer turn out wrong?" → P3. Add a check step.
  • "Why did one prompt nuke half my work?" → P4. Break it up.
  • "Why does the agent keep asking me the same context?" → P5. Put it in the rules file.
  • "Why did the agent touch a folder I didn't mention?" → P6. Tighten scope.
  • "Why don't I know what the agent did?" → P7. Read the execution view.

Build the response to each friction when you hit it, not before. Your rules file should be ten lines, then twelve, then twenty, each line earned by a mistake it now prevents. A rules file written speculatively, before any mistakes, is documentation; a rules file grown line by line through real friction is memory, and only the second kind survives contact with the next session.

The portability dividend. Once you've built this awareness in one tool, it transfers to all four. The principles-to-friction map is identical everywhere. The configs change. The principles don't.

You've completed this course if you can do all five with real work:

  1. Reframe a chatbot prompt as an agent task with an explicit artifact. (P1, P2)
  2. Write the output shape (schema, table, template) before asking for content. (P2)
  3. Name two independent verification paths for any output and invoke one before shipping. (P3)
  4. Decompose non-trivial work into atomic units with a checkpoint after each. (P4)
  5. Maintain a rules file earned line-by-line, and explain any session's behavior from its execution trace. (P5, P7)

Where This Leads Next

  • Build engineering depthPart 2: Agent Workflow Primitives. Chapters 19–20 deepen P1 and P2. Chapters 21 and 21B take P5 from a rules file to a full system of record. Chapter 21A deepens P3 (reading SQL). Chapter 22 deepens P1 and P6. Chapter 23 deepens P4.
  • Deepen the principlesChapter 18: The Seven Principles of General Agent Problem Solving. Same seven principles, more depth, 17 hands-on exercises across 8 modules, capstone projects, and the integration with Spec-Driven Development (Chapter 16) and Context Engineering (Chapter 15) that this crash course only gestures at.
  • Stay in Mode 1, get faster → re-run the capstone on three more recurring tasks. The principles become muscle memory through reps on real work, not more reading. The hello-world packs are reusable, go back to Packs 1, 2, 3, 5, and 6 whenever a principle feels rusty.
  • Expand your tool surface → pick up the other tool in your family (Claude Code ↔ OpenCode, or Cowork ↔ OpenWork) by re-reading the parallel column of your original tool-pair crash course. To cross families (engineer → Cowork, or domain expert → Claude Code), take the other 90-minute tool-pair crash course. The principles transfer immediately; you're only learning a new surface.
  • Move to Mode 2 — manufacturing engagements → when you've outgrown solving problems one-at-a-time and want AI Workers that solve a class of problems on a schedule, you're crossing into manufacturing. That branch is governed by the Seven Invariants of the Agent Factory, anchors to Claude Code or OpenCode regardless of your domain (because building a Worker is fundamentally a coding task, even when the Worker's domain is finance, marketing, or law), and starts at the Agent Factory Thesis plus Spec-Driven Development. (Re-read the thesis framing at the top of this crash course for the Mode 1 vs. Mode 2 split.)
  • Teach your team → the capstone in Part 4 runs well as a team exercise after each person has done it solo on their own task.

Quick Reference

The seven principles in one line each

The five doing-principles (what makes the work happen):

  1. Bash is the Key. Brief the hands, not the brain.
  2. Code as Universal Interface. Specify the shape; eliminate prose ambiguity.
  3. Verification as a Core Step. "Looks right" is the failure mode. Force a check.
  4. Small, Reversible Decomposition. Atomic units. Verify each. Commit each.
  5. Persisting State in Files. Conversation is volatile. Files are memory.

The two operating principles (what makes the discipline survive real projects):

  1. Constraints and Safety. Constraints enable autonomy; they don't limit it.
  2. Observability. You can only direct what you can see.

The four-phase workflow

EXPLORE   → read & summarize (read-only)
PLAN → produce a structured plan, save it, review it
IMPLEMENT → small steps, verify each, commit each
COMMIT → final verification, summary, update the rules file

The five failure patterns

PatternReach for
The Drift (wanders from brief)Persistence (P5)
The Confident Wrong (plausible but incorrect)Verification (P3)
The Big Bang (one change nukes hours)Decomposition (P4)
The Scope Creep (touches unauthorized things)Constraints (P6)
The Black Box (no idea what happened)Observability (P7)

The autonomy ladder

Watching closely → Ambient supervision → Walk away → Act without asking → Scheduled

One rung per task type, with track record. Step back down when a task type changes.

Where the principles live in each tool

PrincipleClaude CodeOpenCodeCoworkOpenWork
1. BashTerminalTerminalLocal Linux VMLocal Linux VM
2. Code-as-InterfaceCode blocks, schemasCode blocks, schemasTemplates, .xlsx schemasTemplates, .xlsx schemas
3. VerificationTests, hooksTests, pluginsRubric pass, cross-modelRubric pass, cross-model
4. DecompositionGit commits, Esc EscGit commits, /undoNumbered versionsNumbered versions, /undo
5. PersistenceCLAUDE.mdAGENTS.md (+ CLAUDE.md fallback)CLAUDE.md in folderAGENTS.md in folder
6. Constraints.claude/settings.jsonopencode.jsonFolder/connector/approvalFolder/connector/approval
7. ObservabilityTerminal streamTerminal streamExecution viewExecution view timeline

When something feels wrong

Agent apologizing without progress, rewriting the same thing,
contradicting earlier constraints, proposing scope you didn't ask for?
→ Context is poisoned. Stop typing. Reset and continue from a file.
Don't try to fix it with another prompt.

Last substantially revised: May 2026. Tool names, free-tier mechanics, and version-specific details are accurate as of that date.

Flashcards Study Aid

Knowledge Check

A quick gated self-check on the ideas you just ran through.

Checking access...