How to Think in the AI Era: Crash Course
6 Disciplines · 6 AI Failure Modes · One Rule
Two people open the same AI tool on Monday morning. Same task: should they spend their budget on hiring one experienced person, or use that same money to buy AI tools that help everyone on the team work faster? Both have access to Claude, ChatGPT, and Gemini. Both have one week to decide.
Person A finishes Friday with a clear recommendation she can explain. She wrote down which AI claims she agreed with, which ones she pushed back on, and what would make her change her mind. Person B finishes Friday with a polished document that mostly repeats what AI told her on Monday. When her boss asks "why did you recommend this?" she cannot explain her own reasoning. She just forwarded what sounded good.
Same tools. Same problem. Different outcomes.

The difference is thinking. Person A formed her own opinion before asking AI. Person B let AI's first answer become her opinion.
That gap is what this crash course closes. Six thinking habits, three short parts, no code. Each habit addresses a specific way AI misleads you when you let it do the thinking for you. Together they turn AI from a magic answer machine (you ask, it answers, you accept) into a thinking partner (you predict first, it answers, you compare, you decide).
Prerequisites. This page assumes you have finished AI Prompting in 2026. That course taught the mechanics: how to give AI context, how to use web search and deep research, how to work with images and audio, and how to use AI desktop apps. This course teaches the thinking discipline that makes those mechanics pay off. Open a free account with Claude, ChatGPT, or Gemini in another tab now. You will use it in the practice sections.
A note on AI models. The practice exercises include AI-graded feedback. These work best with a strong, current AI model (Claude, ChatGPT, or Gemini at their best reasoning level). Older or weaker models tend to give vague or overly positive feedback regardless of what you submit. Use the best model you have access to. The specific brand does not matter; what matters is that the model can reason carefully.
📚 Teaching Aid
View Full Presentation — Thinking with AI
The rule in one line
The deliverable is never the answer. The deliverable is the documented evidence of thinking.
Read that as two claims. First, the deliverable — the thing you hand to your boss, your professor, or your client — is no longer just the answer. AI can produce a polished answer in seconds; producing one is no longer the hard part. Second, what makes a deliverable trustworthy now is the written record of how you thought to get there: the prediction you locked in before asking AI, the row where you marked one of AI's claims as REJECT and said why, and the cascade map you drew to trace the side effects (a one-page diagram with a short column for each group your decision affects — students, professors, parents, sponsors — and three arrows under each showing what happens first, what that causes next, and what that causes after; Discipline 4 explains it in full). If someone asks "why did you decide this?", you point at the evidence.
In practice, the evidence usually lives inside the deliverable — a footnote, a "considered and rejected" paragraph, a cascade map as a figure, a "what would change my mind" sentence near the end. Sometimes it lives in a working doc next to the deliverable. Either way, when someone asks why, you can point to it. If you cannot point at anything, you have an answer you cannot defend, which is not a deliverable in 2026.
Sometimes. A chat session captures everything AI said and everything you asked, which is more complete than any reasoning receipt. For low-stakes work — debugging code, quick research, an exploratory brainstorm — the chat link by itself is often enough. But for serious deliverables, the chat link has three limits: it shows what AI said but not what you decided about each claim, it is too long for a busy reader to scan, and it does not show what AI got wrong (the catches live in your head, not in the transcript). Treat the chat link as raw material, the way an academic paper treats raw data. The reasoning receipt or memo is the deliverable you hand to the audience; the chat link goes in the appendix or the footnote for anyone who wants to verify.
Here is what this looks like in practice. Remember Person A and Person B from the opening — same problem, same AI, different outcomes. Friday morning, their boss asks both of them: "why did you recommend this?" Person B has nothing to point to. She forwards the document AI helped her produce and says it sounded right. The boss reads it, finds two claims he disagrees with, and now has no way to tell whether Person B examined those claims or just accepted them. Person A opens her working doc and says: "I predicted on Monday that the experienced hire would be the better choice. AI's analysis flipped that prediction, and here is why I changed my mind — the three claims I checked, the one I rejected, and the assumption that would change my recommendation back." Same problem. Two completely different conversations.
What does the evidence buy you, right now in this deliverable? Two things. One: the act of writing forces the thinking to happen. You cannot write down a specific prediction without first deciding what you actually believe, and you cannot mark a claim as REJECT without first explaining why. Without the writing, the thinking is too easy to skip — you read AI's polished answer, it sounds right, and you adopt it as your own without ever forming a position to compare it against. The writing is the part you cannot fake. Two: the written record is a working tool, not just an audit trail. The bank manager who wrote "my recommendation is to close the branches, because I think most of these customers are app-only" and saw AI come back with data showing only 45% were app-only did not just document a disagreement — the gap between her position and the data became the opening line of her report and the spine of her recommendation. The record is the surface on which the second pass of thinking happens, and the second pass is where the deliverable actually improves.
What changed is not the habit of writing things down. It is the cost of skipping it. When polished output was expensive, the hard part was making the thing. AI made polished output free. The bottleneck moved from producing work to evaluating it, and the written evidence is how you do the evaluation. Tools change every six months; this does not.
The essentials (five bullets)

You just learned the rule. Here are five of the six habits the rest of the page will teach — short version first, full version in the sections below. The bullets tell you what to do; the sections show you how. (The sixth habit, testing where common advice breaks down, needs more setup than a bullet can give and gets its own section.)
-
Think before you ask AI. Write down what YOU think the answer is before you open any AI tool. Why? Because once you read AI's answer, it takes over your thinking. If AI says something that sounds reasonable, you will adopt it as your own without realizing it. Writing your own answer first protects your independent judgment.
-
Keep a written record of what you accepted and what you rejected. When AI gives you claims or recommendations, go through each one and write down: do I agree with this? Do I disagree? Did AI miss something important? Write one sentence explaining why for each. If you agree with everything AI said without pushing back on anything, you probably did not think hard enough.
-
Polished writing is not the same as correct writing. AI always sounds confident and professional, even when it is wrong. There are six specific types of errors that hide inside smooth-sounding AI output (you will learn all six below). Check for each type by name before you send, publish, or act on anything AI wrote.
-
The obvious answer is never the complete answer. When AI analyzes a decision, it focuses on the thing you asked about and ignores the side effects. Before any important decision, trace what happens next across the different people and groups affected. Look for places where the side effects circle back and undo the original decision.
-
The best results come from working WITH AI, not handing it the wheel. Working alone is slow. Letting AI do everything produces generic output. The winning approach: you do the thinking and deciding, AI does the research and drafting. If you flip that (AI thinks, you just edit), you become unnecessary. People who just pass along AI's answers will eventually be replaced by AI itself.
The full framework: six disciplines
The five bullets above are the working summary. Here is the full architecture: six disciplines, paired one-to-one with the AI failure modes they answer, grouped into three parts.
Figure 1: Six disciplines map to six AI failure modes, arranged in three parts.
The three parts run in order. Part 1, Foundations, is about thinking before you ask AI — taking your own position first, then tracking what you decide about each answer. Part 2, Detection, is about spotting what AI gets wrong — the mistakes buried in confident prose, and the side effects it never traces. Part 3, Origination, is about the thinking AI cannot do for you — finding where common advice breaks, and keeping your judgment in charge when AI tries to take over. Each part depends on the one before it.
Four terms recur on this page. A discipline is a thinking habit you practice — something you do. A failure mode is a specific way AI misleads you — something AI does. Each discipline is paired one-to-one with the failure mode it answers (shown as the italic line under each discipline name in the figure). A part of the course groups disciplines that share a job; there are three parts (Foundations, Detection, Origination), two disciplines each, and each part enables the next. A deliverable is what you hand to your boss, professor, or client — in 2026, that's the answer plus the documented evidence of thinking that produced it (the banner at the bottom of the figure).
Each numbered box in the figure is one discipline. The small caps line at the bottom is the action line — the one specific action that discipline asks you to take, written to fit on a sticky note. The discipline name tells you what the habit is called; the action line tells you what to actually do.
Start here. The two disciplines in Part 1 are the ones that keep AI from doing your thinking for you. Skip them and the other four cannot do their job.
How to read this page
| Time you have | What to read | What to skip |
|---|---|---|
| 45 minutes | Habits 1, 2, 3, and 6 (read only, no exercises) | Habits 4 and 5 (come back later) |
| 90 minutes | All six habits + worked examples, read-only | The AICheck submissions |
| A working day (recommended) | Everything, run each exercise on a real decision from your week | Nothing |
These habits stick when you try them on real problems from your week. Reading the page in 90 minutes shows you the moves. Doing the exercises on real decisions is how they become yours.
Part 1: Foundations (the posture, meaning the stance you take before you start)
If you skip everything else, do not skip these two habits. They fix the two biggest mistakes people make with AI:
-
Mistake 1: AI thinks for you. You ask a question, AI gives a smooth answer, and you accept it before you have formed your own opinion. Habit 1 (Prediction Lock) fixes this: you write down what you think BEFORE opening AI.
-
Mistake 2: AI's first draft looks finished. The writing is so polished that you send it without checking whether it is actually correct. Habit 2 (Reasoning Receipt) fixes this: you go through each claim and write down whether you agree, disagree, or need to verify.
Together, these two habits keep the thinking with you and the typing with AI. Everything in Parts 2 and 3 builds on them.
Discipline 1: The Prediction Lock
The goal is one thing: have a written position of your own before AI's answer arrives. Everything below — the four lines, the sticky note, the confidence percentage — exists to make that one thing actually happen. If you write all four lines and still cannot say what your position was before you opened AI, the discipline didn't work. If your position is clear and you got there in two lines instead of four, the discipline still worked. The four lines are a recipe, not the dish.
Here is what usually happens without the lock. You ask AI an important question. AI gives you a confident, well-written answer. You think "that sounds right" and go with it. Two days later someone asks "why did you decide that?" and you realize: that was AI's answer, not yours. You never formed your own opinion.
The fix takes three minutes. Write four lines on a piece of paper before you open AI. Let's try it together on someone else's decision first.
Maya is 13. Her school emailed: pick one summer activity. Option 1: debate camp (two weeks, all her friends are going). Option 2: coding bootcamp (one week, she is curious but nervous). Her dad says "just ask ChatGPT, it'll know."
Before Maya asks AI, she writes four lines:
Figure: the four lines of the Prediction Lock, with Maya's answers as a worked example.
Line 1: What is this decision really about?
Not "debate or coding." That is just the surface. The real question underneath might be: "Am I going to do what my friends do, or what I would pick if nobody was watching?" Or: "Would I regret missing coding more than missing debate?" Write the real question in one sentence.
Line 2: What is the ONE fact that would help the most?
Not "which is better?" That is too vague. Something specific you can check: "Does the coding bootcamp teach Python?" This matters because her school already teaches Python in 9th grade. If the bootcamp teaches the same thing, the two weeks of coding mostly repeat what she will learn anyway. If it teaches something her school does not cover, the bootcamp is offering a skill she could not get elsewhere.
Line 3: What is your decision, before AI weighs in?
Take a position. Not "it depends." Not "I'll see what AI says." Pick debate or pick coding, and write down why. Maya's reasoning: she knows her school covers Python in 9th grade, the bootcamp is most likely covering Python too, and the two weeks with her friends learning something she cannot get from a school course is worth more than a repeat of next year's curriculum. So her decision is debate.
This is the part everyone wants to skip. "How can I decide without asking AI first?" You can. You already know things — what your school teaches, what you would regret missing, what your friends are doing. Use what you know to form a position. AI's job in a minute is to confirm or overturn that position, not to form it for you.
Line 4: How confident are you, and what specific AI answer would flip your decision?
Pick a percentage: 60%, 75%, anything. The exact number does not matter. What matters is that you committed. Then write the one AI answer that would change your mind. Maya: "70% sure debate is the right call. If the bootcamp teaches something my school doesn't (Rust, embedded programming, game development), coding wins because that's a skill I couldn't get elsewhere."
If you cannot name the specific AI answer that would flip your decision, you have not committed to a real position yet. "It depends" is not a position. "I'll do X unless AI tells me Y" is a position.
How do you know the lock worked?
There is one test, and it does not involve counting lines:
Can you say, out loud, what your position was before you opened AI — and what would have made you change your mind?
If yes, the lock worked. The line count does not matter.
If no — if you find yourself saying "well, AI said X so I went with X" or "I thought about it and decided whatever AI suggested" — the lock did not work. The line count still does not matter.
The four lines are training wheels. They make the goal hard to skip. After a few weeks of practice, you may compress all four into a single paragraph or a few mental notes, and the lock will still work. But for the first ten times you do this, write the four lines explicitly. It is the only way to know you actually committed to a position rather than just thinking you did.
What the four lines are doing
The four lines work for Maya because her decision is simple: one binary choice, one fact that would settle it. Not every decision looks like that. So before you copy the four-line template, look at what each line is actually doing underneath. Maya's lines are one instance of a process that stays the same across decisions even when the form changes.
The Prediction Lock has four parts. They are the same four parts for any decision:
-
Surface the real decision. Strip away the label. Maya's surface decision was "debate or coding." Her real decision was "follow my friends or pick on my own." The bank manager's surface decision was "close two branches." Her real decision was "what to do about a customer base that has moved to the app." The label always hides the actual question. Name the actual question.
-
Identify what would settle it. What information, if you had it, would make the decision obvious? For Maya, one fact (does the bootcamp teach Python?). For a hiring decision with three candidates, it might be three facts (does each candidate have the specific skill we need most?). For a budget allocation across five categories, it might be a comparison (which category has the lowest return on the marginal dollar?). Name the facts specifically enough that you could verify each one. The number depends on the decision; the requirement that they be checkable does not.
-
Commit to a position. Based on what you already know — before you check anything with AI — what would you do? Write it down with the reasoning that supports it. For Maya: debate, because school already covers Python. For a hiring decision: name a specific candidate, with the reason. For a budget cut: name the line items, with the reason. A position is a what plus a why, not just a what.
-
Name the reversal condition. What specific finding would change the position? Maya: if the bootcamp teaches something the school doesn't cover, coding wins. For a hire: if the second candidate's reference check comes back significantly stronger than the top candidate's, switch. For a budget cut: if Category X's projected revenue is more than 30% off, cut a different category instead. If you cannot name what would flip you, you have not committed — you have a preference.
Maya's sticky note happens to fit on four lines because her decision is small enough that each part fits on one line. A bigger decision — a hiring round, a strategic pivot, a major purchase — might take a paragraph per part and fill an A4 page. A smaller decision — what to order for lunch when you actually care — might fit on a single index card.
A worked example with a different shape: imagine you're hiring one of three software engineers and have a week to decide.
- Real decision: Not "who is best on paper" but "which of these three would still be productive in twelve months when the codebase has changed twice."
- What would settle it: Three things, not one. Each candidate's track record on long projects, their willingness to learn unfamiliar tools, and the quality of their reference from a previous manager who saw them through a tough quarter.
- Your position: Candidate B, because her two-year stint on the previous job suggests durability, and her side project shows she picks up new tools without being asked.
- What flips you: If Candidate A's reference says she shipped the hardest project of the past year, switch to A. If Candidate C's reference flags any communication issues, B stays.
That is the same Prediction Lock as Maya's. Different decision, different amount written under each part, same four parts.
Why four lines? Why not just one?
This is the question every reader asks, usually at Line 3 ("can't I just write the decision?"). The answer is no, and it is worth understanding why.
Each line catches a failure mode the others cannot. Compress them into one line and you lose specific things:
- Skip Line 1, and you answer the wrong question. Maya's surface decision is "debate or coding." Her real decision is "follow my friends or pick on my own." These have different answers. The bank manager's surface decision was "close two branches." Her real decision was "what to do about a customer base that has moved to the app." The label always hides the actual question. Line 1 surfaces it.
- Skip Line 2, and your AI prompt collapses the lock. Without a specific question to ask, the reader defaults to "which should I pick?" — an open-ended invitation for AI to make the decision. Line 2 forces a closed, verifiable question that AI can either confirm or contradict. "Does the bootcamp teach Python?" is checkable. "Which camp is better?" is not.
- Skip Line 3, and there is nothing to compare AI's answer against. This is the lock itself. Lines 1 and 2 set it up; Line 4 makes it specific. But Line 3 is the line that gives you a position to defend when AI's confident answer arrives.
- Skip Line 4, and you have a hope, not a commitment. "I pick debate" sounds like a decision. But until you name the specific AI answer that would flip it, you cannot tell whether you actually committed or whether you will abandon the position the moment AI suggests otherwise. Line 4 forces the commitment to be specific. It is also the line that lets you check, months later, whether your gut was calibrated — "I said 70% and it turned out the other way" — which is the only way judgment improves over time.
Try the one-line version and see for yourself. "I think Maya should pick debate" is a casual preference, not a prediction lock. It doesn't say what is really at stake, it doesn't say what AI question would settle it, and it doesn't say what would change your mind. A reader who writes only that single line will read AI's two-paragraph response and adopt it without resistance — because there is nothing in the line to resist with.
The four lines look similar on the surface (they are all about "your position") but they catch different things. The discipline asks for four because experience shows that anyone who tries to skip one falls into the specific failure mode that line catches.
There is a pedagogical reason too. Four lines is short enough that a reader can actually do it (three minutes, fits on a sticky note), but long enough that the act of writing forces the thinking to happen. One line is too short — you can write it without thinking. Ten lines is too long — you will skip the exercise entirely. Four is the floor at which the thinking actually has to happen, and the ceiling at which a busy reader will still do it on a Tuesday morning before a meeting.
So: one line if you want the appearance of a prediction lock. Four lines if you want the thing itself.
Maya's sticky now reads:
What's going on: Whether she'll do what her friends are doing or what she'd pick alone.
The question that would help: Will the bootcamp use Python (which her school already teaches in 9th grade)?
Decision: Debate. Two weeks with friends, learning something the school doesn't offer, beats a one-week repeat of next year's curriculum.
Confidence + what flips me: 70%. If the bootcamp teaches Rust, embedded systems, or anything her school doesn't cover, coding wins.
Now she types her question into ChatGPT. Here's the actual prompt she pastes:
My school's summer program runs a one-week coding bootcamp. I'm trying
to figure out one thing: will it teach Python? My school already teaches
Python in 9th grade, so I want to know if there's overlap. Just answer
the question. Don't recommend which camp I should pick.
The lock changed the question. Without the four lines on the sticky note, Maya would have asked AI "should I pick debate or coding?" — an open question that hands the decision to AI. With the lock, she already has a decision; she only needs one fact to confirm or overturn it. So she asks a closed question instead. AI's role shifts from decision-maker to fact-checker. That shift is what the discipline produces. The four lines did not just clarify Maya's thinking — they reassigned who does what in this conversation.
ChatGPT comes back with: "Most one-week coding bootcamps for middle schoolers cover Python basics in the first two to three days." Maya holds that next to her sticky note. AI's answer (Python) matches the answer she was prepared for. Her decision (debate) holds — for the reason she wrote down, not because AI told her so.
At dinner her dad asks why, and Maya has a real answer: "The bootcamp covers Python and my school's already teaching that next year. I'd rather spend two weeks with my friends learning debate, which the school doesn't offer at all." That is her reasoning. AI confirmed one fact inside it.
Compare that to the version without the lock. Maya opens ChatGPT, asks "should I pick debate camp or a one-week coding bootcamp?" ChatGPT writes a balanced two-paragraph answer ending with "both are valuable; consider what energizes you most." Maya reads it, picks debate because that is where her friends are going, and at dinner says "ChatGPT said both are good, so I went with debate." The decision is the same. The reasoning is gone. Two days later she cannot explain why she chose what she chose.
Those four lines are the Prediction Lock. Three minutes of writing before AI's confident answer takes the spot in your head where your own answer would have gone.
Once you read AI's answer, you cannot un-read it. You can't even tell what you'd have thought without it. You just notice, two days later, that you can't quite explain why you decided what you decided. You absorbed AI's answer. You didn't earn your own.
Sealed before the answer, or it isn't a prediction.
The same discipline works on bigger decisions. A bank manager had to decide whether to close two branches that were losing money. Before asking AI, she wrote her four lines:
Line 1 (what is this really about): The branches lose money because most customers now use the app instead of visiting in person. The real question is whether enough customers still walk in to justify keeping the branches open.
Line 2 (the one fact that would settle it): What percentage of these branches' customers are app-only (never visit the branch)?
Line 3 (my decision before AI weighs in): Close the branches. My experience working with the customer-service team suggests most of these customers stopped walking in years ago. I would not have predicted this two years ago, but the pattern has been clear since the app launched.
Line 4 (confidence + what flips me): 60% sure. If less than half the customers are app-only, that means a real walk-in base still exists, and closing the branches would lose those customers entirely. Keep the branches open in that case.
Then she pulled out her bank's customer data and asked Claude:
I have transaction data for two branches we're considering closing.
For each customer who used these branches in the last 12 months,
I need to know what percentage NEVER walked into a branch and
only used the mobile app. Just give me the percentage. Don't
recommend whether to close the branches.
Claude came back with 45%. That's lower than her 50% threshold, which means her Line 4 flipped — closing the branches was no longer the right call.
But the more interesting thing was the gap between what she expected (most customers app-only) and what the data showed (only 45%). That gap told her she had overestimated how far the customer base had moved. She used both findings in her report: the data flipped her recommendation from "close" to "keep open," and the gap became her opening line — "I expected most of these customers to be app-only; the data shows only 45% are, which changes the recommendation." She ended up proposing a middle path: keep the branches open with reduced staff hours, since 55% of customers were still walking in but not at full-day levels.
Without the Prediction Lock, she would have just accepted whatever AI said and never noticed her own assumption was off — and the middle path (reduced hours) wouldn't have surfaced, because she wouldn't have had a gap to notice.
Maya's four lines and the bank manager's four lines look different on the surface. They're the same Prediction Lock — same four parts, applied to decisions of different sizes.
Now your turn
You already wrote four lines for Maya. You can paste those same lines into the boxes below. Or, if you have a decision of your own, try the four lines on that instead. For example: something you want to buy, two plans you are choosing between, a conversation you keep avoiding, or a class you are not sure about.
Write your four lines first. Then ask AI your Line 2 question using this prompt:
I'm trying to decide [describe your situation in 1-2 sentences].
My question is: [paste your Line 2 question here].
Just answer that one question. Don't make the decision for me.
Here's Maya's version of the same prompt, filled in from her sticky note:
I'm trying to decide between two summer camps. One is a one-week
coding bootcamp; the other is a two-week debate camp where all my
friends are going.
My question is: does the bootcamp teach Python? My school already
teaches Python in 9th grade, so I want to know if there's overlap.
Just answer that one question. Don't make the decision for me.
ChatGPT's response:
Most one-week coding bootcamps for middle schoolers cover Python
basics in the first two to three days, then move on to a small
project using those basics. Some bootcamps add light JavaScript or
web concepts later in the week, but Python is almost always the
core language.
Maya holds that next to her Line 4. Her Line 4 said coding wins only if the bootcamp teaches something her school doesn't cover. AI confirmed Python is the core — exactly what her school already teaches in 9th grade. That's not her flipping condition. Her decision stays: debate.
Only Lines 1 and 2 go into the prompt. Keep Line 3 (your decision) and Line 4 (what would change your mind) off the page AI sees. If AI knows what you've committed to, it tends to agree with you — and you lose the comparison the lock was built for.
Then compare AI's answer to your Line 4. You wrote down a specific finding that would change your mind. Did AI tell you that finding, or didn't it?
-
If AI's answer is not what would flip you, your Line 3 decision holds. You can defend it for the reason you wrote down. Maya's case (what actually happened): her Line 4 said coding would win only if the bootcamp taught Rust or something her school doesn't cover. AI said the bootcamp teaches Python — which is what her school already teaches. That's not the flipping condition. Her decision stays: debate.
-
If AI's answer is exactly what would flip you, your decision changes — for the reason you set in advance, not because AI sounded confident. Maya's case if AI had said something different: suppose AI had come back with "the bootcamp teaches embedded systems, not Python." That would have hit Maya's Line 4 exactly (school doesn't teach embedded systems). She would switch to coding — for the reason she committed to on Monday, not because AI sold her on it.
-
If AI's answer is somewhere in between, go back to your Line 3 reasoning. Does the new information actually weaken it? If yes, change your decision and write down why. If no, your decision still holds. Maya's case if AI had been ambiguous: suppose AI had said "the bootcamp covers Python for the first three days and then introduces React." React is something her school doesn't teach, but it's only two days of the bootcamp. Maya rereads her Line 3: the case was "two weeks with friends learning debate beats one week mostly repeating Python." Two days of React doesn't change that — the bootcamp is still mostly repeat material. Her decision stays.
If AI hedges instead of answering, ask again with one more sentence: "Just give me the specific information; don't qualify it." If AI asks a clarifying question, answer it but add: "Then answer the original question." The goal is a concrete answer you can hold next to your Line 4 — not a paragraph of "it depends on several factors." If your second attempt still doesn't get you a usable answer, your Line 2 question may be too broad. Rewrite it to be more specific, then try again.
A note on revising the lock. If AI's answer makes you realize your Line 4 was wrong — you named the wrong flipping condition — that is a real signal worth honoring, but be careful about when you revise. Revising Line 4 before you've decided how to react to AI's answer is fine; you noticed something you missed and you're updating your thinking. Revising Line 4 after AI's answer has come in, to make the answer not count as a flip, defeats the lock. The test is whether you would have written the new Line 4 even without seeing AI's answer. If yes, revise. If no, your old Line 4 stands.
Check that the lock worked. Try to finish the sentence "I decided this because..." out loud. If you can do it without using the words "AI said," the lock worked. If you can't, find the line you skipped.
That sentence — the one you can finish out loud — is the smallest piece of documented evidence of thinking you can produce. It's the same thing the rule at the top of this page is about: not a polished answer AI handed you, but a reason you can point to. Every discipline below builds on this one piece of evidence. Get this one working and the rest get easier.
The exercise below does not check whether your decision is "right." It only checks whether your four lines are clear: Did you name the real decision? Is your question specific? Is your position committed (not "it depends")? Did you name the specific AI answer that would flip you? It is okay if your first try is messy.
You have two options for what to put in the boxes. Option 1: write four lines for Maya — use her decision (debate camp vs. coding bootcamp) and your own version of what each line should say. The grader will check whether your lines are clear. Option 2: write four lines for a real decision in your own week — something you actually need to figure out. The grader will check the same thing. Either option works; the discipline is the same.
If you're going with Option 1, here are Maya's lines as a reference:
Line 1 (what's going on): Whether she'll do what her friends are doing or what she'd pick alone.
Line 2 (the question that would help): Will the bootcamp use Python (which her school already teaches in 9th grade)?
Line 3 (decision): Debate. Two weeks with friends, learning something the school doesn't offer, beats a one-week repeat of next year's curriculum.
Line 4 (confidence + what flips me): 70%. If the bootcamp teaches Rust, embedded systems, or anything her school doesn't cover, coding wins.
Fill in the four boxes and click submit. The grader scores each line and tells you what to improve, like a teacher checking your homework instantly.

Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.
This takes about 8 minutes the first time. After you get your score, find one place where you think the AI grader is wrong. That is the most useful part of the exercise.
This covers half of Discipline 1. The other half (keeping track of what AI says and deciding which parts you agree with, disagree with, or want to change) is Discipline 2.
Why this works (the research behind it)
The Prediction Lock isn't a new idea. It's the AI-era version of three older techniques, each of which has been studied for decades.
The premortem (Gary Klein, 2007). Before a project starts, the team imagines it has already failed and writes down all the reasons why. The act of writing the failure reasons first, before the optimism of the project takes hold, surfaces risks that would otherwise stay buried. Research by Deborah J. Mitchell, Jay Russo, and Nancy Pennington found that "prospective hindsight" — imagining that an event has already occurred — increases the ability to correctly identify reasons for future outcomes by 30%. The discipline you just learned does the same thing in miniature: before AI weighs in, you write your decision and the specific finding that would change your mind. The "writing first" is the load-bearing part.
Read Klein's original article: Performing a Project Premortem, Harvard Business Review, September 2007.
Forecasting calibration (Philip Tetlock, the Good Judgment Project, 2011-2015). Tetlock and his colleagues ran a multi-year tournament where thousands of forecasters made probabilistic predictions about world events. The best forecasters — the ones Tetlock called "superforecasters" — shared a specific habit: they recorded their predictions with confidence percentages before the answer arrived, then compared the prediction against the outcome afterward. Without the written-down prediction, you cannot tell whether your gut was calibrated or off, because you reconstruct your "prior beliefs" to match whatever happened. Line 4 of the Prediction Lock (the confidence percentage) is the smallest possible version of this practice. Over months and years, comparing your locked-in confidence to actual outcomes is the only way judgment improves.
Read about the project: The Good Judgment Project (Wikipedia). For the book-length treatment: Tetlock and Gardner, Superforecasting: The Art and Science of Prediction (2015).
Anchoring (Amos Tversky and Daniel Kahneman, 1974). When a confident answer occupies the spot in your head where your own answer would have gone, the confident answer becomes your reference point — and you can no longer tell what you would have thought without it. Tversky and Kahneman's original work used numerical examples (people asked to estimate a percentage after being shown an arbitrary number gave estimates anchored to that number), but the principle is general: any confident answer that lands in your head before you have formed your own becomes the anchor your subsequent thinking adjusts from. AI's answers are confident by default. That makes them powerful anchors. The Prediction Lock is the move that keeps the anchor from forming — you place your own anchor first, in writing, before AI's can land.
Read the original paper: Judgment under Uncertainty: Heuristics and Biases, Science, Vol. 185, No. 4157, September 27, 1974, pp. 1124-1131. (Available open-access at this mirror if you don't have Science journal access.)
The Prediction Lock combines all three. Write your decision and your flipping condition first (premortem). Record your confidence so you can check calibration later (Tetlock). And do both before reading AI's answer, so AI's confidence doesn't become the anchor you adjust from (Tversky and Kahneman). The four lines on a sticky note compress three decades of research into a three-minute habit.
The full version of this exercise (10 ranked questions plus the Reasoning Receipt template; 45-60 minutes) lives in Part 0 Chapter 1, Lesson 1. This page teaches the discipline. That page makes it a system.
Discipline 2: The Reasoning Receipt
You spent the morning working with Claude on a report. The result looks good. You send it off and move on. Two weeks later someone asks: "Which parts of this did you actually check? Which parts did you change?" You have no answer. You read what AI wrote, it looked fine, so you used it. The work got done, but you never really thought about it.
This is the second-most common AI failure mode after letting AI think for you (Discipline 1). Even when you have your own position locked in, AI's drafts come out in big polished blocks — five suggestions, a six-paragraph memo, a ten-row plan — and you cannot defend any of it later because you never tracked what you decided about each piece.
Here is how to fix that. Every time AI gives you a claim, a recommendation, or a chunk of writing that goes into your final work, you make a one-line note saying what you did with it and why. Not the whole thing — just one note per piece. Together those notes are called a Reasoning Receipt.
Here's what one row looks like. Suppose you asked Claude to help you plan a group presentation, and it suggested: "Start the presentation with a short video clip to grab attention." You think about it. Your teacher said earlier this semester that visual openings get better grades, so the suggestion fits what you already know about what works in this class. You decide to keep it.
That decision becomes one row in your receipt:
| What AI said | What you did | Why |
|---|---|---|
| Start with a short video clip to grab attention. | ACCEPT | Our teacher said visual openings get better grades. This fits. |
Three columns. What AI said (so future you remembers what was being decided), what you did (a one-word label), and why (one sentence so the row is defensible later).
Now suppose Claude's next suggestion was "Give each person 5 minutes to speak." You have four group members and 15 minutes total. The math doesn't work. So you reject it:
| What AI said | What you did | Why |
|---|---|---|
| Give each person 5 minutes to speak. | REJECT | We have 15 minutes for 4 people. The math doesn't work. |
That's the discipline. One row per AI suggestion, three columns each.
The five labels. What you did always falls into one of five categories. Most of the time you will use ACCEPT, REJECT, or MODIFY. The other two (SURFACED and MISSED) catch cases that are easy to skip otherwise.
| Label | What you did | Write one sentence explaining why |
|---|---|---|
| ACCEPT | You kept what AI said, no changes. | Why you trusted it. |
| REJECT | You decided AI was wrong and removed it. | What made you disagree. |
| MODIFY | You kept the idea but changed part of it. | What you changed and why. |
| SURFACED | AI brought up something you had not thought of. You kept it. | Why it matters. |
| MISSED | You noticed something AI forgot to mention. You added it. | What was missing and why it matters. |
ACCEPT, REJECT, and MODIFY are the basic moves. SURFACED is for the moments AI taught you something — those are valuable to track because they are the cases where AI genuinely added thinking you would not have done alone. MISSED is for what AI did not say but should have — those are the cases where your own judgment caught something AI's drafting glossed over.
A good receipt has a mix of all five over time. If every row says ACCEPT, you are not really thinking — you are just signing off on what AI wrote.
"But nobody ever audits my work — why bother?"
This is the most reasonable objection to the discipline, and it deserves a real answer. Most readers, most of the time, will not be audited. Your boss does not ask. Your professor moves on. Your client signs off. If the only payoff of a Reasoning Receipt was "in case someone asks," the receipt would not be worth the trouble.
Keeping a receipt does three things even when nobody ever asks.
First, the act of writing changes what you decide. When you accept an AI suggestion silently, your brain processes it as "sounds right, moving on." When you have to write a one-word label and a one-sentence reason, your brain has to actually examine the suggestion. Most readers, when they try this for the first time, discover at least one row per session where they cannot finish the "why" sentence. That row was something they were about to use without thinking. The receipt catches it before it ships.
Second, the receipt becomes part of your work, not just a record of it. The bank manager from Discipline 1 turned the gap between her position and the data into the opening line of her report. The student in the next example below used her receipt as a working document with her group, not as an audit trail. A row labeled REJECT often becomes a "considered alternatives" paragraph in the final document. A row labeled SURFACED often becomes the most interesting insight you bring to the meeting. The receipt is a working tool, not a filing cabinet.
Third, future you is the most common auditor. Three months from now, you will look at this work and not remember which parts were yours, which were AI's, or why you decided what you decided. The receipt is a note to future you. Most of the times the receipt pays off, the question comes from yourself, not from a boss.
The audit scenario is the most visible payoff, but it is the rarest. The first three payoffs happen every time you keep a receipt, even when no one ever reads it. This is what the page's central rule means in practice: the deliverable is the documented evidence of thinking. The receipt is not separate from your work — it shapes the work as you produce it, and over time it is the thing you keep when memory of the project has faded.
A receipt is one decision per row. The label tells you what you did. The "why" tells future you (or anyone else reading) why the row can be trusted.
Here's what this looks like in real life.
A student asked Claude to help plan a group presentation for class. Claude gave a full plan. Instead of just using it, the student went through each suggestion and wrote down what she thought:
| What AI said | Label | Why |
|---|---|---|
| "Start the presentation with a short video clip to grab attention." | ACCEPT | Good idea. Our teacher said visual openings get better grades. |
| "Give each person 5 minutes to speak." | REJECT | We only have 15 minutes total and there are 4 of us. The math does not work. |
| "End with a Q&A session." | MODIFY | Q&A yes, but we will prepare 3 backup questions in case nobody asks anything. |
| "Add a live demo of the app you built." | SURFACED | I had not thought of this. A live demo would make our presentation stand out. |
| (AI did not mention who brings the laptop and adapter for the projector.) | MISSED | I added this. Last time our group forgot the adapter and wasted 5 minutes. |
She shared the receipt with her group. After the presentation, the teacher asked why they did not give each person 5 minutes. She pointed at row 2: "We only had 15 minutes for 4 people. The math did not work." That one sentence was enough. Without the receipt, she would have had to remember and explain everything from scratch.
What happens without a receipt:
| What AI said | Label | Why |
|---|---|---|
| "Start with a short video clip." | ACCEPT | Sounds right. |
| "Give each person 5 minutes." | ACCEPT | Sounds right. |
| "End with a Q&A session." | ACCEPT | Sounds right. |
| "Add a live demo." | ACCEPT | Sounds right. |
| (Nothing written down.) |
If every row says ACCEPT with "sounds right" as the reason, you did not really think about it. You just copied what AI said. A good receipt has a mix of labels. If you cannot explain why you accepted something, you did not actually decide to keep it. You just went along with it.
Try it yourself
You are organizing your university's annual tech fest. Your team has 10 members. The event is in 3 weeks. You have not started marketing yet. Another university announced a similar event on the same weekend. You asked AI: "Should we move the event one week earlier, or keep the original date?" AI gave you five suggestions. For each one, pick a label (ACCEPT, REJECT, MODIFY, SURFACED, or MISSED) and write one sentence explaining why.
- "Move it earlier. Being first matters when two events compete for the same audience."
- "If you keep the original date, students will compare the two events and may pick the other one."
- "Your social media posts get the most engagement on Thursdays, so start marketing this Thursday."
- "Moving one week earlier means your team has only 2 weeks to prepare instead of 3."
- "Most students decide which events to attend based on what their friends are going to."
The AI grader will check two things:
- Did you explain your reasoning, or did you just write "sounds right"? Rate 1-10. Quote my weakest explanation.
- Did you use more than one label? If every row says ACCEPT, you did not really think about it. Rate 1-10.
Do not rewrite my work. If a box is empty or vague, just say so.
Claim 1: "Move it earlier. Being first matters."
Claim 2: "Students will compare the two events and may pick the other one."
Claim 3: "Start marketing this Thursday because that is when posts get the most engagement."
Claim 4: "Moving earlier means only 2 weeks to prepare instead of 3."
Claim 5: "Students decide based on what their friends are going to."
Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.
This takes about 10-15 minutes the first time. After you get your score, look for any row where you wrote "sounds right" without a real reason. That is the row where you accepted AI's thinking without doing your own. Go back and write a real explanation for that one row.
What you just did helps you check each suggestion one at a time. But it does not catch mistakes inside each suggestion, like made-up facts, outdated information, or AI sounding confident about something it got wrong. That is what Discipline 3 is for.
Want to see a good example? (Open this after you submit your own.)
Another student did the same tech fest exercise. This is not the only right answer, but it shows what a good receipt looks like.
| Claim | Label | Why |
|---|---|---|
| 1 | REJECT | Being first does not matter here. Students pick events based on what sounds fun, not which was announced first. |
| 2 | MODIFY | Students might compare, but only if they hear about both. If we market better, the other event does not matter. |
| 3 | ACCEPT | Our Instagram data from last semester shows Thursday posts get 2x more likes. This checks out. |
| 4 | SURFACED | I had not thought about this. Losing a week of prep time is a real problem because we have not booked the venue yet. |
| 5 | ACCEPT | This is true. Last year we saw a big jump in sign-ups after we added a "bring your friend" option to the registration form. |
| 6 | MISSED | AI did not mention that our biggest sponsor needs 3 weeks notice. Moving earlier means we might lose the sponsorship. |
Why this is good: Only two ACCEPTs, and both have real reasons behind them (actual data from last semester and last year, not just "sounds right"). The MISSED row (row 6) catches something AI could not have known (the sponsor's 3-week notice rule). The student ended up deciding to keep the original date, but for a reason AI never mentioned: the sponsorship.
What this does not try to do: be clever. Most rows are one sentence. The point is writing real reasons, not long ones.
Why this works (the research behind it)
The Reasoning Receipt isn't a new idea either. Writing down what you decided and why is one of the most studied habits in how experts actually think. Three bodies of work explain why it works.
Reflection-in-action (Donald Schön, 1983). Studying how doctors, architects, engineers, and teachers actually work, Schön found that skilled professionals don't just act and move on — they keep a running internal commentary, noticing surprises and deciding what to do about them as the work unfolds, not in a review afterward. The professionals who improved fastest were the ones who made that commentary explicit instead of leaving it tacit. The Reasoning Receipt is that commentary written down: instead of silently thinking "this AI suggestion seems off," you write the label and the reason while you are still in the work, where it can change what you do next.
Read more: Reflective practice (Wikipedia), which summarizes Schön's The Reflective Practitioner (Basic Books, 1983).
Single-loop vs. double-loop learning (Chris Argyris, 1977). Argyris drew a line between two kinds of correction. Single-loop learning fixes the immediate mistake — the answer was wrong, so you change the answer. Double-loop learning steps back and asks whether the whole approach or assumption was wrong in the first place. His finding was that smart, capable people get stuck in single-loop mode by default; they tune the output and never question the frame. A receipt where every row says ACCEPT is single-loop thinking made visible — you're approving outputs without ever asking whether the approach is right. Forcing a real "why" on each row, and noticing when you can't write one, is what pushes you into the double loop.
Read Argyris's original article: Double Loop Learning in Organizations, Harvard Business Review, September 1977.
Elaboration and the generation effect (Brown, Roediger & McDaniel, 2014). Decades of memory research converge on a simple finding: you remember something far better when you put it into your own words and connect it to what you already know than when you simply re-read it. The act of generating the explanation — even a single sentence — is what builds the durable memory. Each "why" in your receipt is exactly this move. Three months later, the row you wrote a real reason for is the one you'll still understand; the row you rubber-stamped with "sounds right" will be a blank.
Read more: Make It Stick: The Science of Successful Learning (Belknap Press of Harvard University Press, 2014) — a summary of the book's central findings.
The Reasoning Receipt combines all three. You write your decision about each AI claim while you're still in the work (Schön), the forced "why" pushes you from rubber-stamping outputs to questioning the approach (Argyris), and putting the reason in your own words is what makes you remember it later (Brown, Roediger & McDaniel). Nobody has tested the Reasoning Receipt against AI specifically — but the habit underneath it, writing down your choices and explaining them, is one of the most established results in how people think and learn. Using it on AI output is the natural next step.
Go deeper: Part 0 Chapter 1: Asking Better Questions. The full version (a 10-row receipt against a real AI conversation, plus the Contradiction Challenge where you have a different AI attack your reasoning, 45-60 min) lives there as part of the foundational sequence. This page teaches the discipline. That chapter makes it a habit you can run on every high-stakes AI conversation you have.
Part 2: Detection (catching what AI misses)
Part 1 taught you how to think before using AI. Part 2 teaches you how to spot mistakes in what AI gives back.
Here is the problem: AI sounds equally confident whether it is right or wrong. Its worst mistakes often hide in the sentences that sound the most polished. AI also tends to focus on the one thing you asked about and ignore the side effects.
Discipline 3 (Error Taxonomy) gives you a checklist of six common AI mistakes so you can scan for them before you trust the output. Discipline 4 (Thinking in Systems) teaches you to ask "if I do this, what else changes?" so you catch the side effects AI missed.
Discipline 3: The Error Taxonomy
You have probably experienced this. You ask AI a question, the answer comes back sounding smooth and professional, you read through it, everything seems fine, you use it. Three days later you find out one of the numbers was wrong, or a source AI mentioned does not actually exist. The mistake was sitting right there, but you missed it because the writing sounded so good.
Here is the part that matters: the person who pays for that missed mistake is usually you, not some auditor who catches you later. If AI told you a used car had 32,000 miles when it really had 58,000, you do not get embarrassed in a meeting — you buy the wrong car. If AI invented a statistic for your report, you do not just look bad when someone checks; you made a decision based on a number that was never real. AI's mistakes hurt the person who acts on them first. That person is you.
Why "taxonomy"? A taxonomy is just a naming system — a fixed set of labeled categories you sort things into, the way biologists sort living things into species. The power is in the naming. "Check whether this is any good" is too vague to act on; your eyes slide over the page and nothing stops them. But "check whether there's a fabricated source" is a specific hunt with a specific target, so you actually stop at every citation and look. The Error Taxonomy is six named categories of AI mistake. Naming them is what turns a vague worry ("something might be wrong") into six concrete searches you can actually run.
Here is how to catch them. Instead of reading AI's output and asking yourself "does this feel right?", go through it looking for one specific type of mistake at a time. There are six types:
| Mistake type | What it looks like | Where to look first |
|---|---|---|
| Factual error | A wrong fact: a wrong number, wrong date, wrong name. | Any sentence with a specific number. Exact-looking numbers make things sound researched. Example: "73.6% of people fail to check AI's numbers." That sounds real. I just made it up. |
| Logical gap | The conclusion does not actually follow from the evidence. | Look for words like "therefore" or "so." Then ask: does the evidence actually prove this, or is there a step missing? |
| False confidence | AI states something uncertain as if it were a fact. | The smoothest-sounding paragraphs. If AI uses "may" or "could," it knows it is unsure. If AI states something debatable without any "may" or "could," that is the warning sign. |
| Missing context | AI left out an important detail that would change the answer. | Think about what an expert would ask first. If you would ask "but what about X?", AI probably did not think about it. |
| Fabricated source | AI mentions a book, article, study, or tool that does not actually exist. | Check every source AI names. Google the title. If you cannot find it, AI probably invented it. |
| Stale fact | Something that used to be true but is not true anymore. | Anything that changes over time: prices, rules, laws, software versions, who runs a company. |
Here's what one scan feels like. Take just the first type, Factual error. The instruction says: look at any sentence with a specific number. So you read AI's output and stop at every number, ignoring everything else. Suppose AI wrote "this car has 32,000 miles on the odometer." That is a number, so you stop. You do not ask "does this sound right?" — a wrong mileage sounds exactly as reasonable as a right one. Instead you check it against the source: you look at the photo of the dashboard. It says 58,000. Caught. You did not catch it by reading carefully; you caught it because you were specifically hunting for one type of mistake — a wrong number — and a number is where you stopped.
That is the whole technique, repeated six times. Each pass hunts for one type. You are not reading the output six times; you are reading it once but with six different questions in mind, stopping at the places each question points to. In the exercise below you will practice two passes (Factual error and Fabricated source) to learn the rhythm. The worked example next shows all six.
The six error types do not announce themselves. They hide inside the paragraphs that read as most professional, which is exactly why scanning by name beats reading by feel.
Here's what all six look like in real life.
A parent was shopping for a reliable used car. They found one listing they liked: a 2021 Honda CR-V. Before driving an hour to see it, they asked Claude to look it over. They pasted in the listing, the photos, and a note from their own mechanic. Claude wrote back a clean, confident summary: low miles, clean history, strong engine, a rebate to grab. It read well. They almost forwarded it to their partner with "let's buy this one." Instead, they ran the six-row scan.
| Error type | What they found in the write-up | Verdict |
|---|---|---|
| Factual error | Write-up said: "32,000 miles on the odometer." The listing photo of the dashboard clearly showed 58,000. Off by 26,000 miles. | Caught. Corrected from the photo. |
| Logical gap | Write-up said: "It has a clean accident history, therefore it has no mechanical problems." A clean accident record says nothing about the engine. The "therefore" did not hold. | Caught. A clean history is not a clean engine. |
| False confidence | Write-up said: "You will get at least 200,000 trouble-free miles out of this engine." No "should," no "likely," no basis. The flat promise was doing all the work. | Caught. Rewrote as "many CR-Vs last a long time, if serviced." |
| Missing context | Write-up never mentioned the timing belt, which is due for replacement around 60,000 miles. The parent's own mechanic had flagged it. The model never saw that note. | Caught. Added the belt as the first thing to check. |
| Fabricated source | Write-up said: "As Consumer Reports wrote in their March 2026 reliability issue, this is the most dependable small SUV on the market." The parent checked Consumer Reports. No such note. | Caught. Removed the quote. |
| Stale fact | Write-up said: "It still qualifies for the dealer's $1,000 loyalty rebate." The parent called the dealer. That rebate ended last month. | Caught. Dropped the rebate from the math. |
Five out of six mistake types showed up in one short summary. The hardest one to catch was the fake Consumer Reports quote, because it sounded exactly like something a real magazine would write. Because the parent checked each mistake type by name, they went to see the car knowing the real mileage, the repair it needed, and the actual price. Notice who this protected: not the parent's reputation with some auditor, but the parent's own wallet. If they had trusted the summary, they would have driven an hour to buy a car they thought had 32,000 miles, paid a price that assumed a rebate that no longer existed, and skipped a repair they did not know was coming. The scan did not save them from looking bad. It saved them from being wrong about their own decision.
What happens if you just read without checking by type:
| How you read | What you miss | Why |
|---|---|---|
| You read the whole thing asking "does this sound good?" | Wrong numbers. Your eyes skip past numbers when everything sounds smooth. | The wrong mileage (32,000 instead of 58,000) is easy to miss. Checking for "Factual Error" forces you to stop at every number. |
| You trust a quote because it names a brand you know | The fake Consumer Reports quote. Real magazine name, but the quote was invented. | It sounds real, and that is exactly the trap. Checking for "Fabricated Source" forces you to look up every quote. |
| You read "therefore" as just a connecting word | The logical gap. "Clean history, therefore no problems" sounds right but skips a step. | When you check for "Logical Gap," you stop at every "therefore" and ask: does this actually prove what it claims? |
| You only notice missing info if something feels off | The timing belt replacement due at 60,000 miles. AI never mentioned it, so nothing on the page warns you. | Missing information never jumps out on its own. You have to ask yourself: what would someone who knows about cars ask that AI did not? |
The parent who checked by type and the parent who just read casually could be the same person. The only difference is how they read AI's output: one checked each mistake type by name, the other just read and hoped nothing was wrong.
The filled-in scan grid is also worth keeping, not just running. It is the same kind of evidence the page's rule is about — the deliverable is the documented evidence of thinking. When you hand someone a report and they ask "did you check AI's numbers?", the grid is your answer. More often, it is a note to future you: six months from now, when you wonder whether you verified that statistic or just trusted it, the grid tells you which.
Try it yourself
You are buying a used car this weekend. The seller says another buyer is already interested, so you need to decide quickly. You asked AI to compare two cars and tell you which one to buy. Here is what AI wrote back. Read it, then check it for each of the six mistake types listed above. Start with Factual Error and Fabricated Source (those two cost you the most money if you miss them). Fill in the boxes below.
Which car should you buy?
Go with the 2020 Toyota Corolla. The Corolla gets 47 mpg combined, so you will spend far less at the pump than with most cars its size. According to the CarReliability Index 2026 rankings, the Corolla scores 9.4 out of 10, the top spot in its class. The 2019 Honda Civic is also a fine car. The Civic has lower mileage, therefore it is the more reliable choice if you want fewer surprises down the road.
Either car will run for another decade without a major repair, so you can pick on price and color and feel good about it. Both still qualify for the $2,000 state clean-vehicle rebate, which brings your real cost down nicely. Either way, you are getting a dependable car.
(If you prefer, you can skip the car example and use any real AI output from your own life instead: a homework answer, a college application draft, a research summary. The six mistake types work on any topic.)
The AI grader will check two things:
- Did you actually check each type, or did you just read and guess? Rate 1-10. A good answer has something written for every row. If you checked a type and found nothing wrong, write "checked, nothing found" instead of leaving it blank.
- Did you catch the important mistakes, or only the easy ones? Rate 1-10. If I missed a bigger mistake in the same write-up, tell me which sentence I should have caught.
Do not rewrite my work. If a row is blank without explanation, just say so.
For each of the six mistake types, copy the exact sentence from AI's write-up that has the mistake, and explain what is wrong. If you checked a type and found no mistake, write "checked, nothing found."
How sure are you about each one? (Rate 1-10 and say why in one sentence.)
Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.
This takes about 8-15 minutes the first time. It gets faster with practice. After you get your score, find one place where the AI grader disagrees with you. That disagreement is where you learn the most.
What you just did helps you find mistakes inside AI's answer. But there is another kind of problem it does not catch: what happens after you act on AI's advice? If you buy the wrong car, you lose money on repairs. If a company follows bad AI advice, customers leave. One decision causes another problem, which causes another. Discipline 4 teaches you to trace those chain reactions before they happen.
Want a strong sample to compare against? (Open after you submit your own.)
Another reader did the same used-car exercise. This is not the only right answer, but it shows what a good one looks like.
| Mistake type | Sentence from AI's write-up | What is wrong |
|---|---|---|
| Factual error | "The Corolla gets 47 mpg combined." | Wrong number. The real rating is about 33 mpg. This changes how much you would spend on gas. |
| Logical gap | "The Civic has lower mileage, therefore it is the more reliable choice." | Lower mileage helps, but it does not prove a car is reliable. The word "therefore" makes it sound proven when it is not. |
| False confidence | "Either car will run for another decade without a major repair." | Nobody can promise that about a used car. AI stated it as a fact with no "probably" or "likely." That is a guess pretending to be a fact. |
| Missing context | (Not in the write-up.) The 2019 Civic has an open airbag safety recall. | AI never mentioned this. A safety recall is exactly the kind of thing you need to know before buying, and AI left it out. |
| Fabricated source | "According to the CarReliability Index 2026 rankings, the Corolla scores 9.4 out of 10." | This index does not exist. AI invented a source that sounds real. If you search for "CarReliability Index," you will find nothing. |
| Stale fact | "Both still qualify for the $2,000 state clean-vehicle rebate." | That rebate ended in 2025. It was true once but is not true now. This would change the price you actually pay. |
Why this is good: Every row has an answer, and each quoted sentence is one that would actually change which car you buy. The Missing Context row names a specific safety recall. The Fabricated Source row points to an index that does not exist.
What this does not try to do: catch everything. You are not writing a full report. Six rows in fifteen minutes is the goal. Three real catches are better than thirty weak ones.
Why this works (the research behind it)
The Error Taxonomy works because of a quirk in how human judgment handles smooth writing. When text is easy to read, we trust it more — independent of whether it's true. AI writes very smoothly, which makes it a near-perfect trigger for that bias. Four findings explain why scanning by named type beats reading for feel.
Processing fluency (Adam Alter & Daniel Oppenheimer, 2009). Reviewing decades of experiments, Alter and Oppenheimer showed that the ease with which we process something — clear type, simple words, smooth phrasing — gets misread by the brain as a signal that the content is true. The feeling of "this reads well" leaks into the judgment "this is correct," even though the two have nothing to do with each other. AI output is engineered to be maximally fluent, so it pulls this lever hard. Scanning for a specific error type breaks the spell: you stop evaluating how the text feels and start checking whether a particular kind of claim holds up.
Read the paper (open access): Uniting the Tribes of Fluency to Form a Metacognitive Nation, Personality and Social Psychology Review, 13(3), 2009.
Cognitive ease (Daniel Kahneman, 2011). Kahneman's framework gives the mechanism a name: when information arrives effortlessly, the fast, automatic part of the mind (System 1) accepts it and the slow, checking part (System 2) never wakes up. Smooth AI prose keeps System 2 asleep. The six-type scan is a deliberate way to switch System 2 back on — each named check is a task the automatic mind cannot do on autopilot, which forces the effortful look the fluent text was lulling you out of.
Read more: Thinking, Fast and Slow (Wikipedia); the relevant material is the chapter on cognitive ease.
Confidence is not accuracy (Nate Silver, 2012). Studying forecasters across politics, finance, and sports, Silver documented a consistent gap: the people who sound the most certain are frequently the least accurate, because confidence and calibration are separate skills. AI inherits the worst of this — it states almost everything in the same assured tone whether it's right or inventing. The "False confidence" row in the scan exists precisely to separate the tone from the truth: you flag the flat, unhedged claim as a warning sign rather than reading its certainty as evidence.
Read more: The Signal and the Noise (Wikipedia).
Why six separate checks instead of one overall judgment. Gerd Gigerenzer's work on risk shows that how a problem is represented determines whether people reason about it well — break a murky judgment into clear, concrete pieces and accuracy jumps, even though the underlying facts haven't changed. "Is this AI output any good?" is exactly the kind of murky, all-at-once judgment people are bad at. The scan decomposes it into six concrete questions you can answer one at a time, which is why it catches mistakes that a single holistic read sails past.
Read more: Gerd Gigerenzer (Wikipedia), summarizing the argument in Calculated Risks (2002).
The Error Taxonomy combines all four. Fluent text feels true (Alter & Oppenheimer) and keeps the checking mind asleep (Kahneman), AI's uniform confidence hides which claims are actually shaky (Silver), and a vague "does this seem right?" is the wrong representation for catching errors (Gigerenzer). Naming six error types and checking each one fixes all four at once. Nobody has tested this exact checklist against AI specifically — but each piece of the mechanism is well established. Applying it to AI output is the natural next step.
Go deeper: Part 0 Chapter 2: Detecting Broken Reasoning. The full version (8 mistake types, cross-checking with a second AI, and tracking your accuracy over time; 60-75 min) turns this into a complete system.
The six-type scan works best when you know the topic. But what about topics you are new to? Three tricks help:
- Ask AI for the exact source. Do not accept "studies show." Ask: "Give me the author name, the title, the year, and where it was published." If AI cannot give you a real source, do not trust the claim.
- Be suspicious of exact-looking numbers with no source. "Sales went up 47.3%" sounds very precise. But if AI does not say where that number came from, the precision is a warning sign, not proof.
- When you are not sure, label it MODIFY. If you cannot check a claim in two minutes, do not ACCEPT it. Write MODIFY and add "not yet checked." You can look it up later before you use it.
Discipline 4: Thinking in Systems
A university decided to save money by replacing some in-person tutoring with an AI chatbot. They asked AI, and AI said: "This saves 30% on tutoring costs." That sounded great, so they went ahead.
Six months later: the students who struggled the most stopped coming for help, because the chatbot could not understand their questions. Their grades dropped. Parents complained. The university had to hire more tutors to fix the damage, and it ended up costing more than the original budget. The answer "saves 30%" was correct on paper. But the chain reaction wiped out the savings.
This is the failure mode of Discipline 4. When you ask AI about a decision, it answers the question you asked — "how much will this save?" — and stops there. It almost never traces the chain reactions: Effect A causes Effect B, which causes Effect C, and sometimes Effect C circles back and undoes your original decision. A Cascade Map is how you trace those chain reactions yourself, before you commit, so the surprise happens on paper instead of six months into a budget you cannot take back.
Notice who this protects: not your reputation with an auditor, but your own decision. Nobody audited the university for a bad chatbot rollout. The university just spent the money, lived with the consequences, and spent more money fixing it. The cascade map is not a defense you show later — it is the thinking that stops you from making the expensive move in the first place.
Why "thinking in systems"? A system is any set of parts that affect each other — students, tutors, budgets, and grades are not separate facts, they push on one another. Most of us reason in straight lines: this causes that, end of story. But the parts of a system are connected in loops, so an effect can travel around and come back to change the thing that started it. "Thinking in systems" just means refusing to stop at the first effect — you keep asking "and then what?" until you find where the line bends back into a circle. The Cascade Map is the paper version of that habit: it lays out the parts, traces the lines between them, and looks for the place where a line loops back.
Here is how to build one. This takes about 20 minutes the first time, and 10 minutes once you are used to it.
- Write your decision in one clear sentence. Be specific. Not "maybe change tutoring" but "replace half of in-person tutoring hours with an AI chatbot starting next semester."
- List five groups of people this decision affects. Every big decision touches different people. A good starting list: the people doing the work (e.g., tutors), the people using the service (e.g., students), people who compete with you (e.g., other universities), the rules that apply (e.g., university policies), and what your team knows or does not know (e.g., how good is the chatbot, really?).
- For each group, ask "and then what?" three times. Start with the first thing that happens. Then ask what that leads to. Then ask what comes after that. Three layers deep.
- Find at least one loop. Look for a place where a later effect circles back and makes your original decision worse (or better). Be specific about how it happens.
- If your map looks clean and simple, you stopped too early. The real risks hide in the second and third layers. Push deeper until it looks messy.
Here's what building one chain feels like. Take the tutoring decision and just the group "students who struggle most." Start with the first thing that happens, then ask "and then what?" twice more.
- First layer: The struggling students try the chatbot. It cannot understand their half-formed questions, so they give up and stop asking for help.
- And then what? (Second layer.) With no help, their grades drop. They are the students who most needed support, and they got the least.
- And then what? (Third layer.) Some of them transfer to a university that still has human tutors. The university loses their tuition.
That last link is where the surprise lives. The decision was "save 30% on tutoring." But three layers down, it turns into "lose tuition revenue from the students who needed us most." You would never see that by asking AI "how much will this save?" — you only see it by asking "and then what?" three times in a row.
Now look for the loop: lost tuition means an even tighter budget, which means even less money for tutoring, which means the chatbot has to cover even more, which means even more struggling students give up. The original decision feeds itself. That is the loop, and it is the thing that turns a one-time 30% saving into an ongoing decline.

This drawing is called a Cascade Map. The goal is not to predict the future perfectly. The goal is to find the loops before you commit, while changing the decision is still free.
If your map looks neat and tidy, you probably only wrote down the obvious effects. The real risks are in the deeper layers. Keep going.
You and AI have opposite blind spots here, which is why this discipline is a partnership. AI is good at answering the specific question you asked and bad at noticing the side effects your decision creates. You are better at thinking of the people AI forgot and the chain reactions that take months to play out. So you draw the map first — that is the part only you can do — and then you can ask AI to help you stress-test each branch you have drawn.
For a real decision, the map can take 20-30 minutes. The exercise below uses a shorter example so you can practice the technique.
The map shows where to look; the loop shows what undermines the decision. The mess is the feature, not a bug.
Read the diagram in two passes. The top half is the breadth pass: one decision in the middle ("replace loan officers with AI"), five domains around it, and the first thing that happens to each. Most of these are the effects anyone would list — employees lose jobs, customers get worse service, competitors feel pressure to copy you. The one that's easy to miss is Internal knowledge: tacit local lore lost. Loan officers carry knowledge that was never written into any system — which local businesses are reliable despite a thin credit file, whose income is seasonal so a late payment in March is normal, when an applicant isn't being straight. Replace the officers and that knowledge walks out the door, because it was never in the software the AI learned from.
The bottom half is the depth pass, and it's why the decision backfires. Follow the Customers domain forward: the cost cut removes the officers, so service drops (the AI misses the cues the humans used to catch — exactly the lore that was lost), so customers leave, so revenue drops below what was saved, so the savings vanish. The dashed arrow is the whole point: the chain loops back to the start, meaning the cost-cutting move ends up erasing its own justification. That circling-back is the thing you draw a cascade map to find — and it's invisible if you only ask AI "how much will this save?"
Here is the same discipline on a different decision.
A student council president wanted to save money by moving the annual sports day from a rented stadium to the university's own ground. AI said: "This saves 40% of the event budget." The benefits were obvious: no rental fee, closer to campus, easier to set up. AI listed all the positives and recommended going ahead.
Before presenting the idea, she drew a cascade map. Her decision: move sports day from the rented stadium to the university ground to save 40% of the budget. She listed five groups and traced three layers for each. The obvious effects were expected (saves money, less travel for students, smaller venue). But the third layer revealed a problem she had not thought of: the university ground holds far fewer spectators, so fewer families would attend, so the event would feel smaller, so sponsors who paid for visibility would pay less next year, so the budget would shrink, so the event would have to get smaller again. A loop — the cost-saving move quietly shrinking the event year after year.

The image above shows her full cascade map: what happens to each group (students, sports teams, food vendors, admin, sponsors), the loop that would have shrunk the event year after year, and the protections she added to prevent it — a guaranteed minimum sponsor package and a spectator-capacity check before committing. AI's original answer ("just move it, you save money") had none of these protections. She still saved money, but she saved it without triggering the loop.
The cascade map she drew is itself a piece of documented evidence of thinking — the same thing the rule at the top of this page is about. When she presented to the council and someone asked "won't this shrink the event?", she did not have to think on her feet. She pointed at the loop she had already mapped and the protection she had already built. The map was both the thinking and the evidence of it.
Try it yourself
Your exercise: Your university just announced that all exams next semester will use AI proctoring (an AI watches you through your webcam during the exam) and will be online-only. No more in-person exams.
Draw a cascade map with five groups: students, professors, IT staff, parents, and the administration. Go three layers deep for each group. Find one loop where a later effect circles back and makes the original decision worse.
(Or use any real decision from your own life this week. That is what makes it stick.)
The AI grader will check two things:
- Did you cover all five groups with three layers each, and did you explain how each effect happens (not just name it)? Rate 1-10. Tell me which group is the weakest and what I missed.
- Is your loop a real chain of cause and effect, or just a label? Rate 1-10. "Students react" is a label. "Students with bad internet fail exams, which lowers the university's pass rate, which makes administration rethink the policy" is a real chain. If mine is just a label, show me how to turn it into a chain.
Do not redraw my map. If a box is empty or vague, just say so.
Your cascade map (write the decision, then list each group with three layers of effects. It does not have to be neat):
Your loop (write it as one chain of cause and effect):
Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.
This takes about 15-20 minutes the first time. The first few "and then what?" questions feel awkward. That is normal. The real insights usually show up at the third layer, not the first. With practice, you can do a full map in 8-12 minutes.
After you get your score, look for a group the AI grader mentioned that you forgot. That is where your blind spot is. If the AI found a loop you missed, pay extra attention to it. Loops are the most important thing because they show you when a decision will backfire over time.
What you just did helps you trace what happens after a decision. But it does not check whether the decision is based on the right information in the first place.
A perfectly mapped plan built on a wrong assumption still fails. It just fails later, with better notes. That is what Discipline 5 is for.
Want to see a good example? (Open this after you submit your own.)
Another student did the same AI-proctored exam exercise. This is not the only right answer, but it shows what a good cascade map looks like.
Decision: All exams next semester will be AI-proctored and online-only.
| Group | What happens first | What that causes | What THAT causes |
|---|---|---|---|
| Students | Students with slow internet or old laptops struggle | Some get wrongly flagged for "cheating" by the AI proctor | Those students file appeals; trust in the exam system drops |
| Professors | Professors cannot see students during the exam | They cannot tell if a student is confused or stuck | Professors redesign exams to be shorter and simpler, which lowers the standard |
| IT staff | IT has to set up and support the proctoring software | Students call IT constantly during exam week with tech problems | IT is overwhelmed; response times get worse for everyone on campus |
| Parents | Parents worry about privacy (webcam recording) | Some parents contact the administration to complain | The university has to write new privacy policies, which takes months |
| Administration | Administration saves money on exam halls | But they spend money on proctoring software licenses | The cost savings turn out to be smaller than expected |
The loop: Students with bad internet get flagged for cheating → they file complaints → the administration has to hire people to review each complaint manually → this costs more than booking exam halls did → the administration considers going back to in-person exams → the original decision gets reversed.
Why this is good: All five groups are covered with three layers each. Each effect explains how it happens, not just what happens. The loop is a real chain: it starts with a student problem and ends by undoing the original decision.
What this does not try to do: list every possible effect. There are more loops in this scenario (professors quitting, students transferring). The point is to find one real loop with a clear chain of cause and effect, not to find all of them on your first try.
If your map looks tidier than this one, that's the signal: go one more "and then what?" deeper in your two weakest domains and look again for a loop.
Why this works (the research behind it)
The Cascade Map isn't a new idea — it's a stripped-down version of system dynamics, a field that has spent seventy years documenting one stubborn fact: people reason in straight lines, but the world runs on loops. Three bodies of work explain why drawing the map beats thinking it through in your head.
Demand amplification (Jay Forrester, 1958). Forrester, who founded system dynamics at MIT, showed that a decision made at one point in a chain ripples outward and comes back distorted. His most famous demonstration is what's now called the bullwhip effect: a small, steady change in customer demand at the retail end produces wild swings in factory orders upstream, because each link reacts to the link next to it without seeing the whole loop. The lesson generalizes far beyond supply chains — when you decide in straight-line terms ("this saves 30%"), you miss the way the effect travels through the system and returns changed. The Cascade Map is the tool that makes the return trip visible before you commit.
Read more: Bullwhip effect (Wikipedia), which traces the idea to Forrester's "Industrial Dynamics: A Major Breakthrough for Decision Makers," Harvard Business Review, 36(4), 1958.
Misperception of feedback (John Sterman, the Beer Game). Sterman ran a now-classic experiment, the beer distribution game, in which players manage one link of a simple supply chain. Even smart, motivated participants — MBA students, executives — reliably create large, costly oscillations, because they respond to what's in front of them and ignore the delays and feedback loops in the system they cannot see. The failure isn't lack of effort or intelligence; it's that the loops are invisible unless something forces you to lay them out. That "something" is exactly what the Cascade Map provides: a five-minute, low-stakes version of the forced drawing that makes the loop visible before it costs you anything.
Read more: Beer distribution game (Wikipedia). The full treatment is in Sterman's Business Dynamics: Systems Thinking and Modeling for a Complex World (McGraw-Hill, 2000).
Leverage points (Donella Meadows, 2008). Meadows, who worked in the same MIT tradition, spent her career arguing that the most powerful places to change a system are almost never the obvious ones. The biggest leverage usually sits in feedback loops — the very structures that straight-line analysis never names. Her blunt corollary: you cannot adjust, weaken, or protect against a loop you have not drawn. The Cascade Map's whole job is to surface at least one loop, because that loop is both the hidden risk and the highest-leverage place to intervene.
Read more: Meadows's essay Leverage Points: Places to Intervene in a System, which became the basis for her book Thinking in Systems (Chelsea Green, 2008).
The Cascade Map combines all three. Decisions ripple through a system and return distorted (Forrester), people reliably miss those return trips unless forced to draw them (Sterman), and the loops they miss are exactly the highest-leverage places to act (Meadows). The map is the forced drawing that catches the loop while changing the decision is still free. No one has tested the Cascade Map against AI specifically — but the underlying finding, that humans miss feedback loops and that externalizing them fixes it, is one of the most replicated results in the field. The AI-era twist is just that you now have a partner with the opposite blind spot: AI is strong on the breadth you'd forget and weak on the loops you're built to sense, so drawing the map together closes both gaps at once.
Go deeper: Part 0 Chapter 3: Thinking in Systems. The full version (peer review plus AI counter-analysis plus the assessment rubric; 60 minutes) makes this a system.
Part 3: Origination (doing what AI cannot)
Part 1 taught you to think before asking AI. Part 2 taught you to spot mistakes in AI's answers. Part 3 is about something different: doing the thinking that AI cannot do for you.
AI has two big blind spots here. First, it gives you the most common answer, not the best answer for your situation. If a thousand people asked the same question, AI gives you the average of what worked for them. But your situation might be different. Second, the more you use AI, the easier it is to stop thinking for yourself and just accept whatever it says.
Disciplines 5 and 6 fix both problems.
Before you start, learn one important phrase: named threshold. A named threshold is a specific condition that tells you when a piece of advice stops working. For example: "This advice works when your class has fewer than 30 students" is a named threshold. "This works sometimes" is not, because "sometimes" does not tell you when. You will use this phrase in a minute.
Discipline 5: First Principles
You are the president of your university's coding club. Every other club on campus just started charging membership fees. Your vice president, your faculty advisor, and two senior members all say the same thing: "We should charge fees too, everyone else is doing it." You ask AI. AI agrees. Everyone is pointing the same direction.
That agreement is the danger. When everyone lines up behind the same answer — including AI — it feels settled, and it is easy to stop thinking and go along. But the common answer is built on what works for most clubs. Yours might be the exception, and nobody in the room is checking whether it is. This discipline is how you check.
The check has a specific shape: you take the common advice and find the exact condition where it stops working. Most people, when they doubt advice, produce a vague complaint — "charging fees is not always a good idea." That is useless, because "not always" never says when. The skill is turning the vague complaint into a named threshold — a specific, numbered condition where the advice breaks.
Why "first principles"? Reasoning from first principles means refusing to accept an answer just because everyone repeats it, and instead working out what is actually true for your situation. Usually people picture this as building an answer from scratch. This discipline does the lighter, faster version: instead of rebuilding the advice, you test it — you find the exact conditions where the common answer stops being true for you. Same root move (don't take the consensus on authority; check it against your own case), aimed at the boundary rather than the blank page.
Watch the move once. Same situation, two ways of doubting the advice:
- Vague complaint: "Charging fees is not always a good idea."
- Named threshold: "When your club's main goal is to attract first-year students who have never coded before, and most of them cannot afford a fee, charging money will scare away exactly the people you are trying to reach."
The first one is a shrug. The second one tells you exactly when the advice fails (first-years who cannot afford it) and why (the fee blocks the people the club exists to reach). The first changes nothing. The second changes your decision. That gap — between a shrug and a named condition — is the whole discipline.
Here is how to practice it. Pick a piece of common advice that everyone around you (and AI) is telling you to follow. Then write three rows. In each row, describe a specific situation where that advice would not work. Use a real number or a real condition, not just "sometimes."
| The common advice | When does it stop working? (Use a specific number or condition.) |
|---|---|
If you cannot fill three rows with specific conditions, you have been following the advice without really understanding it.
How to tell if your row is good: A row that says "when your club has more than 80% first-year members who have no income, charging fees will cut your membership in half" is useful. It tells you exactly when the advice breaks. A row that says "charging fees does not always work" is too vague to help you make a decision.
Notice who this protects. Nobody was going to audit the club president for charging fees — the whole room, plus AI, agreed it was the right move. If she had followed the consensus, she would simply have made a worse decision, watched membership drop, and never known the named threshold was the reason. The threshold is not something you produce to defend yourself later. It is the thing that catches a bad decision while everyone around you is still nodding along.

Here is what a good result looks like.
The coding club president above did not write three perfect rows on her first try. After thinking it through, she had this:
| Common advice: "Every club should charge membership fees." |
|---|
| Boundary 1. When more than 80% of your members are first-year students with no income, charging fees will scare away exactly the people you are trying to reach. Threshold: 80% first-year, no-income members. |
| Boundary 2. When your club's main value is free workshops that anyone can join, adding a fee creates a barrier that kills walk-in attendance. This matters most when your campus has 3 or more competing clubs that are still free. Threshold: 3+ free competing clubs on the same campus. |
| Boundary 3. When your club gets most of its budget from a university grant that requires you to be open to all students, charging fees could make you lose the grant. Threshold: a grant with an "open access" requirement that covers more than half your budget. |
She presented the three boundaries to her faculty advisor. They decided to keep the club free and raise money through sponsored hackathons instead. By the end of the semester, membership had grown 40% while other clubs that started charging saw attendance drop. None of the three boundaries appeared in the common advice. None appeared in AI's first answer either.
Those three boundaries are also a piece of documented evidence of thinking — the same thing the rule at the top of this page is about. When the president sat down with her advisor, she did not say "I have a bad feeling about fees." She put three named conditions on the table. The difference between a feeling and three named thresholds is the difference between being overruled and being listened to. The rows were both her thinking and the proof she had done it.
Without named thresholds, she would have written something like this:
| Common advice: "Every club should charge fees." | Why this does not help |
|---|---|
| Sometimes charging fees is not a good idea. | Too vague. "Sometimes" does not say when. This could mean when 5% of members leave or when 90% leave. It does not help you decide. |
| Other clubs do not always know what they are doing. | This is a complaint about other clubs, not a reason for your decision. It does not change anything. |
| It depends on the situation. | Saying "it depends" without saying on what does not help. Everyone already knows it depends on the situation. |
Try it yourself
Your exercise: Pick any common advice that people keep telling you. Examples: "follow your passion," "always study in a group," "save 20% of every paycheck," "do not skip lectures." Write three rows. In each row, name a specific situation (with a number or condition) where that advice stops working.
(This works the same no matter which advice you pick.)
Before you start, remember: A threshold uses a specific number or condition ("when your class has more than 200 students"). Words like "sometimes," "often," and "it depends" are not thresholds.
If you cannot come up with a third row, that means you have been following the advice without really understanding it. Try picking different advice instead of forcing a weak third row. That itself is a useful discovery.
The AI grader will check two things:
- Does each row have a specific threshold (a number, a condition, a clear situation)? Rate 1-10. Quote the weakest row.
- Does each row explain why the advice fails in that situation, or does it just say "it does not work"? Rate 1-10. Point out any row that is a vague complaint instead of a real explanation.
Do not rewrite my rows. If a row is empty or vague, just say so.
The common advice I am examining:
Row 1: When does this advice stop working? (Name a specific condition and explain why.)
Row 2:
Row 3:
Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.
This takes about 15-25 minutes the first time. Thresholds are harder to write than you expect. After you get your score, look for any row where you wrote "sometimes" or "it depends" and rewrite it with a real number or condition. If you cannot rewrite it, that row is probably not a real boundary. Drop it and try a different one.
What you just did helps you find where one piece of advice stops working. But it does not help you work with AI on problems where there is no obvious advice to challenge. That is what Discipline 6 is for.
Want to see a good example? (Open this after you submit your own.)
Another student picked the advice "always study in a group." Here are her three rows:
| Common advice: "Always study in a group." |
|---|
| Row 1. Groups bigger than 5 people do not work well. Most people just sit and listen while 2-3 people do the real work. When it breaks: more than 5 people. |
| Row 2. Some subjects need quiet focus (like solving math problems or writing essays). In a group, someone interrupts you every few minutes. When it breaks: tasks that need more than 30 minutes of quiet thinking. |
| Row 3. When one person knows a lot more than everyone else, they spend the whole time explaining instead of studying. They fall behind on their own work. When it breaks: when the best and weakest student are more than 2 grade levels apart. |
Why this is good: Every row uses a specific number (5 people, 30 minutes, 2 grade levels). Every row explains why the advice fails, not just that it fails.
Three clear rows is enough. You do not need to list every possible situation.
Why this works (the research behind it)
The First Principles move — find the exact condition where common advice stops being true for you — sits on top of three older ideas about why advice fails and how to test it.
Ecological rationality (Gerd Gigerenzer, Peter Todd & the ABC Research Group, 1999). Their core finding is captured in one short equation: heuristic + environment = outcome. A rule of thumb is never good or bad on its own — it is good in the environments it fits and bad in the ones it doesn't, and the whole skill is knowing which environment you're in. "Charge membership fees" is a heuristic tuned for clubs with members who can pay; drop it into a club of first-years with no income and the same rule backfires. A named threshold is just you stating, precisely, the environment where the advice stops fitting — which is exactly the judgment this body of work says separates good decisions from bad ones.
Read more: Ecological rationality (Wikipedia), summarizing Gigerenzer, Todd & the ABC Research Group, Simple Heuristics That Make Us Smart (Oxford University Press, 1999).
Recognition-primed decisions (Gary Klein, 1998). Studying firefighters, nurses, and other experts under pressure, Klein found that they rarely weigh options — they recognize a situation as familiar and run the first pattern that fits, usually without noticing they've done it. That's fast and often right, but it's also exactly how consensus advice slips past unexamined: it feels like a recognized, settled answer. Forcing yourself to write down when the pattern would fail is the deliberate pause that interrupts the automatic match — you stop pattern-matching long enough to check whether your case is the exception.
Read more: Recognition-primed decision (Wikipedia); the full account is in Klein's Sources of Power: How People Make Decisions (MIT Press, 1998).
Falsifiability (Karl Popper, 1959). Popper argued that a claim only tells you something about the world if you can state what would prove it wrong. A belief that survives every possible outcome explains nothing. A named threshold is the falsifiability test applied to advice: "this works unless more than 80% of members can't afford the fee" names the exact condition under which you'd abandon the advice. A vague complaint — "it doesn't always work" — names no condition, can never be checked, and so changes nothing. That is the difference between a threshold and a shrug.
Read more: Falsifiability (Wikipedia), the idea Popper introduced in The Logic of Scientific Discovery (1959).
First Principles combines all three. Advice is only ever right for the environment it fits (Gigerenzer & Todd), consensus slips past you because recognizing a "settled" answer is automatic (Klein), and the cure is to name the exact condition that would prove the advice wrong for you (Popper). A named threshold does all three at once: it states the environment, interrupts the automatic match, and is specific enough to be checked. Nobody has tested this exact exercise against AI, but the underlying ideas have held up for decades. Using them to pressure-test AI's confident consensus answers is the natural next step.
Go deeper: Part 0 Chapter 4: Reasoning from First Principles. The full version (the Blank Page Sprint: write 500 words against a practice you have been following, then run a structured AI counter-analysis and a peer review, 60 min) lives in Part 0. This page teaches the row shape. That page teaches the longform argument.
Discipline 6: Working WITH AI
You spent the morning working with AI on an important essay. The result looks great. The arguments are clear and the writing is polished. Then your professor asks: "Which parts of this are your ideas and which parts came from AI?" You open your mouth and realize you cannot tell. Some sentences are yours. Some are AI's. Most are a mix. The essay is good. You just do not know which parts you can actually explain and defend.
Here is how to fix that. Do the same task three different ways, then compare the results side by side.
- Solo. 15 minutes, no AI. Just you and the problem.
- AI-only. 5 minutes. Ask AI, accept the first answer, do not change anything.
- Collaborative. 10 minutes. Ask AI, read critically, disagree where needed, ask follow-up questions, rewrite parts yourself.
Then compare all three versions. Ask yourself: Which version is the best? Which parts of the "together" version are better because of something you pushed back on? The "together" version usually wins, but the real lesson is seeing exactly where your thinking made it better. That is what this discipline is about.
Here is what the comparison feels like. Suppose your task is the closing line of an email to a professor. Lay the three versions next to each other and read just that one line:
- Solo: "Thanks, and sorry again for the trouble." (Apologetic, a little weak.)
- AI-only: "Thank you for your time and consideration." (Polished, but it could be any email from anyone.)
- Collaborative: "I can show you what I have so far if that helps." (Yours — it proves you have already started the work.)
Reading them side by side is the whole move. The Solo line shows what you would have written alone. The AI-only line shows what AI defaults to. And the Collaborative line is the one you can defend, because you know why it beats the other two: it does a job neither of the others does. You did not just feel that the collaborative version was better — you can point at the line and say what it does. That pointing is the skill.
For a real project, the full comparison takes about 30 minutes. The exercise below is a quick 10-minute version so you can feel the difference today.

Here is what this looks like across a whole task.
A student had to write an email to her professor asking for a deadline extension on a major assignment. She had a real reason (she was dealing with a family emergency), but she needed the email to be honest without sounding like an excuse. She decided to try all three paths.
Solo, 15 minutes. She wrote the email herself with no AI help. It was honest and personal. She explained her situation clearly. But she rambled, and the actual request ("can I have 5 more days?") was buried at the bottom. The email was too long and the professor might not read to the end.
AI-only, 5 minutes. She gave AI the situation and accepted the first draft without changing anything. The email was polished and well-structured. But it sounded generic, like a template anyone could have sent. It did not mention any specific details about her situation. It did not sound like her. The professor would probably think she just copied an AI email.
Collaborative, 10 minutes. She wrote the opening herself (explaining her specific situation in her own words), then asked AI to help her restructure the email so the request came first. AI suggested softening the tone; she disagreed and kept her direct wording because she knew this professor prefers honesty over politeness. She also asked AI for a closing line; AI's version was too formal, so she rewrote it to match how she actually talks. The final email was clear, personal, and well-structured. The professor replied within an hour and gave her the extension.
The collaborative version won because of two specific things she did: she kept her own direct wording (which AI tried to soften), and she put the request at the top (which she would not have thought to do on her own). She can point to exactly where her judgment made the email better.
That last sentence is the connection to the rule at the top of this page. The three versions side by side are the documented evidence of thinking. The win is not "the email was good." Plenty of AI-only emails are good. The win is that she can show, line by line, where her judgment changed the result — and that is the thing the professor's question ("which parts are yours?") was really asking for. Notice, too, that the payoff did not depend on the professor ever asking. Even if no one had questioned it, the comparison made her email genuinely better than either of the other two paths would have produced. The audit just makes visible a value that was there either way.
Why you need all three versions, not just the Collaborative one:
- Without the Solo version, you do not know what you would have written on your own. So you cannot tell which ideas in the final email are yours and which came from AI.
- Without comparing all three, you cannot prove the Collaborative version is actually better. "It feels better" is not a real answer if someone asks why you chose this version.
- Without the AI-only version, you cannot tell if you just accepted everything AI said. If your Collaborative and AI-only versions look almost the same, you did not really collaborate. You just copied.
Use this for work where your personal experience matters: emails that need to sound like you, decisions where AI does not know your situation, creative work that needs your ideas. For simple tasks where AI does fine on its own (like formatting a table or summarizing notes), just let AI do it. Do not waste this exercise on tasks that do not need your judgment.
Try it yourself
Start here: Write a message to your landlord asking for a rent reduction, or a message to your professor requesting a deadline extension. Something where you have context the AI doesn't (your payment history, your relationship with the person, the specific situation).
Workplace version: Your boss asks you to write a one-page memo recommending whether your company should buy a smaller competitor. The competitor has 90 people and was growing fast until last quarter, when it lost its biggest customer (who made up 22% of their revenue). They're open to being bought for $40-55M. Your recommendation will be quoted back to you for the next three years.
For either option, do all three versions: Solo (5 min), AI-only (3 min), Collaborative (5 min). Lay all three side by side. The point is not the memo. The point is the felt difference between the three paths.
(Or pick any real decision on your desk this week. The closer to something real, the sharper the comparison.)
Do not skip the AI-only draft. It is the most tempting one to drop ("I already know what AI would say") and the most diagnostic one to keep. If your Collaborative ends up uncomfortably close to your AI-only, you over-accepted. You only learn that by writing both.
The AI grader will check two things:
- Are your three versions actually different, or do they all say the same thing? Rate 1-10. If the Solo and Collaborative versions look almost identical, say so.
- Are your three overrides specific? Rate 1-10. Each override should be something you can point to and say "without this, the email would have been worse." If any override is vague (like "I made it better"), say so.
Do not rewrite my work. If a box is empty or vague, just say so.
Describe each of your three versions (what you wrote, what surprised you, where it fell short):
Name three specific things you changed or added in the Collaborative version that made it better:
Which version would you actually send, and why?
Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.
This takes about 15 minutes total including thinking time. After you get your score, look for a place where the AI grader says your Solo version was better at something. That tells you where your Collaborative version relied too much on AI instead of on your own thinking.
What you just did is the whole crash course in one exercise. You formed your own opinion before asking AI (Discipline 1). You tracked what you agreed and disagreed with (Discipline 2). You checked for mistakes (Discipline 3). You thought about what happens next (Discipline 4). You tested where common advice stops working (Discipline 5). And you kept your own judgment when AI tried to take over (Discipline 6). The point was never the answer itself. The point is being able to show how you thought to get there.
Want to see a good example? (Open this after you submit your own.)
Another student wrote an email to her professor asking for a deadline extension. Here is what each version looked like:
| Version | What she wrote |
|---|---|
| Solo (15 min) | Honest and personal. Explained her family situation clearly. But it was too long, and the actual request ("can I have 5 more days?") was buried at the bottom. She knew it needed restructuring but ran out of time. |
| AI-only (5 min) | Short and well-organized. But it sounded like a template. It used phrases like "I would greatly appreciate your consideration" that she would never say in real life. It did not mention any specific details about her course or her professor. |
| Collaborative (10 min) | She wrote the opening in her own words, then asked AI to help her put the request at the top. AI suggested making the tone softer; she kept her direct wording because she knows this professor likes honesty. She used AI's suggested structure but replaced the closing with her own sentence. |
Three things she changed in the Collaborative version:
- Kept her direct tone. AI tried to make it more formal ("I would be grateful for your understanding"). She kept her original wording ("I need 5 more days") because her professor has said he prefers students who get to the point. Without this, the email would have sounded like every other AI-written extension request.
- Moved the request to the first line. She would not have thought of this on her own. AI suggested it. This was the single biggest improvement over her Solo version.
- Replaced AI's closing line. AI wrote "Thank you for your time and consideration." She replaced it with "I can show you what I have so far if that helps." This showed she had already started the work. Without this, the email would have ended with a generic line that added nothing.
Why this is good: Each override points to something she knew that AI did not (her professor's preference for directness, the fact that she had already started the work). She can say exactly where her judgment made the email better. That is the test.
Why this works (the research behind it)
The pattern this exercise is built on — human plus AI beats either alone, but only when the human keeps the decisions — is one of the most consistent findings in AI productivity research. Three pieces of work explain it from three angles.
Human + machine teaming (Garry Kasparov, on "centaur" chess). After losing to IBM's Deep Blue in 1997, Kasparov didn't conclude that machines had simply won. He helped popularize advanced chess, where a human plays alongside a computer. In the freestyle tournaments that followed, the strongest competitors were often not grandmasters and not the best engines, but ordinary players who were skilled at managing the machine — knowing when to trust its calculation and when to override it with human judgment about strategy. The durable lesson isn't about chess specifically (today's engines are strong enough that a human rarely improves pure play); it's that teaming wins when the human contributes something the machine lacks. In writing and decision-making, that something is your private context — your situation, your reader, what you're really trying to say — which is exactly what the Collaborative path forces you to add.
Read more: Advanced chess (Wikipedia); Kasparov develops the argument in Deep Thinking (PublicAffairs, 2017).
AI lifts the people who know least (Brynjolfsson, Li & Raymond, 2023). In the first large field study of generative AI at work, researchers tracked 5,179 customer-support agents given an AI assistant. Productivity rose 14% on average — but the gain was concentrated almost entirely among novices (about a 34% jump), with little effect on the most experienced agents. The reason is revealing: the AI was essentially handing newer workers the knowledge the expert workers already had. The implication for this exercise is direct — collaboration adds value only when you bring something the AI doesn't already contain. Where AI already knows the answer, there's no judgment for you to add; where you hold the context, your overrides are the whole point.
Read more: Generative AI at Work (NBER), published in the Quarterly Journal of Economics (2025).
AI compresses everyone toward the same middle (Noy & Zhang, 2023). In a controlled experiment, 444 professionals did writing tasks, half with ChatGPT. The tool cut time and raised average quality — but it did so by compressing the distribution: weaker writers improved a lot, stronger writers barely changed, and the outputs grew more similar to one another. That compression is the warning baked into this exercise. If you take AI's draft as-is, you land on the same competent, generic middle everyone else lands on. The Collaborative path is how you climb back off that middle — your overrides are what make the result yours instead of the shared default.
Read the paper (open access): Experimental evidence on the productivity effects of generative artificial intelligence, Science, 381, 2023.
The three-path comparison combines all three. Teaming beats solo work only when the human manages the machine rather than deferring to it (Kasparov); AI adds the most where you know the least, which means your value is whatever the AI doesn't already hold (Brynjolfsson, Li & Raymond); and left unmanaged, AI pulls every output toward the same generic middle (Noy & Zhang). Writing the task three ways — Solo, AI-only, Collaborative — makes all three visible at once: the AI-only draft shows you the generic middle, the Solo draft shows you what's uniquely yours, and the Collaborative draft is where your judgment turns the first into the second. Nobody has tested this exact exercise, but the finding underneath it is about as well established as results in this field get.
The full version of this exercise (the 95-minute three-path comparison, with peer review, XP tracking, and full collaboration-style diagnosis) lives in Part 0 Chapter 6: Working WITH AI, Not For AI. This page teaches the discipline. That page builds the working week around it.
The Capstone: One Decision, Six Disciplines
You are the president of your university's student council. The university just gave your council a surprise budget of $10,000 that must be spent before the semester ends. You see two options. Option A: Hire a professional event planner to organize one big end-of-year farewell party. Option B: Use the money to buy AI tools and equipment that help every council member plan better events all year long. Half the council wants the farewell party. Half wants the AI tools. You need to present your recommendation at Friday's council meeting. Here is how each discipline helps you decide.
Discipline 1, Prediction Lock. Before asking AI anything, you write your four lines. Real decision: not "farewell party vs. tools" but "one big event vs. making every future event better." Question that would settle it: will the AI tools actually get used by enough council members to justify the investment? Your position: pick Option B, the AI tools, because you have watched this council struggle with event planning all year and you know the right tools would change the next four events, not just one. Confidence + what flips you: 55% sure. If fewer than 6 out of 8 council members would actually use the tools, switch to Option A.
Then you ask. Notice your Line 2 question was about the council members, so the answer comes from them, not from AI: you poll the eight directly. Six say they would use the tools and finish the training; two are unsure. That clears your bar of six. Your position holds — Option B — for the reason you wrote down, and now with a number behind it instead of a hunch. (Not every settling question is an AI question. The Prediction Lock just tells you which question to answer; sometimes you answer it by asking people, not a model.)
Discipline 2, Reasoning Receipt. Now you ask AI for advice on how to spend the money. AI says the farewell party will "create lasting memories for 500+ students." You label it MODIFY: the venue only holds 300. AI says AI tools increase event quality by 35%. You label it REJECT: no source given, and your council has never used these tools before. AI mentions that other universities saved money by using AI for event planning. You label it SURFACED: you had not thought about what other universities are doing. You also notice AI never mentioned that a $10,000 farewell party would need event insurance and security — you label that MISSED and add it yourself. After going through AI's suggestions, you have 8 labeled rows. You know exactly which claims you trust and which you do not.
Discipline 3, Error Taxonomy. The receipt handled the claims you stopped on; the error scan is for the ones you might have let through. You run AI's output past the six mistake types and catch three the receipt missed: a fabricated source (AI cited a "2025 National Student Events Report" you cannot find anywhere), a stale fact (the AI-tools price AI quoted is last year's; the current price is about 15% higher), and false confidence (AI flatly claimed the tools "will pay for themselves in one semester" with nothing behind it). The stale price changes your budget math; the other two are reminders of how much of AI's confident tone was unearned.
Discipline 4, Cascade Map. You trace what happens with each option across five groups:
- Council members: Option A means one big event and then nothing. Option B means new skills for everyone.
- Students: Option A gives 300 students one great night. Option B improves every event for all students.
- University admin: Option A is safe and familiar. Option B shows the council is forward-thinking.
- Next year's council: Option A leaves nothing behind. Option B leaves tools and training the next team can use.
- Sponsors: Option A attracts sponsors who want visibility at one event. Option B is harder to pitch to sponsors.
You find one loop: if you pick Option B but only 4 out of 8 members actually use the tools, the events do not improve, next year's council sees no benefit, and they cancel the AI tools. The investment is wasted. This is exactly the reversal condition you named in Line 4 of your Prediction Lock — and it is real enough that you build a safeguard against it.
Discipline 5, First Principles. Everyone says "a big event builds school spirit." You test where that advice breaks. Boundary: when less than 20% of students can attend (300 out of 2,000), the farewell party builds spirit for a small group and the rest feel left out. That boundary changes the picture.
Discipline 6, Working WITH AI. You write your recommendation three ways. Solo: a solid case for Option B, but you forgot to address the council members who want the farewell party. AI-only: a polished recommendation that splits the difference ("do both!") but does not explain how to fit both in the budget. Collaborative: you write the core argument yourself, ask AI to help you address the farewell party supporters' concerns, and add a specific rule: if fewer than 6 out of 8 members complete the AI training within 3 months, the remaining money goes to the farewell party instead.
The council votes for Option B with your safeguard rule. You can explain every part of your recommendation because you built it yourself, with AI's help.
What the six disciplines did: They did not give you the answer. They gave you the trail: a position you committed to on Monday with the specific finding that would flip it, a receipt that shows which AI claims you trusted and which you did not, an error scan that fixed your numbers, a cascade map that found the risk, a boundary that challenged the obvious choice, and a three-version comparison that found the safeguard. Without the six disciplines, you walk into the meeting with "I think Option B is better." With them, you walk in with evidence and a backup plan.
You do not need a Cascade Map to decide where to eat lunch. You do not need a Reasoning Receipt for every text message. Use the six disciplines for decisions that actually matter. For everything else, just decide and move on.
Which disciplines for which decision?
| How important is the decision? | Example | Which disciplines to use | Time |
|---|---|---|---|
| Not important at all | Choosing where to eat, replying to a routine message | None | 0-1 min |
| Somewhat important | Picking a course next semester, buying a laptop | Prediction Lock + Error Taxonomy on the top AI recommendation | 10-15 min |
| Important, with a deadline | Career choice, big purchase, group project proposal | Prediction Lock + Reasoning Receipt + Error Taxonomy + one or two others that fit | 30-60 min |
| Very important, people will judge your reasoning | Thesis defense, job interview presentation, council recommendation | All six disciplines | 90+ min |
Try it yourself: the mini-capstone
You just watched the six disciplines applied to one decision. Now do the same on a decision from your own week, at reduced scope so you can finish in 30 minutes. Pick something real: a purchase you are weighing, a difficult conversation you need to have, a career choice, a project direction.
- Prediction Lock (2 minutes): Write a 2-line lock. One sentence naming the real decision (not the label). One sentence committing to a position with the specific finding that would flip it.
- Reasoning Receipt (5 minutes): Ask AI for its recommendation. Receipt 3 claims with ACCEPT, REJECT, or MODIFY and one sentence of why each.
- Error Taxonomy (3 minutes): Scan the AI output for one named error from the six types. Quote the sentence.
- Cascade Map (5 minutes): Pick 3 groups affected. One layer of "and then what?" per group. Name one loop.
- First Principles (3 minutes): Write one boundary row. Name the threshold where the consensus stops working.
- Three-Path Comparison (5 minutes): Write one paragraph solo, one with AI. Compare. Which has something the other lacks?
Total: 25-30 minutes. The result will not be polished. It will be yours. That is the point.
Where to go from here
For deeper practice in any of the six disciplines, Part 0 of this book is the long-form treatment:
- Part 0 Ch 1: Asking Better Questions. Prediction Lock and Position Lock expanded across four exercises and a Question Quality Portfolio.
- Part 0 Ch 2: Detecting Broken Reasoning. The Error Taxonomy extended with confidence calibration and a domain-expertise stress test.
- Part 0 Ch 3: Thinking in Systems. Cascade maps applied to four real decisions, plus a human-vs-AI systems-analysis exercise.
- Part 0 Ch 4: Reasoning from First Principles. The Blank Page Sprint, the Assumption Autopsy, and a constraint-rebuild exercise.
- Part 0 Ch 6: Working With AI, Not For AI. Three-Path Comparison, collaboration logs, and override tests across a working week.
For the five thinking skills this crash course did not cover, Part 0 has the full treatment:
- Ch 5: Communicating What Matters. Audience prediction, live adaptation, hard conversations.
- Ch 7: Reasoning Through Dilemmas. Ethical position locks, adversarial defences, stakeholder swaps.
- Ch 8: Building Something From Nothing. Blank page sprints, creation logs, three-draft evolutions.
- Ch 9: Deciding Under Uncertainty. Sealed decisions, reversal triggers, decision audits.
- Ch 10: Learning How to Learn. Meta-learning, 72-hour sprints, a Personal Learning Framework.
For your next step in this book, pick a mode:
- If you write code, continue to Claude Code & OpenCode. The engineering surface for Mode 1 (using AI to improve work you already do).
- If you do knowledge work (legal, finance, marketing, operations, healthcare, education, leadership), continue to Cowork. The domain-expert surface for Mode 1.
- If you are ready to build AI Workers that run on their own, continue to Build AI Agents. This is Mode 2 (building AI systems that work independently).
The disciplines transfer across every tool, every mode, every domain. They are the thing you carry from here to anywhere.
Glossary
If you got partway through the page and forgot what a word meant, here are the load-bearing terms in one place.
The four key ideas (from the rule section and the diagram).
- Discipline — a thinking habit you practice. Something you do.
- Failure mode — a specific way AI tends to mislead you. Something AI does. Each discipline is paired with the failure mode it answers, one to one.
- Part — a group of disciplines that share a common job. The course has three parts (Foundations, Detection, Origination) with two disciplines each.
- Deliverable — the thing you hand to your boss, professor, or client. In 2026 the deliverable is not just the answer; it is the answer plus the documented evidence of thinking that produced it (the prediction you wrote down, the receipt rows where you accepted or rejected AI's claims, the cascade map, the named threshold). If you cannot point at the evidence, you do not have a deliverable.
The six disciplines.
| # | Discipline | The action line | What it does |
|---|---|---|---|
| 1 | Prediction Lock (Part 1: Foundations) | PREDICT BEFORE YOU PROMPT | Write down your committed position before you ask AI — including the specific AI answer that would flip it. |
| 2 | Reasoning Receipt (Part 1: Foundations) | DOCUMENT EVERY DECISION | For each thing AI says, mark ACCEPT / REJECT / MODIFY / SURFACED / MISSED with a one-sentence why. |
| 3 | Error Taxonomy (Part 2: Detection) | PREDICT WHERE ERRORS HIDE | Scan AI's output for six specific mistake types: Factual error, Logical gap, False confidence, Missing context, Fabricated source, Stale fact. |
| 4 | Thinking in Systems (Part 2: Detection) | CASCADE MAPS & LOOPS | Trace what happens after a decision across the groups it affects, three layers deep, and find the loops where effects circle back. |
| 5 | First Principles (Part 3: Origination) | FIND THE BOUNDARY | Name the named threshold — the specific number or condition where common advice stops working. |
| 6 | Working WITH AI (Part 3: Origination) | OVERRIDE & ITERATE | Compare what you write Solo, what AI writes alone, and what you write Collaboratively. The Collaborative version wins only if you can point to specific overrides where your judgment made it better. |
A few other terms used on the page.
- Named threshold — a specific number or condition that tells you when a piece of advice stops working. "This works when your class has fewer than 30 students" is a named threshold. "This works sometimes" is not.
- Cascade map — a one-page diagram with a short column for each group your decision affects (students, professors, parents, sponsors, etc.) and three arrows under each showing what happens first, what that causes next, and what that causes after.
- Reasoning receipt — a list of rows, one per important AI claim. Each row has three parts: what AI said, what you did about it (ACCEPT, REJECT, MODIFY, SURFACED, or MISSED), and a one-sentence why.
- Loop — a chain of cause and effect where a later effect circles back and changes the original decision, usually making it worse.
Flashcards Study Aid
Test Your Understanding
The disciplines are not the deliverable. The evidence they produce is the deliverable. The disciplines are how you produce the evidence.
Does this make AI a more powerful tool in your hands, or does it make you a slower version of the tool?