Build It, Then Break It

James was feeling confident. Two exercises in, and he had a system: the Error Taxonomy, the prediction-before-detection approach, the contradiction analysis. He knew what to look for now.

"I've been thinking about something," he said. "The errors I've been catching, anyone with the taxonomy could catch them. You don't need special knowledge to spot a fabricated citation or a logical gap. You just need to know the categories and read carefully."

"Is that what you believe?"

"That's what the evidence shows. I caught five errors in Exercise 1. I built a better analysis than two AI tools in Exercise 2. The taxonomy works."

"It works for the scenarios you've been given. Policy questions. Productivity debates. General topics." Emma pulled up a new screen. "Now try your own field."

"My own field?"

"Whatever you know best. Your old industry. Your profession. The thing you spent years learning. Ask AI to write an analysis in that domain, and annotate it the same way you've been doing."

James shrugged. "That should be easier. I'll catch more errors because I know the subject."

"That's the hypothesis. Test it."

"Okay, but why does this matter? I already know the taxonomy works."

"You know the taxonomy works on topics where you have no expertise. The question is what happens when you do have expertise. And more importantly, what it tells you about every topic where you don't."

James opened his laptop. He wasn't sure why Emma was making a distinction. Error detection was error detection. He'd find out whether she had a point.

Exercise 3: Build It, Then Break It

Layers Used: Layer 5 (Divergence Test), Layer 3 (Live Defence)

James is about to test his error detection skills on his own turf. So are you.

Choose Your Expert Domain

Step 1. Choose your expert domain. Pick a topic you genuinely know well: your profession, your academic field, your city, a hobby you've spent years on. The key is that you can spot errors a non-expert would miss. Examples: accounting standards, local transit systems, a specific programming language, your country's political history.

Step 2. Generate an AI analysis. Ask AI to write a detailed analysis of a specific question in your domain. Be specific enough that the AI will need to make claims you can verify; e.g., "Analyze the public transit challenges in Karachi" not "Tell me about cities."

Annotate With Your Expertise

Step 3. Annotate the most confident-sounding claims. Pick the 10 most authoritative-sounding claims in the AI response, the ones that sound most certain. Label each using the Error Taxonomy from Exercise 1. Pay special attention to errors that sound correct but you know are wrong because of your expertise.

Step 4. Separate your findings. Create two lists:

Expert-visible errors: Errors you caught because of your domain knowledge that a non-expert would accept as true
Suspected errors: Claims that feel wrong but you cannot confirm without further research

Test the Limits of Your Detection

Step 5. Cross-domain exchange. Pair with a student from a different domain. Exchange your annotated outputs. Try to verify your partner's error annotations. Can you confirm their catches are real, or do you lack the expertise to judge? Discuss in a live 10-minute session.

Step 6. Write your reflection (200 words). Compare your error detection experience in your domain vs. your partner's domain. What was different? What does this tell you about using AI outside your expertise?

Solo Learner Alternative

Instead of Step 5, run your own cross-domain test: choose a second topic you know nothing about, generate an AI analysis for it, and try to annotate errors using the same approach. Compare your detection rate between the two domains. The gap reveals exactly how much domain expertise matters.

Your Deliverable

The AI-generated analysis of your domain with line-by-line Error Taxonomy annotations
Your two lists: expert-visible errors + suspected errors
Your partner's annotated output with your verification notes (or your second-domain annotations for solo learners)
Your 200-word reflection on expert vs. non-expert error detection

1Your Work

I am a student testing my error detection skills. I asked AI to analyze a topic I am an expert in. I then annotated the response with every error I found using this taxonomy: factual error, logical gap, false confidence, missing context, correlation-causation confusion, outdated information, fabricated citation, cultural blind spot. Please:

(1) For each error I identified, confirm whether it is a genuine error or a false positive, and explain your reasoning. (2) Are there errors in the original AI analysis that I missed? List them with categories. (3) Rate my overall error detection accuracy. (4) Which error categories am I strongest and weakest at detecting in my own domain? (5) Rate the depth of my annotations -- am I just flagging errors or am I explaining WHY they are errors?

AI analysis:

Paste the AI analysis here...

My annotations:

Paste your annotations here...

Finally, complete the Thinking Score Card for this exercise: Independent Thinking (1-10), Critical Evaluation (1-10), Reasoning Depth (1-10), Originality (1-10), Self-Awareness (1-10). For each score, give a one-sentence justification.

2Get Your Score

Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.

What Happened With James

James stared at two annotation sheets. His expert-domain sheet had fourteen errors flagged, eight of them marked "expert-visible," meaning a non-expert would have read right past them. His cross-domain sheet, where he'd tried to annotate his partner's field, had three tentative flags, two of which his partner confirmed were wrong.

"Fourteen errors in my domain. Three guesses in theirs, and two of those were false positives." He set the sheets down. "The taxonomy didn't help me in their field. I was using the same categories, the same process. But I couldn't tell whether a claim was wrong because I didn't have the background to recognize the mistake."

"What does that tell you?"

"That every time I use AI for something outside my expertise, I'm in the same position as my partner reading my annotations. I can check for logical gaps and false confidence, those are structural. But the factual errors, the missing context, the outdated information, those require knowing the field." He paused. "It's like when we hired a new vendor at my old company. I could evaluate their professionalism and their communication, but I couldn't evaluate whether their technical specs were sound. That's why we brought in a specialist reviewer."

"Now you know when you need a specialist reviewer for AI output."

The Lesson Learned

Domain expertise is your most powerful error detection tool, and its absence is your biggest vulnerability. In your own field, you catch errors that non-experts accept without question. Outside your field, you lose that advantage entirely. The practical takeaway: when AI generates content about a domain you do not deeply understand, treat it the way you would treat an unverified vendor proposal. Bring in someone who knows.

Build It, Then Break It

Why This Matters: James and the Expertise Blind Spot

Exercise 3: Build It, Then Break It

Choose Your Expert Domain

Annotate With Your Expertise

Test the Limits of Your Detection

What Happened With James

The Lesson Learned

Flashcards Study Aid

Why This Matters: James and the Expertise Blind Spot​

Exercise 3: Build It, Then Break It​

Choose Your Expert Domain​

Annotate With Your Expertise​

Test the Limits of Your Detection​

What Happened With James​

The Lesson Learned​

Flashcards Study Aid​

Why This Matters: James and the Expertise Blind Spot

Exercise 3: Build It, Then Break It

Choose Your Expert Domain

Annotate With Your Expertise

Test the Limits of Your Detection

What Happened With James

The Lesson Learned

Flashcards Study Aid