Confidence Calibration
Layers Used: Layer 1 (Predict Before You Prompt), Layer 6 (Iterative Drafts)
What You Do
This exercise uses a different format: rapid-fire timed rounds. You receive 20 AI-generated claims across different topics — science, history, current events, technology, geography, law. You have 90 seconds per claim. For each: read the claim, rate your confidence (0-100%) that it is accurate, write a one-sentence justification for your rating, and flag any red flags you notice. After all 20, verify each claim using AI and web research.
The time pressure simulates real-world decision-making where you must quickly assess AI output without unlimited time to verify.
A table with 20 rows: the AI claim, your confidence rating (0-100%), the verified truth status (accurate / inaccurate / partially accurate), your source for verification, and whether your confidence was calibrated (correct), overconfident (high confidence + wrong), or underconfident (low confidence + right). A Confidence Calibration Chart plotting your ratings against reality. A reflection (200 words) analyzing your calibration patterns.
I am a student calibrating my ability to judge AI accuracy. I rated my confidence on 20 AI-generated claims, then verified each one. Below is my complete calibration table. Please:
(1) Review my verification of each claim -- did I correctly determine which claims were accurate and which were not? Flag any claims I may have verified incorrectly. (2) Calculate my calibration score: for claims I rated 80%+ confidence, what percentage were actually correct? For claims I rated below 40%, what percentage were actually incorrect? (3) Identify my specific calibration weaknesses -- which topics or claim types am I most overconfident about? Underconfident about? (4) Give me 3 specific strategies to improve my calibration based on my patterns. (5) Rate my overall calibration from Poor / Fair / Good / Excellent.
My calibration table:
Finally, complete the Thinking Score Card for this exercise: Independent Thinking (1-10), Critical Evaluation (1-10), Reasoning Depth (1-10), Originality (1-10), Self-Awareness (1-10). For each score, give a one-sentence justification.
Discuss with an AI. Question your scores.
Come back when you have your BEST evaluation.
What This Teaches You
You learn that most people — especially smart people — are systematically overconfident about AI accuracy. By quantifying your calibration, you get a precise map of where your trust in AI is well-placed and where it is dangerous. This exercise is repeated at the end of the book to measure how much your calibration improves after completing all 10 chapters.
An Error Detection Portfolio containing: (1) the sealed error prediction document, (2) two annotated AI responses with full Error Taxonomy markup, (3) the three-draft contradiction analysis with evolution notes, (4) the domain expertise annotation with partner verification, (5) the 20-claim Confidence Calibration Chart with analysis, and (6) all AI feedback responses with your reflections on each.
Grading Criteria
| Component | Weight | What Is Evaluated |
|---|---|---|
| Error prediction accuracy (did you anticipate AI failure modes?) | 15% | Exercise 1 |
| Error detection precision (false positive and false negative rates from AI feedback) | 25% | Exercise 1 |
| Contradiction analysis quality (three-draft evolution showing improvement) | 20% | Exercise 2 |
| Domain expertise annotation depth | 15% | Exercise 3 |
| Confidence calibration accuracy | 15% | Exercise 4 |
| Reflection quality across all exercises | 10% | All exercises |