Skip to main content
Updated Mar 09, 2026

The PRIMM-AI+ Toolkit

In the previous lesson, you learned how to work with AI at each PRIMM stage -- the roles, the boundaries, the checkpoints, the gates, and the five rules. You walked through a complete cycle with a real Python program and saw every safeguard in action.

Now comes a different question. James asks it perfectly:

"I know the rules. But how do I know if I'm actually getting better? And does any of this matter once I'm past the exercises?"

Both questions have concrete answers. PRIMM-AI+ includes tools that measure your growth, diagnose your mistakes, and connect your learning habits directly to professional software development. This lesson covers five of those tools from the nine-enhancement table in the previous lesson: the Verification Ladder, Confidence Scoring, Error Taxonomy, the professional practice mapping, and the Chapter-End Rubric. The remaining enhancement — classroom and solo modes — is covered in Lesson 4.

This lesson introduces each one. By the end, you will have the complete PRIMM-AI+ picture -- not just how to use it, but how to know whether it is working and where it leads.


The Verification Ladder

"How long do I have to keep predicting before running?" James asks. "It feels like training wheels."

Emma smiles. "You'll never stop. Senior engineers do the same thing — they just call it 'code review' and 'testing.' The habit you're building right now is the same habit that powers professional software. It just grows with you."

Every time you predict what code will do and then run it to check, you are building a habit: guess first, verify second. That habit does not stop being useful when you leave this chapter. The Verification Ladder shows how the same habit scales from a learning exercise all the way to professional software -- one step at a time.

StepQuestionWhat You Do
1. PredictionWhat do I think this code does?Read the code and commit to an answer before running it
2. TypesIs the structure valid?Check that every variable and function has the right type labels
3. TestsIs the behavior correct?Write a test that states what the code should do, then run it
4. PipelineDo all checks pass together?Run all your checks (types, tests, formatting) in one command
5. ObservabilityIs it still correct in production?Monitor the running program to catch problems after deployment

You are currently at Step 1 -- predicting and running. Steps 2 through 5 will be introduced gradually as you progress through the book. The key insight is that the predict-then-verify habit you are building now is not a beginner exercise you will outgrow. It is the same habit that powers every step above it.


Confidence Scoring

James predicts the output of a program and rates himself a 5 — certain. He runs the code. The output is completely different from his prediction.

"That's the most dangerous state," Emma says. "Being wrong isn't the problem. Being confident and wrong is the problem. In the real world, that's how bugs get shipped — someone looks at AI-generated code, thinks 'looks right,' and moves on without checking."

In Lesson 1, you saw the 1-5 confidence scale during the Predict stage. Here it is again for reference — and here is how to use it systematically throughout Parts 4 and 5.

ScoreMeaning
1No idea -- complete guess
2Vague guess -- something about a greeting?
3Think I know but could be wrong
4Fairly confident -- I see how the pieces fit
5Certain -- I can describe the exact output

After each prediction, record four things:

  1. Your prediction -- what you think the code will do
  2. Your confidence score -- how certain you are (1-5)
  3. The actual result -- what the code actually did
  4. Your revised explanation -- what you now understand that you did not before

Why this matters: False confidence -- rating yourself a 5 and getting it wrong -- is the most dangerous state for an AI-era developer. It means you think you understand code but you do not. When AI generates code you falsely believe you understand, you ship bugs you cannot find. Confidence scoring trains you to know when you know and when you do not. That calibration is a professional skill.

Over the course of Parts 4 and 5, your calibration should improve. Early on, you will probably rate yourself too high (overconfident) or too low (underconfident). Both are normal. The goal is not to always score 5. The goal is for your confidence scores to accurately predict your actual accuracy.


Five Kinds of Bugs — A Preview

James runs his modified program and gets an error. "It's broken. I'll ask AI to fix it."

"Wait," Emma says. "Before you ask for a fix, can you tell me what kind of problem this is? Did you give the wrong type of data? Did you get the logic backward? Did you build the wrong thing entirely?"

James stares at the error. "I... don't know how to tell."

"That's what this table is for."

When your code does not work, the first question is: what kind of problem is this? Not all bugs are the same. Knowing which kind you are looking at tells you where to search. You do not need to memorize this table now -- just know it exists. You will practice finding each kind starting in Chapter 33.

Kind of BugWhat Went WrongSimple Example
Type ErrorYou gave the wrong kind of dataYou wrote name + score but score is a number, not text
Logic ErrorThe code runs but gives the wrong answerYou printed the name where the subject should be
Specification ErrorThe code does what you asked, but you asked for the wrong thingYou built a greeting program when the task was a profile card
Data ErrorThe code breaks with unusual inputsThe name is empty, so the output starts with a space
Orchestration ErrorThe pieces run in the wrong orderYou tried to print result before you created it

When something goes wrong during Investigate or Modify, glance at this table. Even just asking "Is this a type problem or a logic problem?" narrows your search and saves time.


From Learning to Professional Practice

"Will I still be doing this at work?" James asks. "Predicting output, tracing variables, writing things down before running them?"

"Every day," Emma says. "You'll just call it something different. Predict becomes code review. Investigate becomes debugging. Make becomes shipping a feature. The habit is the same — the stakes just get bigger."

The habits you build with PRIMM-AI+ do not disappear when you finish this book. They are the same habits professional developers use every day -- just applied at a larger scale. Here is what each habit looks like when you are learning versus when you are working professionally:

PRIMM-AI+ StageWhat You Do While LearningWhat Professionals Do With the Same Habit
PredictRead existing code and predict what it will output before running itRead AI-generated code and predict whether it is correct before trusting it
RunRun the code and compare the output to your predictionRun automated tests and compare the results to expected behavior
InvestigateTrace variables and ask questions about how the code worksReview code for bugs, security issues, and maintainability
ModifyChange an existing program to add a feature or fix a problemRefine code iteratively based on what the review revealed
MakeWrite a new program from a specification you createdBuild and deliver a complete, tested solution from requirements

Notice the Make row. While learning, you write the code yourself — that is how you build the skill. In professional practice, AI often generates code from YOUR specification — which is why learning to write clear specifications matters more than learning to type code fast. The spec is the product. The code is the output.

The key insight: Predict always means the same thing -- read the code, form an expectation, then check whether you are right. While learning, you predict what a program will print. As a professional, you predict whether AI-generated code is correct. The habit is identical. The stakes grow.

By the time you finish Parts 4 and 5, these habits will be automatic. You will read AI-generated code with the same critical eye you developed tracing through the walkthrough programs in this chapter. You will refine iteratively, improving code in stages rather than expecting perfection on the first attempt.


Chapter-End Rubric Preview

"How will I know if I actually learned something?" James asks. "Not just finished the lesson — actually learned it."

"Every programming chapter ends with a self-check," Emma says. "Five questions. Honest answers. It takes two minutes and tells you exactly where you stand."

Starting in Chapter 33, every programming chapter ends with a self-assessment against five dimensions. Here is the rubric you will use:

DimensionWhat It Measures
Prediction AccuracyHow often were your predictions correct?
Trace QualityWere your trace artifacts accurate without AI help?
Explanation QualityCan you explain the concept in your own words?
Modification QualityWere your modifications correct on the first or second attempt?
Independent MakeWas the spec and first attempt produced without AI?

Each dimension has three levels: Developing (need more practice), Competent (solid understanding), and Fluent (ready to move on). You do not need to be Fluent in every dimension to proceed -- Competent is the target. But if you score Developing on Prediction Accuracy or Explanation Quality, revisit those concepts before moving forward.


PRIMM-AI+ at a Glance

This table consolidates everything from this lesson and the previous one. Use it as a reference card whenever you start a PRIMM-AI+ lesson. Here is what each column means:

  • AI Permission — What your AI assistant is allowed to do at this stage
  • What You Do First — The action you must take before moving on
  • Mastery Gate — The requirement you must meet before proceeding to the next stage
  • What You Produce — The visible artifact you create as evidence of learning
StageAI PermissionWhat You Do FirstMastery GateWhat You Produce
PredictAI-freeWrite prediction + confidence scoreWritten prediction existsPrediction with confidence score
RunAI allowedCompare prediction to outputComparison recordedPrediction vs. actual record
InvestigateAI after your explanationWrite your own explanation firstCan explain how, not just whatTrace table, explanation, or failure note
ModifyAI for hints onlyAttempt the change yourself firstCan explain the target behaviorModified code with prediction
MakeAI for review onlyWrite spec and first attempt AI-freeWritten spec existsSpecification + solution + rubric

If You Are New to Programming

Focus on confidence scoring and the professional practice table. Confidence scoring teaches you to be honest about what you know. The professional table shows you that every exercise has a real-world payoff — you are not doing busywork. The error taxonomy and verification ladder are reference material you will revisit as you encounter them naturally in later chapters.

If You Have Coded Before

The verification ladder will feel familiar — you already use types, tests, and CI pipelines. What PRIMM-AI+ adds is the explicit prediction step at the base of the ladder. Even experienced developers skip mental predictions when AI generates the code, and that skip is where bugs enter production. The confidence scoring system catches that pattern.


Key Takeaways

  • The Verification Ladder connects your learning predictions to production observability across five steps: Prediction, Types, Tests, Pipeline, Observability. The predict-then-verify habit you build now powers every step above it.
  • Confidence scoring trains you to know when you know and when you do not. False confidence — rating 5 and getting it wrong — is the most dangerous state for an AI-era developer.
  • Five kinds of bugs (Type, Logic, Specification, Data, Orchestration) give you diagnostic vocabulary. Asking "what kind of bug is this?" before asking AI to fix it saves time and builds understanding.
  • Every PRIMM-AI+ stage maps directly to a professional practice: Predict to code review, Run to testing, Investigate to debugging, Modify to iterative refinement, Make to shipping from requirements.
  • PRIMM-AI+ is not a training wheel you will outgrow. Your learning habits map directly to professional practice — the stakes grow, the habits stay.

Try With AI

Prompt 1: Map Your Own Verification Ladder

I am learning about the PRIMM-AI+ Verification Ladder. The
five steps are: Prediction, Types, Tests, Pipeline, and
Observability. I am currently at Step 1 (Prediction).

Show me a simple Python program (4-6 lines, using only
variables and print) and walk me through Steps 1 and 2.
For Step 1, ask me to predict the output. For Step 2,
show me how adding type hints (like name: str) helps
catch errors before the code even runs.

What you are learning: How the first two steps of the Verification Ladder work together. Prediction builds your mental model; type checking validates the structure. You will experience both steps with a concrete example.

Prompt 2: Practice Confidence Calibration

I want to practice confidence scoring. Show me three short
Python programs (4-6 lines each, using only variables and
print, no loops). For each one, ask me to:
1. Predict the output
2. Rate my confidence 1-5
3. Explain my reasoning

After I give all three predictions, show me the actual
outputs and help me analyze my calibration: was I
overconfident, underconfident, or well-calibrated?

What you are learning: The confidence calibration skill. By predicting three programs in a row and comparing your confidence scores to your actual accuracy, you train yourself to know when you know and when you are guessing. This is the foundation of professional judgment about AI-generated code.

Prompt 3: Classify Bug Types

I am learning the five kinds of bugs: Type Error, Logic
Error, Specification Error, Data Error, and Orchestration
Error. Show me five short broken Python programs (one for
each bug type). For each one, ask me to identify what kind
of bug it is before showing me the fix.

What you are learning: Diagnostic vocabulary. When you can name the kind of bug you are looking at, you know where to search for the fix. This skill becomes critical when debugging AI-generated code — knowing whether the problem is in the types, the logic, or the specification tells you whether to fix the code or rewrite the prompt.


Looking Ahead

James sits back. "I know the stages, the rules, the checkpoints, and now the tools to measure myself. But what does a real lesson actually feel like when all of this is running at once?"

"That's the next lesson," Emma says. "You'll see how four teaching methods are woven into the stages, and you'll see the exact lesson architecture that every programming chapter in this book follows. After that, you'll recognize the pattern the moment you hit Chapter 33."

The next lesson brings everything together: the teaching methods embedded within PRIMM-AI+, classroom and solo modes, and the practical lesson architecture that governs every chapter from here forward.