Reading Diffs and the Judgment Call

James finishes Round 2 of his search_notes iteration. He knows a regression appeared, but he is not sure what the AI actually changed. The function is ten lines long. He could read both versions side by side, but he has seen professionals use a faster method.

"Run git diff," Emma says. "It shows you exactly which lines changed. Green lines were added, red lines were removed. Everything else stayed the same."

James runs the command. Five lines of output appear. Two have a - prefix, two have a + prefix, and one is unchanged context. In five seconds, he spots the and that should be or.

"That would have taken me a minute to find by reading both versions," he says.

"Now imagine a function that is fifty lines long," Emma replies. "Diffs scale. Reading two full versions side by side does not."

If you're new to programming

A diff is a summary of differences between two versions of a file. Instead of showing you the entire file twice, it shows only the lines that changed. Lines with a + prefix were added. Lines with a - prefix were removed. Everything else is context to help you find your place. You do not need to memorize every detail of the format; the + and - prefixes are the most important part.

If you have used version control before

You already know git diff. This lesson focuses on reading diffs in the context of AI iteration: comparing Round N to Round N+1, deciding whether the AI's changes are acceptable, and using diff size to make a judgment call about re-prompting vs. manual fixing.

The git diff Format

In Chapter 44 Lesson 7, you set up Git for your SmartNotes project. Now you use it to compare versions. After the AI generates new code, save the file and run:

git diff smartnotes/search.py

Here is what the diff between Round 1 and Round 2 of search_notes looks like:

@@ -1,8 +1,11 @@
 def search_notes(notes: list[Note], term: str) -> list[Note]:
     """Return notes whose title or body contains the search term."""
+    if not term:
+        return list(notes)
     results: list[Note] = []
+    lower_term: str = term.lower()
     for note in notes:
-        if term in note.title or term in note.body:
+        if lower_term in note.title.lower() and lower_term in note.body.lower():
             results.append(note)
     return results

Let us break this down piece by piece.

Reading the Three Parts

1. The Location Header

@@ -1,8 +1,11 @@

This tells you where the changes are. -1,8 means the old version started at line 1 and was 8 lines long. +1,11 means the new version starts at line 1 and is 11 lines long. The function grew by 3 lines.

You do not need to memorize this format. The important takeaway: the @@ line tells you the neighborhood of the change.

2. Context Lines (No Prefix)

 def search_notes(notes: list[Note], term: str) -> list[Note]:
     """Return notes whose title or body contains the search term."""

Lines with no + or - prefix are unchanged. They provide context so you can orient yourself in the file.

3. Changed Lines

+    if not term:
+        return list(notes)

Lines with + were added. These two lines handle the empty-term case.

-        if term in note.title or term in note.body:
+        if lower_term in note.title.lower() and lower_term in note.body.lower():

A - line followed by a + line means that line was replaced. The old matching condition was removed and a new one was added. This is where the regression lives: or became and.

What to Look For in a Diff

When reviewing an AI-generated diff, ask three questions:

What was the intent? You asked for case-insensitive matching and empty-term handling. Do the + lines accomplish that?
What else changed? Look for - lines that removed code you did not ask the AI to change. The or to and swap is an unintended change.
How much changed? Count the + and - lines relative to the total function length. This feeds into the 30% heuristic.

The 30% Heuristic

Not every problem is best solved by re-prompting. Sometimes typing the fix yourself is faster. The 30% heuristic helps you decide:

Scope of Change Needed	Strategy	Why
Less than 30% of the code needs to change	Fix manually	Faster than writing a re-prompt and waiting for output. You can see the fix; just type it.
30% to 70% of the code needs to change	Re-prompt with specifics	Too much to fix by hand comfortably, but the structure is sound. Give the AI targeted instructions.
More than 70% of the code is wrong	Start over	The AI misunderstood the core requirement. A new prompt with better constraints will be faster than patching.

In the search_notes Round 2 regression, one word needs to change: and to or. That is far less than 30%. Fix it manually. Open the file, change the word, save, run pytest. Done in ten seconds.

Compare that to writing a re-prompt, waiting for the AI to regenerate, and hoping it does not change something else. For a one-word fix, the manual approach wins.

Applying the Heuristic: Three Scenarios

Scenario A: One-Line Typo

The AI wrote lower_term in note.titl.lower() (missing the e in title). This is a single character fix. Fix manually. Do not re-prompt for a typo.

Scenario B: Missing Feature

The AI wrote search_notes but never implemented tag searching. You need a new loop that iterates over note.tags and checks for matches. That is roughly 40% of the function body. Re-prompt with specifics: describe the tag-matching behavior, include the failing test output, and let the AI rewrite the relevant section.

Scenario C: Wrong Data Structure

The AI returned a dict[str, Note] instead of list[Note]. The function signature is wrong, the loop is wrong, and every return statement is wrong. More than 70% of the code misses the mark. Start over with a clearer prompt that specifies the return type explicitly.

Beyond "Tests Pass": A Quality Checklist

Tests verify behavior. They do not verify everything. Here is a lightweight checklist to apply after all tests pass, borrowed from the evaluation rubric in Chapter 46 Lesson 4:

Check	Question	Example Issue
Type annotations	Are all variables and parameters annotated?	`results = []` should be `results: list[Note] = []`
Docstring	Does the function have a clear docstring?	Missing or generic ("Does stuff")
No dead code	Are there commented-out lines or unused variables?	`# old_results = []` left behind
Naming	Do variable names describe their purpose?	`x` instead of `lower_term`
Consistency	Does the style match the rest of your codebase?	Mixing single and double quotes

If any check fails but all tests pass, you have a choice: fix it manually (usually fastest) or mention it in your next re-prompt if you are already iterating.

PRIMM-AI+ Practice: Diff Reading

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode before predicting.

Look at this diff without running anything. Answer these questions with a confidence score (1-5):

@@ -3,6 +3,8 @@
 def count_tags(notes: list[Note]) -> dict[str, int]:
     """Count how many notes contain each tag."""
     counts: dict[str, int] = {}
+    if not notes:
+        return counts
     for note in notes:
         for tag in note.tags:
-            counts[tag] = counts.get(tag, 0) + 1
+            lower_tag: str = tag.lower()
+            counts[lower_tag] = counts.get(lower_tag, 0) + 1
     return counts

How many lines were added? How many were removed?
What two features were added by this change?
Could this change introduce a regression? If so, which behavior might break?

Check your predictions

Answer 1: Four lines were added (+ prefix), one line was removed (- prefix). Net change: three new lines.

Answer 2: Two features: (a) early return for empty note lists, and (b) case-insensitive tag counting via .lower().

Answer 3: Yes. If existing tests expect tags to preserve their original case in the output dictionary (e.g., {"Python": 2}), the .lower() change will produce {"python": 2} instead. Any test asserting an uppercase key will fail. This is the same class of regression as the and/or issue: a fix for one requirement (case-insensitive counting) can break tests that assumed case-sensitive keys.

Run

Press Shift+Tab to exit Plan Mode.

If you have a count_tags function in your SmartNotes project, apply the changes shown in the diff manually. Run uv run pytest. Record which tests pass and which fail.

Investigate

In Claude Code, type /investigate and ask about any failures. Determine: is this a regression from the .lower() change, or was the test already failing? Update your tracking table.

Modify

Apply the 30% heuristic to any failures. Is the fix less than 30% of the code? If so, fix manually. If not, write a targeted re-prompt.

Make [Mastery Gate]

Without guidance, read this diff and answer the three questions (lines added/removed, features changed, regression risk):

@@ -1,7 +1,9 @@
 def filter_notes_by_tag(notes: list[Note], tag: str) -> list[Note]:
     """Return notes that contain the given tag."""
+    if not tag:
+        return []
     matches: list[Note] = []
     for note in notes:
-        if tag in note.tags:
+        if tag.lower() in [t.lower() for t in note.tags]:
             matches.append(note)
     return matches

Write your analysis, then check it against a classmate or the AI.

Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Diff Explanation

Explain this git diff to me line by line. For each + or - line,
tell me what changed and why it might have been changed:

@@ -3,6 +3,8 @@
 def count_tags(notes: list[Note]) -> dict[str, int]:
     counts: dict[str, int] = {}
+    if not notes:
+        return counts
     for note in notes:
         for tag in note.tags:
-            counts[tag] = counts.get(tag, 0) + 1
+            lower_tag: str = tag.lower()
+            counts[lower_tag] = counts.get(lower_tag, 0) + 1
     return counts

Compare the AI's explanation to your own analysis from the PRIMM-AI+ Predict section. Did the AI identify the same regression risk you found?

Prompt 2: Heuristic Application

I have a 15-line Python function where 3 lines need to change
to fix a bug. Should I fix it manually, re-prompt the AI,
or start over? Explain your reasoning.

What you're learning: You are calibrating the 30% heuristic. 3 out of 15 lines is 20%, which falls in the "fix manually" range. The AI should agree, but check whether its reasoning matches the framework from this lesson.

Prompt 3: Generate and Explain a Diff

Here are two versions of the same function. Show me the git diff
between them and explain each changed line:

Version 1:
def count_words(text: str) -> int:
    return len(text.split())

Version 2:
def count_words(text: str) -> int:
    if not text.strip():
        return 0
    return len(text.split())

What behavior changed? Write a test that would pass on Version 2
but fail on Version 1.

What you're learning: You are connecting diff reading to behavioral changes. A diff tells you what lines moved; a test tells you whether the change matters.

James pointed at the diff on his screen. "The 30% rule reminds me of incoming inspection at the warehouse. Small damage, under a threshold, we patched it on the dock. Medium damage, we sent it back to the vendor with a specific repair note. Total loss, we rejected the whole pallet and reordered."

"Under 30%, fix it yourself. 30% to 70%, re-prompt with specifics. Over 70%, start fresh." Emma tilted her head. "I hadn't thought of it as an inspection threshold before. That's actually a cleaner mental model than the way I teach it."

James shrugged. "The numbers are different but the logic is the same. You're triaging based on how much work the fix takes relative to the whole."

"And the quality checklist after all tests pass," Emma said. "That's your outbound inspection. The pallet is functional, but is it actually ready to ship?"

"Types, docstrings, naming, dead code. The details that separate 'it works' from 'it's professional.'" James cracked his knuckles. "So I can read diffs, I can track regressions, I know when to fix manually. When do I put it all together?"

Emma smiled. "Right now. The next lesson is a full capstone: first prompt to all tests green on SmartNotes, three rounds or fewer. Everything you've learned in one pass."

The git diff Format​

Reading the Three Parts​

1. The Location Header​

2. Context Lines (No Prefix)​

3. Changed Lines​

What to Look For in a Diff​

The 30% Heuristic​

Applying the Heuristic: Three Scenarios​

Scenario A: One-Line Typo​

Scenario B: Missing Feature​

Scenario C: Wrong Data Structure​

Beyond "Tests Pass": A Quality Checklist​

PRIMM-AI+ Practice: Diff Reading​

Predict [AI-FREE]​

Run​

Investigate​

Modify​

Make [Mastery Gate]​

Try With AI​

Prompt 1: Diff Explanation​

Prompt 2: Heuristic Application​

Prompt 3: Generate and Explain a Diff​