Skip to main content

Reading Diffs and the Judgment Call

James finishes Round 2 of his search_notes iteration. He knows a regression appeared, but he is not sure what the AI actually changed. The function is ten lines long. He could read both versions side by side, but he has seen professionals use a faster method.

"Run git diff," Emma says. "It shows you exactly which lines changed. Green lines were added, red lines were removed. Everything else stayed the same."

James runs the command. Five lines of output appear. Two have a - prefix, two have a + prefix, and one is unchanged context. In five seconds, he spots the and that should be or.

"That would have taken me a minute to find by reading both versions," he says.

"Now imagine a function that is fifty lines long," Emma replies. "Diffs scale. Reading two full versions side by side does not."

If you're new to programming

A diff is a summary of differences between two versions of a file. Instead of showing you the entire file twice, it shows only the lines that changed. Lines with a + prefix were added. Lines with a - prefix were removed. Everything else is context to help you find your place. You do not need to memorize every detail of the format; the + and - prefixes are the most important part.

If you have used version control before

You already know git diff. This lesson focuses on reading diffs in the context of AI iteration: comparing Round N to Round N+1, deciding whether the AI's changes are acceptable, and using diff size to make a judgment call about re-prompting vs. manual fixing.


The git diff Format

In Chapter 44 Lesson 7, you set up Git for your SmartNotes project. Now you use it to compare versions. After the AI generates new code, save the file and run:

git diff smartnotes/search.py

Here is what the diff between Round 1 and Round 2 of search_notes looks like:

@@ -1,8 +1,11 @@
def search_notes(notes: list[Note], term: str) -> list[Note]:
"""Return notes whose title or body contains the search term."""
+ if not term:
+ return list(notes)
results: list[Note] = []
+ lower_term: str = term.lower()
for note in notes:
- if term in note.title or term in note.body:
+ if lower_term in note.title.lower() and lower_term in note.body.lower():
results.append(note)
return results

Let us break this down piece by piece.


Reading the Three Parts

1. The Location Header

@@ -1,8 +1,11 @@

This tells you where the changes are. -1,8 means the old version started at line 1 and was 8 lines long. +1,11 means the new version starts at line 1 and is 11 lines long. The function grew by 3 lines.

You do not need to memorize this format. The important takeaway: the @@ line tells you the neighborhood of the change.

2. Context Lines (No Prefix)

 def search_notes(notes: list[Note], term: str) -> list[Note]:
"""Return notes whose title or body contains the search term."""

Lines with no + or - prefix are unchanged. They provide context so you can orient yourself in the file.

3. Changed Lines

+    if not term:
+ return list(notes)

Lines with + were added. These two lines handle the empty-term case.

-        if term in note.title or term in note.body:
+ if lower_term in note.title.lower() and lower_term in note.body.lower():

A - line followed by a + line means that line was replaced. The old matching condition was removed and a new one was added. This is where the regression lives: or became and.


What to Look For in a Diff

When reviewing an AI-generated diff, ask three questions:

  1. What was the intent? You asked for case-insensitive matching and empty-term handling. Do the + lines accomplish that?
  2. What else changed? Look for - lines that removed code you did not ask the AI to change. The or to and swap is an unintended change.
  3. How much changed? Count the + and - lines relative to the total function length. This feeds into the 30% heuristic.

The 30% Heuristic

Not every problem is best solved by re-prompting. Sometimes typing the fix yourself is faster. The 30% heuristic helps you decide:

Scope of Change NeededStrategyWhy
Less than 30% of the code needs to changeFix manuallyFaster than writing a re-prompt and waiting for output. You can see the fix; just type it.
30% to 70% of the code needs to changeRe-prompt with specificsToo much to fix by hand comfortably, but the structure is sound. Give the AI targeted instructions.
More than 70% of the code is wrongStart overThe AI misunderstood the core requirement. A new prompt with better constraints will be faster than patching.

In the search_notes Round 2 regression, one word needs to change: and to or. That is far less than 30%. Fix it manually. Open the file, change the word, save, run pytest. Done in ten seconds.

Compare that to writing a re-prompt, waiting for the AI to regenerate, and hoping it does not change something else. For a one-word fix, the manual approach wins.


Applying the Heuristic: Three Scenarios

Scenario A: One-Line Typo

The AI wrote lower_term in note.titl.lower() (missing the e in title). This is a single character fix. Fix manually. Do not re-prompt for a typo.

Scenario B: Missing Feature

The AI wrote search_notes but never implemented tag searching. You need a new loop that iterates over note.tags and checks for matches. That is roughly 40% of the function body. Re-prompt with specifics: describe the tag-matching behavior, include the failing test output, and let the AI rewrite the relevant section.

Scenario C: Wrong Data Structure

The AI returned a dict[str, Note] instead of list[Note]. The function signature is wrong, the loop is wrong, and every return statement is wrong. More than 70% of the code misses the mark. Start over with a clearer prompt that specifies the return type explicitly.


Beyond "Tests Pass": A Quality Checklist

Tests verify behavior. They do not verify everything. Here is a lightweight checklist to apply after all tests pass, borrowed from the evaluation rubric in Chapter 46 Lesson 4:

CheckQuestionExample Issue
Type annotationsAre all variables and parameters annotated?results = [] should be results: list[Note] = []
DocstringDoes the function have a clear docstring?Missing or generic ("Does stuff")
No dead codeAre there commented-out lines or unused variables?# old_results = [] left behind
NamingDo variable names describe their purpose?x instead of lower_term
ConsistencyDoes the style match the rest of your codebase?Mixing single and double quotes

If any check fails but all tests pass, you have a choice: fix it manually (usually fastest) or mention it in your next re-prompt if you are already iterating.


PRIMM-AI+ Practice: Diff Reading

Predict [AI-FREE]

Look at this diff without running anything. Answer these questions with a confidence score (1-5):

@@ -3,6 +3,8 @@
def count_tags(notes: list[Note]) -> dict[str, int]:
"""Count how many notes contain each tag."""
counts: dict[str, int] = {}
+ if not notes:
+ return counts
for note in notes:
for tag in note.tags:
- counts[tag] = counts.get(tag, 0) + 1
+ lower_tag: str = tag.lower()
+ counts[lower_tag] = counts.get(lower_tag, 0) + 1
return counts
  1. How many lines were added? How many were removed?
  2. What two features were added by this change?
  3. Could this change introduce a regression? If so, which behavior might break?
Check your predictions

Answer 1: Four lines were added (+ prefix), one line was removed (- prefix). Net change: three new lines.

Answer 2: Two features: (a) early return for empty note lists, and (b) case-insensitive tag counting via .lower().

Answer 3: Yes. If existing tests expect tags to preserve their original case in the output dictionary (e.g., {"Python": 2}), the .lower() change will produce {"python": 2} instead. Any test asserting an uppercase key will fail. This is the same class of regression as the and/or issue: a fix for one requirement (case-insensitive counting) can break tests that assumed case-sensitive keys.

Run

If you have a count_tags function in your SmartNotes project, apply the changes shown in the diff manually. Run uv run pytest. Record which tests pass and which fail.

Investigate

For any failures, determine: is this a regression from the .lower() change, or was the test already failing? Update your tracking table.

Modify

Apply the 30% heuristic to any failures. Is the fix less than 30% of the code? If so, fix manually. If not, write a targeted re-prompt.

Make [Mastery Gate]

Without guidance, read this diff and answer the three questions (lines added/removed, features changed, regression risk):

@@ -1,7 +1,9 @@
def filter_notes_by_tag(notes: list[Note], tag: str) -> list[Note]:
"""Return notes that contain the given tag."""
+ if not tag:
+ return []
matches: list[Note] = []
for note in notes:
- if tag in note.tags:
+ if tag.lower() in [t.lower() for t in note.tags]:
matches.append(note)
return matches

Write your analysis, then check it against a classmate or the AI.


Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Diff Explanation

Explain this git diff to me line by line. For each + or - line,
tell me what changed and why it might have been changed:

@@ -3,6 +3,8 @@
def count_tags(notes: list[Note]) -> dict[str, int]:
counts: dict[str, int] = {}
+ if not notes:
+ return counts
for note in notes:
for tag in note.tags:
- counts[tag] = counts.get(tag, 0) + 1
+ lower_tag: str = tag.lower()
+ counts[lower_tag] = counts.get(lower_tag, 0) + 1
return counts

Compare the AI's explanation to your own analysis from the PRIMM-AI+ Predict section. Did the AI identify the same regression risk you found?

Prompt 2: Heuristic Application

I have a 15-line Python function where 3 lines need to change
to fix a bug. Should I fix it manually, re-prompt the AI,
or start over? Explain your reasoning.

What you're learning: You are calibrating the 30% heuristic. 3 out of 15 lines is 20%, which falls in the "fix manually" range. The AI should agree, but check whether its reasoning matches the framework from this lesson.


Key Takeaways

  1. Diffs show you exactly what changed. Lines with + were added, lines with - were removed, and lines with no prefix are context. Focus on the + and - lines first.

  2. The @@ header tells you where changes are. You do not need to memorize the format; just know it shows the line numbers and size of each changed section.

  3. The 30% heuristic guides your response. Under 30% changed: fix manually. 30% to 70%: re-prompt with specifics. Over 70%: start over. The heuristic prevents wasting time on re-prompts when a quick edit is faster.

  4. Tests verify behavior; the quality checklist verifies craft. After all tests pass, check for type annotations, docstrings, dead code, naming, and consistency. These are the details that separate working code from professional code.


Looking Ahead

You can now read diffs, track errors across rounds, and decide when to fix manually. Lesson 4 puts everything together: a full three-round capstone on SmartNotes where you run the complete feedback loop from first prompt to all tests green.