مرکزی مواد پر جائیں

Multi-Round Iteration

James finishes his Level 2 re-prompt from the previous lesson. He runs uv run pytest and watches the output scroll by. The two tests that were failing now pass. He grins.

Then he reads the rest of the output. A test that passed before, test_search_no_matches, is now failing.

"It broke something," he says, staring at the screen.

Emma nods. "That happens. The AI changed the matching logic to handle case insensitivity, but it also changed how the function handles the 'no match' path. One fix, one new bug." She pulls up a chair. "This is why iteration is usually two or three rounds, not one. The skill is not getting it right on the first re-prompt. The skill is tracking what changed and converging."

If you're new to programming

When you fix one problem and accidentally create another, that is called a regression. It is not a sign that something is broken beyond repair. It means the fix was too broad or touched code it should not have. Regressions happen to professional developers every day. The difference between a beginner and a professional is that the professional expects them and has a process for catching them: tests.

If you have experience with code reviews

You already know that patches can introduce regressions. In AI-assisted development, the pattern is the same, but the "developer" making the change is the AI, and your tests are the code review. This lesson formalizes the tracking process so you can iterate efficiently rather than ping-ponging between fixes.


Why Fixes Introduce New Bugs

In Chapter 46 Lesson 4, a note mentioned that fixes can introduce new failures: "Sometimes the AI's fix breaks something that was already working." That chapter deferred the topic because you were learning single-round re-prompting. Now you are ready.

Here is what typically happens. The AI receives your re-prompt asking it to make the search case-insensitive. To do this, it rewrites the matching logic. The original code might have been:

def search_notes(notes: list[Note], term: str) -> list[Note]:
"""Return notes whose title or body contains the search term."""
results: list[Note] = []
for note in notes:
if term in note.title or term in note.body:
results.append(note)
return results

The AI's fix for case insensitivity might produce:

def search_notes(notes: list[Note], term: str) -> list[Note]:
"""Return notes whose title or body contains the search term."""
results: list[Note] = []
lower_term: str = term.lower()
for note in notes:
if lower_term in note.title.lower() and lower_term in note.body.lower():
results.append(note)
return results

Spot the bug? The AI changed or to and. Now a note must contain the term in both title AND body to match, instead of either one. The case-insensitivity fix is correct, but the logical operator changed as a side effect.

This is a classic regression: the AI rewrote more than it needed to, and the extra change introduced a new error.


Tracking Errors Across Rounds

The key to multi-round iteration is tracking. Without a record, you lose context about what was passing, what broke, and what you already tried. Here is a simple table format:

Test NameRound 1Round 2Round 3
test_search_exact_title_matchPASSPASSPASS
test_search_case_insensitiveFAIL (Omission)PASSPASS
test_search_in_bodyPASSFAIL (Logic)PASS
test_search_no_matchesPASSPASSPASS
test_search_empty_listPASSPASSPASS
test_search_empty_termFAIL (Misinterpretation)PASSPASS

Each cell records PASS or FAIL plus the Error Taxonomy category from Chapter 43. The table shows you the trajectory: Round 1 had two failures. Round 2 fixed both but introduced a regression on test_search_in_body. Round 3 fixed the regression without breaking anything else.


The Three-Round Walkthrough

Let us trace the full search_notes iteration.

Round 1: The Initial Prompt

You wrote a Level 1 prompt (from Lesson 1). The AI produced a case-sensitive search that did not handle empty terms. Result: 4 of 6 tests pass.

Failing tests:

  • test_search_case_insensitive: Omission (the AI never lowercased anything)
  • test_search_empty_term: Misinterpretation (the AI returned [] for empty terms)

Round 2: The Fix That Breaks Something

Your re-prompt from Lesson 1 asked for case-insensitive matching and empty-term handling. The AI rewrote the function. It added .lower() calls and an early return for empty terms. But in the rewrite, it changed or to and in the matching condition.

New result: 5 of 6 pass.

Failing test:

  • test_search_in_body: Logic error (the and vs or change means notes that match only in the body are excluded)

Previously failing tests: Both now pass.

This is progress. You went from 4/6 to 5/6. But the new failure is a regression: something that worked before broke.

Round 3: The Targeted Fix

Your re-prompt for Round 3 should be surgical:

The search_notes function has a logic error on the matching condition.
The current code uses `and` to combine the title and body checks,
but it should use `or`. A note should match if the term appears
in the title OR the body, not both.

Failing test:
FAILED test_search_in_body - assert 0 == 1

The note has title="Cooking Tips" and body="Use fresh herbs."
Searching for "herbs" should return this note because "herbs"
appears in the body. The current code requires it to appear
in both title and body.

Change `and` to `or` on the matching line. Do not change
anything else.

Notice the last line: "Do not change anything else." When you identify a one-word fix, telling the AI to limit its changes reduces the risk of another regression.

Result: 6 of 6 pass. Iteration complete.


SmartNotes search_notes Across Three Rounds

Here is the function at each round, so you can see how it evolved:

Round 1 output (case-sensitive, no empty-term handling):

def search_notes(notes: list[Note], term: str) -> list[Note]:
"""Return notes whose title or body contains the search term."""
results: list[Note] = []
for note in notes:
if term in note.title or term in note.body:
results.append(note)
return results

Round 2 output (case-insensitive, but and regression):

def search_notes(notes: list[Note], term: str) -> list[Note]:
"""Return notes whose title or body contains the search term."""
if not term:
return list(notes)
results: list[Note] = []
lower_term: str = term.lower()
for note in notes:
if lower_term in note.title.lower() and lower_term in note.body.lower():
results.append(note)
return results

Round 3 output (all tests pass):

def search_notes(notes: list[Note], term: str) -> list[Note]:
"""Return notes whose title or body contains the search term."""
if not term:
return list(notes)
results: list[Note] = []
lower_term: str = term.lower()
for note in notes:
if lower_term in note.title.lower() or lower_term in note.body.lower():
results.append(note)
return results

The only difference between Round 2 and Round 3 is one word: and became or. But finding that one word required reading the code, understanding the logic, and writing a precise re-prompt.


When to Stop Iterating

Three rounds is typical for a well-specified function. If you are on round 4 or 5 and still failing tests, one of these is happening:

  • Your tests conflict. Two tests expect contradictory behavior. Review your test logic.
  • Your prompt is missing critical context. The AI cannot converge because it does not understand a constraint you have not stated.
  • The function is too complex for one prompt. Break it into smaller functions and iterate on each one separately.

If you are not making progress (the number of passing tests is not increasing), stop re-prompting. Read the code, understand the problem, and either fix it manually or start over with a better prompt. Lesson 3 covers this judgment call.


PRIMM-AI+ Practice: Regression Tracking

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode before predicting.

Look at the Round 1 output of search_notes above. Suppose you re-prompt the AI to add case-insensitive matching. Before running anything, predict:

  1. Which tests will now pass that were failing before? (Confidence: 1-5)
  2. Will any currently passing tests break? If so, which ones? (Confidence: 1-5)
  3. How many total rounds will you need to get to 6/6? (Confidence: 1-5)

Write your predictions down.

Check your predictions

Prediction 1: test_search_case_insensitive should now pass, since you explicitly asked for case-insensitive matching.

Prediction 2: There is a real risk of regression. The AI must rewrite the matching line to add .lower() calls. If it rewrites too aggressively (changing the logical operator, altering the loop structure, or modifying the return type), a previously passing test could break. The most likely regressions are on test_search_in_body or test_search_exact_title_match, since those depend on the matching condition.

Prediction 3: Two or three rounds is typical. One round for the initial generation, one for the case-sensitivity fix, and possibly one more if a regression appears.

Run

Press Shift+Tab to exit Plan Mode.

Using the Round 1 output in your own SmartNotes project, write a re-prompt asking for case-insensitive matching and empty-term handling. Run uv run pytest. Record the results in a tracking table like the one shown earlier.

Investigate

In Claude Code, type /investigate and ask about any new failures. For tests that passed in Round 1 but fail in Round 2, classify them:

  • Regression from rewrite: the AI changed code it should not have touched
  • Incomplete fix: the AI fixed part of the problem but missed an edge case
  • New misinterpretation: the AI understood your re-prompt differently than you intended

Write one sentence explaining which category applies.

Modify

Write a Round 3 re-prompt that targets only the regression. Include the instruction "Do not change anything else" if the fix is a single-line change. Run pytest. Did you converge to 6/6?

Make [Mastery Gate]

Without guidance, run a complete multi-round iteration on this function:

def count_notes_by_tag(notes: list[Note]) -> dict[str, int]:
"""Count how many notes contain each tag. Case-insensitive tag matching."""
...

Write four tests first (at least one for case-insensitive tags, one for duplicate tags across notes, one for an empty list, one for a note with no tags). Then prompt, iterate, and track your rounds in a table. You should converge in two or three rounds.


Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Diagnose a Regression

I asked you to make search_notes case-insensitive, and now
test_search_in_body is failing. Here is the current code:

def search_notes(notes: list[Note], term: str) -> list[Note]:
if not term:
return list(notes)
results: list[Note] = []
lower_term: str = term.lower()
for note in notes:
if lower_term in note.title.lower() and lower_term in note.body.lower():
results.append(note)
return results

The test expects that searching for "herbs" returns a note with
body="Use fresh herbs." but the function returns an empty list.
What changed that caused this regression?

Read the AI's diagnosis. Did it identify the and vs or issue? Compare its explanation to the one in this lesson.

Prompt 2: Evaluate a Re-prompt

Which of these two re-prompts is more likely to fix the regression
without breaking other tests?

Re-prompt A: "Fix the search function, test_search_in_body is failing."

Re-prompt B: "On line 7 of search_notes, change 'and' to 'or' in the
matching condition. The note should match if the term appears in the
title OR the body, not both. Do not change any other lines."

Explain your reasoning.

What you're learning: You are building judgment about re-prompt specificity. A surgical re-prompt that names the exact line and exact change is safer than a vague one that gives the AI freedom to rewrite more than necessary.

Prompt 3: Track Test Results Across Iterations

I am iterating on a function and want to track my progress.
Here are my test results across two rounds:

Round 1: test_a PASS, test_b FAIL, test_c PASS, test_d FAIL
Round 2: test_a PASS, test_b PASS, test_c FAIL, test_d PASS

test_c passed in Round 1 but failed in Round 2. Is this a
regression? What is the most likely cause, and what should
my Round 3 re-prompt say to fix test_c without breaking
test_b and test_d?

What you're learning: You are practicing the skill of reading a tracking table and diagnosing regressions, which is the core competency of multi-round iteration.



James tapped his tracking table on the screen. "This is just a discrepancy log. At the distribution center, every inbound shipment got checked against the PO. If something was wrong, you logged it, fixed it, and checked again. You never just said 'fix the shipment' and hoped for the best."

"Surgical re-prompts," Emma said. "Name the exact line, name the exact change. Same as writing a specific correction on the discrepancy report instead of 'please fix.'"

"And if you're past round three and still not converging, the problem isn't the warehouse. It's the purchase order." James paused. "The tests or the prompt need rethinking."

Emma was quiet for a moment. "I'm honestly not sure where the cutoff is. Two rounds feels too generous sometimes, four rounds feels like too many. I keep landing on three as the 'something is wrong' signal, but I don't have a principled reason for it."

"Three bad shipments from the same vendor and you call a meeting," James said. "It's not a rule, it's a pattern."

"Fair enough." Emma pulled up a new screen. "So you can track regressions and write surgical re-prompts. But right now you're reading the entire function to figure out what changed. There's a faster way: diffs show you exactly which lines moved, and there's a heuristic for when to stop re-prompting and just fix the code yourself."