Skip to main content

Generate, Verify, Debug

If you're new to programming

This is the first time you run the full TDG cycle without step-by-step instructions. You have the stub (L1) and tests (L2). Now you prompt AI, run the verification stack (ruffpyrightpytest), and handle whatever happens. If tests fail, use the debugging loop from Chapter 56. If you need to re-prompt, use the iteration approach from Chapter 53.

If you've coded before

The verification stack runs in order: ruff check (fast lint), pyright (type check), pytest -v (behavioral verification). If AI modifies your tests despite the prompt saying "do not modify the tests," use git diff to detect it and restore from your commit.

James has the stub file open in one pane and the test file in the other. Fourteen tests, all RED. The function signature is complete, the docstring captures five design decisions, and uv run pyright passes cleanly on both files.

"You have written the purchase order and the acceptance criteria," Emma says. "Now send it to the supplier."

James opens Claude Code and starts typing a prompt. Then he stops. "Should I just say 'implement this function'?"

"Almost. But first, protect your specification. Commit both files to git before you prompt. If the AI modifies your tests, you can revert. The tests are the contract. You do not let the supplier rewrite the acceptance criteria."

Reading New Code? Use PRIMM-AI+

When you encounter new Python syntax in this chapter, use the PRIMM-AI+ method from Chapter 42: Predict what the code does before running it [AI-FREE]. Rate your confidence (1-5). Run it to check your prediction. Investigate any surprises.


Protect the Specification

You are doing exactly what James is doing. You have a stub and a test suite. Before prompting AI, commit both files to git:

git add smartnotes_search.py test_smartnotes_search.py
git commit -m "spec: search_notes stub and test suite (all RED)"

Output:

[main abc1234] spec: search_notes stub and test suite (all RED)
2 files changed, 95 insertions(+)

This commit does two things. First, it creates a restore point. If the AI modifies your tests (it sometimes does, despite being told not to), you run git diff test_smartnotes_search.py and see exactly what changed. Second, it separates your work from the AI's work. The commit history tells a story: "I specified, then AI implemented."


The Prompt

Open Claude Code and write this prompt:

Implement the search_notes function in smartnotes_search.py
so that all tests in test_smartnotes_search.py pass.

Do not modify the tests.
Do not modify the function signature or docstring.

Four sentences. The first says what to do. The second and third protect your specification. The fourth protects your design decisions. This is the prompt pattern for TDG: implement what I specified, do not change my specification.

Send it.


The Verification Stack

The AI writes code. Now you verify. Run three tools in sequence, each catching different problems:

ToolWhat it catchesCommand
ruffStyle violations, unused imports, syntax issuesuv run ruff check smartnotes_search.py
pyrightType errors, wrong return types, missing annotationsuv run pyright smartnotes_search.py
pytestBehavioral failures, wrong outputs, broken logicuv run pytest test_smartnotes_search.py -v

Run them in order. Each one gates the next: there is no point running type checks on code with syntax errors, and no point running tests on code with type errors.

uv run ruff check smartnotes_search.py

Output:

All checks passed!
uv run pyright smartnotes_search.py

Output:

0 errors, 0 warnings, 0 informations
uv run pytest test_smartnotes_search.py -v

Here is where it gets interesting. On James's first attempt, the output looked like this:

Output:

test_smartnotes_search.py::test_keyword_in_title_matches PASSED
test_smartnotes_search.py::test_keyword_in_body_matches PASSED
test_smartnotes_search.py::test_case_insensitive_matching PASSED
test_smartnotes_search.py::test_tag_filter_narrows_results PASSED
test_smartnotes_search.py::test_keyword_and_tag_combined PASSED
test_smartnotes_search.py::test_title_matches_ranked_before_body FAILED
test_smartnotes_search.py::test_empty_keyword_returns_all PASSED
test_smartnotes_search.py::test_empty_notes_list_returns_empty PASSED
test_smartnotes_search.py::test_no_matches_returns_empty_list PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[Python-Python Tips] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[pasta-Recipe Ideas] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[debugging-Debugging Guide] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[migration-Meeting Notes] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[comprehensions-Python Tips] PASSED

13 passed, 1 failed in 0.04s

Thirteen passed. One failed. The AI got the basic matching right, the case insensitivity right, the edge cases right. But it missed the ordering rule: title matches should come before body-only matches.


When Tests Fail: Diagnose First

Do not re-prompt immediately. Read the failure message first:

FAILED test_smartnotes_search.py::test_title_matches_ranked_before_body
AssertionError: assert 2 < 1

The assertion says the last title-match index (2) was NOT less than the first body-only-match index (1). Translation: a body-only match appeared before a title match in the results. The AI's implementation found all matching notes but returned them in the wrong order.

This is a diagnosis, not a guess. The test told you exactly what went wrong. The debugging chapter (Chapter 56) taught you to read the assertion message before looking at the code. Apply that here.


When to Re-Prompt vs. Fix Manually

You have a choice: fix the code yourself or ask the AI to fix it. Use the 30% heuristic from Chapter 53:

SituationAction
You understand the fix and it is fewer than 5 linesFix it yourself
The fix requires restructuring logic you did not writeRe-prompt the AI
More than 30% of tests failedRe-prompt -- the implementation has a fundamental flaw

One test failed out of fourteen. The fix is a sorting step. But James did not write this implementation, and the sorting logic involves separating title matches from body matches. Re-prompting is the right call:

One test fails: test_title_matches_ranked_before_body.
The results are not sorted correctly. Title matches
must appear before body-only matches.

Fix the ordering in search_notes without modifying the tests.

This prompt tells the AI which test fails, what the expected behavior is, and what constraint still applies (do not modify the tests). Run the verification stack again:

uv run ruff check smartnotes_search.py && uv run pyright smartnotes_search.py && uv run pytest test_smartnotes_search.py -v

Output:

All checks passed!
0 errors, 0 warnings, 0 informations

test_smartnotes_search.py::test_keyword_in_title_matches PASSED
test_smartnotes_search.py::test_keyword_in_body_matches PASSED
test_smartnotes_search.py::test_case_insensitive_matching PASSED
test_smartnotes_search.py::test_tag_filter_narrows_results PASSED
test_smartnotes_search.py::test_keyword_and_tag_combined PASSED
test_smartnotes_search.py::test_title_matches_ranked_before_body PASSED
test_smartnotes_search.py::test_empty_keyword_returns_all PASSED
test_smartnotes_search.py::test_empty_notes_list_returns_empty PASSED
test_smartnotes_search.py::test_no_matches_returns_empty_list PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[Python-Python Tips] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[pasta-Recipe Ideas] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[debugging-Debugging Guide] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[migration-Meeting Notes] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[comprehensions-Python Tips] PASSED

14 passed in 0.04s

GREEN. All fourteen tests pass. The implementation matches the specification.


Read the Code You Did Not Write

GREEN does not mean "done." It means the implementation satisfies the tests you wrote. Before you trust it, read it. Use the PRIMM approach from Chapter 42: predict what the code does for an input that is NOT in your test suite.

Open smartnotes_search.py and read the generated implementation. Then pick an input not covered by any test:

Your prediction task: What does search_notes(sample_notes, keyword="the") return? The word "the" appears in the body of "Recipe Ideas" ("Try the new pasta recipe from the cookbook"). Does it appear anywhere else?

Check your prediction by adding a quick test or running it in a Python shell:

from smartnotes_search import Note, search_notes

notes = [
Note(title="Python Tips", body="Learn about list comprehensions.", word_count=42, tags=["python"]),
Note(title="Recipe Ideas", body="Try the new pasta recipe from the cookbook.", word_count=35, tags=["cooking"]),
]
result = search_notes(notes, keyword="the")
print([n.title for n in result])

Output:

['Recipe Ideas']

The implementation found "the" in the body of "Recipe Ideas" and returned it. This matches the specification: keyword matching is case-insensitive and searches both title and body. If the function had returned an empty list, you would have found a bug that none of your nine tests caught.


The Trust Gap

A green bar means the code passes YOUR tests. It does not mean the code is correct for all inputs. This gap between "passes tests" and "correct for all inputs" is the trust gap.

Three things to check after GREEN:

CheckWhat to look forWhy it matters
Hardcoded valuesDoes the code check for specific strings like "Python" instead of using the keyword parameter?A hardcoded implementation passes existing tests but fails on any new input
Missing algorithmDoes the code use in for matching or something more complex?The docstring says "case-insensitive," so the code should use .lower() or casefold()
Untested pathsWhat happens with special characters in the keyword? Unicode?Your tests used ASCII words; the function might break on "cafe\u0301"

Skim the generated code with these questions in mind. If you spot a hardcoded value, write a test that exposes it. If you spot a missing .lower(), that is a bug your tests did not catch. Add the test, watch it fail, then fix it.


Document the Cycle

The TDG cycle is not just about reaching GREEN. It is about building a habit you can repeat. Write a brief log of what happened:

## TDG Cycle Log: search_notes

Prompt 1: "Implement search_notes so all tests pass. Do not modify tests."
Result: 13/14 passed. Ordering test failed.
Diagnosis: Body-only matches appeared before title matches.

Prompt 2: "Fix ordering. Title matches must appear before body-only matches."
Result: 14/14 passed. GREEN.

Post-GREEN check: Predicted output for keyword="the". Correct.
Reviewed code for hardcoded values: none found.
Trust gap: no test for special characters. Added to backlog.

This log takes two minutes to write. It captures what you prompted, what failed, how you diagnosed it, and what you still need to check. When you repeat this cycle on a different function, you will get faster because you have a record of what worked.


PRIMM-AI+ Practice: Predict the First Failure

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode.

You have the search_notes stub and the nine tests from Lesson 2. Before prompting AI, predict:

  1. Will AI get a correct implementation on the first try?
  2. Which test is most likely to fail? Why?
  3. What kind of error will the failure message show?

Write your predictions on paper or in Plan Mode. Rate your confidence from 1 to 5.

Consider: matching and filtering are straightforward for an AI model. Ordering (title matches before body-only matches) requires a two-pass or sorting strategy. Edge cases like empty keyword returning all notes require explicit handling. Which of these is the AI most likely to miss?

Run

Press Shift+Tab to exit Plan Mode.

Prompt Claude Code with the implementation request. Run the verification stack. Compare the results to your predictions.

If the AI passed all tests on the first try, check whether your predictions were wrong or whether this particular AI session happened to produce a clean implementation. Either outcome teaches you something: prediction calibration improves with practice.

Investigate

Pick the test that failed (or the one you predicted would fail). In Claude Code, ask:

/investigate @smartnotes_search.py
This test failed: test_title_matches_ranked_before_body.
Show me the part of search_notes that handles ordering.
Explain why it produced the wrong order.

Compare the AI's explanation to your own diagnosis from reading the assertion message. Did the AI identify the same root cause you did?

Modify

Add a 10th test to the suite. Test this edge case: a keyword that appears in both the title AND the body of the same note. Should that note appear once or twice in the results?

def test_keyword_in_title_and_body_appears_once(sample_notes: list[Note]) -> None:
# "Python" appears in title "Python Tips" AND body "Learn about..."
# for the Debugging Guide, "Python" is in the title and "Python" in body
results = search_notes(sample_notes, keyword="Python")
titles = [note.title for note in results]
# Each note should appear at most once
assert len(titles) == len(set(titles))

Predict: does the existing implementation pass this test without changes? Run it. If it fails, you found a bug the original nine tests missed.

Make [Mastery Gate]

Document your complete TDG cycle for search_notes in a file called tdg_cycle_log.md:

  • What you prompted (exact text)
  • What the AI generated (summary, not full code)
  • Which tests failed and why
  • How you re-prompted (if needed)
  • Your post-GREEN review: any hardcoded values? Any missing edge cases?
  • The 10th test you added and whether it passed

This documentation is the mastery gate. The cycle log proves you drove the process, not the AI.


Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: What Did You Learn From the Failure?

After completing the TDG cycle (whether the AI passed on the first try or required re-prompting), describe your experience to Claude Code:

I ran the TDG cycle on search_notes. Here is what happened:
[describe: which tests passed, which failed, how you fixed it]

What pattern should I watch for next time I run this cycle
on a different function?

Read the AI's response. It may identify a pattern you had not noticed, such as "ordering tests tend to fail first because sorting logic is harder to infer from a docstring than filtering logic."

What you're learning: Reflection turns experience into transferable knowledge. The AI acts as a sounding board that helps you extract a general lesson from a specific experience. You are building a personal playbook for the TDG cycle.

Prompt 2: Predict for an Untested Input

Ask Claude Code to predict the result for an input not in your test suite:

Here is my search_notes implementation.

[paste the generated code]

What does search_notes(sample_notes, keyword=" ") return?
(That is a keyword of three spaces, not empty string.)

Predict the result before I run it.

Run the function with that input and compare to the AI's prediction. If they disagree, one of them is wrong. Figure out which.

What you're learning: You are practicing the trust gap. The AI's prediction is a hypothesis, not a fact. Testing it against actual execution builds your instinct for when to trust generated code and when to verify further.

Prompt 3: Is This Implementation Hardcoded?

Paste the generated implementation and ask:

Review this implementation of search_notes. Is it using a
real algorithm (case-insensitive substring matching, sorting
by match location), or is it hardcoded to pass the specific
test values? How can I tell?

Read the AI's analysis. A real algorithm uses the keyword parameter in comparisons (e.g., keyword.lower() in note.title.lower()). A hardcoded implementation would check for specific strings like "Python" or "pasta".

What you're learning: You are building the skill of code review for AI-generated code. The question is not "does it pass tests?" but "will it work for inputs I have not tested?" This is the core skill of working with AI-generated implementations: trust, but verify.


James leans back and looks at the terminal. Fourteen green dots.

"In the warehouse," he says, "when a shipment arrived, we did not just count the boxes. We opened them. Checked the part numbers against the purchase order. Weighed the critical items. The green bar is the box count. Reading the code and testing untested inputs, that is opening the boxes."

Emma pauses. "Your 'inspect the delivery' framing is cleaner than my usual explanation. I usually say 'review the code' but that sounds passive. 'Inspect' implies you are looking for specific defects. I am going to use that."

She pulls up a blank file. "You drove the cycle once with guidance nearby. Lesson 4 is the capstone: a fresh SmartNotes feature, start to finish, no guidance at all. Just you, the loop, and the tests."