Generate, Verify, Debug
This is the first time you run the full TDG cycle without step-by-step instructions. You have the stub (L1) and tests (L2). Now you prompt AI, run the verification stack (ruff → pyright → pytest), and handle whatever happens. If tests fail, use the debugging loop from Chapter 56. If you need to re-prompt, use the iteration approach from Chapter 53.
The verification stack runs in order: ruff check (fast lint), pyright (type check), pytest -v (behavioral verification). If AI modifies your tests despite the prompt saying "do not modify the tests," use git diff to detect it and restore from your commit.
James has the stub file open in one pane and the test file in the other. Fourteen tests, all RED. The function signature is complete, the docstring captures five design decisions, and uv run pyright passes cleanly on both files.
"You have written the purchase order and the acceptance criteria," Emma says. "Now send it to the supplier."
James opens Claude Code and starts typing a prompt. Then he stops. "Should I just say 'implement this function'?"
"Almost. But first, protect your specification. Commit both files to git before you prompt. If the AI modifies your tests, you can revert. The tests are the contract. You do not let the supplier rewrite the acceptance criteria."
When you encounter new Python syntax in this chapter, use the PRIMM-AI+ method from Chapter 42: Predict what the code does before running it [AI-FREE]. Rate your confidence (1-5). Run it to check your prediction. Investigate any surprises.
Protect the Specification
You are doing exactly what James is doing. You have a stub and a test suite. Before prompting AI, commit both files to git:
git add smartnotes_search.py test_smartnotes_search.py
git commit -m "spec: search_notes stub and test suite (all RED)"
Output:
[main abc1234] spec: search_notes stub and test suite (all RED)
2 files changed, 95 insertions(+)
This commit does two things. First, it creates a restore point. If the AI modifies your tests (it sometimes does, despite being told not to), you run git diff test_smartnotes_search.py and see exactly what changed. Second, it separates your work from the AI's work. The commit history tells a story: "I specified, then AI implemented."
The Prompt
Open Claude Code and write this prompt:
Implement the search_notes function in smartnotes_search.py
so that all tests in test_smartnotes_search.py pass.
Do not modify the tests.
Do not modify the function signature or docstring.
Four sentences. The first says what to do. The second and third protect your specification. The fourth protects your design decisions. This is the prompt pattern for TDG: implement what I specified, do not change my specification.
Send it.
The Verification Stack
The AI writes code. Now you verify. Run three tools in sequence, each catching different problems:
| Tool | What it catches | Command |
|---|---|---|
ruff | Style violations, unused imports, syntax issues | uv run ruff check smartnotes_search.py |
pyright | Type errors, wrong return types, missing annotations | uv run pyright smartnotes_search.py |
pytest | Behavioral failures, wrong outputs, broken logic | uv run pytest test_smartnotes_search.py -v |
Run them in order. Each one gates the next: there is no point running type checks on code with syntax errors, and no point running tests on code with type errors.
uv run ruff check smartnotes_search.py
Output:
All checks passed!
uv run pyright smartnotes_search.py
Output:
0 errors, 0 warnings, 0 informations
uv run pytest test_smartnotes_search.py -v
Here is where it gets interesting. On James's first attempt, the output looked like this:
Output:
test_smartnotes_search.py::test_keyword_in_title_matches PASSED
test_smartnotes_search.py::test_keyword_in_body_matches PASSED
test_smartnotes_search.py::test_case_insensitive_matching PASSED
test_smartnotes_search.py::test_tag_filter_narrows_results PASSED
test_smartnotes_search.py::test_keyword_and_tag_combined PASSED
test_smartnotes_search.py::test_title_matches_ranked_before_body FAILED
test_smartnotes_search.py::test_empty_keyword_returns_all PASSED
test_smartnotes_search.py::test_empty_notes_list_returns_empty PASSED
test_smartnotes_search.py::test_no_matches_returns_empty_list PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[Python-Python Tips] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[pasta-Recipe Ideas] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[debugging-Debugging Guide] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[migration-Meeting Notes] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[comprehensions-Python Tips] PASSED
13 passed, 1 failed in 0.04s
Thirteen passed. One failed. The AI got the basic matching right, the case insensitivity right, the edge cases right. But it missed the ordering rule: title matches should come before body-only matches.
When Tests Fail: Diagnose First
Do not re-prompt immediately. Read the failure message first:
FAILED test_smartnotes_search.py::test_title_matches_ranked_before_body
AssertionError: assert 2 < 1
The assertion says the last title-match index (2) was NOT less than the first body-only-match index (1). Translation: a body-only match appeared before a title match in the results. The AI's implementation found all matching notes but returned them in the wrong order.
This is a diagnosis, not a guess. The test told you exactly what went wrong. The debugging chapter (Chapter 56) taught you to read the assertion message before looking at the code. Apply that here.
When to Re-Prompt vs. Fix Manually
You have a choice: fix the code yourself or ask the AI to fix it. Use the 30% heuristic from Chapter 53:
| Situation | Action |
|---|---|
| You understand the fix and it is fewer than 5 lines | Fix it yourself |
| The fix requires restructuring logic you did not write | Re-prompt the AI |
| More than 30% of tests failed | Re-prompt -- the implementation has a fundamental flaw |
One test failed out of fourteen. The fix is a sorting step. But James did not write this implementation, and the sorting logic involves separating title matches from body matches. Re-prompting is the right call:
One test fails: test_title_matches_ranked_before_body.
The results are not sorted correctly. Title matches
must appear before body-only matches.
Fix the ordering in search_notes without modifying the tests.
This prompt tells the AI which test fails, what the expected behavior is, and what constraint still applies (do not modify the tests). Run the verification stack again:
uv run ruff check smartnotes_search.py && uv run pyright smartnotes_search.py && uv run pytest test_smartnotes_search.py -v
Output:
All checks passed!
0 errors, 0 warnings, 0 informations
test_smartnotes_search.py::test_keyword_in_title_matches PASSED
test_smartnotes_search.py::test_keyword_in_body_matches PASSED
test_smartnotes_search.py::test_case_insensitive_matching PASSED
test_smartnotes_search.py::test_tag_filter_narrows_results PASSED
test_smartnotes_search.py::test_keyword_and_tag_combined PASSED
test_smartnotes_search.py::test_title_matches_ranked_before_body PASSED
test_smartnotes_search.py::test_empty_keyword_returns_all PASSED
test_smartnotes_search.py::test_empty_notes_list_returns_empty PASSED
test_smartnotes_search.py::test_no_matches_returns_empty_list PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[Python-Python Tips] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[pasta-Recipe Ideas] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[debugging-Debugging Guide] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[migration-Meeting Notes] PASSED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[comprehensions-Python Tips] PASSED
14 passed in 0.04s
GREEN. All fourteen tests pass. The implementation matches the specification.
Read the Code You Did Not Write
GREEN does not mean "done." It means the implementation satisfies the tests you wrote. Before you trust it, read it. Use the PRIMM approach from Chapter 42: predict what the code does for an input that is NOT in your test suite.
Open smartnotes_search.py and read the generated implementation. Then pick an input not covered by any test:
Your prediction task: What does search_notes(sample_notes, keyword="the") return? The word "the" appears in the body of "Recipe Ideas" ("Try the new pasta recipe from the cookbook"). Does it appear anywhere else?
Check your prediction by adding a quick test or running it in a Python shell:
from smartnotes_search import Note, search_notes
notes = [
Note(title="Python Tips", body="Learn about list comprehensions.", word_count=42, tags=["python"]),
Note(title="Recipe Ideas", body="Try the new pasta recipe from the cookbook.", word_count=35, tags=["cooking"]),
]
result = search_notes(notes, keyword="the")
print([n.title for n in result])
Output:
['Recipe Ideas']
The implementation found "the" in the body of "Recipe Ideas" and returned it. This matches the specification: keyword matching is case-insensitive and searches both title and body. If the function had returned an empty list, you would have found a bug that none of your nine tests caught.
The Trust Gap
A green bar means the code passes YOUR tests. It does not mean the code is correct for all inputs. This gap between "passes tests" and "correct for all inputs" is the trust gap.
Three things to check after GREEN:
| Check | What to look for | Why it matters |
|---|---|---|
| Hardcoded values | Does the code check for specific strings like "Python" instead of using the keyword parameter? | A hardcoded implementation passes existing tests but fails on any new input |
| Missing algorithm | Does the code use in for matching or something more complex? | The docstring says "case-insensitive," so the code should use .lower() or casefold() |
| Untested paths | What happens with special characters in the keyword? Unicode? | Your tests used ASCII words; the function might break on "cafe\u0301" |
Skim the generated code with these questions in mind. If you spot a hardcoded value, write a test that exposes it. If you spot a missing .lower(), that is a bug your tests did not catch. Add the test, watch it fail, then fix it.
Document the Cycle
The TDG cycle is not just about reaching GREEN. It is about building a habit you can repeat. Write a brief log of what happened:
## TDG Cycle Log: search_notes
Prompt 1: "Implement search_notes so all tests pass. Do not modify tests."
Result: 13/14 passed. Ordering test failed.
Diagnosis: Body-only matches appeared before title matches.
Prompt 2: "Fix ordering. Title matches must appear before body-only matches."
Result: 14/14 passed. GREEN.
Post-GREEN check: Predicted output for keyword="the". Correct.
Reviewed code for hardcoded values: none found.
Trust gap: no test for special characters. Added to backlog.
This log takes two minutes to write. It captures what you prompted, what failed, how you diagnosed it, and what you still need to check. When you repeat this cycle on a different function, you will get faster because you have a record of what worked.
PRIMM-AI+ Practice: Predict the First Failure
Predict [AI-FREE]
Press Shift+Tab to enter Plan Mode.
You have the search_notes stub and the nine tests from Lesson 2. Before prompting AI, predict:
- Will AI get a correct implementation on the first try?
- Which test is most likely to fail? Why?
- What kind of error will the failure message show?
Write your predictions on paper or in Plan Mode. Rate your confidence from 1 to 5.
Consider: matching and filtering are straightforward for an AI model. Ordering (title matches before body-only matches) requires a two-pass or sorting strategy. Edge cases like empty keyword returning all notes require explicit handling. Which of these is the AI most likely to miss?
Run
Press Shift+Tab to exit Plan Mode.
Prompt Claude Code with the implementation request. Run the verification stack. Compare the results to your predictions.
If the AI passed all tests on the first try, check whether your predictions were wrong or whether this particular AI session happened to produce a clean implementation. Either outcome teaches you something: prediction calibration improves with practice.
Investigate
Pick the test that failed (or the one you predicted would fail). In Claude Code, ask:
/investigate @smartnotes_search.py
This test failed: test_title_matches_ranked_before_body.
Show me the part of search_notes that handles ordering.
Explain why it produced the wrong order.
Compare the AI's explanation to your own diagnosis from reading the assertion message. Did the AI identify the same root cause you did?
Modify
Add a 10th test to the suite. Test this edge case: a keyword that appears in both the title AND the body of the same note. Should that note appear once or twice in the results?
def test_keyword_in_title_and_body_appears_once(sample_notes: list[Note]) -> None:
# "Python" appears in title "Python Tips" AND body "Learn about..."
# for the Debugging Guide, "Python" is in the title and "Python" in body
results = search_notes(sample_notes, keyword="Python")
titles = [note.title for note in results]
# Each note should appear at most once
assert len(titles) == len(set(titles))
Predict: does the existing implementation pass this test without changes? Run it. If it fails, you found a bug the original nine tests missed.
Make [Mastery Gate]
Document your complete TDG cycle for search_notes in a file called tdg_cycle_log.md:
- What you prompted (exact text)
- What the AI generated (summary, not full code)
- Which tests failed and why
- How you re-prompted (if needed)
- Your post-GREEN review: any hardcoded values? Any missing edge cases?
- The 10th test you added and whether it passed
This documentation is the mastery gate. The cycle log proves you drove the process, not the AI.
Try With AI
If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.
Prompt 1: What Did You Learn From the Failure?
After completing the TDG cycle (whether the AI passed on the first try or required re-prompting), describe your experience to Claude Code:
I ran the TDG cycle on search_notes. Here is what happened:
[describe: which tests passed, which failed, how you fixed it]
What pattern should I watch for next time I run this cycle
on a different function?
Read the AI's response. It may identify a pattern you had not noticed, such as "ordering tests tend to fail first because sorting logic is harder to infer from a docstring than filtering logic."
What you're learning: Reflection turns experience into transferable knowledge. The AI acts as a sounding board that helps you extract a general lesson from a specific experience. You are building a personal playbook for the TDG cycle.
Prompt 2: Predict for an Untested Input
Ask Claude Code to predict the result for an input not in your test suite:
Here is my search_notes implementation.
[paste the generated code]
What does search_notes(sample_notes, keyword=" ") return?
(That is a keyword of three spaces, not empty string.)
Predict the result before I run it.
Run the function with that input and compare to the AI's prediction. If they disagree, one of them is wrong. Figure out which.
What you're learning: You are practicing the trust gap. The AI's prediction is a hypothesis, not a fact. Testing it against actual execution builds your instinct for when to trust generated code and when to verify further.
Prompt 3: Is This Implementation Hardcoded?
Paste the generated implementation and ask:
Review this implementation of search_notes. Is it using a
real algorithm (case-insensitive substring matching, sorting
by match location), or is it hardcoded to pass the specific
test values? How can I tell?
Read the AI's analysis. A real algorithm uses the keyword parameter in comparisons (e.g., keyword.lower() in note.title.lower()). A hardcoded implementation would check for specific strings like "Python" or "pasta".
What you're learning: You are building the skill of code review for AI-generated code. The question is not "does it pass tests?" but "will it work for inputs I have not tested?" This is the core skill of working with AI-generated implementations: trust, but verify.
James leans back and looks at the terminal. Fourteen green dots.
"In the warehouse," he says, "when a shipment arrived, we did not just count the boxes. We opened them. Checked the part numbers against the purchase order. Weighed the critical items. The green bar is the box count. Reading the code and testing untested inputs, that is opening the boxes."
Emma pauses. "Your 'inspect the delivery' framing is cleaner than my usual explanation. I usually say 'review the code' but that sounds passive. 'Inspect' implies you are looking for specific defects. I am going to use that."
She pulls up a blank file. "You drove the cycle once with guidance nearby. Lesson 4 is the capstone: a fresh SmartNotes feature, start to finish, no guidance at all. Just you, the loop, and the tests."