Writing the Complete Test Suite

If you're new to programming

Writing tests BEFORE the code exists feels backwards, but it is the core of TDG. The tests define "correct." If someone asks "what does this feature do?", your test names answer that question. You learned test basics in Ch 46 and Ch 52. This lesson applies those skills to a real feature.

If you've coded before

This is test-first development (TDD) with AI generation. The key difference: you are designing the test suite as a specification document, not just a safety net. Coverage planning (happy path, edge cases, error paths) matters more than individual assertions.

James has his stub file open. The function signature is written, the docstring captures five design decisions, and uv run pyright passes cleanly. He reaches for Claude Code.

Emma stops him. "What does 'correct' mean?"

"The docstring says--"

"The docstring says what the function should do. But how will you know it did it? If you prompt Claude Code right now and it generates an implementation, what will you check it against?"

James pauses. "I'd... read the code and see if it looks right."

"That is how bugs survive," Emma says. "You need a test for every behavior in that docstring. Write the tests first. Every test that fails is a behavior you specified but haven't built yet. When they all pass, the feature is done."

Reading New Code? Use PRIMM-AI+

When you encounter new Python syntax in this chapter, use the PRIMM-AI+ method from Chapter 42: Predict what the code does before running it [AI-FREE]. Rate your confidence (1-5). Run it to check your prediction. Investigate any surprises.

Planning Before Coding

In Lesson 1, you wrote the specification. Now you are doing what James is doing: defining "correct" before any implementation exists. Start with a plan on paper, not in code.

Every function has three categories of behavior:

Category	What it covers	SmartNotes examples
Happy path	Normal inputs that produce normal results	Keyword in title matches, keyword in body matches
Edge cases	Valid but unusual inputs at the boundaries	Empty keyword, empty notes list, keyword + tag combined
Error paths	Inputs that should produce specific "no result" behavior	No matches returns empty list, tag that no note has

Here is the test plan for search_notes:

#	Test name	Category	What it checks
1	`test_keyword_in_title_matches`	Happy path	A note with the keyword in its title is returned
2	`test_keyword_in_body_matches`	Happy path	A note with the keyword only in its body is returned
3	`test_case_insensitive_matching`	Happy path	"python" matches "Python" in a title
4	`test_tag_filter_narrows_results`	Happy path	Only notes with the specified tag are returned
5	`test_keyword_and_tag_combined`	Happy path	Both filters apply together
6	`test_title_matches_ranked_before_body`	Boundary	Title matches appear before body-only matches
7	`test_empty_keyword_returns_all`	Edge case	`""` returns every note (or filtered by tag)
8	`test_empty_notes_list_returns_empty`	Edge case	`[]` input produces `[]` output
9	`test_no_matches_returns_empty_list`	Error path	A keyword that exists nowhere produces `[]`

Nine tests. Each test name is a sentence that describes one behavior. Read the "Test name" column top to bottom: you now know exactly what search_notes does without reading any implementation.

That is the point. The test names ARE the specification.

Setting Up Test Data with Fixtures

Every test needs notes to search through. Instead of creating notes inside each test, write a fixture that builds sample data once. You learned fixtures in Chapter 52; now you use them for a real feature.

Create a file called test_smartnotes_search.py:

"""Tests for search_notes -- written BEFORE implementation."""

import pytest
from smartnotes_search import Note, search_notes


@pytest.fixture
def sample_notes() -> list[Note]:
    """A small collection of notes covering different tags and content."""
    return [
        Note(
            title="Python Tips",
            body="Learn about list comprehensions and generators.",
            word_count=42,
            tags=["python", "beginner"],
        ),
        Note(
            title="Meeting Notes",
            body="Discussed the python migration timeline.",
            word_count=78,
            tags=["work"],
        ),
        Note(
            title="Recipe Ideas",
            body="Try the new pasta recipe from the cookbook.",
            word_count=35,
            tags=["personal", "cooking"],
        ),
        Note(
            title="Debugging Guide",
            body="Step-by-step debugging for Python beginners.",
            word_count=120,
            tags=["python", "beginner"],
        ),
    ]

Output:

# No output -- this is a fixture definition. It runs when a test
# requests it as a parameter.

Four notes. "Python" appears in two titles and one body. Multiple tags overlap. This gives enough variety to test every behavior in the plan.

Notice the fixture returns typed data (list[Note]), not dictionaries. The tests use the same Note dataclass from Lesson 1. If the data model changes, you update one fixture, not nine tests.

Writing Each Test

Each test follows a pattern: arrange (fixture provides the data), act (call search_notes), assert (check the result). Write them one category at a time.

Happy Path Tests

def test_keyword_in_title_matches(sample_notes: list[Note]) -> None:
    results = search_notes(sample_notes, keyword="Python")
    assert any(note.title == "Python Tips" for note in results)


def test_keyword_in_body_matches(sample_notes: list[Note]) -> None:
    results = search_notes(sample_notes, keyword="pasta")
    assert any(note.title == "Recipe Ideas" for note in results)


def test_case_insensitive_matching(sample_notes: list[Note]) -> None:
    upper = search_notes(sample_notes, keyword="PYTHON")
    lower = search_notes(sample_notes, keyword="python")
    assert len(upper) == len(lower)
    assert len(upper) > 0


def test_tag_filter_narrows_results(sample_notes: list[Note]) -> None:
    results = search_notes(sample_notes, keyword="", tag="cooking")
    assert len(results) == 1
    assert results[0].title == "Recipe Ideas"


def test_keyword_and_tag_combined(sample_notes: list[Note]) -> None:
    results = search_notes(sample_notes, keyword="Python", tag="beginner")
    titles = [note.title for note in results]
    assert "Python Tips" in titles
    assert "Meeting Notes" not in titles  # has "python" in body but wrong tag

Output:

# No output yet -- we have not run pytest. These are test definitions.

Five happy path tests. Each one checks a single behavior from the docstring. The test names tell you what they verify without reading the body.

Boundary Test

def test_title_matches_ranked_before_body(sample_notes: list[Note]) -> None:
    results = search_notes(sample_notes, keyword="python")
    # "Python Tips" and "Debugging Guide" have "python"/"Python" in title
    # "Meeting Notes" has "python" only in body
    # Title matches must come first
    title_match_titles = {"Python Tips", "Debugging Guide"}
    body_only_titles = {"Meeting Notes"}

    first_group = [r.title for r in results if r.title in title_match_titles]
    second_group = [r.title for r in results if r.title in body_only_titles]

    # All title matches appear before any body-only match
    if first_group and second_group:
        last_title_match_idx = max(
            i for i, r in enumerate(results) if r.title in title_match_titles
        )
        first_body_match_idx = min(
            i for i, r in enumerate(results) if r.title in body_only_titles
        )
        assert last_title_match_idx < first_body_match_idx

Output:

# No output -- test definition only.

This test is longer because ordering is harder to assert than presence. It finds the last title-match position and the first body-only-match position, then checks that title matches come first. The complexity is appropriate: ordering is the trickiest part of the specification.

Edge Case and Error Path Tests

def test_empty_keyword_returns_all(sample_notes: list[Note]) -> None:
    results = search_notes(sample_notes, keyword="")
    assert len(results) == len(sample_notes)


def test_empty_notes_list_returns_empty() -> None:
    results = search_notes([], keyword="anything")
    assert results == []


def test_no_matches_returns_empty_list(sample_notes: list[Note]) -> None:
    results = search_notes(sample_notes, keyword="xylophone")
    assert results == []

Output:

# No output -- test definition only.

Three short tests. Edge cases and error paths are often the simplest to write but the easiest to forget. The plan on paper caught them before you started coding.

Using Parametrize for Multiple Keywords

Chapter 52 introduced @pytest.mark.parametrize. Here it saves you from writing five separate tests for different keyword matches:

@pytest.mark.parametrize(
    "keyword, expected_title",
    [
        ("Python", "Python Tips"),
        ("pasta", "Recipe Ideas"),
        ("debugging", "Debugging Guide"),
        ("migration", "Meeting Notes"),
        ("comprehensions", "Python Tips"),
    ],
)
def test_various_keywords_find_correct_notes(
    sample_notes: list[Note], keyword: str, expected_title: str
) -> None:
    results = search_notes(sample_notes, keyword=keyword)
    assert any(note.title == expected_title for note in results)

Output:

# No output -- parametrize expands this into 5 separate test cases
# at runtime. Each runs independently and reports pass/fail separately.

One function, five test cases. If you add a sixth keyword later, you add one tuple to the list instead of writing another function. Parametrize is the right tool when you are testing the same behavior with different inputs.

Seeing RED

Save the file and run:

uv run pytest test_smartnotes_search.py -v

Output:

test_smartnotes_search.py::test_keyword_in_title_matches FAILED
test_smartnotes_search.py::test_keyword_in_body_matches FAILED
test_smartnotes_search.py::test_case_insensitive_matching FAILED
test_smartnotes_search.py::test_tag_filter_narrows_results FAILED
test_smartnotes_search.py::test_keyword_and_tag_combined FAILED
test_smartnotes_search.py::test_title_matches_ranked_before_body FAILED
test_smartnotes_search.py::test_empty_keyword_returns_all FAILED
test_smartnotes_search.py::test_empty_notes_list_returns_empty FAILED
test_smartnotes_search.py::test_no_matches_returns_empty_list FAILED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[Python-Python Tips] FAILED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[pasta-Recipe Ideas] FAILED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[debugging-Debugging Guide] FAILED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[migration-Meeting Notes] FAILED
test_smartnotes_search.py::test_various_keywords_find_correct_notes[comprehensions-Python Tips] FAILED

14 failed in 0.03s

Every test fails. Every single one. The error messages all say the same thing: the function returned None because the body is ... (ellipsis).

This is the correct state. RED means the specification is complete but the implementation does not exist yet. Each failing test is a behavior you defined that has no code behind it. When you eventually generate the implementation, you will run this same command and watch the failures turn to passes, one by one.

If any test had passed, something would be wrong: either the test is not actually testing the stub, or the stub accidentally does something it should not.

PRIMM-AI+ Practice: Plan a Test Suite

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode.

Here is a different function stub:

def count_words_by_author(notes: list[Note]) -> dict[str, int]:
    """Return total word count per author, excluding drafts.

    - Skips notes where is_draft is True
    - Authors with zero non-draft notes do not appear in the result
    - Returns an empty dict if notes is empty or all notes are drafts
    """
    ...

Without writing code, predict:

How many tests does this function need?
List each test by name and category (happy path, edge case, error path)
What fixture data would you need? (How many notes, which authors, which are drafts?)

Write your predictions on paper or in Plan Mode. Rate your confidence from 1 to 5.

Reference test plan

#	Test name	Category
1	`test_counts_words_for_single_author`	Happy path
2	`test_counts_words_across_multiple_authors`	Happy path
3	`test_excludes_draft_notes`	Happy path
4	`test_author_with_only_drafts_not_in_result`	Edge case
5	`test_empty_notes_returns_empty_dict`	Edge case
6	`test_all_drafts_returns_empty_dict`	Edge case
7	`test_single_note_single_author`	Boundary

Seven tests. The fixture needs at least 4 notes: two by the same author (one draft, one published), one by a different author, and one draft by a third author. This covers counting, filtering, and the "author disappears" edge case.

Run

Press Shift+Tab to exit Plan Mode.

Compare your predictions to the reference plan. Did you catch the "author with only drafts" edge case? That one is easy to miss because it requires thinking about what should not appear in the output, not just what should.

Investigate

In Claude Code, ask:

/investigate @test_count_words.py
I have this function stub for count_words_by_author. I planned
7 tests. Are there edge cases I might be missing? Here is my
test plan:

[paste your test names]

Compare the AI's suggestions to your plan. If it identifies a case you missed, decide whether it represents a real behavior or an unlikely scenario not worth testing.

Modify

Add one more test to the search_notes suite from this lesson. Test this edge case: a keyword with leading and trailing spaces (e.g., " python "). Should the function strip whitespace before matching, or treat it literally?

This is a design decision the docstring did not address. Write the test first, then update the docstring to match your decision.

Make [Mastery Gate]

Write a complete test suite for this new function stub:

def get_most_tagged_notes(
    notes: list[Note], min_tags: int = 2
) -> list[Note]:
    """Return notes that have at least min_tags tags, sorted by tag count descending.

    - Notes with fewer than min_tags tags are excluded
    - Ties in tag count are broken by title (alphabetical, ascending)
    - Returns an empty list if no notes meet the threshold
    - Returns an empty list if notes is empty
    """
    ...

Requirements:

Write at least 6 tests covering happy path, edge cases, and boundaries
Use a fixture for sample data
Use @pytest.mark.parametrize for at least one test
Run uv run pytest -v and confirm all tests are RED (all fail)

Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Review My Test Plan

Write your test plan for the Mastery Gate function (get_most_tagged_notes), then paste it into Claude Code:

Review this test plan for get_most_tagged_notes. Does it cover
all the behaviors in the docstring? Are there gaps?

[paste your test names and categories]

Read the AI's feedback. If it identifies a missing edge case, add it to your plan before writing the test code.

What you're learning: Test planning is a design skill. The AI acts as a reviewer who spots gaps in your coverage. You are practicing the same review process that teams use in code review, but applied to the test plan before any code is written.

Prompt 2: Generate Edge Cases

Here is a function stub. Suggest 5 edge cases I should test
that are NOT in the docstring. For each one, explain why it
matters.

def get_most_tagged_notes(
    notes: list[Note], min_tags: int = 2
) -> list[Note]:
    """Return notes that have at least min_tags tags,
    sorted by tag count descending.

    - Notes with fewer than min_tags tags are excluded
    - Ties broken by title (alphabetical, ascending)
    - Returns empty list if no notes meet threshold
    - Returns empty list if notes is empty
    """
    ...

The AI will likely suggest cases like min_tags=0, duplicate tags in a note, or min_tags larger than any note's tag count. For each suggestion, decide: is this a real behavior that needs a test, or is it outside the function's responsibility?

What you're learning: You are building judgment about test boundaries. Not every edge case needs a test. The skill is distinguishing "this could actually happen and break things" from "this is theoretically possible but not worth the maintenance cost." The AI generates candidates; you make the decision.

Prompt 3: Test Names as Documentation

After writing your test suite, ask Claude Code:

Read only the test function names in this file. Based on
those names alone, describe what the get_most_tagged_notes
function does. Do not read the test bodies or the function
stub.

[paste only def test_... lines]

If the AI's description matches the docstring, your test names are good documentation. If it misunderstands a behavior, rename the test to be clearer.

What you're learning: Test names are documentation that executes. If a reader (human or AI) can understand the feature from the test names alone, your specification is complete. This is the test-as-specification principle: the tests define "correct" so precisely that no additional explanation is needed.

James looks at the terminal. Fourteen failures, all in red.

"In the warehouse," he says, "we had acceptance criteria on every purchase order. 'Delivered correctly' meant: right quantity, right part number, right packaging, no damage. The supplier did not ship until we agreed on those criteria. These tests are the same thing. I am writing what 'delivered correctly' means before the AI ships anything."

Emma nods. "How many tests did you write?"

"Nine, plus five parametrized variations. Fourteen total."

"I am not sure that is always the right number," Emma says. "For a function like this, with optional parameters and ordering rules, I usually write 8 to 10 tests. Some developers write 20. The right number depends on how many distinct behaviors the function has. I have gotten it wrong both ways: too few tests and a bug slipped through, too many tests and refactoring took twice as long because every change broke a dozen assertions."

"So how do you decide?"

"One test per behavior in the docstring. If a bullet point describes a behavior, it gets a test. If a parameter has a default value, test both with and without it. After that, stop. You can always add more later when a bug reveals a gap."

She closes her laptop. "The specification is complete: stubs with types, docstrings with edge cases, tests that define 'correct.' Now you prompt AI. Generate, verify, debug. No step-by-step instructions. Just the loop."

Planning Before Coding​

Setting Up Test Data with Fixtures​

Writing Each Test​

Happy Path Tests​

Boundary Test​

Edge Case and Error Path Tests​

Using Parametrize for Multiple Keywords​

Seeing RED​

PRIMM-AI+ Practice: Plan a Test Suite​

Predict [AI-FREE]​

Run​

Investigate​

Modify​

Make [Mastery Gate]​

Try With AI​

Prompt 1: Review My Test Plan​

Prompt 2: Generate Edge Cases​

Prompt 3: Test Names as Documentation​

Planning Before Coding

Setting Up Test Data with Fixtures

Writing Each Test

Happy Path Tests

Boundary Test

Edge Case and Error Path Tests

Using Parametrize for Multiple Keywords

Seeing RED

PRIMM-AI+ Practice: Plan a Test Suite

Predict [AI-FREE]

Run

Investigate

Modify

Make [Mastery Gate]

Try With AI

Prompt 1: Review My Test Plan

Prompt 2: Generate Edge Cases

Prompt 3: Test Names as Documentation