Test Structure: Arrange, Act, Assert

James scrolls through his SmartNotes test file. Fifteen test functions, and every single one looks different. Some cram everything into one line. Others sprawl across eight lines with no clear separation between setup, action, and checking. He can run them all and they pass, but when one fails, he stares at the function trying to figure out which part broke.

"You've been doing something right without knowing it," Emma says. She highlights three tests James wrote in Chapter 51. "Look at this one. You create a Note. You call a function. You check the result. Three phases, every time. That pattern has a name."

If you're new to programming

A test checks that your code does what you expect. You have been writing tests since Chapter 46. This lesson does not introduce tests; it names a structure you have already been using. Giving the structure a name makes it easier to spot when a test is missing a piece.

If you have testing experience from another language

The Arrange-Act-Assert (AAA) pattern is sometimes called Given-When-Then in BDD frameworks. Python's pytest uses plain functions rather than class hierarchies. If you are coming from JUnit or NUnit, the pattern is identical; the ceremony is lighter.

The Three Phases

Every well-structured test has three phases:

Arrange: Set up the data and objects the test needs.
Act: Call the function or method you are testing.
Assert: Check that the result matches your expectation.

Here is a test you might have written in Chapter 51, with the phases labeled:

from smartnotes.models import Note


def test_note_stores_title() -> None:
    # Arrange
    note: Note = Note(title="Meeting Notes", body="Discuss roadmap", word_count=2)

    # Act
    result: str = note.title

    # Assert
    assert result == "Meeting Notes"

The comments are optional. What matters is that you can point at any line and say which phase it belongs to. If you cannot, the test needs restructuring.

Why Separation Matters

Consider this flat test with no separation:

def test_note_stuff() -> None:
    assert Note(title="A", body="B", word_count=1).title == "A"

It works. It passes. But when it fails, pytest tells you the assertion on line 2 failed. You cannot tell whether the problem is in the Note constructor (Arrange), the attribute access (Act), or the comparison (Assert). Splitting the phases gives you a clear diagnostic target.

Compare:

def test_note_stores_title() -> None:
    # Arrange
    note: Note = Note(title="A", body="B", word_count=1)

    # Act
    result: str = note.title

    # Assert
    assert result == "A"

If this fails, you know the Note was constructed (Arrange succeeded), the attribute was accessed (Act succeeded or failed), and the comparison found a mismatch (Assert reports the difference). Three phases, three diagnostic checkpoints.

Test Naming Convention

A good test name tells you what failed without reading the code. The convention:

test_<what>_<condition>_<expected>

Name	What it tells you
`test_categorize_note_high_word_count_returns_long`	Testing `categorize_note` when word count is high; expects `"long"`
`test_note_empty_body_stores_empty_string`	Testing `Note` with empty body; expects empty string stored
`test_count_words_single_word_returns_one`	Testing `count_words` with one word; expects `1`

When pytest reports a failure, you see the function name in the output. A name like test_stuff_works tells you nothing. A name like test_categorize_note_zero_words_returns_short tells you exactly what broke and under what condition.

Test Discovery: How pytest Finds Your Tests

pytest does not need a list of test functions. It discovers them automatically using two rules:

File names: pytest looks for files named test_*.py or *_test.py.
Function names: Inside those files, pytest runs every function whose name starts with test_.

smartnotes/
├── models.py
├── categorize.py
tests/
├── test_models.py        # pytest finds this file
├── test_categorize.py    # pytest finds this file
├── helpers.py            # pytest ignores this (no test_ prefix)

Inside test_models.py:

def test_note_stores_title() -> None:   # pytest runs this
    ...

def test_note_stores_body() -> None:    # pytest runs this
    ...

def helper_create_note() -> Note:       # pytest ignores this (no test_ prefix)
    ...

You do not register tests anywhere. You do not import them into a runner. Name the file and the function correctly, and pytest handles the rest.

Assertion Introspection: Smart Diffs

When a plain assert fails, pytest does not just say "assertion failed." It shows you the actual values on both sides.

In Chapter 50, you wrote a function called categorize_note. From this chapter onward, we rename it to categorize_by_count. The new name is more precise: the function takes a word count (an int), not a Note. Descriptive names are a theme of this chapter, and they apply to functions as much as to tests.

def test_categorize_by_count_medium() -> None:
    # Arrange
    word_count: int = 500

    # Act
    result: str = categorize_by_count(word_count)

    # Assert
    assert result == "medium"

If categorize_by_count(500) returns "long" instead of "medium", pytest shows:

AssertionError: assert 'long' == 'medium'

For longer strings or lists, pytest highlights exactly which characters or elements differ. This is called assertion introspection: pytest inspects the assert statement at runtime and extracts the left and right values to show you a meaningful diff. You do not need special assertion methods like assertEqual. A plain assert gives you full diagnostic information.

Test Isolation: Each Test Runs Independently

Every test function runs in its own world. If test_a creates a Note and modifies it, test_b starts fresh. No test depends on another test running first. No test can corrupt another test's data.

def test_first() -> None:
    note: Note = Note(title="Alpha", body="First", word_count=1)
    assert note.title == "Alpha"


def test_second() -> None:
    # This test does NOT see the note from test_first.
    # It starts with a clean slate.
    note: Note = Note(title="Beta", body="Second", word_count=1)
    assert note.title == "Beta"

This means you can run tests in any order and get the same results. You can run a single test with uv run pytest test_models.py::test_second and it will pass regardless of whether test_first ran before it. Isolation is what makes large test suites reliable.

Refactoring a Flat Test into AAA

Here is a test from a Chapter 51 TDG cycle, written quickly:

def test_cat() -> None:
    assert categorize_by_count(1500) == "long"

This works, but it violates two principles: the name is vague, and the three phases are compressed into one line. Refactored:

def test_categorize_by_count_above_thousand_returns_long() -> None:
    # Arrange
    word_count: int = 1500

    # Act
    result: str = categorize_by_count(word_count)

    # Assert
    assert result == "long"

The refactored version is longer but tells you exactly what is being tested, under what conditions, and what the expected outcome is. When this test fails six months from now, the name alone tells you where to look.

PRIMM-AI+ Practice: Structuring Tests

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode before predicting.

Look at this test without running it. Identify which lines belong to Arrange, Act, and Assert. Write your answers and a confidence score from 1 to 5 before checking.

def test_count_tags_multiple_tags_returns_correct_count() -> None:
    note: Note = Note(title="Ideas", body="Some text", word_count=2)
    tags: list[str] = ["python", "testing", "pytest"]
    result: int = len(tags)
    assert result == 3
    assert note.title == "Ideas"

Questions:

Which lines are Arrange?
Which line is Act?
Which lines are Assert?
Is there a problem with this test's structure?

Check your predictions

Lines 2-3 are Arrange: creating the Note and the tags list.

Line 4 is Act: calling len(tags).

Lines 5-6 are Assert: checking the count and the title.

The structural problem: The second assert (note.title == "Ideas") tests something unrelated to the function being tested (len). This test is checking two different things. A well-structured test checks one behavior. Split this into two tests: one for tag counting, one for title storage.

Run

Press Shift+Tab to exit Plan Mode.

Create a file called test_aaa_practice.py. Write these three tests using the AAA pattern:

test_note_stores_body: Create a Note, access .body, assert it matches.
test_categorize_by_count_low_returns_short: Arrange a word count of 50, call categorize_by_count, assert "short".
test_note_word_count_matches_input: Create a Note with word_count=42, access .word_count, assert it equals 42.

Run uv run pytest test_aaa_practice.py -v and confirm all three pass. The -v flag shows each test name, so you can verify your naming convention.

Investigate

Look at the verbose pytest output. For each test, confirm you can identify which phase failed if you changed the expected value. Try changing one assert to a wrong value and observe the assertion introspection output.

If you want to go deeper, run /investigate @test_aaa_practice.py in Claude Code and ask how pytest's assertion introspection works.

Modify

Rename one of your tests to test_thing(). Run pytest again. Notice how the output becomes less useful when the name is vague. Rename it back to the descriptive version.

Make [Mastery Gate]

Without looking at any examples, write a test file called test_aaa_mastery.py with four tests:

A test for Note title storage (AAA structure, descriptive name)
A test for Note body storage (AAA structure, descriptive name)
A test for categorize_by_count returning "medium" (AAA structure, descriptive name)
A test for categorize_by_count returning "long" (AAA structure, descriptive name)

All four must have clear Arrange, Act, Assert separation. All four must follow the test_<what>_<condition>_<expected> naming convention. Run uv run pytest test_aaa_mastery.py -v and confirm all pass.

Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Identify the Pattern

Here is one of my test functions:

def test_cat() -> None:
    assert categorize_by_count(1500) == "long"

Refactor it into the Arrange-Act-Assert pattern with a
descriptive test name following the convention
test_<what>_<condition>_<expected>. Keep type annotations
on all variables.

Compare the AI's refactored version to the pattern taught in this lesson. Does it separate the three phases? Does the name tell you what the test checks?

What you're learning: You are evaluating whether AI-generated test code follows a structural pattern you now understand.

Prompt 2: Generate and Review

Write 3 test functions for a function called count_words(text: str) -> int
that counts words in a string. Use the AAA pattern. Name each test
using the test_<what>_<condition>_<expected> convention. Type-annotate
all variables.

Read each test the AI produces. Check: are the three phases clearly separated? Are the names descriptive? Is every variable type-annotated? If anything is off, tell the AI what to fix.

What you're learning: You are applying the AAA pattern as a review checklist for AI-generated tests.

Prompt 3: Refactor a Flat Test

Here is a flat test with no structure:

def test_it() -> None:
    assert Note(title="X", body="Y", word_count=1).word_count == 1

Refactor it into the Arrange-Act-Assert pattern with a descriptive
name following test_<what>_<condition>_<expected>. Then explain
which phase would be the diagnostic target if this test failed.

What you're learning: You are practicing the judgment of when a flat test is "good enough" versus when the AAA structure adds real diagnostic value.

James renames test_cat to test_categorize_by_count_above_thousand_returns_long and runs pytest again. The verbose output reads like a checklist. "It is like standard operating procedures in a warehouse. Every SOP has three sections: set up the station, run the process, inspect the output. Arrange, Act, Assert."

"That is exactly right," Emma says. "And when an inspection fails, the SOP name tells the floor manager which station to check. test_cat sends them looking everywhere. test_categorize_by_count_above_thousand_returns_long sends them straight to the threshold logic."

James scrolls through his test file. "But I am writing the same three setup lines in twelve different tests. Same Note, same fields, same values."

Emma pauses. "I used to copy-paste setup code too. It worked until I changed the Note constructor and had to update every single test. There is a better way -- @pytest.fixture writes the setup once and shares it everywhere. That is Lesson 2."

The Three Phases​

Why Separation Matters​

Test Naming Convention​

Test Discovery: How pytest Finds Your Tests​

Assertion Introspection: Smart Diffs​

Test Isolation: Each Test Runs Independently​

Refactoring a Flat Test into AAA​

PRIMM-AI+ Practice: Structuring Tests​

Predict [AI-FREE]​

Run​

Investigate​

Modify​

Make [Mastery Gate]​

Try With AI​

Prompt 1: Identify the Pattern​

Prompt 2: Generate and Review​

Prompt 3: Refactor a Flat Test​