Skip to main content

Testing Dataclass-Based Code

James looks at his SmartNotes project. Every function passes notes around as dict[str, str]. Every function is one typo away from a KeyError. He knows the fix now: replace the dicts with a Note dataclass.

"Where do I start?" he asks.

Emma pulls up the test file. "Start with the data model. Define the Note class. Then change one function at a time. After each change, run the tests. If they pass, move on. If they fail, you know exactly which function broke."

James nods. "Tests as my safety net."

"Tests as your specification," Emma corrects. "The tests describe what each function should do. The dataclass describes what a note looks like. Together, they leave almost nothing to guesswork."

If you're new to programming

This lesson ties together everything from Chapter 51. You will define a Note dataclass, rewrite SmartNotes functions to use it, and write tests that create Note instances. By the end, your SmartNotes project will be structured, type-checked, and tested.


The Note Dataclass

Here is the complete Note dataclass for SmartNotes. It replaces every dict[str, str] in your project:

from dataclasses import dataclass, field

@dataclass
class Note:
title: str
body: str
word_count: int
author: str = "Anonymous"
is_draft: bool = True
tags: list[str] = field(default_factory=list)

Compare this to the dict approach from Lesson 1:

Dict approachDataclass approach
note["title"]note.title
note["body"]note.body
No defined shapeShape defined by class fields
Pyright: 0 warnings on typosPyright: error on every typo
No autocomplete on keysFull autocomplete on fields

Every field has a name and a type. Three fields have defaults. The tags field uses field(default_factory=list) from Lesson 4. This class is the single source of truth for what a note contains.


Transforming SmartNotes Functions

create_note

Before (dict-based):

def create_note(title: str, body: str, author: str = "Anonymous") -> dict[str, str]:
return {
"title": title,
"body": body,
"author": author,
"word_count": str(len(body.split())),
}

After (dataclass-based):

def create_note(title: str, body: str, author: str = "Anonymous") -> Note:
"""Create a new Note with an automatically calculated word count."""
word_count: int = len(body.split())
return Note(
title=title,
body=body,
word_count=word_count,
author=author,
)

Notice two improvements. The return type changed from dict[str, str] to Note. The word_count is now an int instead of a string. With dicts, everything had to be str because the type was dict[str, str]. With a dataclass, each field has its own type.

note_word_count

Before:

def note_word_count(note: dict[str, str]) -> int:
return int(note["word_count"])

After:

def note_word_count(note: Note) -> int:
"""Return the word count of a note."""
return note.word_count

The function is simpler because note.word_count is already an int. No string conversion needed.

format_note_header

Before:

def format_note_header(note: dict[str, str]) -> str:
return f"{note['title']} by {note['author']}"

After:

def format_note_header(note: Note) -> str:
"""Format a note's title and author as a header string."""
return f"{note.title} by {note.author}"

Same logic, but note.title and note.author are checked by pyright and autocompleted by your editor.

merge_notes

Before:

def merge_notes(base: dict[str, str], override: dict[str, str]) -> dict[str, str]:
merged: dict[str, str] = {}
for key in base:
merged[key] = base[key]
for key in override:
merged[key] = override[key]
return merged

After:

def merge_notes(base: Note, override: Note) -> Note:
"""Merge two notes, using override's title and body with combined tags."""
combined_tags: list[str] = base.tags + override.tags
merged_word_count: int = len(override.body.split())
return Note(
title=override.title,
body=override.body,
word_count=merged_word_count,
author=base.author,
tags=combined_tags,
)

The dataclass version is explicit about what "merge" means: take the override's title and body, keep the base's author, and combine both tag lists. The dict version blindly copied keys, which could introduce unexpected fields.


Writing Tests with Dataclass Instances

Tests become clearer when they create and compare dataclass instances. Here is the full test file:

import pytest
from dataclasses import dataclass, field

@dataclass
class Note:
title: str
body: str
word_count: int
author: str = "Anonymous"
is_draft: bool = True
tags: list[str] = field(default_factory=list)


def create_note(title: str, body: str, author: str = "Anonymous") -> Note:
"""Create a new Note with an automatically calculated word count."""
word_count: int = len(body.split())
return Note(
title=title,
body=body,
word_count=word_count,
author=author,
)


def note_word_count(note: Note) -> int:
"""Return the word count of a note."""
return note.word_count


def format_note_header(note: Note) -> str:
"""Format a note's title and author as a header string."""
return f"{note.title} by {note.author}"


def merge_notes(base: Note, override: Note) -> Note:
"""Merge two notes, using override's title and body with combined tags."""
combined_tags: list[str] = base.tags + override.tags
merged_word_count: int = len(override.body.split())
return Note(
title=override.title,
body=override.body,
word_count=merged_word_count,
author=base.author,
tags=combined_tags,
)


# --- Tests ---

def test_create_note_defaults() -> None:
note: Note = create_note(title="Test", body="Hello world")
assert note.title == "Test"
assert note.body == "Hello world"
assert note.word_count == 2
assert note.author == "Anonymous"
assert note.is_draft is True
assert note.tags == []


def test_create_note_custom_author() -> None:
note: Note = create_note(title="Meeting", body="Discussed items", author="James")
assert note.author == "James"


def test_note_word_count() -> None:
note: Note = create_note(title="Long Note", body="one two three four five")
assert note_word_count(note) == 5


def test_format_note_header() -> None:
note: Note = create_note(title="Weekly Review", body="Summary of the week")
result: str = format_note_header(note)
assert result == "Weekly Review by Anonymous"


def test_format_note_header_custom_author() -> None:
note: Note = create_note(title="Plan", body="Ship it", author="Emma")
result: str = format_note_header(note)
assert result == "Plan by Emma"


def test_merge_notes_uses_override_content() -> None:
base: Note = create_note(title="Old", body="Original content", author="James")
override: Note = create_note(title="New", body="Updated content", author="Emma")
result: Note = merge_notes(base, override)
assert result.title == "New"
assert result.body == "Updated content"


def test_merge_notes_keeps_base_author() -> None:
base: Note = create_note(title="Old", body="Original", author="James")
override: Note = create_note(title="New", body="Updated")
result: Note = merge_notes(base, override)
assert result.author == "James"


def test_merge_notes_combines_tags() -> None:
base: Note = Note(
title="A", body="First", word_count=1, tags=["python"]
)
override: Note = Note(
title="B", body="Second", word_count=1, tags=["ai"]
)
result: Note = merge_notes(base, override)
assert result.tags == ["python", "ai"]


def test_note_equality() -> None:
note_a: Note = Note(title="Same", body="Content", word_count=1)
note_b: Note = Note(title="Same", body="Content", word_count=1)
assert note_a == note_b


def test_note_inequality() -> None:
note_a: Note = Note(title="First", body="Content", word_count=1)
note_b: Note = Note(title="Second", body="Content", word_count=1)
assert note_a != note_b

Save this as test_smartnotes_dataclass.py and run:

uv run pytest test_smartnotes_dataclass.py -v

All 10 tests should pass. Each test creates Note instances directly, calls a function, and asserts the result. No dictionary keys to misspell. No KeyError surprises.


What Dataclasses Do Not Catch

A gap you should know about

Try creating a note with the wrong types:

bad_note: Note = Note(title=999, body=True, word_count="five")

This runs without error. Python creates the instance and stores 999 as the title, True as the body, and "five" as the word count. The type annotations in a dataclass are hints for pyright and your editor, but Python does not enforce them at runtime.

Pyright will flag this code with type errors, which is good. But if the data comes from an external source (a JSON file, an API response, user input), pyright cannot check it because the types are only known at runtime.

Phase 3b introduces Pydantic, which validates types at runtime. Pydantic raises an error if you pass 999 where a str is expected. For now, dataclasses give you editor support and static checking. Pydantic adds the runtime safety layer on top.


PRIMM-AI+ Practice: Full Transformation

Predict [AI-FREE]

Read the test_merge_notes_combines_tags test above. Without running it, predict what result.tags will contain and what result.author will be. Rate your confidence from 1 to 5.

Check your prediction
  • result.tags is ["python", "ai"] because merge_notes concatenates base.tags + override.tags.
  • result.author is "Anonymous" because the base note was created with Note(title="A", body="First", word_count=1, tags=["python"]), which uses the default author.

Run

Save the complete test file above as test_smartnotes_dataclass.py. Run uv run pytest test_smartnotes_dataclass.py -v and confirm all 10 tests pass.

Investigate

Introduce a typo in one of the functions. Change note.title to note.titel in format_note_header. Run uv run pyright test_smartnotes_dataclass.py first. Does pyright catch it? Then run uv run pytest. What error do you get? Compare this to Lesson 1, where pyright was silent about the same kind of typo in dict-based code.

Modify

Add a new function called is_long_note(note: Note) -> bool that returns True if the word count is greater than 100 and False otherwise. Write two tests: one for a long note and one for a short note. Run pytest to verify both pass.

Make [Mastery Gate]

Without looking at any examples, write a complete file that:

  1. Defines the Note dataclass with all six fields
  2. Writes a function summarize_note(note: Note) -> str that returns "{title} ({word_count} words)"
  3. Writes two tests for summarize_note: one with the default author and one with a custom author (the author does not appear in the summary, but the function should still work correctly regardless of author)

Run uv run pytest to verify both tests pass. Run uv run pyright to verify zero type errors.


Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Review the Transformation

Here is my Note dataclass and create_note function:

@dataclass
class Note:
title: str
body: str
word_count: int
author: str = "Anonymous"
is_draft: bool = True
tags: list[str] = field(default_factory=list)

def create_note(title: str, body: str, author: str = "Anonymous") -> Note:
word_count: int = len(body.split())
return Note(title=title, body=body, word_count=word_count, author=author)

Is there anything I should improve? Are there any edge cases
I am not handling?

Read the AI's suggestions. Does it mention empty strings, very long bodies, or other edge cases? Evaluate each suggestion: is it worth adding now, or is it premature complexity?

What you're learning: You are using the AI as a code reviewer, then applying your own judgment about which suggestions to accept. This is the core skill of working with AI assistants.

Prompt 2: Generate Additional Tests

Given the Note dataclass and create_note function above,
generate three additional test functions that cover edge
cases I might have missed. Each test should use type
annotations and clear assert statements.

Review the AI's tests. Do they test meaningful scenarios? Do the assertions check the right values? Run them alongside your existing tests to verify they pass.

What you're learning: You are evaluating AI-generated tests for quality and completeness, a skill you will use in every project.


Key Takeaways

  1. Dataclasses replace dicts as the data model for SmartNotes. Every function that accepted dict[str, str] now accepts Note, gaining type safety, autocomplete, and pyright checking.

  2. The transformation is mechanical. Change the parameter type, replace bracket notation with dot notation, and update the return type. The logic stays the same.

  3. Tests with dataclass instances are clearer. Creating Note(title="...", body="...") in a test is more readable than building a dictionary, and equality comparison with == checks every field automatically.

  4. Dataclasses do not validate types at runtime. Note(title=999) creates an instance without error. Pyright catches this statically, but runtime validation requires Pydantic (Phase 3b).

  5. The dict-to-dataclass transformation is a pattern you will repeat. Most Python projects start with dicts and graduate to dataclasses (or Pydantic models) as the code grows. Recognizing when to make this shift is a professional skill.


Looking Ahead

You have transformed SmartNotes from fragile dictionaries to structured dataclasses. Your code is now type-checked, autocompleted, and tested. Chapter 52 takes your testing to the next level: fixtures for reusable setup, parametrize for testing many cases at once, and coverage measurement to prove your test suite is complete.