Skip to main content

Chapter 51: Data Models with Dataclasses

James is wiring categorize_note() from Chapter 50 into his SmartNotes project. The function works: it takes a word count and returns "short", "medium", or "long". Now he wants to call it on an actual note. He types note["titel"] to grab the title and count its words. The code looks fine. Pyright says nothing. He runs it.

KeyError: 'titel'

Runtime crash. A single mistyped dictionary key, invisible to every tool in his stack until the program actually executed that line. He stares at the traceback and fixes the typo: note["title"]. But the unease lingers. How many other dictionary keys across his codebase are one letter away from a crash? How many will only surface when a user happens to trigger that code path three weeks from now?

Emma lets him sit with the frustration for a moment. Then she asks: "How many times has a typo in a key name crashed your code?" James thinks back. The KeyError in Chapter 48 when he misspelled a tag. The silent bug in Chapter 50 where a missing key returned None instead of failing. Every time, the same pattern: dictionaries let you put anything in and pull anything out, and pyright cannot check what does not exist in the type system.

"What if," Emma says, "you could define a Note type where pyright knows every field name, every field type, and catches note.titel the moment you type it?" She opens a new file and writes four lines:

from dataclasses import dataclass

@dataclass
class Note:
title: str
body: str
tags: list[str]

James tries note.titel in the editor. A red underline appears instantly. Pyright catches the typo before the code ever runs. No KeyError. No runtime crash. No three-week-old bug hiding in production.

This chapter replaces dict[str, str] with structured @dataclass types. You will learn what the @ symbol means (it is a decorator, and you have already seen one: @pytest.fixture). You will learn what the class keyword does (it creates a new type). You will learn the difference between a class (the blueprint) and an instance (one specific note built from that blueprint). And you will convert your Phase 2 SmartNotes functions so that pyright can verify every field access at the moment you type it, not the moment a user triggers a crash.

Why Dictionaries Break Down

In Chapter 48, you learned to type dictionaries as dict[str, str]. That annotation tells pyright the keys are strings and the values are strings. But it cannot tell pyright WHICH keys exist. The type dict[str, str] is equally happy with {"title": "My Note"} and {"txxle": "garbage"}. Both are valid dict[str, str] values. Pyright has no way to distinguish them.

This limitation compounds as your codebase grows. Every function that accepts a note must document which keys it expects. Every function that returns a note must document which keys it provides. The documentation lives in docstrings and comments that pyright cannot read. When someone renames a key in one function and forgets to update another, the only thing that catches the mismatch is a KeyError at runtime.

Dataclasses solve this by moving the field definitions into the type system itself. A Note with title, body, and tags is a different type from a Note with heading and content. Pyright knows the difference and enforces it everywhere the type is used.

What You Will Learn

By the end of this chapter, you will be able to:

  • Explain why dict[str, str] cannot catch key-name typos and why structured types can
  • Recognize the @ decorator syntax and connect it to @pytest.fixture from Chapter 46
  • Use the class keyword with @dataclass to define new types with typed fields
  • Distinguish between a class (the blueprint) and an instance (a specific object)
  • Set default values, use field(default_factory=list) for mutable defaults, and create frozen dataclasses
  • Test functions that accept and return dataclass instances using TDG

Chapter Lessons

LessonTitleWhat You DoDuration
1Why Dicts Break: The Pain ChapterExperience dict fragility firsthand, see KeyError and silent bugs20 min
2Decorators: What the @ Symbol MeansUnderstand @, revisit @pytest.fixture, see how decorators modify behavior20 min
3The class Keyword and @dataclassCreate types, instantiate objects, access fields, understand class vs instance and self25 min
4Fields, Defaults, and ImmutabilityUse default values, field(default_factory=list), frozen dataclasses20 min
5Testing Dataclass-Based CodeTest with Note objects, SmartNotes transformation TDG20 min
6Chapter 51 Quiz50 scenario-based questions covering all dataclass concepts25 min

PRIMM-AI+ in This Chapter

Every lesson includes a PRIMM-AI+ Practice section following the five-stage cycle from Chapter 42. This is Phase 3: you are now DEFINING structured types, building on the control flow (Chapter 50) and function signatures (Chapter 49) you already own.

StageWhat You DoWhat It Builds
Predict [AI-FREE]Predict what pyright flags or what a dataclass instantiation produces, with a confidence score (1-5)Calibrates your intuition for structured types
RunExecute the code or run pytest, compare to your predictionCreates the feedback loop
InvestigateWrite a trace artifact explaining why pyright caught (or missed) an error, or why a field has a certain valueMakes your type-system reasoning visible
ModifyChange a field type, add a default, or freeze the dataclass and predict the new behaviorTests whether your understanding transfers
Make [Mastery Gate]Define a @dataclass from scratch with typed fields, defaults, and tests for every fieldProves you can model data independently

Syntax Card: Chapter 51

Reference this card while working through the lessons. Every construct shown here appears in at least one lesson.

# -- Import ------------------------------------------------
from dataclasses import dataclass, field

# -- Basic @dataclass Definition ---------------------------
@dataclass
class Note:
title: str
body: str
tags: list[str]

# -- Fields with Defaults ---------------------------------
@dataclass
class Config:
verbose: bool = False
max_retries: int = 3

# -- Mutable Default with field(default_factory) -----------
@dataclass
class Notebook:
name: str
notes: list[Note] = field(default_factory=list)

# -- Frozen (Immutable) Dataclass --------------------------
@dataclass(frozen=True)
class Label:
name: str
color: str

# -- Instantiation -----------------------------------------
note: Note = Note(title="Hello", body="World", tags=["greeting"])

# -- Attribute Access --------------------------------------
print(note.title) # "Hello"
print(note.tags[0]) # "greeting"

# note.titel # pyright error: no attribute "titel"

# -- Dunder Methods (auto-generated) ----------------------
# @dataclass auto-generates these for you:
# __init__() - constructor from field definitions
# __repr__() - readable string representation
# __eq__() - equality by field values
#
# With frozen=True, also generates:
# __hash__() - allows use as dict key or in sets
#
# With order=True, also generates:
# __lt__(), __le__(), __gt__(), __ge__()

# -- Class vs Instance -------------------------------------
# Note -> the class (blueprint)
# note -> an instance (one specific Note)
# Note.title -> pyright error (title belongs to instances)
# note.title -> "Hello" (attribute of this instance)

# -- The self Parameter ------------------------------------
# Inside a method, self refers to the current instance.
# @dataclass writes __init__(self, title, body, tags) for you.
# You rarely write self by hand with dataclasses,
# but you will see it in AI-generated code.

Prerequisites

Before starting this chapter, you should be able to:

  • Write if/elif/else, for, and while loops (Chapter 50)
  • Test branches and boundary conditions (Chapter 50 Lesson 6)
  • Write function stubs and tests for TDG cycles (Chapters 46, 49, 50)
  • Use SmartNotes functions that take dict[str, str] parameters (Phase 2)

The SmartNotes Connection

At the end of this chapter, you will tackle a TDG challenge that transforms your SmartNotes codebase from dictionaries to dataclasses. The core change:

Before (dict-based):

def categorize_note(note: dict[str, str]) -> str:
word_count: int = len(note["body"].split())
...

After (dataclass-based):

def categorize_note(note: Note) -> str:
word_count: int = len(note.body.split())
...

The function logic stays the same. What changes is the contract. With dict[str, str], pyright cannot verify that note["body"] exists. With Note, pyright verifies that note.body exists, is a str, and catches any typo the instant you type it. You will convert four SmartNotes functions (create_note, note_word_count, format_note_header, merge_notes) to use the Note dataclass, write tests that construct Note instances as test fixtures, and experience the difference between "hope the key exists" and "pyright guarantees the field exists."