Manual Validation Pain

James loads a JSON file of SmartNotes data. Each note is a dictionary with six fields: title, body, word_count, author, tags, and is_draft. The JSON loaded fine (Lesson 4 handled that). But the data inside is a mess. One note has "word_count": "many". Another has "title": "". A third has "tags": "python" instead of ["python"].

"The JSON parser does not know what a valid note looks like," Emma says. "It parses the JSON format correctly, but it cannot tell you that word_count should be an integer or that title should not be empty. That is your job."

James sighs. He knows what is coming: an isinstance check for every field, a bounds check for every number, a length check for every string. Six fields, each with its own validation rules, each needing its own test. It is going to be verbose.

"Yes," Emma says. "It will be painful. That is the point."

If you're new to programming

Validation means checking that data meets your expectations before you use it. When you load data from a file or receive it from a user, you cannot trust that it has the right types, the right lengths, or the right values. A validation function inspects every field and rejects anything that does not meet the rules.

If you know validation from another language

Python's isinstance() serves the same role as Java's instanceof or C#'s is keyword. Manual validation in Python is particularly verbose because there is no built-in schema validation. Libraries like Pydantic (Chapter 55) fill this gap, similar to how Java has Bean Validation or C# has Data Annotations.

The Note Dataclass

You built this dataclass in Chapter 51. Here it is for reference:

from dataclasses import dataclass


@dataclass
class Note:
    title: str
    body: str
    word_count: int
    author: str
    tags: list[str]
    is_draft: bool

Six fields, six types. When you create a Note in code, Python trusts that you pass the right types. But when data comes from JSON, everything arrives as raw Python objects (strings, ints, lists, bools). Nothing guarantees the types match.

Writing `validate_note_data`

Here is the full validation function. Read it carefully. Notice how many lines it takes to validate just six fields:

def validate_note_data(data: dict[str, object]) -> Note:
    """Validate raw dictionary data and return a Note.

    Checks every field for correct type, non-empty strings,
    valid bounds, and correct list element types.

    Raises:
        TypeError: If a field has the wrong type.
        ValueError: If a field has an invalid value.
        KeyError: If a required field is missing.
    """
    # --- title: str, non-empty, max 200 chars ---
    title: object = data["title"]
    if not isinstance(title, str):
        raise TypeError(f"title must be a string, got {type(title).__name__}")
    if len(title) == 0:
        raise ValueError("title cannot be empty")
    if len(title) > 200:
        raise ValueError(f"title too long: {len(title)} chars (max 200)")

    # --- body: str, non-empty ---
    body: object = data["body"]
    if not isinstance(body, str):
        raise TypeError(f"body must be a string, got {type(body).__name__}")
    if len(body) == 0:
        raise ValueError("body cannot be empty")

    # --- word_count: int, non-negative ---
    word_count: object = data["word_count"]
    if not isinstance(word_count, int):
        raise TypeError(
            f"word_count must be an integer, got {type(word_count).__name__}"
        )
    if word_count < 0:
        raise ValueError(f"word_count cannot be negative: {word_count}")

    # --- author: str ---
    author: object = data["author"]
    if not isinstance(author, str):
        raise TypeError(f"author must be a string, got {type(author).__name__}")

    # --- tags: list ---
    tags: object = data["tags"]
    if not isinstance(tags, list):
        raise TypeError(f"tags must be a list, got {type(tags).__name__}")

    # --- is_draft: bool ---
    is_draft: object = data["is_draft"]
    if not isinstance(is_draft, bool):
        raise TypeError(f"is_draft must be a bool, got {type(is_draft).__name__}")

    return Note(
        title=title,
        body=body,
        word_count=word_count,
        author=author,
        tags=tags,
        is_draft=is_draft,
    )

Count the lines. Over 30 lines of validation for six fields. Every field follows the same pattern: extract, check type, check value, raise if wrong. The repetition is obvious. The fragility is real: add a seventh field and you need another block of checks. Rename a field and you need to update the check, the error message, and the constructor call.

Testing Every Validation Path

Each check needs a test. Here is the full test suite:

import pytest


def make_valid_data() -> dict[str, object]:
    """Return a valid note dictionary for testing."""
    return {
        "title": "Test Note",
        "body": "This is a test note body.",
        "word_count": 42,
        "author": "James",
        "tags": ["python", "testing"],
        "is_draft": False,
    }


def test_valid_note() -> None:
    data: dict[str, object] = make_valid_data()
    note: Note = validate_note_data(data)
    assert note.title == "Test Note"
    assert note.word_count == 42


def test_title_wrong_type() -> None:
    data: dict[str, object] = make_valid_data()
    data["title"] = 123
    with pytest.raises(TypeError):
        validate_note_data(data)


def test_title_empty() -> None:
    data: dict[str, object] = make_valid_data()
    data["title"] = ""
    with pytest.raises(ValueError):
        validate_note_data(data)


def test_title_too_long() -> None:
    data: dict[str, object] = make_valid_data()
    data["title"] = "A" * 201
    with pytest.raises(ValueError):
        validate_note_data(data)


def test_body_wrong_type() -> None:
    data: dict[str, object] = make_valid_data()
    data["body"] = 99
    with pytest.raises(TypeError):
        validate_note_data(data)


def test_body_empty() -> None:
    data: dict[str, object] = make_valid_data()
    data["body"] = ""
    with pytest.raises(ValueError):
        validate_note_data(data)


def test_word_count_wrong_type() -> None:
    data: dict[str, object] = make_valid_data()
    data["word_count"] = "many"
    with pytest.raises(TypeError):
        validate_note_data(data)


def test_word_count_negative() -> None:
    data: dict[str, object] = make_valid_data()
    data["word_count"] = -5
    with pytest.raises(ValueError):
        validate_note_data(data)


def test_author_wrong_type() -> None:
    data: dict[str, object] = make_valid_data()
    data["author"] = 42
    with pytest.raises(TypeError):
        validate_note_data(data)


def test_tags_wrong_type() -> None:
    data: dict[str, object] = make_valid_data()
    data["tags"] = "python"
    with pytest.raises(TypeError):
        validate_note_data(data)


def test_is_draft_wrong_type() -> None:
    data: dict[str, object] = make_valid_data()
    data["is_draft"] = "yes"
    with pytest.raises(TypeError):
        validate_note_data(data)


def test_missing_field() -> None:
    data: dict[str, object] = make_valid_data()
    del data["title"]
    with pytest.raises(KeyError):
        validate_note_data(data)

Twelve tests for one function. The make_valid_data() helper creates a baseline valid dictionary, and each test modifies one field to trigger one specific validation failure. This is the pattern from Lesson 2 (every raise gets a test) applied at scale.

Feeling the Pain

Look at what you just wrote:

30+ lines of validation for 6 fields
12 tests to cover every path
Identical patterns repeated for each field: extract, check isinstance, check value, raise

Now imagine a real application with 20 fields. Or 50. Every field needs the same pattern. Every new field means more lines of isinstance, more error messages, more tests. If you rename a field in the dataclass, you need to update the validation function, the error messages, and the tests.

This is not a sign that you are doing something wrong. This is how validation works without a framework. You are experiencing the exact problem that Pydantic was designed to solve.

A preview of Chapter 55

In Chapter 55, the entire validate_note_data function collapses into a Pydantic model:

from pydantic import BaseModel, Field


class Note(BaseModel):
    title: str = Field(min_length=1, max_length=200)
    body: str = Field(min_length=1)
    word_count: int = Field(ge=0)
    author: str
    tags: list[str]
    is_draft: bool

Six field declarations replace 30+ lines of isinstance checks. Pydantic validates types, enforces constraints, and produces clear error messages automatically. But that solution only makes sense after you have felt the problem. You have now felt it.

PRIMM-AI+ Practice: Validation Functions

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode before predicting.

Look at this validation call without running it. Predict whether it returns a Note or raises an exception. If it raises, predict the exception type. Write your prediction and a confidence score from 1 to 5 before checking.

data: dict[str, object] = {
    "title": "My Note",
    "body": "Some content here.",
    "word_count": 0,
    "author": "Emma",
    "tags": ["python"],
    "is_draft": True,
}

result = validate_note_data(data)

Check your prediction

Returns a valid Note. All checks pass:

title is a non-empty string under 200 characters
body is a non-empty string
word_count is an integer and 0 is not negative (the check is < 0, not <= 0)
author is a string
tags is a list
is_draft is a bool

The tricky part: word_count: 0 is valid because the check rejects negative values, not zero. If you predicted this would fail, revisit the if word_count < 0 check.

Run

Press Shift+Tab to exit Plan Mode.

Create a file with the Note dataclass, the validate_note_data function, and the test suite. Run uv run pytest to verify all 12 tests pass.

Investigate

Add a print statement at the top of each field check block (e.g., print("Checking title...")). Call validate_note_data with data["word_count"] = "many". Observe which fields are checked before the function raises. Use /bug in Claude Code to classify the exception type. The function stops at the first error. Is that a good design? Why might you want to collect all errors instead?

Modify

Add validation for the tags list contents: every element must be a non-empty string. Write the isinstance and length checks for each tag element. Then write two new tests: one for a tag that is not a string, and one for an empty string tag.

Hint

Use a for loop inside the tags validation block:

for i, tag in enumerate(tags):
    if not isinstance(tag, str):
        raise TypeError(f"tags[{i}] must be a string, got {type(tag).__name__}")
    if len(tag) == 0:
        raise ValueError(f"tags[{i}] cannot be empty")

This adds another 4 lines of validation and 2 more tests. The pain grows.

Make [Mastery Gate]

Without looking at any examples, write a function called validate_config(data: dict[str, object]) -> dict[str, object] that validates three fields:

"name": must be a non-empty string
"version": must be an integer, at least 1
"debug": must be a bool

Raise TypeError for wrong types, ValueError for invalid values, and KeyError for missing fields. Return the validated dict if everything passes.

Write six tests:

Valid data passes
name wrong type raises TypeError
name empty raises ValueError
version wrong type raises TypeError
version zero raises ValueError
debug wrong type raises TypeError

Run uv run pytest to verify all tests pass.

Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Generate Validation Code

Write a Python function called validate_product that
takes a dict with keys "name" (str, non-empty),
"price" (float, positive), and "quantity" (int,
non-negative). Use isinstance checks for type
validation. Raise TypeError for wrong types and
ValueError for invalid values. Return a dict if valid.
Use type annotations on all variables.

Review the AI's output. Count the lines of validation code. Does it follow the same pattern as validate_note_data? Is it similarly repetitive? This is the same pain, applied to a different domain.

What you're learning: You are seeing that the validation pain is not specific to SmartNotes. It is a universal problem.

Prompt 2: Ask About Alternatives

The validate_product function I just wrote has 20+ lines
of isinstance checks. Is there a better way to do this
in Python? What libraries exist for data validation?
Do not show me the implementation yet, just tell me
what options exist.

Read the AI's response. It will likely mention Pydantic, attrs, marshmallow, or similar libraries. You will learn Pydantic in Chapter 55. For now, just knowing these libraries exist confirms that the pain you felt is a known problem with known solutions.

What you're learning: You are discovering that the manual validation pattern you wrote is a well-known problem in the Python ecosystem, with purpose-built solutions.

Prompt 3: Compare Line Counts

Show me the validate_product function using manual
isinstance checks side by side with the same
validation using Pydantic. Count the lines for each
approach. Do not explain Pydantic in detail; just
show the comparison.

Compare the two versions. The Pydantic version should be dramatically shorter. You do not need to understand Pydantic syntax yet; the line count comparison alone makes the case for why you will learn it next.

What you're learning: You are previewing the payoff of Chapter 55 without needing to understand the details yet.

James scrolled through his thirty-line validation function and groaned. "This is like the old manual inventory audits. Every item, checked by hand against the manifest. Type of item, quantity, condition, shelf location. One auditor, one clipboard, six hundred SKUs. We did it that way for a year before someone finally brought in a barcode scanner."

"Thirty lines for six fields," Emma said. "And every line follows the same pattern: extract, check type, check value, raise."

"It scales terribly. We had the same problem at the warehouse. When we added a new product line, the audit checklist doubled. Renaming a category meant reprinting every form. One time I forgot to update the audit sheet after a product renumber, and we flagged fifty valid items as defective. Took a full shift to sort out."

"That's exactly the fragility here," Emma said. "Rename a field and you're updating the validation function, the tests, and the helper. Miss one spot and your tests pass but your code is wrong."

James leaned forward. "So this pain is the point. You made me write it by hand so I'd understand what the automation is replacing."

"Thirty lines of isinstance checks become six field declarations. That's the next chapter. Pydantic reads your type annotations and does all of this automatically, every check, every error message, every constraint. The pain you just felt is the motivation."

The Note Dataclass​

Writing validate_note_data​

Testing Every Validation Path​

Feeling the Pain​

PRIMM-AI+ Practice: Validation Functions​

Predict [AI-FREE]​

Run​

Investigate​

Modify​

Make [Mastery Gate]​

Try With AI​

Prompt 1: Generate Validation Code​

Prompt 2: Ask About Alternatives​

Prompt 3: Compare Line Counts​