Skip to main content

Manual Validation Pain

James loads a JSON file of SmartNotes data. Each note is a dictionary with six fields: title, body, word_count, author, tags, and is_draft. The JSON loaded fine (Lesson 4 handled that). But the data inside is a mess. One note has "word_count": "many". Another has "title": "". A third has "tags": "python" instead of ["python"].

"The JSON parser does not know what a valid note looks like," Emma says. "It parses the JSON format correctly, but it cannot tell you that word_count should be an integer or that title should not be empty. That is your job."

James sighs. He knows what is coming: an isinstance check for every field, a bounds check for every number, a length check for every string. Six fields, each with its own validation rules, each needing its own test. It is going to be verbose.

"Yes," Emma says. "It will be painful. That is the point."

If you're new to programming

Validation means checking that data meets your expectations before you use it. When you load data from a file or receive it from a user, you cannot trust that it has the right types, the right lengths, or the right values. A validation function inspects every field and rejects anything that does not meet the rules.

If you know validation from another language

Python's isinstance() serves the same role as Java's instanceof or C#'s is keyword. Manual validation in Python is particularly verbose because there is no built-in schema validation. Libraries like Pydantic (Chapter 55) fill this gap, similar to how Java has Bean Validation or C# has Data Annotations.


The Note Dataclass

You built this dataclass in Chapter 51. Here it is for reference:

from dataclasses import dataclass


@dataclass
class Note:
title: str
body: str
word_count: int
author: str
tags: list[str]
is_draft: bool

Six fields, six types. When you create a Note in code, Python trusts that you pass the right types. But when data comes from JSON, everything arrives as raw Python objects (strings, ints, lists, bools). Nothing guarantees the types match.


Writing validate_note_data

Here is the full validation function. Read it carefully. Notice how many lines it takes to validate just six fields:

def validate_note_data(data: dict[str, object]) -> Note:
"""Validate raw dictionary data and return a Note.

Checks every field for correct type, non-empty strings,
valid bounds, and correct list element types.

Raises:
TypeError: If a field has the wrong type.
ValueError: If a field has an invalid value.
KeyError: If a required field is missing.
"""
# --- title: str, non-empty, max 200 chars ---
title: object = data["title"]
if not isinstance(title, str):
raise TypeError(f"title must be a string, got {type(title).__name__}")
if len(title) == 0:
raise ValueError("title cannot be empty")
if len(title) > 200:
raise ValueError(f"title too long: {len(title)} chars (max 200)")

# --- body: str, non-empty ---
body: object = data["body"]
if not isinstance(body, str):
raise TypeError(f"body must be a string, got {type(body).__name__}")
if len(body) == 0:
raise ValueError("body cannot be empty")

# --- word_count: int, non-negative ---
word_count: object = data["word_count"]
if not isinstance(word_count, int):
raise TypeError(
f"word_count must be an integer, got {type(word_count).__name__}"
)
if word_count < 0:
raise ValueError(f"word_count cannot be negative: {word_count}")

# --- author: str ---
author: object = data["author"]
if not isinstance(author, str):
raise TypeError(f"author must be a string, got {type(author).__name__}")

# --- tags: list ---
tags: object = data["tags"]
if not isinstance(tags, list):
raise TypeError(f"tags must be a list, got {type(tags).__name__}")

# --- is_draft: bool ---
is_draft: object = data["is_draft"]
if not isinstance(is_draft, bool):
raise TypeError(f"is_draft must be a bool, got {type(is_draft).__name__}")

return Note(
title=title,
body=body,
word_count=word_count,
author=author,
tags=tags,
is_draft=is_draft,
)

Count the lines. Over 30 lines of validation for six fields. Every field follows the same pattern: extract, check type, check value, raise if wrong. The repetition is obvious. The fragility is real: add a seventh field and you need another block of checks. Rename a field and you need to update the check, the error message, and the constructor call.


Testing Every Validation Path

Each check needs a test. Here is the full test suite:

import pytest


def make_valid_data() -> dict[str, object]:
"""Return a valid note dictionary for testing."""
return {
"title": "Test Note",
"body": "This is a test note body.",
"word_count": 42,
"author": "James",
"tags": ["python", "testing"],
"is_draft": False,
}


def test_valid_note() -> None:
data: dict[str, object] = make_valid_data()
note: Note = validate_note_data(data)
assert note.title == "Test Note"
assert note.word_count == 42


def test_title_wrong_type() -> None:
data: dict[str, object] = make_valid_data()
data["title"] = 123
with pytest.raises(TypeError):
validate_note_data(data)


def test_title_empty() -> None:
data: dict[str, object] = make_valid_data()
data["title"] = ""
with pytest.raises(ValueError):
validate_note_data(data)


def test_title_too_long() -> None:
data: dict[str, object] = make_valid_data()
data["title"] = "A" * 201
with pytest.raises(ValueError):
validate_note_data(data)


def test_body_wrong_type() -> None:
data: dict[str, object] = make_valid_data()
data["body"] = 99
with pytest.raises(TypeError):
validate_note_data(data)


def test_body_empty() -> None:
data: dict[str, object] = make_valid_data()
data["body"] = ""
with pytest.raises(ValueError):
validate_note_data(data)


def test_word_count_wrong_type() -> None:
data: dict[str, object] = make_valid_data()
data["word_count"] = "many"
with pytest.raises(TypeError):
validate_note_data(data)


def test_word_count_negative() -> None:
data: dict[str, object] = make_valid_data()
data["word_count"] = -5
with pytest.raises(ValueError):
validate_note_data(data)


def test_author_wrong_type() -> None:
data: dict[str, object] = make_valid_data()
data["author"] = 42
with pytest.raises(TypeError):
validate_note_data(data)


def test_tags_wrong_type() -> None:
data: dict[str, object] = make_valid_data()
data["tags"] = "python"
with pytest.raises(TypeError):
validate_note_data(data)


def test_is_draft_wrong_type() -> None:
data: dict[str, object] = make_valid_data()
data["is_draft"] = "yes"
with pytest.raises(TypeError):
validate_note_data(data)


def test_missing_field() -> None:
data: dict[str, object] = make_valid_data()
del data["title"]
with pytest.raises(KeyError):
validate_note_data(data)

Twelve tests for one function. The make_valid_data() helper creates a baseline valid dictionary, and each test modifies one field to trigger one specific validation failure. This is the pattern from Lesson 2 (every raise gets a test) applied at scale.


Feeling the Pain

Look at what you just wrote:

  • 30+ lines of validation for 6 fields
  • 12 tests to cover every path
  • Identical patterns repeated for each field: extract, check isinstance, check value, raise

Now imagine a real application with 20 fields. Or 50. Every field needs the same pattern. Every new field means more lines of isinstance, more error messages, more tests. If you rename a field in the dataclass, you need to update the validation function, the error messages, and the tests.

This is not a sign that you are doing something wrong. This is how validation works without a framework. You are experiencing the exact problem that Pydantic was designed to solve.

A preview of Chapter 55

In Chapter 55, the entire validate_note_data function collapses into a Pydantic model:

from pydantic import BaseModel, Field


class Note(BaseModel):
title: str = Field(min_length=1, max_length=200)
body: str = Field(min_length=1)
word_count: int = Field(ge=0)
author: str
tags: list[str]
is_draft: bool

Six field declarations replace 30+ lines of isinstance checks. Pydantic validates types, enforces constraints, and produces clear error messages automatically. But that solution only makes sense after you have felt the problem. You have now felt it.


PRIMM-AI+ Practice: Validation Functions

Predict [AI-FREE]

Look at this validation call without running it. Predict whether it returns a Note or raises an exception. If it raises, predict the exception type. Write your prediction and a confidence score from 1 to 5 before checking.

data: dict[str, object] = {
"title": "My Note",
"body": "Some content here.",
"word_count": 0,
"author": "Emma",
"tags": ["python"],
"is_draft": True,
}

result = validate_note_data(data)
Check your prediction

Returns a valid Note. All checks pass:

  • title is a non-empty string under 200 characters
  • body is a non-empty string
  • word_count is an integer and 0 is not negative (the check is < 0, not <= 0)
  • author is a string
  • tags is a list
  • is_draft is a bool

The tricky part: word_count: 0 is valid because the check rejects negative values, not zero. If you predicted this would fail, revisit the if word_count < 0 check.

Run

Create a file with the Note dataclass, the validate_note_data function, and the test suite. Run uv run pytest to verify all 12 tests pass.

Investigate

Add a print statement at the top of each field check block (e.g., print("Checking title...")). Call validate_note_data with data["word_count"] = "many". Observe which fields are checked before the function raises. The function stops at the first error. Is that a good design? Why might you want to collect all errors instead?

Modify

Add validation for the tags list contents: every element must be a non-empty string. Write the isinstance and length checks for each tag element. Then write two new tests: one for a tag that is not a string, and one for an empty string tag.

Hint

Use a for loop inside the tags validation block:

for i, tag in enumerate(tags):
if not isinstance(tag, str):
raise TypeError(f"tags[{i}] must be a string, got {type(tag).__name__}")
if len(tag) == 0:
raise ValueError(f"tags[{i}] cannot be empty")

This adds another 4 lines of validation and 2 more tests. The pain grows.

Make [Mastery Gate]

Without looking at any examples, write a function called validate_config(data: dict[str, object]) -> dict[str, object] that validates three fields:

  • "name": must be a non-empty string
  • "version": must be an integer, at least 1
  • "debug": must be a bool

Raise TypeError for wrong types, ValueError for invalid values, and KeyError for missing fields. Return the validated dict if everything passes.

Write six tests:

  1. Valid data passes
  2. name wrong type raises TypeError
  3. name empty raises ValueError
  4. version wrong type raises TypeError
  5. version zero raises ValueError
  6. debug wrong type raises TypeError

Run uv run pytest to verify all tests pass.


Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Generate Validation Code

Write a Python function called validate_product that
takes a dict with keys "name" (str, non-empty),
"price" (float, positive), and "quantity" (int,
non-negative). Use isinstance checks for type
validation. Raise TypeError for wrong types and
ValueError for invalid values. Return a dict if valid.
Use type annotations on all variables.

Review the AI's output. Count the lines of validation code. Does it follow the same pattern as validate_note_data? Is it similarly repetitive? This is the same pain, applied to a different domain.

What you're learning: You are seeing that the validation pain is not specific to SmartNotes. It is a universal problem.

Prompt 2: Ask About Alternatives

The validate_product function I just wrote has 20+ lines
of isinstance checks. Is there a better way to do this
in Python? What libraries exist for data validation?
Do not show me the implementation yet, just tell me
what options exist.

Read the AI's response. It will likely mention Pydantic, attrs, marshmallow, or similar libraries. You will learn Pydantic in Chapter 55. For now, just knowing these libraries exist confirms that the pain you felt is a known problem with known solutions.

What you're learning: You are discovering that the manual validation pattern you wrote is a well-known problem in the Python ecosystem, with purpose-built solutions.

Prompt 3: Compare Line Counts

Show me the validate_product function using manual
isinstance checks side by side with the same
validation using Pydantic. Count the lines for each
approach. Do not explain Pydantic in detail; just
show the comparison.

Compare the two versions. The Pydantic version should be dramatically shorter. You do not need to understand Pydantic syntax yet; the line count comparison alone makes the case for why you will learn it next.

What you're learning: You are previewing the payoff of Chapter 55 without needing to understand the details yet.


Key Takeaways

  1. Manual validation is verbose and repetitive. Six fields required 30+ lines of isinstance checks, type comparisons, and bounds validation. Every field follows the same extract-check-raise pattern.

  2. Every validation path needs a test. Twelve tests for one function is not excessive; it is thorough. Each test modifies one field to trigger one specific failure. The make_valid_data() helper keeps tests readable.

  3. The pattern scales poorly. More fields mean more validation lines and more tests. Renaming a field requires changes in multiple places. This fragility is a real problem in production code.

  4. Use dict[str, object] for unvalidated data. Raw data from JSON has unknown types. Using object as the value type is honest: you do not know what each value is until you check with isinstance.

  5. This pain is intentional. You need to feel the problem before the solution makes sense. Chapter 55 introduces Pydantic, which replaces 30+ lines of manual checks with 6 field declarations.


Looking Ahead

You have built a complete error-handling toolkit: catching exceptions, raising exceptions, navigating the hierarchy, handling files safely, and validating data manually. The validation pain you felt in this lesson is the setup for Chapter 55, where Pydantic models collapse all of that boilerplate into concise, declarative field definitions. Thirty lines become six.