مرکزی مواد پر جائیں

Serialization and the Boundary Pattern

James has a NoteCreate model that validates types and enforces constraints. But the data he needs to validate does not arrive as Python keyword arguments. It arrives as JSON: files on disk, API responses, configuration data. How does he get JSON data into a Pydantic model?

Emma opens the SmartNotes notes.json file. "Two steps. First, you need to get data OUT of a model (serialization). Then you need to get data IN from JSON (parsing). Pydantic has methods for both."

She pauses and pulls up the Note dataclass from Chapter 51. "And then there is the bigger question: once the data is validated, what do you do with it? You do not keep it as a BaseModel forever. You convert it to your internal dataclass. That is the boundary pattern."

If you're new to programming

Serialization means converting a Python object into a format that can be stored or sent (like a dictionary or a JSON string). Parsing (or deserialization) is the reverse: converting stored data back into a Python object. These operations happen whenever your program reads from or writes to files, databases, or APIs.

If you know serialization from another language

Pydantic's model_dump() is analogous to Jackson's writeValueAsString() in Java or JsonSerializer.Serialize() in C#. The model_validate_json() method is the reverse: parsing JSON directly into a validated object, similar to objectMapper.readValue() or JsonSerializer.Deserialize().


model_dump(): Model to Dictionary

The model_dump() method converts a Pydantic model instance to a plain Python dictionary:

from pydantic import BaseModel, Field


class NoteCreate(BaseModel):
title: str = Field(min_length=1, max_length=200)
body: str = Field(min_length=1)
word_count: int = Field(ge=0)
author: str
tags: list[str] = []
is_draft: bool = False


note = NoteCreate(
title="My Note",
body="Learning serialization.",
word_count=25,
author="James",
)

data = note.model_dump() # returns a plain dict
print(data)
print(type(data))

Output:

{'title': 'My Note', 'body': 'Learning serialization.', 'word_count': 25, 'author': 'James', 'tags': [], 'is_draft': False}
<class 'dict'>

The result is a regular dictionary. Every field becomes a key-value pair. Default values (tags=[], is_draft=False) are included.


model_dump_json(): Model to JSON String

The model_dump_json() method converts directly to a JSON string:

json_output: str = note.model_dump_json()
print(json_output)
print(type(json_output))

Output:

{"title":"My Note","body":"Learning serialization.","word_count":25,"author":"James","tags":[],"is_draft":false}
<class 'str'>

This is a JSON string you can write to a file or send over a network. Notice that Python's False becomes JSON's false, and the list [] stays as []. Pydantic handles the Python-to-JSON conversion automatically.

You could also use json.dumps(note.model_dump()) to achieve the same result, but model_dump_json() is faster because it bypasses the intermediate dictionary step.


model_validate_json(): JSON String to Model

This is where Pydantic shines for external data. Instead of manually loading JSON, extracting fields, and passing them to a constructor, you feed the JSON string directly to model_validate_json():

from pydantic import BaseModel, Field, ValidationError


class NoteCreate(BaseModel):
title: str = Field(min_length=1, max_length=200)
body: str = Field(min_length=1)
word_count: int = Field(ge=0)
author: str
tags: list[str] = []
is_draft: bool = False


json_string: str = '{"title": "From JSON", "body": "Parsed directly.", "word_count": 10, "author": "Emma"}'

note = NoteCreate.model_validate_json(json_string)
print(note.title)
print(note.word_count)
print(note.tags)

Output:

From JSON
10
[]

One method call: parse the JSON, validate every field, enforce constraints, return a model instance. If anything is wrong, Pydantic raises ValidationError:

bad_json: str = '{"title": "", "body": "x", "word_count": -1, "author": "James"}'

try:
note = NoteCreate.model_validate_json(bad_json)
except ValidationError as e:
print(e)

Output:

2 validation errors for NoteCreate
title
String should have at least 1 character [type=string_too_short, input_value='', input_type=str]
word_count
Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-1, input_type=int]

No json.loads(). No isinstance checks. No manual field extraction. The JSON goes straight from a string into a validated model.


Loading from JSON Files

In Chapter 54, Lesson 4, you loaded JSON files with with open and json.load. Here is the same operation using Pydantic's model_validate_json:

from pathlib import Path


def load_note_from_file(path: str) -> NoteCreate:
"""Load and validate a note from a JSON file."""
json_text: str = Path(path).read_text()
return NoteCreate.model_validate_json(json_text)

The function reads the file as a string and passes it directly to model_validate_json(). If the file contains invalid data, Pydantic raises ValidationError. If the file does not exist, Python raises FileNotFoundError (which you handle with try/except from Chapter 54).

Compare this to the manual approach:

# Manual approach (Chapter 54 style)
import json

def load_note_from_file_manual(path: str) -> Note:
with open(path) as f:
data = json.load(f) # returns parsed JSON
return validate_note_data(data) # 30+ line function

Both approaches load a file and validate its contents. The Pydantic version replaces json.load plus validate_note_data (30+ lines) with a single model_validate_json call.


The Boundary Pattern

You now have two models for notes: NoteCreate(BaseModel) for validation, and Note (the @dataclass from Chapter 51) for application logic. This is not an accident. It is a design pattern called the boundary pattern.

The idea: Pydantic models guard the edges of your program. Dataclasses live inside. External data enters through a Pydantic model, gets validated, and then converts to an internal dataclass for processing.

External Data (JSON, API, user input)
|
v
[NoteCreate (BaseModel)] <-- validates at the boundary
|
v
[Note (@dataclass)] <-- clean data inside the app
|
v
Application Logic (functions, tests)

Here is the conversion function:

from dataclasses import dataclass, field


@dataclass
class Note:
title: str
body: str
word_count: int
author: str
tags: list[str] = field(default_factory=list)
is_draft: bool = False


def to_note(validated: NoteCreate) -> Note:
"""Convert a validated NoteCreate to an internal Note."""
return Note(
title=validated.title,
body=validated.body,
word_count=validated.word_count,
author=validated.author,
tags=list(validated.tags),
is_draft=validated.is_draft,
)

The to_note function takes a validated NoteCreate and creates a Note dataclass. At this point, the data is guaranteed to be valid. The dataclass does not need to check anything. It trusts the caller because the caller (the Pydantic model) already did the checking.


Why Two Models?

Emma anticipates the question. "I'm still not sure when to use dataclasses vs Pydantic for internal data," she admits. "But the boundary pattern makes the split clear."

Here is the reasoning:

ConcernNoteCreate (BaseModel)Note (@dataclass)
PurposeValidate external inputRepresent validated data
Where it livesAt the boundary (file loading, API input)Inside the application
OverheadValidation on every instantiationNo validation overhead
MutabilityImmutable by default in PydanticMutable (or frozen) by choice
DependenciesRequires pydantic packageStandard library only

Internal functions work with Note because it is lightweight and has no validation overhead. The validation happens once, at the boundary, and then the clean data flows through the rest of the program.


PRIMM-AI+ Practice: Serialization

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode before predicting.

Look at this code without running it. Predict the output of each print statement. Write your predictions and a confidence score from 1 to 5 before checking.

from pydantic import BaseModel, Field


class Config(BaseModel):
name: str = Field(min_length=1)
version: int = Field(ge=1)
debug: bool = False


config = Config(name="myapp", version=2)

d = config.model_dump()
print(type(d))
print(d["debug"])

json_str = config.model_dump_json()
print(type(json_str))
Check your prediction

Output:

<class 'dict'>
False
<class 'str'>

model_dump() returns a plain dict. The debug field has its default value False, which is included in the dictionary. model_dump_json() returns a str containing the JSON representation.

If you predicted debug would be missing from the dict (because it was not explicitly passed), that is a reasonable guess. Pydantic includes default values in the output by default.

Run

Press Shift+Tab to exit Plan Mode.

Create a file with the Config model above. Run uv run python config_serial.py. Then add:

json_input: str = '{"name": "otherapp", "version": 5, "debug": true}'
config2 = Config.model_validate_json(json_input)
print(config2.name)
print(config2.debug)

Predict the output, then run and compare.

Investigate

In Claude Code, type /investigate @config_serial.py and ask about the difference between model_dump() and model_dump_json(). Then try parsing invalid JSON:

bad_json: str = '{"name": "", "version": 0}'
config3 = Config.model_validate_json(bad_json)

How many errors does Pydantic report? Which constraints are violated? Now try malformed JSON (not valid JSON syntax):

broken: str = '{name: "myapp"}'
config4 = Config.model_validate_json(broken)

What exception does Pydantic raise for malformed JSON? Is it ValidationError or something else?

Modify

Write a to_config_dict function that takes a Config model and returns a dictionary with only name and version (excluding debug). Use model_dump() and then delete the debug key. Test it with an assertion.

Make [Mastery Gate]

Without looking at any examples, use /tdg in Claude Code to scaffold a TDG cycle. Write:

  1. A BaseModel called TaskItem with description: str = Field(min_length=1) and priority: int = Field(ge=1, le=5)
  2. A @dataclass called Task with the same two fields (no constraints)
  3. A function to_task(validated: TaskItem) -> Task that converts between them
  4. A test that creates a TaskItem from a JSON string using model_validate_json, converts it to a Task, and asserts the field values match

Run uv run pytest to verify.


Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Explore model_validate_json

I have a Pydantic BaseModel called NoteCreate. I want
to load data from a JSON file and validate it in one
step. Show me how to use model_validate_json with
Path.read_text(). Do not use json.load or json.loads.
Include type annotations.

Review the AI's output. It should read the file as a string and pass it directly to model_validate_json(). Check that it does not use json.load() as an intermediate step.

What you're learning: You are verifying that the AI uses Pydantic's direct JSON parsing instead of falling back to the manual json-then-validate pattern.

Prompt 2: Understand the Boundary Pattern

In my SmartNotes project, I have a Pydantic BaseModel
called NoteCreate for validation and a @dataclass
called Note for internal logic. Why would I use two
separate classes instead of just using the BaseModel
everywhere? What are the tradeoffs?

Read the AI's response. It should mention performance (no validation overhead for internal operations), dependency isolation (dataclass needs no external package), and separation of concerns (validation at edges, clean data inside).

What you're learning: You are understanding the design reasoning behind the boundary pattern, not just the mechanics.

Prompt 3: Apply the Boundary Pattern to a Different Source

I want to load data from a CSV file where each row
has columns: product_name, price, quantity. Show me
how to read each row, validate it with a Pydantic
BaseModel, and convert it to a @dataclass. Use the
csv module and include type annotations.

Review the AI's response. Does it apply the same boundary pattern (Pydantic at the edge, dataclass inside)? Does it handle the fact that CSV values arrive as strings and need type conversion? Compare this to the JSON loading pattern from this lesson.

What you're learning: You are verifying that the boundary pattern works with any external data source, not just JSON.



James drew a line on his notebook. "Pydantic on the left, dataclass on the right. The line in the middle is the boundary. It's receiving inspection at the warehouse. Everything that comes off the truck gets checked by the inspection team. Once it passes, it goes on the shelf as trusted inventory. Nobody re-inspects it when it moves to the packing station."

"Validate once at the boundary, trust everywhere inside," Emma said. "That's the pattern."

"And model_validate_json is the inspection machine that takes a raw shipping label, reads it, validates the contents, and either accepts the item or rejects the whole delivery. One step instead of three."

"I learned the boundary pattern the hard way," Emma said. "On my first real project, I put validation in every function. The user service validated the email, the notification service validated the email again, the database layer validated it a third time. Same check, three places. When we changed the email rules, we updated two of the three and spent a week debugging why some emails worked in one service but not another."

James winced. "Validate once at the edge. Don't repeat yourself downstream."

"Exactly. And now you have all the pieces: BaseModel for types, Field for constraints, serialization for format conversion, the boundary pattern for architecture. The next lesson puts it all together. You'll build a complete SmartNotes boundary layer, end to end, with tests for every category of failure."