Skip to main content

Serialization and the Boundary Pattern

James has a NoteCreate model that validates types and enforces constraints. But the data he needs to validate does not arrive as Python keyword arguments. It arrives as JSON: files on disk, API responses, configuration data. How does he get JSON data into a Pydantic model?

Emma opens the SmartNotes notes.json file. "Two steps. First, you need to get data OUT of a model (serialization). Then you need to get data IN from JSON (parsing). Pydantic has methods for both."

She pauses and pulls up the Note dataclass from Chapter 51. "And then there is the bigger question: once the data is validated, what do you do with it? You do not keep it as a BaseModel forever. You convert it to your internal dataclass. That is the boundary pattern."

If you're new to programming

Serialization means converting a Python object into a format that can be stored or sent (like a dictionary or a JSON string). Parsing (or deserialization) is the reverse: converting stored data back into a Python object. These operations happen whenever your program reads from or writes to files, databases, or APIs.

If you know serialization from another language

Pydantic's model_dump() is analogous to Jackson's writeValueAsString() in Java or JsonSerializer.Serialize() in C#. The model_validate_json() method is the reverse: parsing JSON directly into a validated object, similar to objectMapper.readValue() or JsonSerializer.Deserialize().


model_dump(): Model to Dictionary

The model_dump() method converts a Pydantic model instance to a plain Python dictionary:

from pydantic import BaseModel, Field


class NoteCreate(BaseModel):
title: str = Field(min_length=1, max_length=200)
body: str = Field(min_length=1)
word_count: int = Field(ge=0)
author: str
tags: list[str] = []
is_draft: bool = False


note = NoteCreate(
title="My Note",
body="Learning serialization.",
word_count=25,
author="James",
)

data = note.model_dump() # returns a plain dict
print(data)
print(type(data))

Output:

{'title': 'My Note', 'body': 'Learning serialization.', 'word_count': 25, 'author': 'James', 'tags': [], 'is_draft': False}
<class 'dict'>

The result is a regular dictionary. Every field becomes a key-value pair. Default values (tags=[], is_draft=False) are included.


model_dump_json(): Model to JSON String

The model_dump_json() method converts directly to a JSON string:

json_output: str = note.model_dump_json()
print(json_output)
print(type(json_output))

Output:

{"title":"My Note","body":"Learning serialization.","word_count":25,"author":"James","tags":[],"is_draft":false}
<class 'str'>

This is a JSON string you can write to a file or send over a network. Notice that Python's False becomes JSON's false, and the list [] stays as []. Pydantic handles the Python-to-JSON conversion automatically.

You could also use json.dumps(note.model_dump()) to achieve the same result, but model_dump_json() is faster because it bypasses the intermediate dictionary step.


model_validate_json(): JSON String to Model

This is where Pydantic shines for external data. Instead of manually loading JSON, extracting fields, and passing them to a constructor, you feed the JSON string directly to model_validate_json():

from pydantic import BaseModel, Field, ValidationError


class NoteCreate(BaseModel):
title: str = Field(min_length=1, max_length=200)
body: str = Field(min_length=1)
word_count: int = Field(ge=0)
author: str
tags: list[str] = []
is_draft: bool = False


json_string: str = '{"title": "From JSON", "body": "Parsed directly.", "word_count": 10, "author": "Emma"}'

note = NoteCreate.model_validate_json(json_string)
print(note.title)
print(note.word_count)
print(note.tags)

Output:

From JSON
10
[]

One method call: parse the JSON, validate every field, enforce constraints, return a model instance. If anything is wrong, Pydantic raises ValidationError:

bad_json: str = '{"title": "", "body": "x", "word_count": -1, "author": "James"}'

try:
note = NoteCreate.model_validate_json(bad_json)
except ValidationError as e:
print(e)

Output:

2 validation errors for NoteCreate
title
String should have at least 1 character [type=string_too_short, input_value='', input_type=str]
word_count
Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-1, input_type=int]

No json.loads(). No isinstance checks. No manual field extraction. The JSON goes straight from a string into a validated model.


Loading from JSON Files

In Chapter 54, Lesson 4, you loaded JSON files with with open and json.load. Here is the same operation using Pydantic's model_validate_json:

from pathlib import Path


def load_note_from_file(path: str) -> NoteCreate:
"""Load and validate a note from a JSON file."""
json_text: str = Path(path).read_text()
return NoteCreate.model_validate_json(json_text)

The function reads the file as a string and passes it directly to model_validate_json(). If the file contains invalid data, Pydantic raises ValidationError. If the file does not exist, Python raises FileNotFoundError (which you handle with try/except from Chapter 54).

Compare this to the manual approach:

# Manual approach (Chapter 54 style)
import json

def load_note_from_file_manual(path: str) -> Note:
with open(path) as f:
data = json.load(f) # returns parsed JSON
return validate_note_data(data) # 30+ line function

Both approaches load a file and validate its contents. The Pydantic version replaces json.load plus validate_note_data (30+ lines) with a single model_validate_json call.


The Boundary Pattern

You now have two models for notes: NoteCreate(BaseModel) for validation, and Note (the @dataclass from Chapter 51) for application logic. This is not an accident. It is a design pattern called the boundary pattern.

The idea: Pydantic models guard the edges of your program. Dataclasses live inside. External data enters through a Pydantic model, gets validated, and then converts to an internal dataclass for processing.

External Data (JSON, API, user input)
|
v
[NoteCreate (BaseModel)] <-- validates at the boundary
|
v
[Note (@dataclass)] <-- clean data inside the app
|
v
Application Logic (functions, tests)

Here is the conversion function:

from dataclasses import dataclass, field


@dataclass
class Note:
title: str
body: str
word_count: int
author: str
tags: list[str] = field(default_factory=list)
is_draft: bool = False


def to_note(validated: NoteCreate) -> Note:
"""Convert a validated NoteCreate to an internal Note."""
return Note(
title=validated.title,
body=validated.body,
word_count=validated.word_count,
author=validated.author,
tags=list(validated.tags),
is_draft=validated.is_draft,
)

The to_note function takes a validated NoteCreate and creates a Note dataclass. At this point, the data is guaranteed to be valid. The dataclass does not need to check anything. It trusts the caller because the caller (the Pydantic model) already did the checking.


Why Two Models?

Emma anticipates the question. "I'm still not sure when to use dataclasses vs Pydantic for internal data," she admits. "But the boundary pattern makes the split clear."

Here is the reasoning:

ConcernNoteCreate (BaseModel)Note (@dataclass)
PurposeValidate external inputRepresent validated data
Where it livesAt the boundary (file loading, API input)Inside the application
OverheadValidation on every instantiationNo validation overhead
MutabilityImmutable by default in PydanticMutable (or frozen) by choice
DependenciesRequires pydantic packageStandard library only

Internal functions work with Note because it is lightweight and has no validation overhead. The validation happens once, at the boundary, and then the clean data flows through the rest of the program.


PRIMM-AI+ Practice: Serialization

Predict [AI-FREE]

Look at this code without running it. Predict the output of each print statement. Write your predictions and a confidence score from 1 to 5 before checking.

from pydantic import BaseModel, Field


class Config(BaseModel):
name: str = Field(min_length=1)
version: int = Field(ge=1)
debug: bool = False


config = Config(name="myapp", version=2)

d = config.model_dump()
print(type(d))
print(d["debug"])

json_str = config.model_dump_json()
print(type(json_str))
Check your prediction

Output:

<class 'dict'>
False
<class 'str'>

model_dump() returns a plain dict. The debug field has its default value False, which is included in the dictionary. model_dump_json() returns a str containing the JSON representation.

If you predicted debug would be missing from the dict (because it was not explicitly passed), that is a reasonable guess. Pydantic includes default values in the output by default.

Run

Create a file with the Config model above. Run uv run python config_serial.py. Then add:

json_input: str = '{"name": "otherapp", "version": 5, "debug": true}'
config2 = Config.model_validate_json(json_input)
print(config2.name)
print(config2.debug)

Predict the output, then run and compare.

Investigate

Try parsing invalid JSON:

bad_json: str = '{"name": "", "version": 0}'
config3 = Config.model_validate_json(bad_json)

How many errors does Pydantic report? Which constraints are violated? Now try malformed JSON (not valid JSON syntax):

broken: str = '{name: "myapp"}'
config4 = Config.model_validate_json(broken)

What exception does Pydantic raise for malformed JSON? Is it ValidationError or something else?

Modify

Write a to_config_dict function that takes a Config model and returns a dictionary with only name and version (excluding debug). Use model_dump() and then delete the debug key. Test it with an assertion.

Make [Mastery Gate]

Without looking at any examples, write:

  1. A BaseModel called TaskItem with description: str = Field(min_length=1) and priority: int = Field(ge=1, le=5)
  2. A @dataclass called Task with the same two fields (no constraints)
  3. A function to_task(validated: TaskItem) -> Task that converts between them
  4. A test that creates a TaskItem from a JSON string using model_validate_json, converts it to a Task, and asserts the field values match

Run uv run pytest to verify.


Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Explore model_validate_json

I have a Pydantic BaseModel called NoteCreate. I want
to load data from a JSON file and validate it in one
step. Show me how to use model_validate_json with
Path.read_text(). Do not use json.load or json.loads.
Include type annotations.

Review the AI's output. It should read the file as a string and pass it directly to model_validate_json(). Check that it does not use json.load() as an intermediate step.

What you're learning: You are verifying that the AI uses Pydantic's direct JSON parsing instead of falling back to the manual json-then-validate pattern.

Prompt 2: Understand the Boundary Pattern

In my SmartNotes project, I have a Pydantic BaseModel
called NoteCreate for validation and a @dataclass
called Note for internal logic. Why would I use two
separate classes instead of just using the BaseModel
everywhere? What are the tradeoffs?

Read the AI's response. It should mention performance (no validation overhead for internal operations), dependency isolation (dataclass needs no external package), and separation of concerns (validation at edges, clean data inside).

What you're learning: You are understanding the design reasoning behind the boundary pattern, not just the mechanics.


Key Takeaways

  1. model_dump() converts a model to a dictionary. All fields, including defaults, become key-value pairs in a plain Python dict.

  2. model_dump_json() converts a model directly to a JSON string. Faster than json.dumps(model.model_dump()) because it skips the intermediate dictionary.

  3. model_validate_json() parses JSON directly into a validated model. One method call replaces json.loads() plus manual validation. Invalid data raises ValidationError.

  4. The boundary pattern separates validation from application logic. Pydantic models guard the edges (file loading, API input). Dataclasses live inside (functions, tests, business logic). Data converts from BaseModel to dataclass after validation.

  5. Validate once at the boundary, then trust the data inside. Internal functions receive dataclass instances that are guaranteed valid. No redundant checking needed.


Looking Ahead

You have all the pieces: BaseModel for validation, Field for constraints, serialization for converting between formats, and the boundary pattern for separating concerns. In Lesson 5, you will assemble everything into a complete SmartNotes TDG: NoteCreate at the boundary, Note inside the app, full test suite covering valid data, invalid types, constraint violations, missing fields, and malformed JSON.