Skip to main content

Axiom VI: Data is Relational

Axiom V gave James typed dataclasses for orders, customers, and products: Pyright catching structural errors before runtime. But types describe individual objects. A CustomerOrder knows its own shape. It knows nothing about the customer who placed it, the products inside it, or the fifty other orders that customer has made. When the team asked James to build a dashboard showing order history by customer, with filters for date range, status, and product category, he hit a wall that types alone could not solve. His data had relationships, and nothing in his system understood them.

His data lived in a JSON file: orders.json. Each order was a dictionary with a customer_name string, a product_name string, and a status field. To find all orders for "Acme Corp," he loaded the entire file into memory and looped through every record. To find overdue orders across all customers, he looped again. To count how many orders each customer had placed, he looped a third time, building a dictionary by hand. The file was 2,000 records. The dashboard took eleven seconds to load.

Then the product team changed a customer's name from "Acme Corp" to "Acme Corporation." James updated the customer record. He forgot to update the 47 orders that referenced the old name. Now the dashboard showed two customers: "Acme Corp" with 47 historical orders and "Acme Corporation" with zero. The data was inconsistent, and the JSON file had no way to tell him.

"It's a data format problem," James said before Emma could finish examining the dashboard. "I'll switch to YAML, or maybe use separate JSON files for customers and orders. Cleaner structure, same flexibility."

"Build it," Emma said. "Separate files. Show me how you would fix the Acme Corp problem."

James sketched it out. A customers.yaml with IDs and names. An orders.yaml with customer ID references instead of names. "Change the name in the customers file, orders still point to the right ID. Problem solved."

"Delete customer number 7," Emma said.

James paused. The orders referencing customer 7 would still exist: pointing to a customer that was gone. Nothing in the YAML file would stop him. Nothing would warn him that twenty orders now referenced a ghost. "I would have to write code to enforce that myself."

"And the original problem: what if someone types 'Acme Corp' in one order file and 'Acme Corporation' in another?"

"The YAML file would accept both," James said slowly. He stared at the dashboard showing two "Acme" customers. "The format was never the issue. YAML would have the same problem. CSV would have the same problem. I need something that enforces the connections, that rejects bad data before it gets in. Like a spreadsheet where instead of typing a customer name in every order row, you have a separate customers sheet and each order just points to a row number. Change the name in the customers sheet, and every order still points to the right place."

Emma paused. "That is actually a better analogy than the one I usually use. I have been explaining normalization with entity-relationship diagrams for years. The spreadsheet version is clearer."

Emma opened a terminal and typed twelve lines of SQL. The same dashboard query that took eleven seconds and forty lines of Python returned in three milliseconds. The customer name lived in one place. The relationships were enforced by the database. The data could not become inconsistent because the system would not allow it.

The difference between JSON-as-database and a relational database is Axiom VI.

The Problem Without This Axiom

James's orders.json was not a beginner's mistake. It was the path every developer follows when data starts simple and grows relational. The trajectory is predictable:

StageWhat HappensConsequence
Week 1JSON file stores 20 recordsFast, simple, readable
Month 2File grows to 500 recordsQueries require loading everything into memory
Month 4Second entity added (customers separate from orders)Relationships expressed by duplicating strings
Month 6Name change breaks data consistencyNo constraints, no validation, no way to detect the problem
Month 9Dashboard needs cross-entity queries40 lines of Python to do what SQL does in 3
Month 12AI agent asked to query the dataAgent writes custom loops because JSON has no query language

Without recognizing that structured data is inherently relational, developers fall into three traps. The JSON Graveyard: projects accumulate JSON files (orders.json, customers.json, products.json) with no way to express relationships between them. James was building one. The Flat File Spiral: as complexity grows, developers invent ad-hoc query languages, build custom indexing, implement their own transaction logic (slowly reinventing a database, badly). The NoSQL Trap: developers reach for document stores because the API feels familiar, but when the data is relational, fighting its nature creates complexity that a relational database handles natively.

Each path leads to the same destination: a system that cannot answer basic questions about its own data without heroic effort from the developer.

The Axiom Defined

Axiom VI: Structured data follows relational patterns. SQL is the default for persistent structured data. SQLite for single-user, PostgreSQL for multi-user. Use an ORM only when it doesn't obscure the SQL.

This axiom makes three claims, each of which James learned the hard way:

  1. Structured data is relational by nature. When you have entities with attributes and connections between them, you have relational data (whether or not you store it relationally). James's JSON file contained relational data. It just could not enforce the relationships.
  2. SQL is the default choice. Not the only choice, but the one you should deviate from consciously with good reason.
  3. The ORM serves you, not the reverse. If your ORM hides the SQL so completely that you cannot reason about what queries execute, it has become an obstacle.

From Principle to Axiom

In Chapter 17, you learned Principle 5: Persisting State in Files: the general durability rule that work products must survive beyond a single session. James was already following this principle. His team's markdown knowledge base (Axiom II) and his typed Python modules (Axiom III) all persisted in files.

But Axiom VI refines this principle for a specific category of state: structured data with relationships. Not all persistent data belongs in the same format. The distinction matters:

State TypeStorageWhy
Knowledge, documentation, specsMarkdown filesHuman-readable, version-controlled, AI-parseable
ConfigurationYAML/TOML filesDeclarative, mergeable, environment-specific
Structured entities with relationshipsSQL databaseQueryable, constrained, normalized, concurrent-safe
Binary assetsFile systemGit LFS or object storage for large files

Principle 5 tells you to persist state. Axiom VI tells you HOW to persist structured data: relationally, with SQL, using the right engine for the job.

The Paper That Gave Data a Theory

In 1970, an English mathematician named Edgar F. Codd published a paper at IBM's San Jose Research Laboratory: "A Relational Model of Data for Large Shared Data Banks." At the time, databases were navigational: programs traversed pointers from record to record, like walking through a maze. If the structure of the maze changed, every program that navigated it broke. Codd proposed something radical: separate the logical structure of data from its physical storage. Define data as tables with rows and columns. Express queries as mathematical operations on those tables. Let the database, not the programmer, figure out how to retrieve the data efficiently.

IBM's own database team resisted. They had built IMS, a hierarchical database that powered most of the company's revenue. Codd's relational model threatened that product. IBM delayed implementation for years. But a young programmer named Larry Ellison read Codd's paper, saw its implications, and in 1977 founded a company to build the first commercial relational database. He called it Oracle.

The relational model won because it solved James's exact problem at industrial scale: when data has relationships, a system that understands relationships will always outperform one that does not. Codd's tables, foreign keys, and constraints are the reason Emma's twelve-line SQL query returned in three milliseconds what James's forty-line Python loop took eleven seconds to produce. The database optimizer (the component Codd's model made possible) chose the execution path. James did not have to.

More than half a century later, SQL remains the dominant language for structured data. It has survived the rise and fall of object databases (1990s), the XML movement (2000s), the NoSQL revolution (2010s), and the graph database wave (2020s). Each found legitimate niches. None displaced SQL for general-purpose structured data, because Codd's insight addresses a property of data itself: when entities have relationships, a relational system is the natural fit.

Why SQL Works

PropertyWhat It MeansWhy It Matters
DeclarativeYou say WHAT you want, not HOW to get itThe database optimizer chooses the execution strategy
RelationalData is organized into related tablesReflects how real-world entities connect
ConstrainedSchema enforces structure, types, and relationshipsInvalid data is rejected before it enters the system
OptimizedDecades of query planner researchComplex queries execute efficiently without manual tuning
TransactionalACID guarantees (Atomicity, Consistency, Isolation, Durability)Data is never left in a half-updated state
UniversalOne language across SQLite, PostgreSQL, MySQL, SQL ServerSkills transfer between databases

The declarative nature deserves emphasis. This is why Emma's query was so much shorter than James's Python loop. When you write:

Code Preview

SQL is a language for working with databases. You will use it in Part 6. For now, read the structure, not the syntax: tables have columns, rows have values, and relationships connect tables through shared keys.

The pattern:
data lives in tables (rows and columns)
each row is one record, each column is one property
tables connect to each other through shared keys
you ask questions by describing WHAT you want, not HOW to find it
SELECT tasks.title, projects.name
FROM tasks
JOIN projects ON tasks.project_id = projects.id
WHERE tasks.status = 'overdue'
ORDER BY tasks.due_date;

You have not specified HOW to find this data. You have not said "scan the tasks array, for each task look up the project, filter by status, then sort." You described the RESULT you want, and the database figures out the fastest path to deliver it. This is the same declarative philosophy behind CSS, HTML, and configuration files, and it is why AI agents work so effectively with SQL.

Relational Thinking: Entities and Relationships

Emma started James's education by drawing three boxes on a whiteboard: the same three entities that his JSON file had tangled together.

Entity-Relationship Diagram showing customers, orders, order_items, and products tables with their relationships

1. Entities (Tables)

An entity is a distinct "thing" in your domain. In James's order system:

  • Customer: a company or person who places orders
  • Order: a transaction with a status, date, and total
  • Product: an item that can be ordered

Each entity becomes a table. Each row is one instance. The key insight: in James's JSON file, these three entities were mashed into a single list of dictionaries. In a relational database, each lives in its own table with its own structure.

Focus on what the code is doing, not the syntax

This lesson introduces SQL and Python database code. You have not learned either language yet. SQL is new here, and Python has not been covered yet. Read these code blocks for the concept: what data is being defined, what constraints are being enforced, how tables connect to each other. The specific syntax will make sense when you reach the hands-on chapters.

2. Attributes (Columns)

Each entity has typed properties, and this is where Axiom V meets Axiom VI. The schema is a type definition for your data:

CREATE TABLE customers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
email TEXT NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE TABLE orders (
id INTEGER PRIMARY KEY,
status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'shipped', 'delivered')),
total_amount REAL NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
customer_id INTEGER NOT NULL REFERENCES customers(id)
);

Notice the constraints: NOT NULL means required, UNIQUE prevents duplicates, CHECK restricts values to a valid set, REFERENCES declares relationships. When Emma showed this to James, he recognized the pattern from Axiom V: these constraints are guardrails, enforced by the database instead of the type checker.

3. Relationships (Foreign Keys)

Relationships connect entities (the part James's JSON could not express):

  • A Customer has many Orders (one-to-many)
  • An Order belongs to a Customer (many-to-one)
  • An Order contains Products (many-to-many, via a junction table)

Foreign keys enforce referential integrity: you cannot create an order for a customer that does not exist. This is what prevented James's "Acme Corp" vs "Acme Corporation" disaster: the customer name lives in one row of the customers table, and every order references it by id, not by duplicated string.

-- This FAILS if customer_id 999 doesn't exist (the database protects you)
INSERT INTO orders (status, total_amount, customer_id)
VALUES ('pending', 149.99, 999);
-- Error: FOREIGN KEY constraint failed

Compare this to JSON, where nothing prevents "customer_id": 999 even if no such customer exists. The relational database catches the error. The JSON file silently accepts it.

The SQLite / PostgreSQL Decision

"Which database should I use?" James asked. Emma's answer was a decision framework, not a preference. The axiom specifies two databases. Here is when to use each:

FactorSQLitePostgreSQL
WritersSingle processMany concurrent users
DeploymentEmbedded in your applicationSeparate server process
SetupZero configuration (just a file)Requires installation and configuration
SizeUp to ~1 TB practicalPetabytes with proper architecture
ConcurrencySingle-writer, multiple readersFull MVCC (Multi-Version Concurrency Control): concurrent reads and writes without blocking
Use caseCLI tools, mobile apps, prototypes, embeddedWeb apps, APIs, multi-user systems
BackupCopy the filepg_dump or streaming replication
AI agent workLocal projects, personal toolsProduction deployments

The Decision Framework

Ask these three questions:

  1. How many processes write to this database simultaneously?

    • One process: SQLite
    • Multiple processes: PostgreSQL
  2. Does this need to run as a network service?

    • No (CLI tool, desktop app, local agent): SQLite
    • Yes (web API, shared service): PostgreSQL
  3. Is this a prototype or production?

    • Prototype: SQLite (migrate to PostgreSQL later if needed)
    • Production multi-user: PostgreSQL from the start

SQLite in Practice

SQLite is not a toy database. It is the most widely deployed database engine in the world: present in every smartphone, every web browser, and most operating systems. For James's order dashboard (a single-user internal tool), it was exactly the right choice. No server to maintain, no connection strings to manage, no Docker containers. Just a file:

import sqlite3

conn = sqlite3.connect("orders.db")
cursor = conn.cursor()

# The schema Emma wrote (James's orders.json replaced in 12 lines)
cursor.execute("""CREATE TABLE IF NOT EXISTS customers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
email TEXT NOT NULL
)""")

cursor.execute("""CREATE TABLE IF NOT EXISTS orders (
id INTEGER PRIMARY KEY AUTOINCREMENT,
status TEXT NOT NULL DEFAULT 'pending',
total_amount REAL NOT NULL,
customer_id INTEGER NOT NULL REFERENCES customers(id),
created_at TEXT NOT NULL DEFAULT (datetime('now'))
)""")

# Insert with parameterized queries (SAFE, see anti-patterns below)
cursor.execute("INSERT INTO customers (name, email) VALUES (?, ?)",
("Acme Corporation", "orders@acme.com"))
customer_id = cursor.lastrowid

cursor.execute("INSERT INTO orders (status, total_amount, customer_id) VALUES (?, ?, ?)",
("shipped", 149.99, customer_id))
conn.commit()

# The dashboard query (3 milliseconds instead of 11 seconds)
cursor.execute("""
SELECT customers.name, COUNT(orders.id), SUM(orders.total_amount)
FROM customers
JOIN orders ON orders.customer_id = customers.id
GROUP BY customers.id
ORDER BY SUM(orders.total_amount) DESC
""")

for name, count, total in cursor.fetchall():
print(f"{name}: {count} orders, ${total:.2f} total")

conn.close()

This replaced James's entire forty-line Python loop. The database is a single file (orders.db) that you can copy, back up, or inspect with any SQLite tool. When the team later moved the dashboard to a multi-user web app, the SQL transferred directly to PostgreSQL: same queries, different connection string. This is the universality of SQL: learn it once, apply it everywhere.

SQL and AI: Why Agents Love Relational Data

This is where Axiom VI connects to everything this book teaches, and where the lesson becomes urgent rather than merely architectural. When James asked an AI agent to "show me all overdue orders for Acme Corp" against his JSON file, the agent generated this:

import json
from datetime import datetime, timedelta

with open("orders.json") as f:
data = json.load(f)

cutoff = datetime.now() - timedelta(days=30)
results = []
for order in data["orders"]:
# Hope that "customer" is spelled consistently
if "Acme" in order.get("customer_name", ""):
if order.get("status") == "pending":
created = datetime.fromisoformat(order["created_at"])
if created < cutoff:
results.append(order)

The code works until someone stores the customer name as "customer" instead of "customer_name", or formats dates differently, or nests orders inside customer objects. The AI had no schema to consult, so it guessed at field names, assumed a flat structure, and produced brittle string matching. Every assumption is a silent failure waiting to happen.

When he asked the same question against his SQL schema, the agent generated:

SELECT orders.id, orders.total_amount, orders.created_at
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE customers.name = 'Acme Corporation'
AND orders.status = 'pending'
AND orders.created_at < datetime('now', '-30 days');

The difference is structural. SQL has a constrained vocabulary: approximately 30 keywords that matter. Compare this to Python's thousands of library functions. Fewer choices mean fewer hallucination opportunities. SQL is declarative: it describes WHAT you want, not HOW to get it. Natural language intent maps almost word-for-word to SQL. The schema is the specification: when you give an AI agent your CREATE TABLE statements, it knows exactly what data exists, what types each column holds, and how tables relate. The schema is to data what type annotations are to code: a machine-readable contract.

SQL is Verifiable

Unlike generated Python, SQL queries can be verified mechanically before touching real data:

  1. Syntax check: Does the query parse?
  2. Schema check: Do the referenced tables and columns exist?
  3. Type check: Are comparisons between compatible types?
  4. Result check: Does EXPLAIN show a reasonable query plan?

This makes SQL ideal for AI-generated code. James could verify every AI-generated query against the schema without running it against production data: the same principle as Pyright catching type errors before runtime.

Practical tip: Keep a schema.sql file in your project's docs/ directory (the same knowledge base from Axiom II). When an AI agent needs to work with your data, it reads one file and has the complete map: every table, every column, every relationship, every constraint. Emma called it "the system prompt for your database." James added his on the same day and never removed it. Every AI prompt that touched the order system started with @docs/schema.sql.

Reading Checkpoint

This is a natural stopping point. If you need a break, bookmark this spot and return when you are ready. Everything above covers the core concept; everything below applies it through exercises and practice.

Still reading for concepts, not memorization

The ORM code below shows how Python classes can mirror database tables. You will use these tools (SQLModel, Alembic) when you reach the hands-on chapters. For now, notice the pattern: one definition serves both your code and your database.

ORMs: When to Use, When to Avoid

James noticed that his Python code and his SQL schema were expressing the same structure in two languages: his CustomerOrder dataclass mirrored his orders table. An ORM (Object-Relational Mapper) bridges that gap, letting you define the structure once. In Python, SQLModel (built on SQLAlchemy) is the recommended choice for agentic development because it unifies Pydantic validation with SQLAlchemy's database layer:

from sqlmodel import SQLModel, Field, Session, create_engine, select
from typing import Optional
from datetime import UTC, datetime

class Customer(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str = Field(index=True, unique=True)
email: str

class Order(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
status: str = Field(default="pending")
total_amount: float
customer_id: int = Field(foreign_key="customer.id")
created_at: datetime = Field(default_factory=lambda: datetime.now(UTC))

# Create database and tables
engine = create_engine("sqlite:///orders.db")
SQLModel.metadata.create_all(engine)

# Use the ORM (James recognized his dataclasses, now backed by a database)
with Session(engine) as session:
customer = Customer(name="Acme Corporation", email="orders@acme.com")
session.add(customer)
session.commit()
session.refresh(customer)

order = Order(status="shipped", total_amount=149.99, customer_id=customer.id)
session.add(order)
session.commit()

# Query (still readable, maps directly to SQL concepts)
statement = select(Order).where(Order.status != "delivered")
for order in session.exec(statement):
print(f"[{order.status}] ${order.total_amount:.2f}")

James noticed something: the SQLModel classes looked almost identical to his Axiom V dataclasses. That was the point. The ORM unified his type definitions with his database schema, one structure serving both purposes.

The ORM Rule

The axiom says: "Use an ORM only when it doesn't obscure the SQL."

This means:

Use the ORM WhenAvoid the ORM When
CRUD operations (Create, Read, Update, Delete)Complex analytical queries with multiple JOINs
Type safety matters (Python type hints on models)Performance-critical paths where you need query plan control
Schema definition (models as documentation)You cannot explain what SQL the ORM generates
Migrations (Alembic integrates with SQLAlchemy)The ORM syntax is more complex than raw SQL

The test is simple: Can you explain the SQL that your ORM code generates? If yes, the ORM is adding value. If no, write the SQL directly. A useful heuristic: if you spend more than two minutes reading ORM documentation to express a JOIN or GROUP BY, the ORM has become the obstacle. Write the SQL.

There is an agentic angle here too. AI agents are significantly better at generating raw SQL than complex ORM-specific syntax. SQL is a constrained, universal language with decades of training data. ORM DSLs (like SQLAlchemy's legacy session.query(Order).filter(Order.status == 'pending').join(Customer)) are framework-specific dialects that AI is more likely to hallucinate. When you need AI to write a query, raw SQL is the more agent-native language.

Emma showed James the dividing line with two examples from his own codebase:

# Good: ORM for simple CRUD (the SQL is obvious)
order = session.get(Order, order_id)
order.status = "delivered"
session.commit()

# Better as raw SQL: James's dashboard query (complex aggregation)
cursor.execute("""
SELECT customers.name, COUNT(orders.id) AS total,
SUM(orders.total_amount) AS revenue
FROM customers
JOIN orders ON orders.customer_id = customers.id
GROUP BY customers.id ORDER BY revenue DESC
""")
Preview: Migrations, Schema Evolution Over Time

Three months after migrating to SQL, James needed to add a priority column to the orders table. He could not just edit the CREATE TABLE statement (the database already existed with real data). Emma showed him migrations: versioned scripts that transform your schema from one state to the next, like version control for your database structure.

Without migrations, schema changes are manual commands run against production databases with no record, no rollback, and no reproducibility. With migrations, every schema change is:

  • Versioned: Each migration has a sequence number
  • Reversible: Each migration defines both "upgrade" and "downgrade"
  • Reproducible: Run all migrations to recreate the database from scratch
  • Auditable: Git tracks who changed the schema and when

In the Python ecosystem, Alembic (built on SQLAlchemy) handles migrations:

# alembic/versions/001_add_priority_to_tasks.py
"""Add priority column to tasks table"""

from alembic import op
import sqlalchemy as sa

def upgrade():
op.add_column('tasks', sa.Column('priority', sa.Integer(), nullable=True))
op.create_index('ix_tasks_priority', 'tasks', ['priority'])

def downgrade():
op.drop_index('ix_tasks_priority', 'tasks')
op.drop_column('tasks', 'priority')

This migration adds a priority column and an index. If something goes wrong, downgrade() reverses it cleanly. The migration file lives in version control alongside your code: schema and application evolve together. You will use Alembic in hands-on chapters when you build your own database-backed applications.

Anti-Patterns

You have seen the JSON graveyard. Every team has one. It is the project folder with data.json, users.json, config.json, and backup_data_old_FINAL_v2.json: the one where every new feature means another JSON file, every query means another Python loop, every relationship means another duplicated string.

It is the system where a developer once changed a customer name and broke six months of reports because the name was copied into 2,000 order records instead of referenced by ID. It is the project where the AI agent was asked to "find all orders from last quarter" and generated forty lines of json.load(), nested loops, and datetime parsing, code that a single SQL query would replace.

The JSON graveyard was not built by bad developers. It was built by developers who started with twenty records and did not recognize the moment when their data became relational.

Anti-PatternWhat Goes WrongThe Fix
JSON files as databaseNo queries, no relations, no constraints, loads everything into memoryUse SQLite (same simplicity, relational power)
NoSQL as defaultFighting relational data with document model, denormalization headachesStart relational. Move to NoSQL only for genuinely non-relational data (logs, events, documents)
Raw string SQLSQL injection vulnerabilities, crashes on special charactersAlways use parameterized queries (? placeholders)
No migrationsManual schema changes, inconsistent environments, no rollbackUse Alembic or equivalent migration tool
Ignoring indexesQueries slow to a crawl as data grows (full table scans)Index columns used in WHERE, JOIN, and ORDER BY
Over-normalizationDozens of tables for simple domains, JOIN-heavy queries for basic readsNormalize to 3NF, denormalize consciously with measured justification

The String Concatenation Trap

James wrote his first search feature using f-strings: f"SELECT * FROM orders WHERE customer_id = {user_input}". Emma stopped him before the code left his machine. "Type this into the search box," she said, and dictated: '; DROP TABLE orders; --

James stared at the resulting SQL: SELECT * FROM orders WHERE customer_id = ''; DROP TABLE orders; --'. His entire orders table would have been deleted by a user typing thirteen characters into a search box.

"This is SQL injection," Emma told him. "The OWASP Top 10 has listed it as a critical risk for over two decades, and it is still one of the most common vulnerabilities in production software. The rule is absolute: never interpolate user-provided values into SQL strings. Always use parameterized queries."

# DANGEROUS (SQL injection vulnerability)
cursor.execute(f"SELECT * FROM orders WHERE customer_id = '{user_input}'")

# SAFE (parameterized query; the database treats user_input as DATA, never as SQL)
cursor.execute("SELECT * FROM orders WHERE customer_id = ?", (user_input,))

Parameterized queries are not optional. They are a non-negotiable safety requirement. This applies to AI-generated code as well. When asking an AI to generate database queries, include in your prompt: "All queries must use parameterized statements. No string interpolation for user input." James added this line to every AI prompt that touched the database.


Try With AI

Use these prompts to build practical understanding of relational data modeling and SQL for agent development.

Prompt 1: The Duplicated Contact List

I want to understand why duplicating information causes problems.

Imagine you manage a club with 50 members. You keep a spreadsheet where
every row is an event attendance record: member name, phone number, email,
event name, date, attended (yes/no).

The same member's name, phone, and email appear on dozens of rows, one
for every event.

Help me explore:
1. A member changes their phone number. How many rows need updating?
What happens if you miss one?
2. You need a list of all members (just names and emails, no duplicates).
How hard is this to extract from the spreadsheet?
3. You want to know: "Which members attended BOTH the March meeting AND
the April meeting?" Walk me through how painful this is with duplicated
data.
4. Now redesign: what if you had TWO lists, a "Members" list (name,
phone, email, member ID) and an "Attendance" list (member ID, event
name, date)? How do the same three problems change?

Connect this to James's orders.json story from this lesson: how is his
duplicated customer name problem the same as the duplicated phone number
problem?

What you're learning: The core problem that relational thinking solves: data duplication creates inconsistency. When the same information exists in multiple places, updates become error-prone and queries become painful. By splitting data into separate lists connected by IDs, you store each fact once and link to it everywhere else. This is exactly what Emma did when she replaced James's orders.json with SQL tables.

Prompt 2: Spot the Relationships

Here are five real-world systems. For each one, identify the "things"
(entities) and how they connect (relationships):

1. A library: books, members, loans
2. A school: students, teachers, classes, grades
3. A hospital: patients, doctors, appointments, prescriptions
4. A restaurant: customers, orders, menu items, tables
5. A music streaming service: users, playlists, songs, artists

For each system:
- List the 3-4 main entities
- For each pair of connected entities, say whether the relationship is
one-to-many (one teacher teaches many classes) or many-to-many (many
students take many classes)
- Identify ONE piece of information that should be stored in only ONE
place (like a student's name) and explain what goes wrong if it is
duplicated instead

Then pick the system closest to a project you might build, and explain:
if you stored all this data in a single flat list (like James's JSON
file), what questions would be hard to answer?

What you're learning: Recognizing relational structure in everyday systems. Every domain has entities with connections, and once you see them, you understand why a flat list (or a JSON file) cannot represent them faithfully. The "stored in one place" exercise builds the instinct that prevented James's "Acme Corp" vs "Acme Corporation" disaster.

Prompt 3: Design a Data Organization Plan

Pick a domain you know well (your school, your workplace, a hobby, a
side project) or one of these examples:
- A small online store (products, customers, orders)
- A sports league (teams, players, matches, scores)
- A study group (members, subjects, sessions, notes)

Help me design how to organize this data:

1. What are the 3-5 main "things" (entities) in this domain?
2. What information does each entity have (name, date, status, etc.)?
3. Which pieces of information should NEVER be duplicated, and why?
4. Draw the connections: which entity links to which, and how?
(e.g., "one customer places many orders")
5. Now imagine someone searches for "all orders from last month by
customer X." Compare how easy this is when data is organized
relationally (separate linked lists) vs stored in one big flat list.

Finally: if you gave this data organization plan to an AI assistant and
asked it to answer questions about the data, would the AI do better with
a flat list or with your organized structure? Why?

What you're learning: How to apply Axiom VI to your own domain. You are practicing the same skill Emma taught James: decomposing a messy data problem into clean, connected entities where each fact lives in one place. When you eventually learn SQL (in the hands-on chapters), you will already understand WHY tables, relationships, and constraints exist.


PRIMM-AI+ Practice: Data is Relational

Predict [AI-FREE]

Close your AI assistant. A teacher keeps student grades in a notebook. Each page has: the student's full name, the subject, and the grade. The student "Amara Johnson" appears on 12 different pages (one for each assignment).

One day, the teacher learns her legal name is actually "Amara K. Johnson."

Predict:

  • How many pages need updating in the notebook system?
  • What happens if the teacher updates 11 pages but forgets page 7?
  • Now imagine a different system: each student has ONE index card with their name, and each grade page uses the student's card number instead of their name. How many places need updating when the name changes?

Write your answers. Rate your confidence from 1 to 5.

Run

Ask your AI assistant: "Compare two ways of tracking student grades: (1) writing the student's full name on every grade page, versus (2) giving each student a unique ID number and writing only the ID on grade pages, with the name stored once on an index card. What happens in each system when a student's name changes? Which prevents data inconsistency?"

Compare. Did you correctly predict the number of updates needed in each system?

Answer Key: What actually happens

In the notebook system, all 12 pages need updating: every page where "Amara Johnson" was written by hand. If the teacher updates 11 but forgets page 7, the system now contains two versions of the same student's name. Any search for "Amara K. Johnson" will miss the assignments on page 7. Any report grouping grades by student will show two separate students. The data is silently inconsistent, and nothing in the notebook warns the teacher.

In the card system, exactly ONE place needs updating: Amara's index card. Every grade page references her card number (say, #42), not her name. When the name on card #42 changes, every grade page automatically reflects the correct name. It is impossible to have an inconsistent state because the name exists in only one location.

The key insight: duplication creates the opportunity for inconsistency. Linking (referencing by ID instead of copying) eliminates that opportunity structurally, not by being careful, but by making the error impossible.

Investigate

Write in your own words why storing information in ONE place and LINKING to it (instead of copying it everywhere) prevents errors. What is the specific failure that duplication causes?

Now connect this back to the lesson's story. The teacher's notebook is James's orders.json: customer names copied into every order record, with no way to keep them synchronized. When "Acme Corp" became "Acme Corporation," James updated one place but forgot the 47 order records that duplicated the old name. The card system is Emma's SQL schema: the customer name stored once in the customers table, referenced by customer_id everywhere else. Change the name once, and every order automatically reflects it.

Apply the Error Taxonomy: duplicate data becoming inconsistent (11 pages say "Amara K. Johnson" but page 7 still says "Amara Johnson") is a data/edge-case error. The system has no mechanism to ensure all copies stay synchronized. James's dashboard showing two separate customers ("Acme Corp" and "Acme Corporation") was the same error at production scale.

Modify

The teacher adds a new subject to the curriculum. In the notebook system, they write the student's full name again on a new page. In the card system, they just add a new grade line with the card number.

Now a student transfers to another school. How many changes are needed in each system to remove all their records? What mistakes could happen in the notebook system that are impossible in the card system?

Make [Mastery Gate]

Think of 3-5 things in your life that are connected to each other (for example: students and classes, friends and events, books and authors, recipes and ingredients, playlists and songs).

Draw or describe the relationships:

  • Which thing connects to which?
  • Is the relationship one-to-many (one author writes many books) or many-to-many (many students take many classes)?
  • Where would you store each piece of information so it exists in only ONE place?

This relationship map is your mastery gate. You should be able to explain why duplicating information (like writing a student's name on every grade page) causes problems that linking (using an ID number) prevents.



James leaned back and stared at his refactored schema. "It's like warehouse inventory," he said. "We used to keep one giant spreadsheet. Customer name on every row. Every time someone changed a phone number, you had to find it in forty places. Then we moved to a proper system where the customer existed once and everything else just pointed to it."

"That is the relational model," Emma said. "Entities, attributes, relationships. Codd figured this out in 1970."

"And SQL is how you talk to it." James pulled up the twelve-line query that had replaced his forty lines of Python loops. "This I can actually verify. I can read it, the AI can read it, and the database tells us if we're wrong."

Emma paused. "I'll be honest, I'm still not entirely sure when to reach for SQLite versus PostgreSQL on a new project. The line is blurrier than people think. But the SQL itself transfers between both, so the choice is less permanent than it feels."

"What about ORMs? Those seemed convenient."

"Convenient until you can't explain what they're doing. Use them for the routine stuff. But if you can't read the SQL underneath, write it yourself." She glanced at his parameterized queries. "And those placeholders are non-negotiable. Thirteen characters in a search box can delete a table."

James nodded slowly. "So I've got structure, types, composition, and now relational data. But how do I know all of it actually works together?"

"That," Emma said, "is exactly what the next axiom answers. You need a way to define what 'correct' means before you build anything. Tests as specification, not afterthought."