Skip to main content

Chapter 21B: Postgres as System of Record - The Invariant 5 Implementation

A database holds today's rows. A system of record holds today's rows, last week's mistakes, and a credible answer to "what happened on Tuesday at 11pm?"

You already know how to put rows in Postgres. That is useful. It is not enough for an AI-native company.

Then you hire your first Worker. The Expense Reconciler Worker wakes up nightly, reads pending bank-statement rows, categorises each one, writes a clean expenses row for the right user, and dispatches a confirmation email. It runs across many users at once. It can crash mid-run. The retry policy will sometimes call it twice on the same input. Side effects (the email) cannot be un-sent.

Naive Postgres was enough when humans were the only writers. With a Worker on the table, naive Postgres becomes a liability. This chapter teaches the smallest implementation of "Postgres as system of record" that still satisfies Invariant 5.

Why This Chapter Exists

Invariant 5, in plain English: every Worker runs against a system of record, not its own context window. The context window is transient. The model can hallucinate. The retry can replay. The system of record is the durable, governed truth that survives all of that.

Two words to keep separate:

  • Engine is what the Worker runs ON: the model, the runtime, the agent loop.
  • System of Record (SoR) is what the Worker runs AGAINST: the durable place where work, state, and side-effect intent live.

A system of record has six properties:

  1. Durable: writes survive process death and restarts.
  2. Addressable: every fact has a stable id you can name.
  3. Governed: who can read or write what is enforced by the database, not by the application.
  4. Replayable: you can reconstruct the work that happened, in order.
  5. Inspectable: a human or auditor can answer "what happened?" by reading rows.
  6. Policy-bound: business rules (idempotency, tenant boundaries, append-only history) are encoded so the database refuses to break them.

CRUD on Postgres gives you (1) and (2) for free. The other four are what this chapter builds.

How This Chapter Fits

This chapter is deliberately self-contained. It assumes you can read basic SQL and that you have access to Postgres. If this chapter moves elsewhere in the book, the story still holds: one Worker writes financial rows, queues a side effect, and must leave behind evidence an auditor can read.

The distinction is simple:

Basic database workSystem-of-record work
Store rowsPreserve operational truth across retries, crashes, and audits
Verify one query outputVerify runs, turns, tool calls, outbox messages, and events
One writer at a timeMany Workers writing concurrently to the same shared database
Application code checks policyPostgres enforces policy through constraints, roles, and RLS

What CRUD Misses

When a Worker, not a human, is the writer, six things break that CRUD does not even notice:

  1. Retries: the Worker is invoked twice with the same input and inserts the row twice.
  2. Crashes: the Worker dies mid-run and nobody knows what was finished and what was not.
  3. Audits: finance asks "what happened on Tuesday at 11pm?" and the database has no answer beyond current state.
  4. Races: two Workers pick up the same pending row at the same time and both process it.
  5. Side effects: the database commit succeeded, then the email step failed, and now the books say "sent" when the user got nothing.
  6. Second tenant: a second customer is added; one bad join silently mixes their rows into the first customer's report.

CRUD storage handles none of these. SoR discipline handles all of them.

Running Story: The Expense Reconciler Worker

Across this chapter you build out one Worker. The Expense Reconciler runs nightly. It reads pending bank-statement rows from a pending_transactions table, categorises each row, writes a clean expenses row for the right user (Alice, Bob, or Carol), and queues a confirmation email. Each lesson fixes one failure mode of this Worker.

By the end of the chapter, a simple Postgres database has grown into a multi-tenant, multi-Worker system of record. The shape of the data did not get cleverer. The shape of the discipline did.

Chapter Contract

By chapter end, you should be able to answer these five questions:

  1. What is the difference between a database and a system of record, and why does Invariant 5 force the second one?
  2. What does a safe migration look like, and what does the agent need to write before you let a destructive change run?
  3. What three tables let you reconstruct what a Worker did, even after a crash?
  4. Why is "the database commit succeeded" not the same as "the email was sent," and how does the outbox pattern fix the gap?
  5. When a multi-tenant report mixes Alice's and Bob's rows, where in the database should the policy live, and why is the application layer the wrong answer?

Seven Principles (Compact)

PrincipleChapter 21B Application
P1 Bash is the Keypsql for migration drills, run-state inspection, forensic queries
P2 Code as Universal InterfaceThe migration file IS the schema contract; the journal IS the run protocol
P3 Verification as Core StepRead the migration, replay the journal, inspect the outbox, check RLS before approval
P4 Small Reversible DecompositionOne lesson fixes one failure mode; each migration is reversible on a Neon branch
P5 Persisting State in FilesRun state lives in tables, not in the agent's context window
P6 Constraints and SafetyDrop guards, idempotency keys, RLS policies enforce the rules at the database
P7 ObservabilityRuns, turns, tool_calls, events, and outbox rows give you forensics, not just logs

How This Chapter Teaches

Lesson 0 frames the Invariant 5 problem with a short diagnostic, not a full PRIMM cycle. Lessons 1 through 9 use the full visible PRIMM-AI+ cycle adapted for SoR work: predict what durable state should exist, run the demo against Postgres, investigate by reading rows first then asking the agent, modify one rule and predict the effect, then make a business-English brief that the agent implements and you approve. Lesson 10 is a capstone where you prove the database is a real system of record.

All examples appear in two tabs where the runtimes differ: Claude Code and OpenCode (with DeepSeek V4 Flash as the cost-friendly default).

Lesson Flow

LessonOutcomeFast Visible Win
L0 From Database to System of RecordName what CRUD storage misses once a Worker is the writerMatch six failure scenarios to the six SoR properties
L1 Migrations: Schema History, Not Schema MemoryRead and approve a versioned migration, refuse a destructive oneApply a safe column-add migration on a Neon branch and roll it back
L2 The Worker Execution JournalRecord runs, turns, and tool_calls so a crashed Worker can be reconstructedTrace a single Reconciler run end to end by reading three tables
L3 Idempotency: Retried Work Writes OnceUse natural keys and ON CONFLICT DO NOTHING so retries cannot duplicate rowsRun the same Reconciler input twice, see one row, not two
L4 Concurrency: Two Workers, One ClaimUse SELECT ... FOR UPDATE SKIP LOCKED so two Workers cannot grab the same pending rowSpin two Workers and watch each claim a different row
L5 The Outbox Pattern: Atomic Side-Effect IntentCommit the database write and the email intent in one transactionInspect the outbox row before the email actually sends
L6 Append-Only Events: Audit Trail That Survives the SessionRecord every state transition as an immutable event rowReconstruct yesterday's Reconciler run from events alone
L7 RLS and Roles: Policy in the DatabaseEnforce tenant boundaries at the database, not in application codeConnect as Alice's role, run a join, see only Alice's rows
L8 MCP Boundary: Workers Call Tools, Not the DatabasePush raw SQL access behind an MCP tool surface with allow-listed operationsHave a Worker insert via the MCP tool and watch raw SQL be refused
L9 Observability and Forensics: Answer "What Happened?"Produce a forensic report from runs, events, and outbox without re-running anythingHand the boss a paragraph that names the run id, the failure point, the fix
L10 Capstone: Prove This Postgres Database Is a System of RecordWalk an auditor through the six SoR properties using your own ReconcilerApprove or reject your own database against the six-property checklist

Prerequisites

  • A Postgres database you can safely practice against
  • Basic SQL reading comfort: SELECT, JOIN, INSERT, UPDATE, DELETE, transactions
  • A free Neon account with a project where you can spin a branch
  • Terminal access with psql, or your Neon SQL Editor open in the browser
  • Your AI coding agent (Claude Code, or OpenCode with DeepSeek V4 Flash)
  • No prior backend engineering required: the agent writes the migrations, the journal code, and the policies; you read and approve

No-Regression Rules

No simplification is allowed to remove:

  • Migrations reviewed by a human before they touch production data.
  • Idempotent writes for any Worker action that can be retried.
  • No DROP without an override: destructive migrations need a Neon branch, a backup, and explicit approval.
  • Transactional outbox for any side effect that must match a database commit.
  • Append-only events for any state transition that an auditor might ask about.
  • RLS for multi-tenant data: tenant scoping lives in the database, not in the application.

If a rewrite makes content shorter but drops any of these, it is a regression.

What This Chapter Is Not

This chapter is not a full DBA course. It is not full event sourcing. It is not a complete enterprise role matrix. It does not try to cover every performance, backup, or disaster-recovery practice. 21B teaches the smallest set of Postgres habits that turn a database into a system of record a fleet of Workers can safely point at.

After Chapter 21B

When you finish this chapter, three things change:

  1. You can read a migration file and decide go or no-go, with a real reason for either answer.
  2. You know the minimum SoR shape (journal, outbox, events, RLS) and can name what is missing in a database that does not have it.
  3. You can produce a forensic report from a failed Worker run by reading rows, not by guessing or re-running anything.

Start with Lesson 0: From Database to System of Record.