From Database to System of Record
It is Friday at 4:55pm. The Expense Reconciler Worker has been running nightly for a week. Tonight you are about to leave when Slack lights up.
"Our auditor is asking what happened on Tuesday at 11pm. Can you pull it?"
You open psql. The Reconciler crashed half-way through Tuesday's run. Another night, it ran twice by accident and double-charged a category. A third night it sent two confirmation emails to the same user. A fourth night it quietly read another customer's data while computing Bob's totals.
The database technically still works. Every row is there. Every constraint passed. The agent will happily write you a query against the current state.
The problem is that the auditor did not ask for the current state. The auditor asked what happened. The database does not know.
That gap is the difference between a database and a system of record.
The CRUD Wall
CRUD (Create, Read, Update, Delete) is the first useful shape of database work. It is correct for a single user typing into an app. It is not correct enough for a Worker.
The moment a Worker is the writer, six failures appear that CRUD cannot see:
| Failure | What Happens in CRUD |
|---|---|
| Retry | The Worker is invoked twice with the same input. The same row is inserted twice. |
| Crash | The Worker dies mid-run. Nobody knows what was finished and what was not. |
| Audit | The auditor asks "what happened on Tuesday?" The database has only the current rows, no history. |
| Race | Two Workers pick up the same pending row at the same time. Both process it. |
| Side effect | The database commit succeeded. The email step failed. Nobody notices. |
| Second tenant | A new customer is added. One bad join silently mixes their rows into the first customer's report. |
CRUD stores current state. A system of record preserves operational truth: the work that happened, the order it happened in, the side effects that were intended, and the tenant boundary every read was scoped to.
The good news is that the building you already have is the right building. Postgres is plenty. The next nine lessons add discipline on top of it.
Engine vs System of Record
Two words to keep separate, because the rest of the chapter depends on them.
- Engine is what the Worker runs ON: the model, the runtime, the agent loop, the prompt, the tools.
- System of Record (SoR) is what the Worker runs AGAINST: the durable, governed, replayable, inspectable place where the work and its side-effect intent live.
A model can be swapped. A runtime can be upgraded. The system of record stays. If you change your engine every six months, the SoR is the thing that holds your business steady through the change.
Invariant 5, in plain English: every Worker runs against a system of record, not its own context window. The context window is transient. The SoR is the truth.
The Six SoR Properties
A system of record is not just any durable store. It has six properties:
| Property | One-Sentence Meaning |
|---|---|
| Durable | Writes survive process death, restarts, and retries. |
| Addressable | Every fact has a stable id you can name, link to, and refer to later. |
| Governed | Who can read or write what is enforced by the database, not by application code. |
| Replayable | You can reconstruct the work that happened, in order. |
| Inspectable | A human or auditor can answer "what happened?" by reading rows. |
| Policy-bound | Business rules (idempotency, tenant scope, append-only) are encoded so the database refuses to break them. |
CRUD on Postgres gives you durable and addressable for free. The remaining four are what this chapter builds.
What This Chapter Adds on Top of Postgres
Each property maps to one of the upcoming lessons. This is the map of where you are going.
| Property | Lesson That Builds It |
|---|---|
| Durable | L1 Migrations: schema history that survives every change |
| Addressable | L2 The Worker Execution Journal: every run, turn, and tool call has an id |
| Replayable | L3 Idempotency and L6 Append-Only Events: retried work is safe; history is recoverable |
| Policy-bound | L4 Concurrency, L5 Outbox, L8 MCP Boundary: races, side effects, and tool access enforced by the database |
| Governed | L7 RLS and Roles: tenant scope enforced by the database |
| Inspectable | L9 Observability and Forensics: a forensic report read out of rows |
By Lesson 10 you will walk an auditor through your own Reconciler database against this six-property checklist and approve or reject it.
A Quick Diagnostic
No full PRIMM cycle this lesson. Just three short questions you answer in your head before you move on. Pick a Worker you can imagine, real or from a small app you already understand.
- Retry: if your Worker is invoked twice with the same input by accident, will the database have one row or two? How would you know?
- Crash: if your Worker dies in the middle of a run, can you reconstruct what it had finished and what it had not? Where is that information stored?
- Audit: if someone asks "what happened on Tuesday at 11pm?" tomorrow morning, can you answer using rows that already exist, or only by re-running the Worker?
If any answer is "I would have to guess," that is the gap this chapter closes.
Try With AI
Use these prompts in Cowork or your preferred AI assistant. The exact runtime does not matter for this lesson.
Prompt 1: Translate Invariant 5
Explain Invariant 5 ("every Worker runs against a system of record,
not its own context window") in two short sentences. Use one example
from a Worker that processes invoices. Do not use the words "context
window" or "system of record" in your explanation. Then I will judge
whether you got it right.
What you're learning: Forcing the agent to translate the rule out of its own jargon is the fastest way to confirm you understand the rule yourself. If the explanation does not match what you wrote in the diagnostic, one of you is wrong. The agent should be the one who has to re-explain.
Prompt 2: Match Failures to Properties
Here are six failures and six SoR properties. Match each failure to the
property that would prevent it. Do not explain. Just give a six-line
table.
Failures: retry, crash, audit, race, side effect, second tenant
Properties: durable, addressable, governed, replayable, inspectable,
policy-bound
What you're learning: This is the recognition drill that pays off across the rest of the chapter. By the time you finish, you should be able to read a Worker bug and name the SoR property the database is missing.
Prompt 3: Audit the Reconciler
The Expense Reconciler Worker reads pending bank rows nightly,
categorises each, writes an expense row, and queues a confirmation
email. List, in priority order, the three failure modes that would
hurt the business most if I shipped the Worker today against a plain
CRUD Postgres database. For each one, name the SoR property the
database is missing.
What you're learning: Prioritisation. Not every failure mode is equally urgent for your domain. By forcing the agent to rank, you start to develop a sense for which lessons in this chapter are first-class and which are second-class for your particular Worker.
Checkpoint
- I can explain in one sentence why CRUD storage is not enough once a Worker is the writer.
- I can name the six SoR properties from memory.
- I can pair each of the six failure modes (retry, crash, audit, race, side effect, second tenant) with the property that prevents it.
- I can name one Worker action in my own work or in the Reconciler story that needs SoR discipline.
- I am ready to read a real migration file in the next lesson.