RLS and Roles: Policy in the Database
The Reconciler ran clean for two weeks. Tuesday night a bug shipped.
Somewhere in the agent's code, a variable holding user_id got reused. While processing Alice's bank statement, the Worker read Bob's user_id from the previous iteration. It categorised Alice's transactions correctly. Then it wrote the expense rows. Under Bob's name.
By morning Bob had forty new expenses he did not recognise. Alice's report was blank. The boss asked the obvious question: "the database knew the difference between Alice and Bob, right? Why did it let this happen?"
The honest answer was uncomfortable. "The database did not know who the Worker thought it was. The application told it. We trusted the application." That is the failure RLS exists to prevent.
The application layer is a layer. It is not a wall. A buggy Worker, a swapped variable, a leaked session: any of these can send the wrong user_id down the wire. The database accepts the value. The application was supposed to be the only line of defence. There is no second line.
This lesson teaches the smallest fix: put the wall inside the database itself.
- Row-Level Security (RLS): A Postgres feature that filters which rows a query can see and which rows it can change, based on a policy attached to the table.
- USING: The clause of an RLS policy that filters rows on read. A row that fails the USING check does not appear in the result set.
- WITH CHECK: The clause of an RLS policy that rejects writes. A row that fails the WITH CHECK is refused at INSERT or UPDATE.
- Principal: The identity the database thinks is making the request. For a Worker, this is the role plus the tenant context.
- Tenant context: A session-local variable (usually
app.current_user_id) that says "this transaction is acting on behalf of this user". - Role: A named database identity (
worker_read,worker_write,worker_admin) with a defined set of permissions. - Boundary Error: A silent authorization failure. The query does not crash; it returns rows the principal should not see, or zero rows the principal should see.
Where Policy Belongs
The Reconciler bug above is not a database bug. It is a layering bug. The application was the only thing checking that Alice's transactions wrote to Alice's user_id. The database accepted whatever value came down the wire.
Two layers should know the rule. The application enforces UX (Alice cannot click the button to edit Bob's data). The database enforces the wall (even if the application asks, the database refuses).
| Layer | What it enforces | What it catches |
|---|---|---|
| Application | The UX path. A user only sees their own data in the dashboard. | Honest users using the product the right way. |
| Database (RLS) | The contract. Even a buggy Worker or a leaked session cannot read or write outside its scope. | Bugs, leaks, attacks, the wrong variable, the swapped session. |
Without the database layer, a single bug above the database becomes a multi-tenant data leak. With it, the same bug becomes an empty result set or a rejected write. Visible. Recoverable. Not a press release.
Roles, the Minimal Set
Workers should never connect as the database owner. They should connect as a role with the smallest permission set that lets them do their job.
| Role | Purpose | What it can do |
|---|---|---|
worker_read | Workers that only read SoR data | SELECT on permitted tables, filtered by RLS USING |
worker_write | Workers that perform writes | INSERT and UPDATE on permitted tables, limited by RLS WITH CHECK |
worker_admin | Migrations, role grants, ops fixes | DDL, GRANT, can bypass RLS when explicitly needed |
The agent writes the grants. You read them.
CREATE ROLE worker_write;
GRANT SELECT, INSERT, UPDATE ON expenses TO worker_write;
GRANT SELECT, INSERT ON outbox_messages TO worker_write;
GRANT SELECT, INSERT ON agent_events TO worker_write;
-- callback to Lesson 6: REVOKE UPDATE, DELETE ON agent_events FROM worker_write
Two things to read. The role gets only the verbs it needs (no DROP, no TRUNCATE). The grants are explicit per table; nothing is granted at the database or schema level by mistake.
A Worker that connects as worker_admin to "make things easier" is one bug away from a DROP TABLE expenses. Connect as the smallest role that works.
USING vs WITH CHECK
A policy has two clauses. Each one does a different job.
ALTER TABLE expenses ENABLE ROW LEVEL SECURITY;
CREATE POLICY expenses_tenant_isolation ON expenses
FOR ALL
TO worker_write
USING (user_id = current_setting('app.current_user_id', true)::uuid)
WITH CHECK (user_id = current_setting('app.current_user_id', true)::uuid);
Read it left to right.
| Clause | What it does | What it catches |
|---|---|---|
USING | Filters rows on read. A row whose user_id does not match the session's tenant context disappears from the result set. | A Worker trying to read another user's data. |
WITH CHECK | Validates rows on write. An INSERT or UPDATE that would land a row whose user_id does not match the tenant context is rejected. | A Worker trying to write a row for another user. |
The two clauses usually carry the same condition. The Tuesday night bug above would have been caught by WITH CHECK: the Worker tried to INSERT an expense with Bob's user_id while the tenant context said Alice. The database would have refused the write.
If the policy only has USING, writes pass through. If the policy only has WITH CHECK, reads are unrestricted. Both are partial walls. You want both clauses.
The Silent Zero-Row Failure Mode
This is the most important warning in the lesson.
When tenant context is missing or wrong, RLS does not throw. The query does not error. The application gets back a result set, just an empty one. The INSERT does not crash; it silently affects zero rows.
-- Session forgot to call SET LOCAL app.current_user_id.
SELECT * FROM expenses;
-- Returns zero rows. No error.
This is a real failure mode, not a theoretical one. The shape:
- A Worker starts a new transaction.
- The Worker forgets to set
app.current_user_id(a bug, a refactor, a missing config). - The Worker runs a SELECT.
- RLS evaluates
USING (user_id = current_setting('app.current_user_id', true)::uuid). The setting is missing, so the comparison is not true for any row. - The result set is empty.
- The Worker logs "no work to do" and exits cleanly.
The Worker did not error. The Worker did not do anything either. Alice's expenses sat unread. The next morning, finance notices the dashboard is empty.
The verification habit that catches this is simple: do not just check that reads ran without error. Check that they returned the expected rows. A zero-row result for a tenant you know has data is the visible signal of a missing or wrong tenant context.
Setting Tenant Context per Worker Session
The Worker tells the database who it is acting for at the start of every transaction.
-- At the start of the Worker's per-user transaction:
SELECT set_config('app.current_user_id', $alice_uuid, true);
-- The third argument `true` means SET LOCAL: scope is the transaction only.
-- Then the Worker runs its work:
INSERT INTO expenses (worker_id, user_id, category, amount, spent_at, idempotency_key)
VALUES ($worker_id, $alice_uuid, 'Food', 12.50, '2026-05-09', $key)
ON CONFLICT (worker_id, idempotency_key) DO NOTHING;
Three things to read.
- The
set_configcall comes first, before any SELECT or INSERT. If it is missing, every subsequent statement runs without a tenant context. - The third argument is
true. This makes the setting transaction-scoped. When the transaction ends, the context goes with it. A subsequent transaction in the same session starts with a clean slate. - The
user_idin the INSERT must match the context. OtherwiseWITH CHECKrefuses the write.
The pattern is mechanical. Every Worker transaction starts with set_config. Every Worker review checks that the call is present and that the context matches the user the Worker is acting for.
PRIMM-AI+ Practice: The Identity Flip Drill
Predict [AI-FREE]
Three rows exist in expenses. Alice owns two, Bob owns one. RLS is enabled on the table with the policy above. You will run SELECT * FROM expenses in three separate transactions: one with Alice's context set, one with Bob's, one with no context set at all.
Before running anything, write down:
- How many rows does Alice's session see?
- How many rows does Bob's session see?
- How many rows does the no-context session see?
- Does any of the three sessions error?
- Your confidence score from 1 to 5.
You should be able to answer all four before reading further.
Run
Ask Claude Code to run three separate transactions against your Neon branch.
Transaction A:
BEGIN;
SELECT set_config('app.current_user_id', '<alice-uuid>', true);
SELECT id, user_id, category, amount FROM expenses ORDER BY id;
COMMIT;
Transaction B: same but with <bob-uuid>.
Transaction C: same but skip the set_config line entirely.
What you should see:
| Session | Tenant context | Rows returned | Errored? |
|---|---|---|---|
| Alice | Alice's UUID | 2 | No |
| Bob | Bob's UUID | 1 | No |
| None | unset | 0 | No |
The third row is the lesson. Zero rows. No error. A bug that looks like "nothing to do".
Investigate
Write your own one-paragraph explanation:
- Alice's session saw only Alice's two rows because the USING clause filtered out Bob's row.
- Bob's session saw only Bob's one row for the same reason.
- The no-context session saw zero rows because
current_setting('app.current_user_id', true)returned NULL, so the equality check was not true for any row. - None of the three sessions errored. RLS filtering is silent by design.
Then ask the agent:
- "If a Worker forgets to call
set_configand then runs a SELECT, how would the on-call engineer notice? Walk through the logs." - "An attacker controls the value passed to
set_config. What can they do? What can they NOT do? Be specific." - "I see only USING in this policy, no WITH CHECK. What writes would still be possible that USING does not prevent?"
The third question is where the WITH CHECK clause earns its keep. A USING-only policy filters reads but allows any write. A Worker with a swapped user_id would happily INSERT a row for the wrong tenant. The row would then disappear from that Worker's own reads (because USING filters it back out), but it would still be there.
This is the /boundary-probe habit in plain language: confirm the database role, confirm the tenant context, confirm the policy exists, read the USING clause, then read the WITH CHECK clause. Do that before editing the business query. Many "no rows" bugs are boundary errors, not query bugs.
Modify
Change one rule. Add a separate admin read policy:
CREATE POLICY expenses_admin_read ON expenses
FOR SELECT
TO worker_admin
USING (true);
Predict the difference. Then run a fourth transaction as the database role worker_admin with no app.current_user_id.
What you should see: the admin session reads all three rows from expenses. The bypass is tied to the database role, not to a session variable the Worker can casually set. That is the point. An admin path may exist, but it must be role-scoped, narrow, and auditable through append-only events. If the agent proposes OR current_setting('app.role') = 'admin' inside a normal Worker policy, reject it: that turns a tenant wall into a string flag.
Make [Mastery Gate]
The brief in business English:
"The Reconciler must only write expenses for the user it is currently scoped to. If it tries to INSERT a row whose
user_iddoes not match the tenant context set at the start of the transaction, the database must reject the write. The Reconciler should also only read its own user's expenses; another user's rows must not appear in any SELECT."
Hand this brief to the agent. Ask for:
- The
CREATE POLICYstatement. - The
set_configcall the Worker must run as the first statement in each per-user transaction. - A short test plan: three transactions (Alice, Bob, unset) plus one write attempt with a mismatched
user_id.
You read each piece. The gate passes when you can:
- Point at the USING clause and explain which read it filters.
- Point at the WITH CHECK clause and explain which write it rejects.
- Name what happens in the unset-context case and why the team must verify reads return the expected rows, not just that they did not error.
If the agent ships USING without WITH CHECK, you reject. If the agent forgets the set_config call, you reject. If the agent says "the application will handle it", you reject.
Try With AI
Prompt 1: The Identity-Flip Drill
I have an `expenses` table with RLS enabled and this policy:
CREATE POLICY expenses_tenant ON expenses
FOR ALL
TO worker_write
USING (user_id = current_setting('app.current_user_id', true)::uuid)
WITH CHECK (user_id = current_setting('app.current_user_id', true)::uuid);
Walk me through what each of these three transactions sees and does:
A) SET LOCAL app.current_user_id = '<alice-uuid>';
SELECT * FROM expenses;
B) SET LOCAL app.current_user_id = '<bob-uuid>';
INSERT INTO expenses (user_id, ...) VALUES ('<alice-uuid>', ...);
C) -- no SET LOCAL
SELECT * FROM expenses;
For each one, name the rows returned, the rows written, and whether
any statement errors.
What you're learning: Recognising the three classic RLS outcomes. Filtered read. Refused write. Silent empty read. The third one is the trap; the drill builds the reflex to catch it.
Prompt 2: USING vs WITH CHECK Recognition
I will paste three RLS policies. For each one, tell me whether it
fully isolates tenants, partially isolates them, or does almost nothing.
Name the specific failure each gap allows.
A) USING (user_id = current_setting('app.current_user_id', true)::uuid)
-- no WITH CHECK
B) WITH CHECK (user_id = current_setting('app.current_user_id', true)::uuid)
-- no USING
C) USING (user_id = current_setting('app.current_user_id', true)::uuid)
WITH CHECK (user_id = current_setting('app.current_user_id', true)::uuid)
What you're learning: USING and WITH CHECK do different jobs. A USING-only policy lets bad writes through. A WITH CHECK-only policy lets bad reads through. Both clauses are usually needed and they usually carry the same condition.
Prompt 3: Silent Zero-Row Trap
A Worker calls SELECT id, amount FROM expenses and gets back zero rows.
List five different reasons this could happen, ordered from least
alarming to most alarming. For each one, describe how I would tell
the difference by reading another table or running another query.
What you're learning: Zero rows is a signal, not an answer. It can mean "the table is empty for this tenant" (fine), "the tenant context is missing" (a Worker bug), "the role lacks SELECT" (a config bug), or "RLS is filtering for a wrong reason" (a policy bug). Differentiating between these is the verification habit that turns RLS from "looks like it works" to "I can prove it works".
Checkpoint
- I can explain why the application layer alone is not a wall for multi-tenant data.
- I can read a CREATE POLICY statement and name what USING filters and what WITH CHECK rejects.
- I can recognise the silent zero-row failure mode and name the verification habit that catches it.
- I can read the minimum role grants for
worker_writeand confirm the role only has the verbs it needs. - I have run the three-session identity flip drill and watched Alice, Bob, and the unset session each return the expected rows.
- I rejected at least one agent-proposed policy during practice for missing WITH CHECK or for adding an unaudited admin bypass.