Give Your AI Agent a Nervous System
15 concepts, ~80% of real use: senses (triggers), reflexes (durable execution), and balance (flow control).
You have built an agent that works. It also only works while you are watching it. You open Claude Code or OpenCode, you type, it replies. And the moment you step away, it stops. That gap, between an agent you operate and a worker that operates on its own, is the whole subject of this course.
The surprise is what closes the gap, and it is not a smarter agent. Your agent already has what it needs to do the work: an LLM to think, tools and MCP servers to act, skills for the workflows it knows. What it does not have is a nervous system. Think of your own body: your brain thinks and your muscles act, but a second system runs underneath without you, your heartbeat and your reflexes, the signals that keep you alive while you sleep. Stop paying attention and your heart keeps beating; an agent has no version of that, so the moment you stop driving it, it stops. A nervous system is the connective tissue that closes the loop on its own, with no human driving each turn: it senses the world and wakes the agent when something happens, it reacts by reflex when a step fails (and holds its place for hours while it waits on a person or a slow API), and it keeps the agent in balance when five hundred requests land at once. That is the line between an agent you operate and an FTE that operates on its own. You give your agent this nervous system; you do not rewrite the agent. That is the one idea this course is built around.
The tool that gives your agent a nervous system has a technical name, a durable execution engine, and we use one called Inngest. The patterns carry over to Temporal, Restate, and Dapr Agents. This is not just a teaching picture. Day AI, a CRM built for AI-native companies, calls Inngest "the nervous system" of their product, and runs on every part this course teaches. Inngest's free Hobby tier is the easiest place to start: no credit card, a one-command dev server, and a dashboard you can watch while you build.
The example is deliberately thin: a customer-support agent that looks up a few sample customers, drafts a reply, and issues a refund only after a human approves. It is thin on purpose: the agent is not where the difficulty lives, so we keep it small and spend the effort on the nervous system around it. You build it here from scratch. It shares ideas with the earlier Digital FTE course but assumes none of it. Set the environment up once in the Quick Win below, and Part 4 builds the worker in seven paste-and-watch prompts. It is Python-first on inngest-py: you direct your coding agent in plain English and it writes the code. If you learn by doing, skim Parts 1-3 and jump to Part 4.
A single agent crash mid-task is annoying. A workforce of fifty agents handling customer-facing work without a nervous system underneath is impossible: you adopt a platform that gives it to you, or spend six months building a worse version yourself. Four properties make this nervous system uniquely important for agents: Day AI, the CRM for AI-native companies, runs its product on every primitive this course teaches: durable LLM workflows, wait-for-event coordination, replay on failure, debounce plus throttle plus concurrency, and multi-tenant fairness. Two of their founding engineers reached for the same nervous-system picture on their own. It is production language, not curriculum branding. The Agent Factory thesis describes Seven Invariants any production agent system must satisfy. The worker you build here satisfies Invariant 4 (an engine) and Invariant 5 (a system of record, here a small audit trail). This course adds two more, plus a piece of Invariant 1:Why an AI agent needs a nervous system (four properties)
step.wait_for_event (Concept 15), you build an approval queue yourself: database table, polling, timeout handling, audit trail. That is a project, not a feature.Where this course sits in the Agent Factory thesis
step.wait_for_event is the cleanest expression on any platform: the agent suspends, a human emits the awaited event, the agent resumes.
The 15 concepts, at a glance. They map onto the three jobs a nervous system does: the senses (triggers wake the worker), the reflexes (durable execution keeps it correct when something breaks), and balance (flow control keeps it healthy under load). This is the first-pass version, concept plus a one-line gist. When something breaks during a build, the Quick reference at the end has a symptom-to-concept diagnostic that points you back to the concept the failure belongs to.The 15 concepts in one line each (expand for the full map)
# Concept One-line gist Senses (Triggers) how the world reaches the worker 1 Events vs requests A request is sync and someone waits; an event is async and the world has moved on. 2 Cron triggers A schedule wakes the function. One line: TriggerCron(cron="0 9 * * *").3 Webhook triggers An inbound HTTP payload becomes a named event; your function reacts to the name. 4 Idempotency and event semantics Event IDs and step names make a duplicate event (or retry) a no-op. 5 Fan-out and sub-agent delegation One event, N subscribing functions; or one parent firing N child events. Reflexes (Durable execution) keeping the worker correct when something breaks 6 step.run and the durable function modelEach step.run is a checkpoint; the function can crash between steps and resume.7 Memoization, the mechanic underneath Completed steps return stored output instead of re-executing. 8 step.sleep and step.wait_for_eventBoth suspend the function durably, for a duration or for an event. 9 Retries, error handling, dead-letter Automatic backoff retries; after N tries the failed run persists for replay. 10 step.run for AI calls in PythonWrap OpenAI calls in step.run; step.ai.infer offloads inference (step.ai.wrap is TypeScript-only).Balance (Flow control) keeping the worker healthy under load 11 Concurrency and throttling concurrency caps active runs; throttle caps starts-per-second.12 Priority and fairness Priority orders the queue; per-key concurrency gives each tenant a fair share. 13 Batching Accumulate events into one batched function call for cheap bulk work. 14 Replay and bulk cancellation Replay failed runs with new code; bulk-cancel runs you no longer want. 15 HITL gates with step.wait_for_eventThe function suspends until a human approves, then resumes with the decision.
Prerequisites. Four things, and the course stands alone otherwise (Part 4 builds its own worker from scratch).
- You can drive a coding agent. Claude Code or OpenCode, installed and authenticated. Plan mode, rules files, the read-first-then-write workflow: if that rhythm is familiar, you are calibrated. The Agentic Coding Crash Course covers it if not.
- You have an
OPENAI_API_KEY(or another model key your coding agent can use) and a Neon account for the worker's Postgres system of record. The worker runs a real model and reads and writes its customers and audit trail in Neon. Neon is free (no card), and you authorize it with one browser click during setup; sign up at neon.com in about a minute if you don't have an account. The Inngest dev server itself needs no account.- You have Node.js 20+ available, even though the worker is Python. The Inngest dev server is distributed as a Node CLI (
npx inngest-cli@latest dev).- You have a working mental model of "event-driven" vs "request/response." If "the world fires an event and zero, one, or many functions react to it" reads as familiar, you are calibrated. If not, Concept 1 gives you the shape.
Done From Agent to Digital FTE? You have a richer worker to wrap; a callout at the end of Part 4 points the nervous system at it. A bonus, not a gate.
How to read this page on first pass, plus a glossary of the terms you'll meet
First pass. Expand anything labeled "Done when" or "What to watch": runnable behavior to check predictions against. In Part 4 you can skim the load-bearing snippets on a first read; the prose around each one tells you what the layer does, and your agent writes the code when you build. The "Try with AI" blocks are optional extension prompts. The goal of pass one is the nervous-system model, its three layers, in your head; pass two, hands on the keyboard, is where you build. Each concept closes with a Predict (commit to an answer before you read on) or a Quick check (test the rule you just read); both exist to make you pause, not to grade you.
Glossary (each term is also explained in context where it first appears):
- Production Worker: An AI agent with a nervous system around it: senses that wake it (triggers), reflexes that survive failures (durable execution), and balance that scales it under load (flow control).
- Event: A named, immutable message describing that something happened. Example:
{"name": "customer/email.received", "data": {"customer_id": "..."}}. The trigger surface. - Inngest function: A Python function decorated with
@inngest_client.create_function, declaring triggers and steps. The unit of durable work. - Step: A unit of work inside an Inngest function wrapped in
ctx.step.run(),ctx.step.sleep(),ctx.step.wait_for_event(), orctx.step.ai.infer(). Each step is independently retried and memoized. - Memoization: When a function crashes and restarts, Inngest re-runs the function code from the top but returns stored outputs for any
step.runwhose result is already cached. The function catches up to where it broke without redoing work. - Flow control: Per-function policies:
concurrency(max active runs),throttle(max starts per second),priority(queue order),batch_events(accumulate before invoking). - HITL (Human In The Loop): A function pauses to wait for human approval or input before continuing.
step.wait_for_eventis the primitive. - Replay: Re-running failed runs as fresh runs from the top, on the current code after a bug fix (distinct from the automatic retry inside a run, which resumes from memo). The dashboard Rerun button.
- Dev server: Inngest's local dev environment via
npx inngest-cli@latest dev. Dashboard athttp://127.0.0.1:8288; MCP endpoint at/mcp.
Current as of May 2026. The whole Part 4 build was run end-to-end against a live Inngest dev server and a real model on inngest 0.5.18, openai-agents 0.17.3, fastapi 0.136.3, Python 3.12, and the Inngest CLI. Every snippet in Part 4 is from that working build, not written from memory. The architecture this course teaches does not change when the SDK does; the SDK is this year's interface to it. If a live docs page and this page ever disagree on a syntax detail, the docs win: pin your versions, and check the Inngest Python quick start and the OpenAI Agents SDK docs when you build.
Sections that diverge between Claude Code and OpenCode have a switcher; pick one and the page syncs across visits.
The fifteen-minute quick win: set up the base, and see the reflex
Before you read the 15 concepts that explain why this architecture works, set up the environment the whole course runs in and watch one durable function survive a crash. This is the setup you do once; Part 4 builds the customer-support worker on the exact same base. By the end you will have:
- the base open in your coding agent, the Skills installed, and three MCP servers wired (Neon, Context7, and the dev-server
inngest-dev), - a fresh Neon database with two tables,
customersandaudit_log, that you created over MCP and saw in the console, with itsDATABASE_URLwritten into.envfor the worker to use later, - one tiny durable function (a
step.run, astep.sleep, a FastAPI host) running against the Inngest dev server, - a run you triggered and watched suspend at the sleep with zero compute consumed,
- and a run you broke on purpose, then watched Inngest retry, returning the already-completed step from memo while only the broken step re-executed.
That last beat is the whole point of the course, in miniature: the reflex you can watch with your own eyes, a step fails and the system recovers without redoing the work it already finished. This is not the Part 4 worked example (the full Worker, seven prompts); this is one sitting. Do it, then come back for the concepts.
A Production Worker is two processes side by side, and keeping them straight is the mental model: a Python function host (your code, serving the function to Inngest) and the Inngest dev server (the nervous system that triggers runs, memoizes steps, and shows you the dashboard). Your coding agent wires both, installs the Skills that teach it the Inngest patterns, and talks to the dev server through the inngest-dev MCP.
One more boundary matters, and it is the same one the Digital FTE course drew. Your worker keeps its customers and its audit trail in a Neon Postgres database, and there are two distinct ways that database is touched. Your coding agent uses the Neon MCP to build and inspect it: create the tables, read rows, pull the connection string, all in plain English at development time. Your worker uses its own Postgres connection (DATABASE_URL) to read and write it at runtime. The worker never calls the Neon MCP, and Neon's own docs are blunt about why: the MCP server is for development and inspection, never wired into a running app. Neon is free with one OAuth click; the Inngest dev server needs no account at all.
Get the base and open it
Download the base and open the folder in your coding agent. The agent does the setup itself, from the prompts just below. You set this up once: ai-agent-nervous-system/ is your folder for the whole course, the Quick Win and Part 4 alike. You never re-download or re-unzip.
Download ai-agent-nervous-system-base.zip
cd ai-agent-nervous-system
claude
This base assumes a capable general agent (Claude Code, or OpenCode running Claude Sonnet or Opus, GPT-5, or similar). A smaller model will drift on the build prompt; if its first plan looks vague instead of specific, switch to a stronger one before you go further.
Prep the base (~3 min)
The base ships its rules in AGENTS.md and its MCP wiring; the Skills, your key, and the Neon authorization come next. Have your agent set itself up. Paste this:
Read AGENTS.md, then get this base ready: install the Skills it lists for whichever agent you are, copy
.env.exampleto.envfor me, and tell me exactly what you need from me to bring the Neon and Context7 MCP servers online.
Watch for: the agent installing the four Inngest Skills and the neon-postgres Skill (you see the install runs and Installed confirmations), creating .env, then asking you for two things: your OPENAI_API_KEY to paste into .env, and one browser click to authorize Neon over OAuth. Neon is free; if you don't have an account yet, sign up at neon.com in about a minute, or create one right at the authorization screen. INNGEST_DEV=1 is already in .env, so the SDK runs in local dev mode with no signing key. When the install and wiring are done, the agent tells you to start the dev server (next step) and then restart it, because the new Skills and the inngest-dev MCP do not load mid-session.
Done when: the Skills are installed, .env holds your key, Context7 is reachable, and Neon is authorized. The inngest-dev MCP comes online once the dev server is running, which is the next step.
Start the dev server, and confirm the agent can reach it (~2 min)
This course adds two boundaries your agent reaches over MCP: a Neon database it builds and inspects, and the running dev server it sends events to and watches. So before you build anything, bring both up and confirm they are live.
Start the Inngest dev server in its own terminal (it is a Node CLI; leave it running):
npx inngest-cli@latest dev
The dashboard comes up at http://127.0.0.1:8288, and the dev server exposes its MCP endpoint at /mcp. Now restart your coding agent (exit and relaunch in the ai-agent-nervous-system folder) so the freshly installed Skills and the inngest-dev MCP both load. Then paste this:
List the Neon tools and the inngest-dev tools you can see.
Watch for: two real lists. The Neon tools (creating a project, running SQL, describing tables, fetching a connection string, and the like) are your agent's hand on the database. The inngest-dev tools (list_functions, send_event, invoke_function, get_run_status, and the rest) are its hand on the running dev server. Everything below rides on both.
Gate open: the reply lists real Neon tool names and real inngest-dev tool names. If the Neon tools are missing: the OAuth didn't finish; redo the Neon authorization from the prep step. If the inngest-dev tools are missing: the dev server is not running (start it), or you skipped the restart (exit, relaunch in this folder, ask again).
Build the store, and grab its connection string (~3 min)
Now create the worker's system of record over the Neon MCP, then hand the worker the one thing it will need to reach it later: a connection string. The worker you build in Part 4 reads its customers and writes its audit trail here. Paste this:
Paste this to your coding agent. Plan first; execute on approval.
On a fresh Neon project, create two tables:
customers(id, email, tier) andaudit_log(a record of every action the worker takes). Then call the Neon tool that returns the connection string and write that URL into my.envasDATABASE_URL. Use the Neon tools for all of it; don't write SQL for me to run.
Watch for: the agent calling the Neon MCP tools to create the project and the two tables (you see those tool calls, not SQL you typed), then writing DATABASE_URL into .env. That string is the handoff: Neon MCP provisioned the store, and your worker will use the string, not the MCP server.
Done when: a fresh Neon project exists with a customers table and an audit_log table, and .env holds a DATABASE_URL. Open console.neon.tech, pick the project the agent just made, and open Tables: there sit customers and audit_log, empty for now. You will see rows appear in D0 when the worker runs. (A table is just a spreadsheet: each row one thing, each column one detail.)
Build the first durable function, and drive it from the dashboard (~3 min)
Now build the smallest durable function, using the Skills you just installed. The Inngest Skills are TypeScript-first in their examples, so your agent takes the patterns from them (what a step is, how a durable function is shaped) and confirms the exact Python signatures from the docs (the dev-server MCP's grep_docs/read_doc, or Context7), not from memory. Paste this:
Using the Inngest Skills, write one tiny Inngest durable function (call it
greet-customer, triggered by ademo/greetevent) that composes a greeting in onestep.run, sleeps fifteen seconds withstep.sleep, then composes a farewell in a secondstep.runand returns both. Serve it from a FastAPI host in local dev mode, and start the host on port 8000 with auto-reload on, so edits I make later are picked up without a manual restart.
The shape it writes, so you know it when you see it: the function is plain async def, the two step.run calls wrap work that should be memoized, and the step.sleep between them suspends the run durably (the process can crash, restart, or redeploy during the sleep; the run resumes at the next line when the timer fires). One detail to confirm in the agent's code: the Inngest client is constructed with is_production=False, or it reads the INNGEST_DEV=1 already in your .env. Without one of those, the SDK quietly defaults to Cloud and your function never registers locally.
Done when: the function host is running on port 8000, and the dev server (already running from the last step) auto-discovered it. Open http://127.0.0.1:8288, click Functions, and greet-customer is listed. You drive the rest from the browser.
Trigger it, and watch a step sleep at zero compute (you drive)
Send the trigger event. The simplest path is the dashboard: in http://127.0.0.1:8288, click Events, then Send event, paste this, and click Send:
{
"name": "demo/greet",
"data": { "name": "Sara" }
}
(Prefer to stay in the agent? Ask it to send the event over the MCP: "Send a demo/greet event with name Sara using the inngest-dev send_event tool." Either way the same run starts.)
Click Runs and open the new run. The first step completes; the sleep step shows Sleeping with a resume time. Nothing in your code is running, the host terminal is idle, and that is the point: a durable wait costs zero compute. After fifteen seconds the run resumes on its own, the farewell step completes, and the status flips to Completed. The Output panel shows the returned dict.
Break a step, and watch the retry skip the work it already did (the payoff)
Now make a step fail on purpose, so you can watch memoization carry the completed work across the retry. Paste this to your agent:
Make the farewell step raise an error on purpose, so I can watch a run fail. Keep everything else the same.
Send the same demo/greet event again, then open the run and read its trace. Here is the payoff, and it is in this one failing run: the greeting step shows one completed attempt, and the farewell step shows several Attempts, each retried with backoff (Inngest defaults to several attempts) before the run lands in Failed. Sit with what that attempt count means: the completed greeting step is paid for once, not once per retry. That is durable execution you can see with your own eyes. Why the completed step returns instantly instead of re-running is the mechanic you will meet in Concept 7; for now, just watch it happen.
(This dev-server build shows no separate "memoized" badge. The memo is the attempt count: the completed step sitting at one attempt while the broken step climbs is exactly what "returned from memo, not re-run" looks like here.)
Now fix it:
Now revert the farewell step to the working version.
The host auto-reloads (that is what --reload bought you; if you skipped it, restart the host by hand). Send a fresh demo/greet event and the whole function now runs clean to Completed on the fixed code. One honest note about recovery, because it bites people: the dashboard's Rerun button starts a brand-new run from the top with your current code, every step re-executing from scratch. That is the right tool for incident recovery (a bad deploy broke a batch of runs; you ship a fix and rerun them), but it is not the memo-preserving resume. The memo-preserving resume is the automatic retry you just watched inside the failing run, where the completed step stayed put.
You just set up the whole course environment and saw the nervous system work with your own eyes: the Skills are installed, your Neon store is provisioned with DATABASE_URL in .env, the dev-server MCP is live, and you ran a durable function, watched a step sleep without consuming compute, then broke a step and watched the automatic retry return the completed step from memo while only the broken one re-ran. That is the architecture this course is about. The rest of the course scales it up: real senses (cron, webhook, fan-out), stronger reflexes (the agent invocation inside step.run), real balance under load, and the human-approval gate that turns "the agent might mess this up" into "the agent drafts, a human approves, the action issues."
If something did not work, four problems cover almost all of it:
- The dev server cannot reach the function host: confirm the host is running on port 8000.
- The client is in Cloud mode: the agent dropped
is_production=Falseand.envis missingINNGEST_DEV=1, so functions never register locally. Ask it to set one (an explicitis_productionvalue wins over the env var). - The function is missing from the dashboard: the host did not reload; restart it.
- A run hangs with no error and no progress: a de-synced host stalls silently; restart both the host and the dev server together, and run one host against one dev server. (One subtle cause: if
:8288was taken and the dev server came up on8289+, re-pointing theinngest-devMCP URL is not enough; the host still talks to:8288. SetINNGEST_BASE_URL=http://127.0.0.1:<port>on the host so it follows the dev server to the new port.)
If you hit any of these, the universal recovery move works here too: "Something didn't work. Read the error, tell me in plain language what you see, and propose one fix I can approve."
What you built, and where it grows
The environment is set up: the base is open, the Skills are installed, all three MCP servers are wired (Neon, Context7, inngest-dev), your Neon store has its customers and audit_log tables with DATABASE_URL in .env, and the dev server is running. You also saw the one idea the whole course rests on, the reflex of durable execution, with your own eyes. Part 4 builds the customer-support worker on this same base, in this same folder: it reads those customers and writes those audit rows, then wraps the whole thing in the full nervous system, a real event trigger, a daily cron that fans out, flow control, and the durable human-approval gate on refunds. Part 4 scales this step.run-and-step.sleep skeleton into a worker that does real work on your Neon store. If this Quick Win worked, the concepts ahead explain why each piece is shaped this way.
Part 1: The senses, how the world reaches the worker
An AI agent you call by hand runs when you call it. A real Production Worker has senses: it runs when the world reaches it. A customer emails, a webhook arrives, a cron fires at 09:00 daily, another worker hands off work. Each of these is a signal coming in, and a trigger is how the agent feels it. Part 1's five concepts are those senses: the event-driven mental model, the three ways the world reaches in (cron, webhook, event), the semantics that prevent double-processing, and the fan-out patterns that let one signal wake many workers.
Concept 1: Events vs requests, the durable mental model shift
Everything that follows in this course rests on one mental shift: from requests to events.
A request is a synchronous conversation. Someone calls; you handle; you return; they continue. A connection stays open; a human or service is waiting. If you crash, the caller gets an error. An agent you chat with at the prompt is a request: you typed, it streamed back, the conversation belonged to your terminal session.
An event is an asynchronous message. Something happened in the world (a customer signed up, an email arrived, a payment cleared), and the originator emits a named record of that fact. Zero, one, or many functions react to the event independently. No connection stays open. The originator does not know who is listening, does not wait for results, and is not blocked. The world has moved on.
# A request: I'm here, waiting, blocking
result = await agent.handle_customer_message(text=user_input)
print(result) # I unblock when the agent finishes
# An event: I fire-and-forget
await inngest_client.send(events=[
inngest.Event(
name="customer/email.received",
data={"customer_id": "c-4429", "body": email_body, "subject": subject},
),
])
# I return immediately. Somewhere else, one or more Inngest
# functions react to this event on their own schedule.
The shift sounds small. It is not. Once you think in events, durability and scale fall out almost for free, because:
- The producer cannot be slowed by the consumer (the email-receiver does not wait for the agent to finish drafting a reply).
- The consumer can crash and restart without losing the work (the event is durably stored; Inngest re-delivers it).
- New consumers can be added without changing producers (a second function, say an analytics counter, can subscribe to
customer/email.receivedwithout the email-receiver knowing). - Backpressure becomes a flow-control policy, not a code change (Inngest caps concurrency; the producer keeps firing; events queue).
Predict. Your customer-support Worker takes 8 seconds to respond to an email: three seconds for the agent's reasoning, four seconds for two MCP tool calls, one second for the database write. At peak load you receive 50 emails per minute. If you use the request model (the email parser blocks until the agent finishes), how many parallel HTTP connections to your email parser does that imply? If you use the event model (the email parser fires an event and returns immediately), how many? Confidence 1-5.
The answer: request model needs about 7 concurrent parsers (50/min × 8 seconds = ~6.7 parallel handlers, plus headroom). Event model needs one parser (it fires the event and returns in ~10ms; the event queue absorbs the 50/min spike; Inngest functions consume the queue at whatever concurrency you allow). The event model decouples production rate from consumption rate. That is not just a scaling fact; it is an architectural one. The event becomes a durable boundary between "what happened in the world" and "what the Worker does about it." Crash the consumer mid-processing and the event is still there to retry. Add three more consumer types and the producer does not notice. Events are how you stop owning the timing of work.Try with AI
Walk me through three scenarios. For each, classify it as REQUEST-MODEL
or EVENT-MODEL, and explain which one fits better:
A) A user clicks "Submit refund request" in the support portal and
expects to see "Refund issued: $30" within 2 seconds.
B) A nightly cron job at 02:00 runs a customer-health-check across
all 5,000 customers and writes a report to Slack.
C) A customer sends an email to support@; we want a draft response
ready within 60 seconds for the on-call agent to review and send.
For each, name (a) what the human's expectation of timing is and
(b) what failure looks like if the model crashes mid-execution.
Concept 2: Cron triggers, work that runs because time passed
The simplest trigger is the clock. Many things a Production Worker does are not reactions to outside events; they are scheduled work: daily health reports, weekly cleanups, hourly recalculations. Inngest's cron trigger is one line of code.
import inngest
@inngest_client.create_function(
fn_id="daily-customer-health-check",
trigger=inngest.TriggerCron(cron="0 9 * * *"), # 09:00 every day, UTC
)
async def daily_health_check(ctx: inngest.Context) -> dict[str, int]:
"""Run a customer-health pass for every Pro/Enterprise customer."""
customers = await ctx.step.run("fetch-pro-customers", fetch_pro_customer_ids)
# fan out: one event per customer, one Worker run per event
await ctx.step.run("fan-out", fan_out_per_customer_events, customers)
return {"customers_scheduled": len(customers)}
Three things to notice:
-
The schedule is just standard cron syntax.
0 9 * * *is 09:00 UTC every day;*/15 * * * *is every 15 minutes;0 9 * * 1is Mondays at 09:00. Inngest evaluates the cron in UTC; if you need a different timezone, that is a function parameter, not a different concept. -
The function still uses
ctx.step.run. Cron-triggered or event-triggered, the function shape is identical. Steps work the same. Durability works the same. Flow control works the same. The trigger is just how the function starts. -
The cron output is a regular Inngest function run. It shows up in the dashboard, has a run ID, has a trace, supports replay. If your Monday-morning cron run fails at step 3, Tuesday's cron will run normally and Monday's failure stays available for replay after you fix the bug.
What happens if your service is down when the cron fires? This is the question that separates a durable scheduler from a fragile one. Inngest's cron runs are durably recorded the moment the schedule fires; if your function endpoint is unreachable, Inngest retries with backoff until it succeeds or hits the retry ceiling. The cron fired at 09:00 does not "miss" because your deploy was rolling at 09:00; the run waits, you finish your deploy, the run completes. Cron triggers in development have one quirk worth knowing about: the local dev server only fires crons while it is running. Production runs them on Inngest's infrastructure, which is always running.
Quick check. Three claims. Mark each True or False. (a) If a cron function takes 45 minutes to run and is scheduled every 15 minutes, three concurrent instances will be running at any given time. (b) You can use
step.sleepinside a cron-triggered function to spread work across the day. (c) A cron-triggered function can also be invoked manually from the dashboard for testing.
Answers: (a) Depends on concurrency policy: by default Inngest will queue the overlapping runs; if you set concurrency=1 they serialize; if you set concurrency=10 they parallelize. The default is sane. (b) True, and it is a common pattern for "spread daily work across hours to smooth load." (c) True: the Inngest dashboard lets you invoke any function on demand for testing, regardless of its trigger.Try with AI
With my AI coding assistant connected to the Inngest dev server MCP,
write a cron-triggered Inngest function in Python that:
1. Runs every Monday at 09:00 UTC.
2. Queries the audit_log table for all conversations resolved in the
prior week (status='resolved' in that window).
3. Computes per-agent metrics: total conversations resolved, average
resolution time, count of escalations, count of refunds issued.
4. Returns the metrics as a JSON object.
After you write the function, use the MCP's `invoke_function` tool to
test it manually (instead of waiting for Monday). Confirm the audit
SQL is correct by using `grep_docs` to search Inngest's docs for
"step.run" examples.
Concept 3: Webhook triggers, when the outside world calls in
The second trigger surface is HTTP. An external system (Stripe, your email provider, a customer-portal form, a GitHub webhook) wants to call your Worker. Without Inngest, you would have to: stand up an HTTPS endpoint, parse the payload, validate the source, write to a queue, write a worker consuming from the queue, handle retries, handle idempotency, ship telemetry. Each one is a week of infrastructure work.
With Inngest, the endpoint is provided. You configure a webhook in the Inngest dashboard with a URL like https://inn.gs/e/<your-key>, point Stripe (or whatever) at that URL, and the webhook payload becomes an event in your event stream. Any function with a matching event-name trigger now fires.
@inngest_client.create_function(
fn_id="handle-stripe-refund-failed",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def on_refund_failed(ctx: inngest.Context) -> dict[str, str]:
"""Triggered by Stripe webhook → Inngest event → this function."""
charge_id = ctx.event.data["charge_id"]
customer_id = ctx.event.data["customer_id"]
# Look up which support ticket originated this refund
ticket = await ctx.step.run(
"find-ticket-for-refund", lookup_ticket_by_charge, charge_id,
)
# Wake the customer-support Worker with the full context
await ctx.step.run(
"notify-support-agent",
notify_support_agent_of_refund_failure,
ticket_id=ticket["id"], charge_id=charge_id,
)
return {"ticket": ticket["id"], "action": "notified"}
The flow: Stripe fails to refund a charge → Stripe POSTs to the Inngest webhook URL → Inngest creates an event named stripe/charge.refund.failed → the function above (matching that event name) fires → the function uses steps to look up the ticket and notify the support agent. None of the HTTP plumbing is yours to write. No endpoint, no parser, no queue, no consumer.
Two adjacent patterns worth naming:
- Generic JSON webhooks. If the source is not a known vendor, you point any JSON-emitting service at the same kind of endpoint and pick the event name. Slash-namespaced names (
vendor/event.subtype) are the convention; nothing enforces it, but the dashboard sorts cleanly when you follow it. - Webhook transforms. If the incoming payload does not match the shape you want, Inngest lets you define a "transform" function that runs server-side at receipt time and reshapes the event before it enters your event stream. This keeps your function code clean of provider-specific fields.
Predict. A Stripe webhook fires
stripe/charge.refund.failedat the exact same millisecond as your customer-support Worker is also callinginngest_client.sendto emit a different event namedcustomer/refund.investigation_needed. Both events arrive in the system simultaneously; the function above triggers on the Stripe event only. Will the function run once or twice? Confidence 1-5.
The answer: once. The function is registered to trigger on stripe/charge.refund.failed only; the customer/refund.investigation_needed event has a different name and matches a different function (or no function, if you have not written one). An event's name is its routing key. Two events with different names never accidentally trigger the same function, even if they arrive at the same instant. This is one reason naming discipline matters: a typo in an event name (customer/email_received vs customer/email.received) means the function never fires, and the symptom is silent. Inngest's dashboard helps catch this: unmatched events appear in a separate stream you can audit.Try with AI
I need to handle three webhook sources for my customer-support Worker:
A) Stripe: refund failed, charge disputed
B) Postmark (email service): bounced email, complaint
C) My internal admin UI: manual "investigate this ticket" button
For each, decide:
1. What event names you'd use (vendor/event.subtype format).
2. Whether the function reacting to it should run synchronously (the
caller is waiting) or asynchronously (fire and continue).
3. Whether you'd write a webhook transform to reshape the payload, or
consume it raw.
Then write the Inngest function for the Stripe refund-failed case in
Python, using the MCP's grep_docs to find the current syntax for
TriggerEvent and the dev-server MCP's send_event tool to test it.
Concept 4: Idempotency and event semantics, the same event firing twice
Webhooks are not exactly-once. They are at-least-once: the sender retries if it does not get an acknowledgment. Networks drop packets, services restart, your endpoint times out and the sender retries even though you actually succeeded. Without idempotency, every webhook system eventually double-bills, double-emails, or double-refunds someone. This is not a theoretical concern; it is the most common production bug in event systems.
Two layers of defense, both built into Inngest.
Layer 1: Event ID seeds at the source. When you send an event yourself (rather than receiving it from a webhook), you can attach an idempotency key:
await inngest_client.send(events=[
inngest.Event(
name="customer/refund.requested",
data={"order_id": "o-4429", "amount_cents": 5000},
id=f"refund-request-{order_id}-{request_timestamp}", # idempotency key
),
])
If a second event with the same id is sent within the dedup window (24 hours by default), Inngest drops the duplicate. Same logical event, same id, only one function run.
Layer 2: Step-level idempotency. Inside a function, each step.run is identified by its name. If a function crashes between step 3 and step 4, the retry re-runs the function code from the top, but for steps 1, 2, and 3, Inngest returns the stored outputs without re-executing the step body. Step 4 runs normally for the first time. This is what makes a function "durable": the side effects of completed steps do not re-happen on retry.
@inngest_client.create_function(
fn_id="issue-customer-refund",
trigger=inngest.TriggerEvent(event="customer/refund.requested"),
)
async def issue_refund(ctx: inngest.Context) -> dict[str, str]:
# Step 1: look up the order. If the function retries, this returns
# the SAME order data it computed the first time, from Inngest's memo.
order = await ctx.step.run(
"lookup-order", lookup_order_by_id, ctx.event.data["order_id"],
)
# Step 2: call Stripe. If the function retries AFTER this step
# succeeded, the Stripe call does NOT happen again. The refund is
# issued exactly once even if the function runs three times.
refund = await ctx.step.run(
"issue-stripe-refund", call_stripe_refund_api,
charge_id=order["stripe_charge_id"],
amount=ctx.event.data["amount_cents"],
)
# Step 3: write the audit row. Same property: runs at most once.
await ctx.step.run(
"audit-refund", write_audit_refund_issued,
order_id=order["id"], refund=refund,
)
return {"refund_id": refund["id"]}
If this function crashes during step 3, the retry re-enters step 1 (gets cached order data, no DB call), re-enters step 2 (gets cached refund data, no Stripe call), runs step 3 for real, returns. The customer's card is charged once, even if the function ran three times. This is the killer feature. It is what makes Inngest qualitatively different from a queue with a retry loop.
Inngest's memoization gives you exactly-once step completion from the function's perspective: once step.run records a step as successful, it will not re-execute. But there is a narrow window. If your step's body calls Stripe (the side effect happens on Stripe's servers), then crashes before Inngest records the result, the retry will re-call Stripe. From Inngest's perspective the step "did not complete." From Stripe's perspective the charge already happened. The production-grade pattern is Inngest step memoization plus provider-level idempotency keys: Stripe's Idempotency-Key header, Postmark's MessageID reuse, your own MCP server's idempotency contract. Treat step.run and provider idempotency keys as complementary, not substitutes: step.run keeps your function's internal logic exactly-once; the provider's idempotency key keeps the external side effect exactly-once.
Quick check. True or false. (a)
step.runmakes the step idempotent only if the function inside is also idempotent. (b) An event with a duplicate ID outside the dedup window will be treated as a new event. (c) Ifstep.runfails mid-execution (the step's code throws an exception), Inngest stores the failure and retries the step on the next attempt without re-running prior steps.
Answers: (a) False: step.run makes the step's invocation idempotent (it will run at most once on success), but if the function inside is non-idempotent (like calling Stripe), the at-most-once guarantee is exactly what you want. The whole point is that you do not have to make Stripe-calling idempotent yourself. (b) True: Inngest's dedup window is 24 hours by default; events with the same ID after that window are treated as new. (c) True: the automatic retry is memoized; Inngest knows step 3 failed at attempt 1 and retries just step 3 on attempt 2. Prior successful steps do not re-execute. (This is the within-run retry, not the dashboard Replay button, which is a fresh run, Concept 14.)Try with AI
Here are three scenarios. For each, decide: idempotency PROBLEM or
NO PROBLEM, and if it's a problem, what's the fix:
A) Stripe sends the same charge.refund.failed webhook three times
in 90 seconds (because their first two attempts timed out at
your endpoint). Your function emails the customer.
B) A customer clicks "Issue refund" three times because the page
was slow. Your function calls Stripe and writes audit_log.
C) Your nightly cron at 09:00 sends a customer-health-check event
to each Pro customer. If two crons fire at the same time (a deploy
bug), what happens?
For each problem case, propose ONE specific fix: event ID seed
inside the function, idempotency key in inngest_client.send, or
function-level deduplication on the trigger.
Concept 5: Fan-out and sub-agent delegation, one event many Workers
Often a single event needs to trigger work in many places. The Stripe charge.refund.failed event might need to: notify the support agent, write to audit, update the customer's risk score, alert finance ops, post to Slack. Five reactions, all independent, all from one event.
The Inngest pattern: subscribe many functions to the same event. No fan-out code; just multiple @inngest_client.create_function decorators with the same TriggerEvent. Each function runs independently, has its own retries, has its own step trace, fails independently of the others.
@inngest_client.create_function(
fn_id="refund-failed-notify-support",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def notify_support(ctx: inngest.Context) -> dict[str, str]:
# ... runs the customer-support Worker to draft a response ...
return {"status": "drafted"}
@inngest_client.create_function(
fn_id="refund-failed-update-risk-score",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def update_risk_score(ctx: inngest.Context) -> dict[str, float]:
# ... runs the risk-scoring Worker ...
return {"new_risk_score": 0.42}
@inngest_client.create_function(
fn_id="refund-failed-post-slack",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def post_to_slack(ctx: inngest.Context) -> None:
# ... posts a Slack notification ...
return None
One Stripe webhook arrives. Inngest creates one event. Three functions fire, each in its own run. If post_to_slack fails because Slack is down, the other two are unaffected and complete normally. The failed run sits in the dashboard for replay once Slack recovers. This is the core of multi-Worker coordination, and it is the architectural pattern your future manager layer (a later course) will compose at scale.
The other fan-out pattern: parent-fires-N-children. Sometimes the fan-out is dynamic. Your daily cron needs to fire a customer-health event for each Pro customer, which might be 500 or 5,000 depending on the week. The parent function sends N events:
from datetime import date
async def fan_out_per_customer_events(
customers: list[str],
) -> int:
events = [
inngest.Event(
name="customer/health_check.requested",
data={"customer_id": cid},
id=f"daily-health-{cid}-{date.today().isoformat()}", # idempotency
)
for cid in customers
]
await inngest_client.send(events=events)
return len(events)
5,000 events get sent in a single send call. 5,000 function runs fire, each with its own customer_id, each isolated, each independently retryable. Flow control (Concept 11) caps how many run concurrently so you do not melt your downstream APIs. The cron function returns in seconds; the fan-out runs at whatever rate Inngest's flow-control policies allow.
Sub-agent delegation is a special case of fan-out. Inside a Worker run, you can call await inngest_client.send(...) to delegate sub-tasks to other Worker types. The parent does not wait for the children unless it explicitly uses step.invoke to run them synchronously and collect their results.
Predict. You have three functions all triggered by
customer/email.received: the customer-support agent that drafts a reply (15 seconds), an analytics counter (50ms), and a "VIP detector" that checks if the customer is high-value (200ms). When an email arrives, what does the user-visible latency look like for each? Three options: (a) all three add up to ~15 seconds; (b) all three run in parallel, total latency is ~15 seconds (the slowest); (c) each runs independently with no shared latency at all. Confidence 1-5.
The answer: (c). Each function is its own run, in its own process slot. The customer-support agent does not block the analytics counter; the VIP detector does not block the agent. From the outside, the latency for any particular function is just that function's own time. No function ever waits on a sibling function. This is why fan-out scales: the consumers are isolated. If the agent crashes, the analytics counter is unaffected.Try with AI
Design the fan-out architecture for these three scenarios. For each,
sketch the event names and the functions that subscribe:
A) New customer signs up. Need to: send welcome email, create
Stripe customer, post to Slack #new-customers, write to
audit_log, schedule a 7-day follow-up.
B) Customer support email arrives. Need to: draft a reply (agent),
detect sentiment, check if VIP, update customer's "last contact"
timestamp, attach to the right ticket thread.
C) Daily cron at 09:00 needs to run customer-health-check on
~5,000 Pro customers. Each check takes ~30 seconds. We want
the whole batch to complete by 11:00 (a 2-hour window).
For each, decide: how many event types, how many subscriber
functions, what the idempotency story is, and one specific failure
mode this design protects against.
Part 2: The reflexes, what happens when something breaks
The senses wake the worker. The reflexes make the worker survive what comes next. A worker calls an agent, the agent calls a few tools, the tools call a database and a payment API and a model: several external calls in a single turn, any of which can fail. Without durability, a single transient failure mid-turn restarts the whole flow from the top. A reflex is automatic: it acts fast, without the agent's mind having to decide. That is what durable execution gives you. Durability is the property that says: when something fails mid-execution, the work already completed stays completed, and execution resumes from where it broke. Inngest delivers this with one primitive (step.run) and a memoization mechanic underneath. Part 2 explains both, plus the time-based variants (step.sleep, step.wait_for_event), the retry semantics, and the step.ai primitives.
First-pass compression note. If you are scanning, the load-bearing concepts are 6 (
step.run) and 7 (memoization). Concepts 8-10 build on them. Read 6 and 7 carefully; the rest will read fast once you have those two in your head.
Concept 6: step.run and the durable function model
A normal Python function runs once, top to bottom. If it crashes halfway through, you start over from the top. If it makes three API calls before crashing, the next attempt makes those three calls again, and pays for them, and possibly double-charges someone, again.
An Inngest function is durable. Each operation you want to be checkpointed gets wrapped in step.run(name, fn, ...). The function still runs top to bottom on each attempt, but steps that have already completed return their stored outputs instead of re-executing. The function "catches up" to where it broke, then continues forward.
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
customer_id = ctx.event.data["customer_id"]
# Step 1: load the customer record (one DB call)
customer = await ctx.step.run(
"load-customer", load_customer_by_id, customer_id,
)
# Step 2: load the conversation thread (one DB call)
thread = await ctx.step.run(
"load-thread", load_thread_for_customer, customer_id,
)
# Step 3: run the OpenAI Agents SDK agent (your worker)
response = await ctx.step.run(
"run-agent",
run_customer_support_agent,
customer=customer,
thread=thread,
email_body=ctx.event.data["body"],
)
# Step 4: write the draft reply to the database
await ctx.step.run(
"save-draft-reply", save_reply,
customer_id=customer_id, text=response.draft,
)
# Step 5: notify the on-call human reviewer via Slack
await ctx.step.run(
"notify-reviewer", post_slack_for_review, response=response,
)
return {"status": "drafted", "reviewer_notified": True}
Five steps. Each one is independently checkpointed.
What durability buys you here, in three failure scenarios:
-
Scenario A: the agent step throws a timeout. Without
step.runwrapping the agent call, the next retry of this function reloads the customer, reloads the thread, and reruns the agent from scratch, paying for OpenAI tokens twice for work the agent already partially did. Withstep.run, the customer and thread loads are memoized (steps 1-2 do not re-execute); only step 3 retries. Inngest's automatic retries handle transient OpenAI errors without your code knowing. -
Scenario B: the function process gets killed between step 3 and step 4 (a deploy rolled out, a node restarted, the container OOMed). Without durability, the agent's response is lost and the customer's email goes unanswered until someone notices. With durability, the function resumes after the restart: steps 1, 2, 3 return their stored outputs in milliseconds, step 4 runs for real, step 5 runs for real, the customer gets the drafted reply.
-
Scenario C: Slack returns a 503 on step 5. Without
step.run, you would either lose the work or write retry-and-backoff logic by hand for the Slack call specifically. Withstep.run, Inngest retries step 5 with exponential backoff until Slack recovers; meanwhile steps 1-4 stay completed and will not re-execute. The draft reply is already in the database; the notification is the only thing pending.
You do not write any retry loops, any "did I already do this" checks, any state machines. The state machine is the sequence of step.run calls. Each step is a node; each transition is durable.
The one rule of step.run. The function passed to step.run should be deterministic given its inputs: calling it twice with the same arguments should produce the same result.
- That is automatic for pure functions.
- It is automatic for idempotent API calls (Stripe's
idempotency_key, your own MCP server tools). - It requires care for things like "generate a random ID" or "call an LLM with default temperature" (a retry could produce different output than the original attempt, which sometimes matters).
When the operation is not deterministic, you make it deterministic: pass a seed, pre-generate the random value outside the step, or accept that the retry may differ from the original (often fine for an agent response).
Quick check. True or false. (a) The function body re-executes from the top on every retry, including all the imports and variable assignments outside
step.runcalls. (b) If a step takes 30 seconds to complete, and the function crashes 25 seconds in, the retry continues that step from second 25. (c)step.runoutputs are stored in Inngest's infrastructure, not in your application.
Answers: (a) True, and this is why you keep the work inside step.run. Code outside step.run re-runs on every retry; code inside runs once per attempt and is memoized on success. (b) False: step.run is the atomic unit; if a step is interrupted, the retry re-runs the entire step. If your step is so long it cannot be allowed to restart, you break it into smaller steps. (c) True: the step output store is part of Inngest, not your DB. This is why you can replay runs even after your database schema has changed.Try with AI
With my AI coding assistant connected to the Inngest dev server MCP,
shape a customer-support worker into an Inngest durable function.
Take a Runner.run call that processes a customer email and wrap each
of these inside its own step.run:
1. Load the customer record
2. Load the related conversation thread
3. Run the agent (the OpenAI Agents SDK Runner)
4. Persist the draft reply
5. Notify the on-call reviewer
Use grep_docs to find the current Python SDK syntax. Use
invoke_function to test it with a synthetic email payload. Then
deliberately raise an exception in step 4 and use get_run_status
to confirm steps 1-3 don't re-execute on retry.
Concept 7: Memoization, the mechanic underneath resumability
Concept 6 said "steps that have already completed return their stored outputs instead of re-executing." That mechanism is memoization and it is worth understanding the mechanic, because every other Inngest primitive uses it.
When you call await ctx.step.run("load-customer", load_customer_by_id, "c-4429"), three things happen on the first attempt:
- Inngest checks its memo store: "is there a stored result for step
load-customerin this run?" There is not. - The function
load_customer_by_id("c-4429")runs. It returns{"id": "c-4429", "tier": "pro", ...}. - Inngest writes that result into the memo store, keyed by
(run_id, step_name="load-customer"). Then it returns the result to your code.
If the function crashes after step 3 and Inngest retries, on the second attempt the function body re-runs from the top. When execution reaches the same line, three different things happen:
- Inngest checks its memo store: "is there a stored result for step
load-customerin this run?" Yes, it was stored on attempt 1. - The function
load_customer_by_id("c-4429")does not run. The DB call does not happen. - Inngest returns the stored result to your code in milliseconds.
This is why retries are cheap: the expensive work is already cached. It is why durability is correct: the expensive work does not happen twice. And it is why the "function body re-runs top to bottom" is fine despite sounding wasteful: the work inside steps does not actually re-run; only the orchestration code between steps does.
The implication that surprises new users. Code outside step.run runs on every attempt. If you do this:
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
# ANTI-PATTERN: this runs on every retry. Don't do this.
expensive_thing: dict = await fetch_expensive_data(ctx.event.data["id"])
await ctx.step.run("do-something", do_something_with, expensive_thing)
return {"status": "done"}
fetch_expensive_data runs on every retry. If it costs $0.10 a call and the function retries 5 times, you just spent $0.50 fetching the same data five times. The fix is to wrap the expensive thing in its own step:
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
expensive_thing: dict = await ctx.step.run(
"fetch-expensive-data", fetch_expensive_data, ctx.event.data["id"],
)
await ctx.step.run("do-something", do_something_with, expensive_thing)
return {"status": "done"}
Now fetch_expensive_data is memoized; retries do not pay for it again.
The step name is the memo key. This is why step names must be unique within a function. If you have two step.run("load-customer", ...) calls in the same function, Inngest will return the first one's stored output for both calls. That is almost never what you want. If you have a loop that calls a step N times, name them uniquely (step.run(f"load-customer-{i}", ...)) so each iteration has its own memo slot.
Predict. Your function has three steps. Step 1 (
load-customer) costs $0.01 in DB calls and takes 100ms. Step 2 (run-agent) costs $0.20 in OpenAI tokens and takes 12 seconds. Step 3 (save-draft) costs $0.005 in DB calls and takes 50ms. Step 2 fails 30% of the time due to OpenAI rate limits; Inngest retries with backoff. What is the cost difference between (a) wrapping all three instep.runand (b) wrapping only step 2 instep.run? Confidence 1-5.
The answer: with (a), a single retry costs you the cost of step 2 only ($0.20). The customer and the save-draft are memoized; they do not re-execute. With (b), every retry costs you steps 1 and 3 plus step 2: $0.215 per retry. Over a thousand emails with a 30% retry rate, that is a difference of about $4.50 in pure waste, plus the operational complexity of figuring out what got partially written when step 3 ran twice. Wrap everything you do not want re-executed in step.run. It is not optional once you understand the mechanic.Try with AI
With my AI coding assistant: review the Inngest function we built
in Concept 6's Try-with-AI and identify any code BETWEEN step.run
calls that should be wrapped in its own step but isn't. Common
candidates:
- Computed values (timestamps, IDs, formatting) that we want to be
stable across retries
- Calls to logging or metrics services
- Reads from Redis, environment variables, secret managers
Then propose a refactor that moves each of these into its own step
with a meaningful name. For each, explain whether the side effect
is one you want to happen once (use step.run) or every retry
(leave it outside).
Concept 8: step.sleep and step.wait_for_event, durability through time
Some work has to wait. A welcome-email pipeline sends an email immediately, then waits three days, then sends a follow-up. A refund-investigation needs to wait for a human to approve. A trial-conversion flow watches for "user upgraded to paid" within 7 days and sends a different email depending on what it sees.
In a normal Python function, "wait three days" means hold a process open for three days. That is untenable: your process restarts, your hosting bills you for 72 hours of idle compute, your timer gets lost. In Inngest, "wait three days" is one line:
from datetime import timedelta
@inngest_client.create_function(
fn_id="trial-welcome-series",
trigger=inngest.TriggerEvent(event="user/trial.started"),
)
async def welcome_series(ctx: inngest.Context) -> dict[str, str]:
user_id = ctx.event.data["user_id"]
await ctx.step.run("send-welcome-email", send_welcome_email, user_id)
# Wait three days. The function gets paged out of memory. Nothing
# is consuming compute. Three days later, Inngest pages it back in
# and resumes execution at the next line.
await ctx.step.sleep("wait-three-days", timedelta(days=3))
await ctx.step.run("send-followup", send_followup_email, user_id)
return {"status": "completed"}
step.sleep is durable, the nervous system at rest. The function suspends; Inngest stores the resume time; nothing consumes compute while you wait; the function resumes at the right time, with all prior step outputs still memoized. step.sleep (and step.sleep_until) can wait up to one year on paid plans, up to seven days on the free Hobby plan (Inngest usage limits). The seven-day Hobby ceiling is wide enough for every sleep this course uses.
The more powerful sibling is step.wait_for_event. Instead of waiting for time, wait for another event. The function suspends until a matching event arrives, or until a timeout you set expires. This is what makes Inngest the cleanest expression of HITL (Concept 15) and inter-agent coordination patterns:
@inngest_client.create_function(
fn_id="refund-with-approval",
trigger=inngest.TriggerEvent(event="customer/refund.requested"),
)
async def refund_with_approval(ctx: inngest.Context) -> dict[str, str]:
request = ctx.event.data
request_id = request["request_id"]
# If amount is over $100, require approval before issuing
if request["amount_cents"] >= 10_000:
# Notify a human via Slack/email/whatever
await ctx.step.run("notify-approver", notify_human_approver, request)
# Wait for an approval event. Up to 24 hours; expires otherwise.
approval = await ctx.step.wait_for_event(
"wait-for-approval",
event="refund/approval.decided",
timeout=timedelta(hours=24),
if_exp=f"async.data.request_id == '{request_id}'",
)
if approval is None or not approval.data.get("approved"):
return {"status": "rejected_or_timeout"}
# Either it was under $100, or it was approved
refund = await ctx.step.run(
"issue-stripe-refund", call_stripe_refund_api, request,
)
return {"status": "issued", "refund_id": refund["id"]}
What is happening:
- The function reaches
wait_for_event. It suspends. Zero compute consumed. - A human looks at the Slack notification, clicks "Approve" in your admin UI, your UI calls
inngest_client.send(events=[Event(name="refund/approval.decided", data={"request_id": "...", "approved": True})]). - Inngest matches the event to the waiting function (the
if_expensures only events for this request_id match) and resumes the function with the event as theapprovalreturn value. - The function continues to the refund step. The Stripe refund happens after the human approved.
step.sleep and step.wait_for_event are timeouts you do not pay for. The function looks synchronous in your code ("wait three days, then send the email"), but the runtime semantics are async and durable. This is one of the two things Inngest is famous for (durable retries being the other). Without it, the alternative is a queue plus a state machine plus a database plus a poller, and you would write a thousand lines instead of three.
Quick check. Three claims. Mark each True or False. (a) If
step.sleepis set for 30 days and your service is redeployed five times in those 30 days, the sleep continues uninterrupted on a paid plan. (b) Ifstep.wait_for_eventtimes out, the function raises an exception. (c) Twostep.wait_for_eventcalls in the same function can wait for the same event simultaneously.
Answers: (a) True on a paid plan: sleeps are stored in Inngest's infrastructure, not in your service's memory, so redeploys do not lose them. Note the tier ceiling: a 30-day sleep is fine on a paid plan but exceeds the free Hobby plan's seven-day sleep cap. (b) False: on timeout, wait_for_event returns None. Your code checks for it and decides what to do (rejection, escalation, default-approval, whatever the policy is). (c) True, but suspicious: both will fire when a matching event arrives. If the two wait_for_event calls have different if_exp filters, this is fine. If they are identical, you are probably looking at a refactor opportunity.Try with AI
Build a delayed-investigation flow with my AI coding assistant.
Specification:
1. Triggered by event 'customer/refund.failed'.
2. Immediately notify the on-call human via Slack with the refund
details and a "Investigate" button.
3. Wait for the human to click the button (which fires
'customer/refund.investigation_started') for up to 4 hours.
4. If the click arrives in time: run the agent to draft an
investigation summary.
5. If 4 hours pass without a click: escalate to a senior reviewer
by firing 'customer/refund.escalated'.
Use the dev-server MCP's send_event tool to simulate the
human-click event during testing. Use get_run_status to inspect
how the suspended function shows up in the dashboard. Before
writing, use list_docs to scan the Inngest documentation tree
for the right page on wait_for_event semantics, then
read_doc on the page you find to get the exact syntax for
the if_exp filter expression.
Concept 9: Retries, error handling, dead-letter
This is the reflex in close-up. By default, Inngest retries failed steps. The defaults are sensible: ~4 retries with exponential backoff, ranging from a few seconds to a few minutes between attempts. After the final retry fails, the run enters a failed state and stays there for inspection and (optionally) replay. You can tune this per function: retries=10, retries=0 (do not retry at all), specific exception types that should not be retried.
@inngest_client.create_function(
fn_id="charge-customer",
trigger=inngest.TriggerEvent(event="order/checkout.completed"),
retries=2, # only retry twice; this involves Stripe; don't keep hammering
)
async def charge_customer(ctx: inngest.Context) -> dict[str, str]:
try:
charge = await ctx.step.run(
"call-stripe", call_stripe_charge, ctx.event.data,
)
return {"status": "charged", "charge_id": charge["id"]}
except StripeCardDeclinedError as e:
# A declined card is not a transient failure. Don't retry.
# Mark the order as failed in our database and emit an event
# for the dunning flow.
await ctx.step.run(
"mark-failed", mark_order_failed,
ctx.event.data["order_id"], reason=str(e),
)
await ctx.step.run(
"emit-dunning-event", emit_dunning, ctx.event.data["order_id"],
)
return {"status": "card_declined"}
Three patterns matter.
Pattern 1: Transient vs permanent failures. Inngest retries everything by default, but some errors are not transient. A card-declined error from Stripe will be declined again on retry. A 401-unauthorized from your downstream API will not become a 200 just because you wait. Your function should catch these specifically and handle them: write to your DB, emit a downstream event, return cleanly, so they do not waste retry budget on hopeless attempts. Inngest's NonRetriableError explicitly tells Inngest to skip retries for a thrown exception.
Pattern 2: Step-level vs function-level errors. A step that throws is retried. After step-level retries are exhausted, the function fails. Sometimes you want a function to survive a failing step: log the failure, mark the work as "partial," continue. Wrap the step.run in try/except. The step still gets its retries; if all retries fail, the exception propagates to your catch block, where you can decide what to do.
Pattern 3: Dead-letter and replay. When a function fully fails, it does not disappear. It enters the Inngest dashboard's "failed runs" view, with the full trace, all step outputs, the exception, and a Replay button. After you ship a bug fix, you can replay the failed runs: each re-runs from the top on the fixed code (a fresh run, not a memo-preserving resume, that distinction is Concept 14). This is the "dead-letter queue" pattern from traditional queues, except you do not write the dead-letter handler. You just fix the bug and replay, keeping side-effecting steps idempotent so a re-run does not double-act.
Predict. Your function calls Stripe in step 2 and your customer data service in step 4. Stripe returns 503 (service unavailable, transient) on the first attempt of step 2. Step 2 retries 4 times with exponential backoff (~1s, ~2s, ~5s, ~12s); on the 4th retry, Stripe is back, the charge succeeds. Now step 4 runs, and the data service is down with a 500. Does Inngest retry the whole function, or just step 4? How many times? Confidence 1-5.
The answer: just step 4, and it gets its own retry budget. Steps do not share retries. Step 2's four retries are independent of step 4's. Inngest will retry step 4 (default ~4 times) and if the MCP server comes back, step 4 completes, and the function succeeds. The Stripe charge from step 2 is not re-issued, because step 2's output was memoized after its successful retry. The customer is charged exactly once even though the function spent 20 seconds across retries.Try with AI
With my AI coding assistant: extend the customer-support Worker
function from Concept 6 with explicit retry and failure handling.
Specification:
1. The OpenAI Agents SDK call should retry 3 times on transient
failures (rate limit, timeout), but NOT retry on a content-policy
refusal from the model.
2. The Slack notification should retry up to 10 times (Slack is
often flaky; don't lose the notification).
3. The Postgres write should retry once; if it fails again, log the
failure and continue (don't fail the whole function over a
transient DB blip).
For each step, decide what's transient vs permanent and structure
the try/except accordingly. Use grep_docs to find the Python SDK's
NonRetriableError equivalent.
Concept 10: step.run for AI calls in Python (step.ai.wrap is TypeScript-only)
Concepts 6-9 work for any side-effecting code: DB writes, API calls, file writes, agent invocations. Inngest also ships AI-specific step primitives that handle the patterns LLM calls are prone to: rate-limit retries, observability into prompts and responses, and (optionally) inference proxying that reduces serverless compute costs.
Important Python-vs-TypeScript note up front. Inngest's
step.aimodule has two methods, and they have different language support.step.ai.infer()is available in both TypeScript and Python (Python SDK v0.5+): it offloads inference to Inngest's infrastructure and traces the call.step.ai.wrap()is TypeScript only: there is no Python equivalent today. For Python projects (like this course's Worker), the correct pattern for wrapping an OpenAI Agents SDK call isctx.step.run(...), which already gives you full durability, retries, and observability of the wrapped step's inputs and outputs. You just do not get the LLM-specific prompt/response telemetry that the TypeScriptstep.ai.wrapadds. (Verified against the AI Inference docs as of May 2026.)
step.run for OpenAI calls in Python (the recommended pattern). Your function makes the OpenAI call inside ctx.step.run("name", fn, ...). Inngest traces the inputs and outputs of the step (the arguments you passed and what was returned), retries on transient failures, and memoizes the result so retries of later steps do not re-pay the OpenAI cost. The prompt and response are recorded as the step's input/output in the dashboard:
from openai import AsyncOpenAI
oai = AsyncOpenAI()
async def call_openai_summary(thread_text: str) -> str:
"""A normal async function. Inngest doesn't care that this is an LLM call."""
response = await oai.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "Summarize this support thread in 3 sentences."},
{"role": "user", "content": thread_text},
],
)
return response.choices[0].message.content
@inngest_client.create_function(
fn_id="summarize-customer-thread",
trigger=inngest.TriggerEvent(event="customer/thread.summary_requested"),
)
async def summarize_thread(ctx: inngest.Context) -> dict[str, str]:
thread: list = await ctx.step.run(
"load-thread", load_thread, ctx.event.data["thread_id"],
)
# The OpenAI call is wrapped in step.run. Inngest sees this as a step:
# the inputs (formatted thread text) are recorded, the output (summary
# string) is recorded, the call is memoized on success, and retries are
# automatic on transient failures.
summary: str = await ctx.step.run(
"openai-summary", call_openai_summary, format_thread(thread),
)
return {"summary": summary}
In the dashboard, this run shows the function's step trace (load-thread followed by openai-summary) with the inputs and outputs of each step. If OpenAI returned a 429 (rate limited), Inngest retries openai-summary with backoff automatically: same memoization semantics as Concept 7, so retries do not double-bill the prior load-thread step. What you do not get (compared to TypeScript's step.ai.wrap): automatic LLM-specific telemetry like token counts, model name, and provider-specific traces broken out in the dashboard's AI view. For most Python production workloads, the standard step trace plus your own OpenAI client telemetry (for example, the OpenAI Agents SDK's tracing) covers this gap.
Because step.run records each step's inputs and outputs to Inngest's observability store, the content you pass through a step is stored and visible in the dashboard. If your prompt includes PII (names, emails, addresses), secrets (API keys, internal tokens), contractual or financial data, or regulated content (HIPAA, GDPR-scoped data, PCI), do not pass the raw content into the step body. Redact, hash, summarize, or pass a reference (a customer_id and ticket_id, not the full ticket text) and reload the sensitive content inside the step body from your authoritative store, where retention and access controls are yours to configure. The same discipline applies to the OpenAI Agents SDK's own tracing if you enable it. Treat step traces as you would treat any production log: useful by default, regulated by policy.
step.ai.infer: a niche tool for serverless cost reduction (Python-supported). You will rarely reach for this; step.run is the default for every AI call in this course. step.ai.infer exists for one specific situation: instead of calling OpenAI from your function process, you ask Inngest's infrastructure to make the call, so while the request is in flight your function process can deallocate. On serverless platforms (Vercel, Cloudflare Workers, AWS Lambda) that bill for in-flight time, this saves compute cost during the wait. For long-running inferences (Deep Research, large embedding batches) the savings are real. For sub-second calls, it adds latency without much benefit.
If you ever do reach for it, pull the exact signature from the AI Inference docs for your installed version: it lives in the experimental inngest.experimental.ai namespace and was not exercised in this course's build.
Quick check. True or false. (a) In Python,
ctx.step.run("name", call_openai, ...)makes the OpenAI call durable, retried on transient failures, and memoized on success. (b)step.ai.inferis a hard requirement for using Inngest with the OpenAI Agents SDK in Python. (c) Replacingstep.runwithstep.ai.inferfor a single OpenAI call would always make the function cheaper to run.
Answers: (a) True: this is the recommended Python pattern. The OpenAI call goes inside the step body; Inngest treats the whole step as the unit of work. (b) False: step.run is enough for most cases. step.ai.infer is an optimization for serverless compute cost, not a requirement. The OpenAI Agents SDK integration in the worked example uses plain step.run. (c) False: step.ai.infer saves money only when (i) you are on a serverless platform that bills for in-flight time AND (ii) the call is long enough that the request-offload savings dominate the added orchestration overhead. For sub-second calls on always-on servers, plain step.run wins.Try with AI
With my AI coding assistant: take a customer-support agent
invocation and produce TWO versions of the Inngest function that
calls it:
Version A: Wrap the Runner.run call in step.run (the recommended
Python pattern: durable, retried on transient failures, memoized;
you get the standard step trace).
Version B: For comparison, write a SEPARATE small Inngest function
that calls a single OpenAI completion via step.ai.infer (the
Python-supported step.ai primitive that offloads inference to
Inngest's infrastructure to save serverless compute cost).
For each version, explain (a) what the dashboard trace shows for a
successful run, (b) what happens when the OpenAI call hits a 429
rate limit, and (c) on which kind of deployment (always-on server
vs serverless) Version B's offload saves real money.
Part 3: Balance and recovery, production scale
Balance is the third layer: it keeps the worker healthy under load, the way your body keeps itself steady when you push it hard. Concurrency stops the worker from melting downstream systems. Throttling keeps you off rate-limit walls. Priority and fairness prevent one chatty customer from starving everyone. Batching turns "10,000 events at midnight" into "100 manageable function runs." Replay turns "yesterday's bug cost us 200 failed interactions" into "we fixed it; 200 conversations resumed." Human-approval gates suspend the agent until a human approves. Part 3's five concepts give you the production policies that turn a working worker into one you can put in front of paying customers.
Concept 11: Concurrency and throttling
Concurrency is the maximum number of runs of a function that can execute simultaneously. Throttling is the maximum number of runs that can start per unit of time. Both are configured per function with one line each. Both are the most common production gap when teams move from prototype to scale.
from datetime import timedelta
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
concurrency=[inngest.Concurrency(limit=10)],
throttle=inngest.Throttle(limit=100, period=timedelta(minutes=1)),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
...
concurrency=10 says: at most 10 of these functions are running at any moment. The 11th event waits in queue until one of the 10 finishes. throttle=100/minute says: at most 100 new runs start per minute. The 101st event waits even if there is concurrency headroom.
Why both matter in practice. Concurrency protects downstream systems: if your customer-support Worker talks to OpenAI and Postgres, having 1,000 concurrent runs means 1,000 simultaneous OpenAI calls and 1,000 simultaneous Postgres connections. You will exhaust your OpenAI rate limit, exhaust your connection pool, or both. Throttle protects against bursts: if 500 customer emails arrive at 9:00am sharp, you do not want 500 functions starting in the same second; throttle smooths the start rate.
Per-key concurrency. A single concurrency limit applies to the function globally. A more interesting pattern is per-key concurrency: limit by some property of the event.
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
concurrency=[
inngest.Concurrency(limit=10), # global cap
inngest.Concurrency(limit=2, key="event.data.customer_id"), # per-customer cap
],
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
...
This says: at most 10 functions running globally, AND at most 2 per customer at a time. If a single customer sends 100 emails in a minute, only 2 of their emails are processed simultaneously; the other 98 queue behind. Meanwhile, other customers' emails flow normally; they are not blocked by the chatty customer. This is multi-tenant fairness in two lines of code. Concept 12 develops the pattern further.
Quick check. Three claims, True or False. (a) If you set
concurrency=10and 1,000 events arrive at once, 990 of them are dropped. (b) Throttling and concurrency limits both reduce total throughput. (c) Per-key concurrency requires a key that is deterministic from the event data.
Answers: (a) False: events are not dropped; they queue. Inngest's queue is durable; the 990 events wait until concurrency slots open up. (b) False. Throttling caps start-rate; concurrency caps in-flight runs. Neither drops work; both shape when work executes. Throughput over a long window is unchanged if your average load is below the limits. Throughput over a peak is shaped: bursts are absorbed by the queue. (c) True: the key expression is evaluated on the event data; it has to produce a stable string for the same logical scope (customer_id is fine; current_timestamp is not).Try with AI
With my AI coding assistant: design the concurrency and throttling
policy for the customer-support Worker. Constraints:
- OpenAI rate limit: 30 requests per minute, hard cap.
- Postgres connection pool: 20 max connections (the Worker takes 1 per run).
- Some customers send bursts of 30+ emails in a minute (an angry
customer); these shouldn't starve other customers.
- We expect ~1,000 emails per day, with peaks around 9am and 2pm.
Propose:
1. A global concurrency value
2. A per-customer concurrency value
3. A throttle (limit and period)
For each, explain what production failure it protects against and
what the cost is (in queue latency at peak).
Concept 12: Priority and fairness, multi-tenant scaling
Concurrency limits work. Per-key concurrency adds basic fairness. Production-grade multi-tenant systems need more: priorities (Enterprise customers should not wait behind hobbyists for the same compute) and fair-share scheduling (no single tenant can monopolize the system even within their concurrency cap).
Priority. Inngest evaluates a priority expression on each event; runs with higher priority jump the queue ahead of runs with lower priority.
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
concurrency=[inngest.Concurrency(limit=10)],
priority=inngest.Priority(
# Enterprise tier = high priority; Pro = 0; Free = low priority
run="100 - (event.data.customer_tier_priority * 100)",
),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
...
When the concurrency queue has 50 runs waiting, the Enterprise customers' runs go first, then Pro, then Free. Inside the same tier, FIFO order applies. Priority does not override concurrency or throttle limits; it just decides which of the waiting runs gets the next free slot. An Enterprise customer still waits for a slot to open; they just get the next one.
Fair-share scheduling. When you have hundreds of tenants competing for the same global concurrency pool, FIFO plus priority is not enough. A single tenant sending a burst can still occupy most of the slots for minutes. Fair-share scheduling, implemented via the key parameter on concurrency with a thoughtful sizing, gives each tenant a guaranteed slice:
concurrency=[
inngest.Concurrency(limit=50), # global pool
inngest.Concurrency(limit=3, key="event.data.tenant_id"), # max 3 per tenant
],
With this: 50 total slots, no tenant takes more than 3. If 20 tenants are active, that is at most 60 slots requested but only 50 available. Fair-share rotates them through, every tenant gets some share, no one gets shut out.
Predict. You have a customer-support function with
concurrency=10and per-customerconcurrency=2. You also have priority configured: Enterprise = high, Free = low. At 9:00am, the queue has: 5 events from Customer A (Free), 5 events from Customer B (Enterprise), and 10 events from a single new Customer C (Free, just bought their first plan). In what order do they execute? Confidence 1-5.
The answer: it is a multi-level decision. First, the per-customer cap of 2 means at most 2 of each customer's events are eligible to run at once. So the pool of candidates is: 2 from A, 2 from B, 2 from C: six runs eligible immediately. Second, priority decides which of those six fill the first slots: B's two run first (Enterprise), then A's two and C's two (Free, FIFO). So at t=0: 2 of B run, then 2 of A start, then 2 of C start. Total: 6 active. As each finishes, its customer's next queued event becomes eligible and the next slot fills by priority. This is the kind of policy that is a feature in Inngest and a thousand-line scheduler in your own code.Try with AI
With my AI coding assistant: extend the customer-support Worker
configuration with a priority and fair-share scheme. Requirements:
1. Three customer tiers: Enterprise, Pro, Free.
2. Enterprise customers should never wait more than 5 seconds at
peak load.
3. Free tier customers should get fair access: no Free customer
should be starved for more than 60 seconds, even when the
global queue is full.
4. A single noisy customer (regardless of tier) should not occupy
more than 3 slots.
Write the concurrency + priority configuration. For each line of
config, explain which requirement it satisfies.
Concept 13: Batching, cost-effective bulk processing
Some work is naturally batched. You do not summarize each of 10,000 customer conversations independently; you call the LLM with a batch of 50 at a time. You do not write 10,000 audit rows one at a time; you COPY them. Inngest's batch trigger lets you accumulate events and invoke a single function with the batch as input.
@inngest_client.create_function(
fn_id="batch-embed-tickets",
trigger=inngest.TriggerEvent(event="ticket/resolved"),
batch_events=inngest.Batch(
max_size=50, # invoke when 50 events accumulated, OR
timeout=timedelta(seconds=30), # invoke when 30 seconds pass, whichever first
),
)
async def batch_embed_resolved_tickets(ctx: inngest.Context) -> dict[str, int]:
# ctx.events (plural) instead of ctx.event
ticket_ids = [e.data["ticket_id"] for e in ctx.events]
tickets = await ctx.step.run(
"load-tickets", load_tickets_by_ids, ticket_ids,
)
# One embedding call for 50 tickets, not 50 calls for 1 ticket each
embeddings = await ctx.step.run(
"embed-batch", embed_texts_batch,
[t["text"] for t in tickets],
)
await ctx.step.run(
"store-embeddings", store_embeddings_batch,
ticket_ids, embeddings,
)
return {"batched": len(ctx.events)}
What changes: ctx.events is a list, not a single event. The function runs once per batch instead of once per event. The OpenAI embedding API is called with a 50-text batch instead of 50 single-text calls, which is dramatically cheaper (you pay per token, but the per-request overhead is gone) and faster (one API round-trip instead of 50).
Batching is the right tool when the work is naturally bulkable (embeddings, bulk DB writes, bulk emails) and you can tolerate up to your timeout's worth of latency before the work happens. It is the wrong tool when each event requires interactive response or when ordering matters across events in unpredictable ways.
Quick check. True or false. (a) Batched functions still get retries and memoization; the batch as a whole is durably memoized. (b) If the batch timeout expires with only 3 events accumulated, the function will not run until the next 47 arrive. (c) You can combine
batch_eventswithconcurrencyto cap how many batches run in parallel.
Answers: (a) True: the batch is the unit of work; retries replay the whole batch with all its events still in scope. (b) False: that is the whole point of the timeout. After 30 seconds the function runs with whatever is accumulated, even if it is 1 event. (c) True: this is the production pattern. Batch plus concurrency together cap your downstream load nicely.Try with AI
With my AI coding assistant: write a batched Inngest function that
embeds resolved support tickets, converting a per-ticket event
handler into one batched call.
Triggers: 'ticket/resolved' event, batched at 50 events or 30 seconds.
The function should:
1. Load the ticket bodies in one query
2. Call OpenAI embeddings API with a 50-text batch (faster + cheaper)
3. Store the embeddings
4. Emit a 'ticket/embedded' event per ticket for downstream consumers
Use grep_docs to find the OpenAI batch-embedding pattern.
Concept 14: Replay and bulk cancellation, production recovery
Sometimes everything goes wrong at once. You shipped a bug; a thousand runs failed in the last six hours. Or your downstream API was down for 30 minutes; everything that tried to call it during that window died. Or you discovered a logic error and want to redo a day's work after fixing it.
First, the distinction that trips everyone up. Inngest gives you two ways a failed step can re-run, and they behave differently:
- Automatic retry (within the same run). When a step throws, Inngest retries the function with backoff, re-entering from the top. Completed steps return from memo and do not re-execute; only the failing step runs again. This is the memo-preserving resume, the one you watched in the Quick Win, and the one that makes the "$0.20 spent at step 3 is not re-spent" property true. It is automatic and happens inside the original run.
- Replay / Rerun (the dashboard button, across many runs). This starts a brand-new run from the top with your current deployed code, every step re-executing from scratch (a rerun gets a new run id and re-runs the first step, not a resume of the old one). So in practice the old run's memo does not save you here. It is for incident recovery, not for skipping completed work.
Keeping these straight is the whole concept. The memo payoff lives in the automatic retry; Replay is a fresh start.
Two opposite recovery primitives. Replay says "this work failed, I want it to run again on the fixed code." Bulk cancellation says "this work was queued but I no longer want it to happen." Same dashboard surface, opposite intent. Most teams need both within their first three months of running real traffic.
Replay is the recovery primitive. Failed runs persist with their full step history, the input event, and the exception from the failed step. From the dashboard you open the Functions view, filter to a function that has failed runs, select a time window and a failure pattern (any specific error message or just "all failures"), and click Replay. Inngest schedules each as a fresh run from the top on whatever code is deployed now.
Three things to understand about replay.
- Replay uses your current deployed code. If you deployed a fix between when the runs failed and when you replay them, the replayed runs use the new code. This is the whole point: take a population of runs that died on a bug, ship the fix, and re-run them all hands-off.
- Replay re-executes every step; it does not reuse the old run's memo. A replayed run is a new run, so each step runs again from scratch on the fixed code. Cost-wise, plan for the cost of the whole function per replayed run, not just the failed step. The thing that keeps a replay from issuing a second real-world side effect (a duplicate refund, a duplicate email) is not memo, it is an idempotency key on that side effect (Concept 4): you derive a stable key from the request (for a refund, something like
(order_id, request_id)) and the provider treats a repeat as a no-op. The minimal worker in this course omits that key for brevity, its refund matches on the customer and writes unconditionally, so a production version would add one before any real money moves. Memo protects within a run; the idempotency key protects across reruns. - Replay is opt-in. Failed runs sit in the dashboard until you act on them. They do not retry forever; they do not disappear. They wait for you.
Bulk cancellation is the inverse. Sometimes you have thousands of queued or sleeping runs that you no longer want: a campaign got canceled, a customer churned and you no longer want to send them follow-up emails, a feature got rolled back. From the dashboard you select a function and a time window or event filter, and click Cancel. The matching runs terminate cleanly: their step.sleep and step.wait_for_event calls do not resume, queued runs do not start, in-flight runs check for cancellation and exit at the next step boundary. Cancellation respects the step boundary; an in-flight step.run finishes the step it is in before terminating, so you do not get half-completed Stripe charges or torn DB writes.
Replay vs cancellation as a decision. When something has gone wrong with a population of runs, ask one question: do I want this work to succeed or do I want it to not happen? If the work should succeed (bug-fix recovery), replay. If the work should not happen (cancelled campaign, churned customer, rolled-back feature), cancel. If you are unsure (for example, the failed runs include some you want to recover and some that should not have fired in the first place), filter your dashboard query more narrowly so each subset gets the right treatment.
Three patterns this enables in practice:
- The "we shipped a bug" recovery. Find the failed runs in the time window of the bad deploy, fix the bug, ship the fix, replay the failures. The customer experience: their email did not get a reply for an hour but did eventually get one, without you writing any recovery code.
- The "campaign canceled" rollback. A welcome series that fires three follow-up emails over 14 days; the customer churns on day 4. You do not want to send the day-7 and day-14 follow-ups. Bulk-cancel matching
wait-for-eventandsleepruns. - The "schema migration" replay. You changed how the agent formats summaries; you want yesterday's tickets re-summarized with the new format. Find those runs (successful or not) and replay them; because a replay is a fresh run from the top, the agent re-runs every step on the new code, which is exactly what you want here. Keep your side-effecting steps idempotent so re-running them does not double-charge or double-send.
The dev-server MCP makes recovery accessible without leaving your coding agent. During development you can ask the AI to use get_run_status to inspect a failed run, then recover the work by re-firing the event on the fixed code (give it a new event id, since re-firing with the same id is deduplicated to a no-op by Concept 4's idempotency semantics). The dashboard Rerun button is the equivalent one-click path. Either way you get a fresh run on the current code, not a memo-preserving resume.
Quick check. True or false. (a) A dashboard Replay re-runs the work on the new deployed code. (b) A dashboard Replay returns the original run's successful steps from memo and only re-runs the failed one. (c) The automatic retry inside a failing run returns completed steps from memo and re-runs only the failing step. (d) Bulk-canceling a function that is in flight will mid-step abort the currently-executing
step.runto terminate faster.
Answers: (a) True: a replay is a fresh run from the top on whatever is deployed now, which is why it is the tool for bug-fix recovery. (b) False: this is the trap. A replay is a new run that re-executes every step from the top, so the old run's memo does not carry over. What stops a replayed side effect from firing twice is the idempotency key, not memo. (c) True: this is the memo-preserving path, and it is the one you watched in the Quick Win. The completed step sits at one attempt while the failing step retries. (d) False: cancellation respects the step boundary; the current step.run finishes (or fails) before the run terminates. This prevents torn writes.Try with AI
Walk through a recovery scenario with my AI coding assistant:
Yesterday at 14:00 we deployed a change to the worker's agent step.
A bug in the new code made the agent step throw on every run.
From 14:00 to 18:00, 47 customer-support runs failed at that step.
At 18:30 we noticed, fixed the bug, and re-deployed.
Use the dev-server MCP's grep_docs to find Inngest's replay docs,
then:
1. Outline the exact dashboard steps to identify the 47 failed runs.
2. Explain what a dashboard Replay does for one of those runs: is it
a fresh run from the top on the fixed code, or a resume that
reuses the old run's memo? What does that mean for the cost of
replaying all 47?
3. Confirm whether the customers will see one reply or several if a
replayed run re-sends the email, and name the mechanism that
keeps it to one (hint: it is not memo).
4. Identify ONE scenario in this story where you'd prefer to
bulk-cancel instead of replay, and explain why.
Concept 15: HITL gates with step.wait_for_event, Invariant 1 in the runtime
The Agent Factory's Invariant 1 says the human is the principal: authored intent, not the agent's autonomous judgment, is what the runtime must honor on high-stakes decisions. This is the one place a human mind steps back into the loop. Everywhere else the nervous system runs on its own, by reflex; here it pauses and waits for a person. This shows up in production as approval gates: the agent does the analysis, drafts the action, but does not execute the action until a human approves.
Inngest's step.wait_for_event (Concept 8) is the cleanest expression of this on any platform today. The agent runs to the point of decision, suspends, and waits for an approval event. The human reviews (in Slack, in an admin UI, in email) and clicks approve or reject. The event fires. The function resumes with the human's verdict and acts accordingly. This is what spec-driven means at runtime: the nervous system enforces the plan, which action needs a human, in what order, with what timeout. It does not police the agent's reasoning; it controls what the agent is allowed to do.
@inngest_client.create_function(
fn_id="refund-with-hitl-gate",
trigger=inngest.TriggerEvent(event="customer/refund.investigated"),
concurrency=[inngest.Concurrency(limit=5)],
)
async def refund_with_gate(ctx: inngest.Context) -> dict[str, str]:
request_id = ctx.event.data["request_id"]
amount_cents = ctx.event.data["amount_cents"]
# Step 1: the agent's analysis (your worker, run durably)
analysis = await ctx.step.run(
"agent-investigates",
run_refund_investigation_agent,
request_id=request_id,
)
# Step 2: if the agent thinks refund is warranted AND amount > $100,
# gate behind human approval
needs_approval = analysis.recommends_refund and amount_cents >= 10_000
if needs_approval:
await ctx.step.run(
"notify-approver",
send_slack_approval_request,
request_id=request_id,
analysis=analysis,
amount_cents=amount_cents,
)
# === THE HITL GATE ===
approval = await ctx.step.wait_for_event(
"wait-for-human-approval",
event="refund/approval.decided",
timeout=timedelta(hours=24),
if_exp=f"async.data.request_id == '{request_id}'",
)
if approval is None:
# Timeout: no human responded in 24h. Escalate.
await ctx.step.run(
"escalate-timeout",
escalate_to_senior_reviewer,
request_id=request_id,
)
return {"status": "escalated_timeout"}
if not approval.data["approved"]:
await ctx.step.run(
"notify-rejected", notify_customer_rejected,
request_id=request_id,
)
return {"status": "rejected_by_human"}
# Either it was approved, or it didn't need approval
refund = await ctx.step.run(
"issue-refund", call_stripe_refund,
request_id=request_id, amount_cents=amount_cents,
)
await ctx.step.run(
"audit-approved-refund", audit_refund,
request_id=request_id, refund=refund,
approved_by="human" if needs_approval else "auto",
)
return {"status": "issued", "refund_id": refund["id"]}
What you see in code: a sequence of steps, with one wait_for_event in the middle. What is happening at runtime:
- The agent runs (step 1, durably).
- The function decides whether the gate applies (in-code logic, free of side effects).
- If gated: a Slack notification fires (step 2, durable). The function suspends. Zero compute consumed for up to 24 hours.
- A human in Slack clicks Approve or Reject. The admin backend calls
inngest_client.sendwithrefund/approval.decidedand therequest_id. - Inngest matches the event to the suspended function (the
if_expfilter ensures only matching request IDs match). The function resumes at the next line. - The function uses the human's decision to either issue the refund or notify rejection. Both paths audit the decision and the approver.
This is what makes Inngest qualitatively different from a queue-plus-state-machine. The HITL pattern is one primitive. The function's code reads top to bottom, with the gate inline. There is no callback, no state restoration, no if state == waiting_for_approval: ... dispatching. The runtime handles the suspend/resume mechanic; your code expresses the policy.
A later course develops Invariant 1 architecturally: authored intent, spec-driven workflows, the manager-of-workers layer that decides which gates apply to which actions. This course gives you the runtime primitive. When that manager layer arrives, the gate it implements will be exactly this wait_for_event pattern, just composed at fleet scale. Knowing the primitive now means the architectural pattern later reads as "a sensible composition" rather than "magic."
This is the keystone you build in Part 4's Decision 5: the refund approval, made durable. The concept here is the shape; the worked example wires it to a real needs_approval tool and proves the refund fires exactly once.
Predict. You have an HITL gate set with
timeout=timedelta(hours=24). A customer's refund request comes in at 17:00 on a Friday. No human is online over the weekend. The gate's timeout fires at 17:00 Saturday. Your timeout handler records a blocked refund. The reviewer reads the request Monday at 9:00am. Walk through the timeline: how many function runs were active during the weekend? How much compute did Inngest charge for? Confidence 1-5.
The answer: zero active function runs during the weekend. The function was suspended: Inngest stored its state, paged the function out of memory, and waited for either the event or the timeout. Inngest does not bill for suspended time. When Saturday 17:00 came and the timeout fired, the function resumed for the few hundred milliseconds it took to write the blocked-refund audit row, then completed. The fact that the reviewer does not look until Monday costs nothing from the worker's side. The economics of HITL workflows on Inngest are dramatically different from polling-based queues that bill you for every second of "is this approved yet?" polling.Try with AI
With my AI coding assistant: design a durable refund-approval gate.
Specification:
1. The agent investigates and decides a refund is warranted, but the
refund tool needs human approval before it runs.
2. The gate should:
- Notify the on-call reviewer with the agent's recommendation
- Wait up to 4 hours for the reviewer to approve or reject
- On approve: issue the refund.
- On reject: do not issue; record a blocked refund.
- On 4-hour timeout: do not issue; record a blocked refund.
3. Every branch (approve/reject/timeout) writes an audit row from a
small fixed set of action names, capturing what was decided.
Use the dev-server MCP's send_event to simulate each branch of
the reviewer's decision during testing.
Part 4: The worked example, a customer-support Production Worker
This is where you build. First the worker (one prompt), then the nervous system around it, one layer per prompt. You direct your coding agent in short plain-English prompts and it writes the code; the snippets shown below are the few load-bearing lines of each layer, not the files. The full implementation was run end-to-end against a live dev server and a real model, so the shapes you see are what runs. If a signature looks unfamiliar, your agent checks the current docs.
The shape: seven prompts, on the base you already set up.
- D0 builds the worker itself, standalone.
- D1 makes the agent run durable.
- D2 lets an event wake it.
- D3 adds a daily cron that fans out.
- D4 adds flow control.
- D5 is the keystone: a durable human-approval gate on refunds.
- D6 proves the worker survives a broken step: retry without redoing completed work, then recover.
Before you start. Your environment is already set up from the Quick Win: open the same
ai-agent-nervous-systemfolder, with the Inngest andneon-postgresSkills installed, yourOPENAI_API_KEYand your NeonDATABASE_URLin.env, yourcustomersandaudit_logtables provisioned, and all three MCP servers (Neon, Context7,inngest-dev) wired. Two reminders only:
- The dev server is running. Start it again if you closed it:
npx inngest-cli@latest devin its own terminal. The dashboard is athttp://127.0.0.1:8288. (When you later deploy to Inngest Cloud, the free Hobby tier is $0 with no credit card; its ceilings are in Part 5.)- One casing note for the MCP calls below. The dev-server tool names are
snake_case(send_event,get_run_status,invoke_function), but their parameters arecamelCase(get_run_statustakesrunId,invoke_functiontakesfunctionId). The Python SDK issnake_casethroughout; only the MCP call parameters arecamelCase.
The brief
You build a small customer-support worker and give it a Production Worker nervous system. The worker reads its sample customers from the Neon customers table (id, email, tier), drafts a warm reply to an incoming email, can issue a refund only with human approval, and writes an audit row into the Neon audit_log table for every action from a small fixed set: message_received, message_sent, refund_issued, refund_blocked. The seven prompts then add Inngest around it: an event wakes it, the agent call runs durably, a daily cron fans out a health check per eligible customer, flow control caps concurrency and throttle, the refund pauses on a durable human gate, and a replay path recovers failed runs.
A note on the prompts that follow. Each one is written the way you would actually say it to a coding agent: short, plain, trusting it to handle the detail. They work pasted cold, and better still if you first ask the agent to orient ("read the project and tell me what you see, then ask me anything unclear before you start") as the files pile up. The prompts are the destination; orienting first is the on-ramp.
D0: Build the worker, standalone
Where you are: the base is open, the dev server is running, and your Neon store is provisioned, but no worker exists yet. This Decision builds the standalone worker; by the end it runs on a sample email and writes an audit row to Neon.
The base already ships an AGENTS.md your agent read on open, so it knows the project; that is why these prompts stay short. The one rule in it worth keeping in your own head is the architectural invariant of the whole course: the worker's own code never imports from inngest. The agent and its tools stay plain Python; the nervous system wraps them from outside. That separation, the agent and the nervous system kept apart, is what lets you swap Inngest for Temporal or Restate later and leave the worker untouched.
Your Neon system of record is already provisioned from the Quick Win: the customers and audit_log tables exist, and DATABASE_URL is in your .env. So the worker reads and writes that database from the start. Now build the worker. Paste this:
Build me a minimal customer-support agent with the OpenAI Agents SDK, running in a local sandbox. It reads the sample customers from my Neon
customerstable (each row has an id, email, and tier), drafts a warm reply to an incoming customer email, and can issue a refund, but the refund tool needs human approval before it runs. Write an audit row into my Neonaudit_logtable for every action, using a small fixed set of action names and theDATABASE_URLin.env. Seed thecustomerstable with five sample rows first if it is empty. Keep it small; it exists to be wrapped, not shipped. Then run it on a sample email and show me the reply.
Creates: worker.py and db.py (a flat project, no src/ nesting). D1 adds the Inngest host as the third file. The agent reaches Postgres through DATABASE_URL, never through the Neon MCP server, which is your build-time tool only.
The seed data is tiny enough to keep on the page, five sample customers across three tiers, which the agent inserts into the customers table:
[
{ "id": "cust_001", "email": "ada@example.com", "tier": "enterprise" },
{ "id": "cust_002", "email": "grace@example.com", "tier": "pro" },
{ "id": "cust_003", "email": "linus@example.com", "tier": "pro" },
{ "id": "cust_004", "email": "edsger@example.com", "tier": "standard" },
{ "id": "cust_005", "email": "alan@example.com", "tier": "standard" }
]
Your agent writes two short Python files. db.py holds the Postgres access: a small pooled asyncpg connection over DATABASE_URL, a load_customers() read, and a record() audit-write helper with a closed vocabulary, any action name outside the four-item set raises, which turns a typo into a loud error instead of a silent bad row. worker.py is a SandboxAgent with two tools that call into db.py. Only one line of it is load-bearing for the rest of the course, the refund tool's decorator:
@function_tool(needs_approval=True)
def issue_refund(order_id: str, amount_cents: int, reason: str) -> str:
...
That needs_approval=True makes the agent pause instead of issuing the refund: the run comes back with the refund pending and a human decides. It is the hook the whole HITL keystone (D5) hangs off. (This floor gates every refund, which keeps the keystone simple; a production worker would usually gate only above a threshold, the over-$100 pattern from Concept 15. The wiring is identical either way.)
One structural note to confirm in what the agent writes, because D5 depends on it: keep build_agent() and the sandbox run_config() as separate functions. When D5 resumes a paused run it rebuilds the agent to the same tool shape and re-passes the same run_config(); the saved state does not carry the sandbox session, so the resume must supply it again. Factor them apart now and the keystone is a small step later.
Done when: the agent runs on a sample email and prints a short reply, and a new row is in the Neon audit_log table (check it in the console, or ask your agent to read it back over the Neon tools). If the email describes a refund, the run pauses at the refund tool rather than issuing it; that pause is the whole point, and D5 makes it durable.
The prompts in this Part assume a frontier-class coding agent (Claude Sonnet or Opus, a GPT-5-class model, or Gemini 2.5 Pro). The Inngest architecture you are learning (events, steps, memoization, flow control) is SDK-level and holds whatever model drives your agent. But the build experience leans on strong instruction-following, especially the D5 keystone. On a weaker model, expect to iterate on a prompt more than once and to spell out the file names. The architecture is not broken; the prompting just needs more scaffolding.
D1: Make the agent run durable
Where you are: a worker that runs only when you call it, losing everything on a crash mid-run. This Decision wraps the agent call in step.run; by the end a completed run shows the agent step memoized in the dashboard.
The nervous system begins here: wrap the whole agent call in a single step.run so it is durable and memoized. Paste this:
Wrap the agent run in an Inngest durable function so it survives crashes and retries transient failures. The whole agent call goes inside a single
step.runso it is memoized. Run it in local dev mode against the Inngest dev server, with a FastAPI host. Confirm a completed run shows the agent step memoized in the dashboard.
Creates: inngest_app.py (an Inngest client in dev mode, the agent call in one helper, and a FastAPI host the dev server discovers).
The shape that matters is one step.run wrapping the agent call:
async def handle_customer_email(ctx: inngest.Context) -> dict:
email_text = ctx.event.data["email_text"]
outcome = await ctx.step.run("run-agent", functools.partial(_run_agent, email_text))
return {"replied": outcome["status"] == "done"}
Two idioms to confirm in what the agent writes. The step handler takes no arguments of its own, so functools.partial binds email_text ahead of time, that is how you pass data into any step, and you will see it on every step from here on. And the agent helper uses plain Runner.run, not a streamed runner: it is the path the human-approval keystone (D5) is built on, so using it from the start makes D5 a small step rather than a rewrite. The client is constructed is_production=False (the dev-mode flag from the Quick Win).
Run it as two processes, the function host and the dev server that finds it:
uv run uvicorn inngest_app:app --port 8000 --reload --log-level info # terminal 1: function host (your model key is sourced here; --reload picks up the D6 break/fix edits)
npx inngest-cli@latest dev -u http://127.0.0.1:8000/api/inngest # terminal 2: dev server, auto-discovers the host
Done when: the dashboard lists handle-customer-email and a completed run shows the run-agent step. (You wake it properly with an event in D2; for now, the function being discoverable is enough.)
Why this is the load-bearing move. The agent call is the expensive part: model tokens, several seconds. Inside step.run its result is memoized, so when a later step fails and the function retries, the agent does not run again. That single wrapping is the difference between a worker that double-pays and double-acts on every retry and one that does each expensive thing exactly once.
D2: Trigger it on an event
Where you are: a durable function already triggered by customer/email.received (D1's decorator), but with no audit trail. This Decision adds an audit row on each side of the agent; by the end a real event drives a run with both rows written.
Add an audit step before the agent and one after, then wake the worker with a real event instead of running it by hand. Paste this:
Make the worker wake on a
customer/email.receivedevent instead of being run by hand. Add an ingress audit step before the agent and a reply audit step after it. Send a test event and show me the run completing with both audit rows.
Edits: inngest_app.py (the function gains an audit step on each side of the agent).
The shape is two more step.run calls around the agent step:
customer_id = ctx.event.data.get("customer_id") # bound from the event, alongside D1's email_text
await ctx.step.run("audit-received", functools.partial(
db.record, "message_received", customer_id=customer_id, detail=email_text[:80]))
outcome = await ctx.step.run("run-agent", functools.partial(_run_agent, email_text))
await ctx.step.run("audit-sent", functools.partial(
db.record, "message_sent", customer_id=customer_id, detail=(outcome["reply"] or "")[:80]))
Each row uses an action name from the closed set: message_received in, message_sent out, and db.record writes it to the Neon audit_log table over DATABASE_URL. Send the test event from the agent with the dev-server MCP's send_event tool (name: "customer/email.received", a data object with email_text and customer_id). The dev server accepts any event, so you configure no webhook to test locally; in production you would point your email provider at an Inngest webhook URL that reshapes its payload into this event, which is a dashboard setting, not code.
Done when: the run completes, the trace shows three steps in order (audit-received, run-agent, audit-sent), and the Neon audit_log table has one message_received and one message_sent row for that customer.
Why two audit steps, not one. Each is its own step.run, so each is independently memoized. If the reply step fails and the function retries, the ingress row is not written twice (memo hit) and the agent does not run twice (also memoized). The audit trail stays exactly-once across retries, the property D6 will prove.
D3: A daily cron that fans out
Where you are: a worker the world wakes one email at a time. This Decision adds a daily cron that fans out one event per eligible customer; by the end each gets its own durable child run.
Add scheduled work: a daily cron that fires one health-check event per Pro and Enterprise customer, each event triggering its own durable run. Paste this:
Add a daily cron that fans out one
customer/health_check.requestedevent per Pro and Enterprise customer, each one idempotency-keyed so a re-delivered cron run never double-fires. Each child event triggers its own durable run that writes one audit row. Invoke the cron manually and show me one child run per eligible customer.
Creates: a cron parent that fans out and an event consumer that handles each child, both registered with the host.
Two shapes carry this Decision. The trigger is a one-line cron decorator, and the fan-out is N events each carrying an idempotency key:
@inngest_client.create_function(fn_id="daily-health-check", trigger=inngest.TriggerCron(cron="0 9 * * *"))
async def daily_health_check(ctx: inngest.Context) -> dict:
# ... select Pro/Enterprise customers, then:
events = [
inngest.Event(
name="customer/health_check.requested",
data={"customer_id": c["id"]},
id=f"health-{c['id']}-{ctx.event.id}", # idempotency key per (customer, cron run)
)
for c in eligible
]
await ctx.step.send_event("fan-out-health-checks", events)
The idempotency key is the load-bearing detail: id=f"health-{customer}-{cron_run}" means that if the same cron run is delivered twice (a redeploy, a retry), the duplicate event drops, so each customer gets exactly one check per day. The consumer is an ordinary event-triggered function that writes one audit row. Invoke the cron from the agent with the MCP's invoke_function tool (don't wait for 09:00 tomorrow). One dev quirk: the dev server only fires crons while it is running; production runs them on Inngest's always-on infrastructure.
Done when: the parent completes in seconds and the dashboard shows one customer-health-check child run per eligible customer, with the standard-tier customers correctly skipped.
Why fan-out, not a loop. The parent does not process the customers itself; it sends N events and returns. Each child is its own run, isolated, independently retryable, capped by its own concurrency. A loop inside one function would couple them: one slow customer holds up the rest, and a crash loses the whole batch. Fan-out is how one scheduled wake-up becomes N independent durable runs.
D4: Flow control
Where you are: a worker that handles each email but would fire all of them at once under a burst. This Decision adds three flow-control policies; by the end a twenty-event burst queues under the cap with no dropped or duplicated rows.
When five hundred emails land at 9am, the worker should not fire five hundred model calls at once: that blows the rate limit and starves everyone behind the noisy customer. Add a global concurrency cap, a per-customer cap, and a throttle. Paste this:
Add flow control to the email handler: a global concurrency cap, a per-customer concurrency key so one noisy customer can't starve the rest, and a throttle to protect the OpenAI rate limit. Fire a burst of twenty events across five customers and show me they queue under the cap and all complete with no dropped or duplicated audit rows.
Edits: inngest_app.py (three decorator arguments on the email function).
These three arguments are the lesson, all of D4 lives in them:
concurrency=[
inngest.Concurrency(limit=10), # global cap
inngest.Concurrency(limit=2, key="event.data.customer_id"), # per-customer cap
],
throttle=inngest.Throttle(limit=100, period=datetime.timedelta(minutes=1)),
Three knobs, three jobs. The global limit=10 caps how many runs execute at once, protecting two real ceilings: the model's rate limit, and your Neon connection budget. Two things bound your connections, and they work at different scales. Within a single worker replica, all runs share one asyncpg pool, so the pool's max_size is what holds connections flat no matter how many runs are active (a twenty-run burst on one host still rides a handful of pooled connections). Across replicas, that local pool no longer helps, replica two has its own pool, so the concurrency cap is what bounds total runs, and therefore total connections, fleet-wide: ten replicas at limit=10 each is a hundred runs and a hundred-ish connections, which you size against Neon's budget (the free tier allows a few hundred pooled). Pool and cap together are the protection: the pool bounds one replica, the cap bounds the fleet. Without either, a five-hundred-email burst across unpooled, uncapped replicas opens far more connections than Neon will accept. The per-customer limit=2 keyed on event.data.customer_id means one customer's burst occupies at most two slots, so a flood from one account never starves the others. The throttle caps how many runs start per minute, smoothing a spike into a steady rate. A function carries at most two concurrency policies; the global-plus-per-key pair is the common shape. Fire the burst from the agent: twenty customer/email.received events across five customers via send_event.
Done when: the burst queues under the cap (the running count stays at or below 10, and at or below 2 per customer), every run completes, and the Neon audit_log table has exactly twenty message_received and twenty message_sent rows. No dropped runs, no duplicates, and no Neon connection-limit errors under the burst, on this single host the asyncpg pool holds the connections flat (you will see only a handful in use even with the burst running), and the cap is what would hold them flat across replicas once you scale out.
Why these are policy, not code. None of this lives in your function body; it is configuration the runtime enforces. Without the caps, a burst either melts a downstream system or lets one tenant monopolize the worker. Writing the same fairness by hand is a queue plus a scheduler plus a rate limiter, hundreds of lines. Here it is three decorator arguments.
D5: A durable human-approval gate on refunds (the keystone)
Where you are: a worker whose refund pause (D0's needs_approval=True) is ephemeral, living in the running process. This Decision makes that pause durable; by the end the run suspends at zero compute, waits for a real approval event, and resumes to issue the refund exactly once.
That ephemeral pause is the gap: a crash, a deploy, or a reviewer who takes the afternoon, and the pending refund is gone. This is the keystone of the whole course: make the pause durable, so the function suspends at zero compute, waits for a real approval event for as long as it takes, then resumes the exact same agent run. Paste this:
The refund approval is currently an in-process pause that a crash or a slow reviewer would lose. Make it durable: when the agent pauses on the refund, persist its serialized run state as the step's output, then suspend the whole function on
step.wait_for_eventwaiting for arefund/approval.decidedevent (give it a four-hour timeout and match it to this customer). When the decision arrives, rehydrate the state, apply approve or reject, and resume the agent so the refund fires exactly once. Drive a refund, show me the run suspended and waiting, send an approval, and show me exactly one refund audit row. Then do it again with a rejection and show me a blocked row and no refund.
Edits: inngest_app.py (the agent helpers learn to pause and resume; the email function gains the gate).
This Decision earns more code than the others, because the suspend-and-resume dance is the lesson. When the agent pauses, it serializes its run state; when the decision arrives, you rehydrate that state, apply approve or reject, and resume:
async def _run_agent(email_text: str) -> dict:
agent = worker.build_agent()
result = await Runner.run(agent, email_text, run_config=worker.run_config())
if result.interruptions: # the refund tool paused for approval
return {"status": "needs_approval", "state": result.to_state().to_string()}
return {"status": "done", "reply": result.final_output}
async def _resume_agent(state_str: str, approved: bool, rejection_message: str | None) -> dict:
agent = worker.build_agent()
state = await RunState.from_string(agent, state_str)
for item in state.get_interruptions():
if approved:
state.approve(item)
else:
state.reject(item, rejection_message=rejection_message or "Refund denied.")
db.record("refund_blocked", detail=f"args={item.arguments}")
result = await Runner.run(agent, state, run_config=worker.run_config())
return {"status": "resumed", "reply": result.final_output}
Inside the email function, the gate is one inline wait_for_event where the agent paused; the decision drives a resume step:
decision = await ctx.step.wait_for_event(
"await-refund-approval",
event="refund/approval.decided",
timeout=datetime.timedelta(hours=4),
if_exp=f"async.data.customer_id == '{customer_id}'",
)
# (decision is None on timeout -> write a refund_blocked row and return)
resumed = await ctx.step.run("resume-agent", functools.partial(
_resume_agent, outcome["state"], bool(decision.data.get("approved")), decision.data.get("rejection_message")))
Read it top to bottom: the gate is one inline call in an otherwise ordinary function. No callback, no state-machine dispatch, no if status == waiting: branching across invocations. The runtime handles suspend and resume; your code expresses the policy. Four details earn their place:
result.to_state().to_string()serializes the paused run, and it becomes therun-agentstep's output, so it is durably stored.to_state()is synchronous;to_string()returns the string you persist.RunState.from_string(agent, s)is awaited (it is a coroutine) and takes that stored string directly. You thenapproveorrejectoverstate.get_interruptions()and callRunner.run(agent, state, ...)to resume. (One resume can leave approvals pending, so the real helper loops until none remain.)- The same
run_config()is re-passed on resume, and the agent is rebuilt to the same tool shape. The serialized state does not carry the sandbox session, so the resume must supply it again. This is the one detail that, if missed, makes the resumed run fail. (D0 factoredbuild_agentandrun_configapart for exactly this.) if_expmatches the decision to this customer (async.data.customer_id == '...'), so an approval for one customer never resumes a different customer's run.
To drive it from the agent: send a customer/email.received event whose email describes a refund, watch the run suspend at await-refund-approval (the dashboard shows it WAITING, with run status RUNNING but zero compute), then send refund/approval.decided with {"approved": true, "customer_id": "cust_001"} via send_event. Do it again with {"approved": false}.
Done when: on approval, the suspended run resumes and the Neon audit_log table has exactly one refund_issued row. On rejection, the run resumes, the audit has a refund_blocked row and no refund_issued, and the agent's reply explains the denial.
Why this is the keystone. Every other layer (the senses, the reflexes, balance) keeps the worker correct or healthy on its own. This one is where the human mind re-enters the loop on a high-stakes action, durably, for as long as it takes, at zero cost while waiting. A queue-plus-database-plus-poller version of this is a small project. Here it is one wait_for_event and a resume.
D6: Prove durability survives a broken step
Where you are: a full worker with every layer wrapped. This Decision proves the property that justified all of it; by the end you have watched a broken run retry its failing step many times while its completed audit step runs exactly once, then recovered the work on a fresh run.
The last property to prove is the one that justified all of this, the memoization mechanic from Concept 7. You understood it there; now prove it in your own worker. Paste this:
Deliberately break the agent step so it fails, fire an event, and show me Inngest retrying it while the earlier audit step stays memoized, so the failing run writes its ingress audit row exactly once across all the agent retries. Then fix the step and recover the work, and show me the recovery completing.
Break the agent step on purpose (raise a ValueError inside _run_agent), fire a few customer/email.received events for different customers, and read each run's trace. This is the proof, and it is inside each failing run: audit-received shows one completed attempt and writes its row once; run-agent shows several Attempts as it retries with backoff (Inngest defaults to several attempts) and then fails; audit-sent never runs. The audit step sitting at one attempt while the agent step climbs is the memoization from Concept 7, now visible in your own worker: the failing run writes only one message_received row no matter how many times the agent step retries.
Then revert the break (the host auto-reloads if you ran it with --reload; otherwise restart it) and recover the work by re-firing the event on the fixed code (or, for a real bad-deploy batch, the dashboard Rerun button; both start a fresh run from the top, covered in Concept 14). Here is the part that surprises people, and it is correct behavior, not a bug: the recovery is a brand-new run, so it runs audit-received again and writes its own message_received row. After a break-then-recover, that customer legitimately has two message_received rows, one from the failed run, one from the recovery. Memoization is a within-run guarantee; it never spans two separate runs.
Done when: in the failed run's trace, audit-received sat at one attempt and wrote one row while run-agent accrued several attempts and failed, that one-attempt-despite-N-retries is the memoization, proven. Then the recovery run completes run-agent and audit-sent on the fixed code. Query the Neon audit_log in the console (or ask your agent to read it back over the Neon tools): a customer you broke-and-recovered will have two message_received rows (failed run plus recovery) and one message_sent (only the recovery got that far), which is exactly right. The real diagnostic is per-run, not per-customer: open a single run's trace and confirm audit-received shows one attempt. If one run's trace shows the ingress step running twice, that is a memoization bug (usually a non-unique step name); two rows spread across two separate runs is not.
Why this is the bright line. A worker that loses customer work on a bad deploy is just an agent you call. A worker that takes the same bad deploy, fails loudly, retries the broken step without redoing the work it already finished (the agent step's many attempts, but the ingress audit written once), and recovers cleanly on a fresh run after the fix, is a Production Worker. The proof is the failed run's own trace, one ingress attempt against many agent attempts, not a row count across runs.
Point this same nervous system at your own SandboxAgent worker instead of the minimal floor; the wrapping is identical. And this step.wait_for_event approval replaces the hand-rolled run-state table from that course's optional Decision 10: the durable gate you just built is the persistence layer, so you can delete the table.
What just happened
You built a small customer-support worker and gave it a nervous system, one layer at a time. The worker's internals never changed after D0: same SandboxAgent, same two tools, same Neon Postgres audit trail. What changed is everything around it. It now wakes on a customer/email.received event and on a daily cron that fans out per eligible customer, runs durably (the agent call inside step.run), respects flow control (global and per-customer concurrency, a throttle), gates refunds on a durable human approval (step.wait_for_event), and recovers from a bad deploy by replaying failed runs, with the audit trail showing that within any single run each step fired exactly once, no matter how many times that run retried.
The agent code is the same; its reach is not. You started with an agent you operate, prompt it, watch it, prompt it again. You now have a worker that operates on its own: the world wakes it, its reflexes carry it through failures, it holds its balance under load, and a human steps in only where the stakes demand one. That is the line the opening drew, between an agent you operate and an FTE that operates on its own, and you just built across it.
The remaining concerns are observability at scale, multi-worker coordination, and the manager layer that decides which workers handle which traffic. That is the next course in the track. This course covers the unit of production-ready execution; the next composes those units into a workforce.
Part 5: Where this course leaves off
The cost shape of a Production Worker
Two cost surfaces matter: infrastructure cost (Inngest, and whatever store and compute you run the worker on) and inference cost (model tokens). Infrastructure stays roughly flat as load increases; inference scales linearly. The method below is what to learn; any dollar figure goes stale the week it ships, so treat the numbers as illustrative and check current pricing pages before you put a number in a budget.
Inngest pricing. Inngest charges per execution: each function run, plus each step-level retry, counts as one execution.
| Tier | Price | Executions / month | Concurrent steps | Notable |
|---|---|---|---|---|
| Hobby | $0 | 50,000 | 5 | 3 users, 50 realtime connections, no credit card |
| Pro | from $75 / month | 1,000,000 | 100+ | 1000+ realtime connections, 15+ users, 7-day trace retention |
| Enterprise | custom | custom | 500-50,000 | SAML / RBAC, 90-day trace retention, dedicated support |
Events pricing layers on top: the first 1-5M events per day are included; above that, overage starts around $0.000050 per event and declines at higher volume. Pro adds $50 per additional 1M executions when you blow past the 1M cap.
Hobby-tier ceilings that matter here. The 5-concurrent-step cap means that even if you declare concurrency=Concurrency(limit=10) in code, the platform's account-level cap holds you at 5. Your code is correct for production; observed concurrency on the free tier is 5. step.sleep and step.sleep_until are also tier-bounded: up to seven days on the free Hobby plan, up to one year on paid plans (Inngest usage limits).
Inference cost dominates. A typical customer-support run uses a few thousand to ten thousand model tokens per conversation. Multiply your per-token price by your tokens-per-email by your emails-per-day and you have the line that matters; for most workers it dwarfs everything else. This is what you optimize. Everything else is a rounding error. The two highest-value levers: keep a stable cached prompt prefix (so the model bills the repeated part at the cheaper cached rate, not full price on every call), and route easy turns to a cheaper model.
Three Inngest-specific cost levers once you are in the optimization zone:
- Do not wrap pure functions in
step.run. If a function has no side effects, it does not need durability; wrapping it adds a step-run charge for no benefit. Savestep.runfor I/O and side effects. - Use
batch_eventsfor bulk paths. A 50-event batch is one function run, not 50. - Suspend cheaply with
step.sleepandstep.wait_for_event. Suspended functions do not bill for the suspension time. A 3-day delayed-followup costs the same as a 3-second one.
The shape at scale: inference is the bill that grows with traffic; Inngest, your data store, and compute stay comparatively flat. Run the same multiplication at your real volume rather than trusting a figure printed here.
Swap guide: the nervous system is invariant, the platform is not
This course names Inngest at every layer. That is because a teaching example needs concrete answers, not "use any orchestrator you like." But the architecture works with any compliant alternative. Five swaps the course's design explicitly anticipates:
-
Trigger surface: Inngest events → Temporal signals, Restate handlers, AWS EventBridge + Lambda. Each platform has a way to express "this code runs when this named thing happens." The event names, the payload shapes, and the idempotency discipline all transfer. What changes: the SDK's decorator syntax and the dashboard.
-
Durable execution: Inngest
step.run→ Temporal activities, Restate handlers, custom Postgres-backed state machines. Each gives you the "memoize this side-effecting call, retry on transient failure, resume after crash" semantics. Temporal is the closest analog and the older, more enterprise-tested option. Restate is the newest and has a more functional-programming flavor. Custom state machines are what teams write when they cannot adopt a managed platform; usually 1,000-10,000 lines of code that recreate ~70% of what Inngest gives you for free. -
HITL primitive:
step.wait_for_event→ Temporal'sawait Workflow.execute_activity(approval_signal), Restate's awakeables, custom Redis/Postgres approval queues. The pattern is the same: function suspends, external signal resumes it, audit captures the decision. Inngest's expression is the cleanest at writing; Temporal's is more verbose but battle-tested at large scale. -
Cron scheduling: Inngest cron triggers → Kubernetes CronJobs + queue, GitHub Actions schedules, AWS EventBridge schedules. Cron triggers are commodity. The Inngest advantage is not having cron; it is that cron-triggered functions get the same durability/replay/flow-control as event-triggered ones, automatically. Other platforms make you wire that yourself.
-
Flow control: Inngest concurrency + throttle → Temporal task queues with worker concurrency, Redis-backed rate limiters, AWS SQS message visibility timeouts. Other platforms can do this; Inngest does it with the configuration density we have seen (one decorator argument).
Dapr as the open companion at production scale. A more ambitious replacement worth naming: Dapr Agents as the structural companion to Inngest at production scale, in the way OpenCode is to Claude Code. Dapr Agents reached v1.0 GA on March 23, 2026 under CNCF governance (CNCF announcement, Dapr Agents core concepts). DurableAgent is the production-ready class; the older Agent class is deprecated. Choose Dapr when Kubernetes-native deployment and multi-language SDKs matter more than Inngest's local dev experience. Inngest is the better learning tool (the dashboard makes the mental model visible); Dapr is the better scale tool when you have hit Inngest's tier ceilings or need K8s-native multi-language deployment.
Inngest is also open source (github.com/inngest/inngest; the 1.0 release added self-hosting support in September 2024) and self-hostable via Helm + KEDA. The axes that matter at scale are governance, support, and maturity: Inngest is governed by a single vendor with a young self-hosting story; Dapr is CNCF-governed with a longer production track record.
| This course's concept | Inngest primitive | Dapr production analogue | Teaching note |
|---|---|---|---|
| Scheduled work | TriggerCron | Cron input binding / Dapr Scheduler | Same idea: time wakes the Worker. Dapr usually requires component configuration. |
| Webhook/event ingress | Inngest webhook endpoint → event | HTTP endpoint, input bindings, or pub/sub ingress | Inngest hides more plumbing; Dapr gives infrastructure control. |
| Internal events | inngest_client.send() | Dapr pub/sub | Same event-driven mental model; broker is pluggable in Dapr. |
| Fan-out | One event triggers many functions | One topic/event consumed by many services | Same architecture; Dapr uses broker/topic/subscriber composition. |
| Durable steps | step.run() + memoization | Dapr Workflows + activities | Similar production purpose, different developer model. |
| Waiting without compute | step.sleep() | Durable workflow timers | Both avoid holding a process open while waiting. |
| Human approval gate | step.wait_for_event() | Workflow external events/signals, pub/sub, actors | Inngest expression is simpler; Dapr is more composable. |
| Retries | Function/step retries | Workflow/activity retries + resiliency policies | Dapr makes resiliency a runtime policy as well as workflow behavior. |
| Dead-letter / failed runs | Inngest dashboard failed runs + replay | Broker DLQ + workflow status/restart/manual tooling | Inngest is more turnkey here; Dapr is more infrastructure-native. |
| Flow control | Concurrency, throttling, priority, batching | Kubernetes scaling, app concurrency, broker controls, resiliency policies, bulk pub/sub | Dapr can do it, but it is not one decorator argument. Inngest is denser. |
| Stateful coordination | wait_for_event, event keys, step state | Actors + state store + workflows | Dapr Actors are stronger for long-lived identity/stateful coordination. |
| Agent runtime | Your agent inside Inngest function | DurableAgent / Dapr Agents v1.0 GA | Dapr Agents explicitly makes the agent workflow-backed and resumable. |
This table is a translation guide, not a claim of identical APIs. Inngest teaches the production pattern with a compact developer experience: triggers, steps, waits, replay, and flow control in one product surface. Dapr implements the same production architecture through distributed-systems building blocks: bindings, pub/sub, workflows, actors, state, resiliency, and Kubernetes-native operations. The concepts transfer directly; the implementation style changes. Verified against Dapr's bindings overview and Dapr Agents core concepts as of May 2026.
Three reasons to reach for Dapr at production scale:
- CNCF-governed, vendor-neutral by charter: no single vendor controls the platform or your dependence on it.
- Polyglot with first-class Python. Dapr Agents is Python-first; the same agent code can run alongside services written in JavaScript, Go, .NET, Java, or PHP without anyone learning a second framework.
- Horizontally scalable on Kubernetes by design. Run in your own cluster, in a managed offering (Diagrid Catalyst), or locally via
dapr init. The scaling story is the same architecture in every environment.
The honest caveat: Dapr is not a getting-started platform. Running it in production means Kubernetes, state store, pub/sub broker, placement service, observability, YAML components, sidecars. That is a lot of operational surface when your goal is still to learn the patterns, which is why this course starts on Inngest: one command, and the dashboard appears. Reach for Dapr once the patterns have landed and the question shifts to running at organizational scale on infrastructure you control.
Learn the concepts on Inngest and the OpenAI Agents SDK first: fast feedback loop, minimal infrastructure, focus on the patterns. When you reach the scale where Kubernetes governance, polyglot teams, or vendor-neutrality become non-negotiable, the same architectural patterns lift onto Dapr with the translation table above as your key. The patterns transfer; the substrate changes; what you learned in this course remains the load-bearing knowledge.
What this course doesn't cover (yet)
The worker you built satisfies four of the Seven Invariants the thesis sets out. Specifically: it runs on an engine (Invariant 4, the SandboxAgent), against a system of record (Invariant 5, the audit trail), with the world able to call it (Invariant 7, the triggers you added), and with the human as principal at a gated decision (Invariant 1, partial: the runtime mechanism is here, the broader architectural pattern is later). The remaining three Invariants, and the broader architecture that makes a workforce out of workers, are later courses. One bullet each:
- Invariant 2: Every human needs a delegate. A personal agent at the edge that holds your context, represents your judgment, and brokers work to the workforce. The thesis names OpenClaw as the current realization.
- Invariant 3: The workforce needs a manager. An orchestrator that assigns work, enforces budgets, audits execution, exposes hiring as a callable capability. The thesis names Paperclip.
- Invariant 6: The workforce is expandable under policy. A meta-layer where an authorized agent generates a prompt, provisions a runtime, and registers a new Worker, without waking a human. Claude Managed Agents is one realization.
A single worker waking on events, running durably, and gating on humans is the smallest unit of the architecture this course teaches. The next course extends that worker into a workforce: multiple workers coordinated by a manager, expandable on demand, woken by triggers, governed by spec. Same OpenAI Agents SDK foundation, same audit habit, same Inngest nervous system. The architecture is invariant.
How to actually get good at this
Reading this crash course does not make you good at building Production Workers. Using it does. You start by building the worker, feel the friction as you wrap it, and let each piece of friction teach you which concept it belongs to.
The mapping for this course:
- "Why does my function not fire when the event arrives?" → event name typo or namespace mismatch (Concept 3). Compare the event name string in your
TriggerEventto the one ininngest_client.sendbyte-for-byte. - "Why did my function fire twice for the same logical event?" → missing idempotency key (Concept 4). Add an
id=to the event with a deterministic seed. - "Why did my function 'lose work' after a deploy?" → code outside
step.rundoing the work (Concept 7). Wrap the I/O and side effects in named steps. - "Why did the customer get charged twice?" → the Stripe call was outside
step.run, or the step name was not unique (Concepts 6 and 7). Move the call into a namedstep.run; make the step name globally unique within the function. - "Why does OpenAI return 429 errors at the 9am peak?" → missing throttle (Concept 11). Add
throttle=Throttle(limit=N, period=timedelta(minutes=1)). - "Why does one customer's bursts starve other customers?" → missing per-key concurrency (Concept 12). Add a second
Concurrency(limit=2, key="event.data.customer_id"). - "Why did my HITL gate fire silently over the weekend?" → missing timeout handler that writes to audit (Concept 15). Branch on
approval is Noneand write the audit row explicitly.
Build the architecture one piece at a time. That is why Part 4 is seven prompts, not one. Build the worker (D0). Wrap the agent in step.run (D1) and watch what changes when you deliberately crash mid-run. Wake it on an event (D2). Add the cron fan-out (D3), then flow control (D4) once you have actually hit a rate limit, then the durable approval gate (D5) when a high-stakes action actually needs a human. Each layer is its own learning. Combined into one big rewrite, they are a wall.
The discipline this course teaches (wake on events, run durably, gate on humans, replay on bugs) is the architectural invariant. Whatever platform implements it, that four-property contract is what you are really committing to. This is the Lindy bet: you build on the parts that have lasted, plain functions, SQL, a typed language, an event bus, not this season's wrapper. The product is replaceable; the discipline is not.
Quick reference
A separator between the narrative course and the during-build reference. The sections below are meant to be searched, not read top to bottom. The one-line gist of each concept is in the intro's collapsed cheat sheet; this section is the during-build diagnostic, the two decision trees, and the file layout.
Decision tree: pick the trigger surface
When a new thing happens in the world, where does the wake-up come from?
- An external system sent us an HTTP request. → Webhook trigger. Configure the source in the Inngest dashboard; reshape the payload via the transform; consume the resulting event.
- A schedule says it is time. → Cron trigger.
TriggerCron(cron="..."). Use UTC; production crons fire even when your service is mid-deploy. - Another Inngest function emitted an event during its run. → Event trigger.
TriggerEvent(event="ns/name.subtype"). Subscribe one or many functions to the same name. - An interactive user is waiting for an immediate response. → Not an Inngest trigger. Keep the request/response in your normal web endpoint; if the response involves heavy work, fire an event from inside the request and return immediately, letting Inngest handle the work asynchronously.
Decision tree: pick the step primitive
Given a function is running and you need to do something, which step.* call do you reach for?
- A side-effecting call (API, DB, file write, agent invocation). →
ctx.step.run("name", fn, ...). The default. Memoized on success, retried on transient failure. - A long-running OpenAI call on a serverless platform that bills for in-flight time. →
ctx.step.ai.infer(...). Offloads the inference to Inngest's infrastructure so your function process can deallocate. - Wait for a fixed duration before continuing. →
ctx.step.sleep("name", timedelta(...)). Durable; zero compute while waiting (up to seven days on the free plan, one year on paid). - Wait for an external event (human approval, sibling-function completion). →
ctx.step.wait_for_event("name", event="...", timeout=..., if_exp=...). Durable; resumes when the event arrives or returnsNoneon timeout. - Pure deterministic computation (formatting a string, computing a date). → Just write the code. No
step.runneeded; no charge.
File-location quick-ref
A flat project, four files, no src/ nesting:
ai-agent-nervous-system/
├── .claude/
│ └── skills/ # the four Inngest skills (installed in the Quick Win)
│ ├── inngest-setup/SKILL.md
│ ├── inngest-events/SKILL.md
│ ├── inngest-steps/SKILL.md
│ └── inngest-durable-functions/SKILL.md
├── db.py # Neon Postgres access: pooled asyncpg, load_customers, record (closed-vocabulary audit) (D0)
├── worker.py # the worker: SandboxAgent + 2 tools (D0)
├── inngest_app.py # the nervous system: Inngest functions + FastAPI host (D1-D5)
├── .env # OPENAI_API_KEY, DATABASE_URL, INNGEST_DEV=1
└── AGENTS.md # the base's rules file (read on open)
The customers and audit trail live in your Neon database (provisioned in the Quick Win, seeded in D0), not in local files. The worker (db.py, worker.py) never changes after D0. Every nervous-system layer (D1 through D5) edits inngest_app.py.
Diagnostic table, symptom → root cause → concept
| Symptom | First suspect | Concept to re-read |
|---|---|---|
| Function never fires when expected event arrives | Event name typo, namespace mismatch | C3 (webhooks), C5 (fan-out) |
| Function fires twice for the same logical event | Missing idempotency key | C4 (idempotency) |
| Function "lost work" after deploy | Code outside step.run doing the work | C7 (memoization) |
| Cron schedule did not fire over a deploy | Local dev server only, production runs on Inngest infra | C2 (cron) |
| Customer charged twice for one refund | Stripe call outside step.run, or step name not unique | C6 (step.run), C7 (memoization) |
| OpenAI rate-limit errors during 9am peak | Missing throttle | C11 (concurrency + throttle) |
| One customer's bursts starve other customers | Missing per-key concurrency | C12 (priority + fairness) |
| Function suspended forever, never resumed | Event name in wait_for_event does not match the event being sent | C8 (wait_for_event), C15 (HITL) |
| HITL timeout fired silently over the weekend | Missing timeout handler that writes to audit | D5 (durable refund gate), C15 (HITL) |
| Yesterday's failed runs disappeared from dashboard | Runs persist until manually replayed or after retention window | C14 (replay) |
| Replay re-charged customers | Replay is a fresh run that re-executes every step; the charge had no idempotency key | C4 (idempotency), C14 (replay is a fresh run) |
| Function trace does not show OpenAI prompt | Step trace shows function inputs/outputs but no LLM-specific prompt/token telemetry | C10 (Python uses step.run; LLM-specific telemetry needs your own OpenAI client tracing; step.ai.wrap's prompt-level traces are TypeScript-only) |
Appendix: optional lineage and an Inngest cheat sheet
This course stands alone: Part 4 builds the worker from scratch, so nothing below is a prerequisite. Two short notes for context.
A.1: If you are coming from the Digital FTE course
The From Agent to Digital FTE course builds a richer customer-support worker: portable Skills, a Postgres system of record, and a custom MCP server. If you did it, you already have a SandboxAgent worker sitting on disk, and you can skip D0's minimal floor: point the nervous system (D1 onward) at your own worker instead. The wrapping is identical. One bonus: the durable refund gate you build in D5 (step.wait_for_event) replaces the hand-rolled run-state table from that course's optional Decision 10, so you can delete it. If you have not done that course, ignore all of this; D0 gives you everything you need.
A.2: Inngest-specific essentials this course uses
If anything below feels unfamiliar, skim the corresponding doc page before diving into Part 4.
- Inngest client instantiation. A single
inngest.Inngest(app_id=...)instance per Python project, exported from one module and imported wherever you decorate functions. Python quick start. - Function decoration.
@inngest_client.create_function(fn_id=..., trigger=...). The trigger can beTriggerEvent,TriggerCron, or a list of both for multi-trigger functions. ctx.step.run,ctx.step.sleep,ctx.step.wait_for_event,ctx.step.ai.infer. The four step primitives that make up 90% of what you will write in Python. (TypeScript has a fifth,step.ai.wrap, for LLM-specific tracing; Python projects usestep.runfor AI calls.)inngest_client.send(events=[...]). Emit events from anywhere in your code (inside functions, inside agent tools, from CLI scripts). Use anid=for idempotency.- Dev server startup.
npx inngest-cli@latest dev. Runs on:8288. Dashboard athttp://127.0.0.1:8288. MCP athttp://127.0.0.1:8288/mcp. If:8288is taken it uses8289+; then setINNGEST_BASE_URL=http://127.0.0.1:<port>on the host so it follows, not just the MCP URL.
A.3: The two shifts that are actually hard
The hardest thing about this course is not Inngest's syntax. It is the mental shift from request to event (Concept 1) and from in-process execution to durable execution (Concept 6). The syntax is mechanical once those two land. Re-read Concepts 1 and 6 first if anything else feels harder than it should.