Give Your AI Agent a Nervous System: A 90-Minute Crash Course
15 concepts, ~80% of real use: senses (triggers), reflexes (durable execution), and balance (flow control).
You have built an agent that works. The problem: it only works while you watch it. You open Claude Code or OpenCode, you type, it replies. Step away and it stops. Closing that gap, between an agent you operate and a worker that runs on its own, is what this course is about.
What closes the gap is not a smarter agent. Your agent already has what it needs to do the work: an LLM to think, tools and MCP servers to act, skills for the jobs it knows. What it is missing is a nervous system.
Think about your own body. Your brain thinks and your muscles act. But a second system runs underneath, without you: your heartbeat, your reflexes, the signals that keep you alive while you sleep. Stop paying attention and your heart keeps beating. An agent has nothing like that. So the moment you stop driving it, it stops.
A nervous system closes the loop on its own, with no human in each turn. It senses the world and wakes the agent when something happens. It reacts by reflex when a step fails, and holds its place for hours while it waits on a person or a slow API. It keeps the agent steady when five hundred requests arrive at once. That is the difference between an agent you operate and an FTE that runs on its own. You add this nervous system to your agent. You do not rewrite the agent. That is the one idea this course is built on.
📚 Teaching Aid
View Full Presentation — AI Agent Nervous System
This tool has a technical name: a durable execution engine. We use one called Inngest. The same patterns work in Temporal, Restate, and Dapr Agents. And this is not just a teaching picture: Day AI, a CRM for AI-native companies, calls Inngest "the nervous system" of their product. Inngest's free Hobby tier is the easiest place to start: no credit card, a one-command dev server, and a dashboard you can watch while you build.
Before anything else, here is the whole setup as one picture:
1. an EVENT happens (e.g. a customer emails)
|
v
2. the INNGEST ENGINE catches it
(you do NOT build this. it runs your agent for you:
retries, waits, remembers every step, shows a dashboard)
|
| it reaches your code over a thin web wire (FastAPI)
v
3. YOUR AGENT runs
(the only part you write. it thinks and acts.)
That is the entire model: two programs. The engine (you do not write it) catches events and runs your agent (you do write it), reaching it over a thin web wire, which is the only reason a web server (FastAPI) ever appears in this course. You start both in the Quick Win and watch the engine drive your agent.
The example is small on purpose: a customer-support agent that looks up a few sample customers, drafts a reply, and issues a refund only after a human approves. The agent is not the hard part, so we keep it small and spend our effort on the nervous system around it. You build it here from scratch. It picks up where the earlier Digital FTE course leaves off, though D0 sets up a minimal worker from scratch if you skipped it. It is Python-first on inngest-py: you direct your general agent in plain English, and it writes the code.
Here is how the course is built, so you read it the right way. The build is the spine. You set up the environment once in the Quick Win, then Part 4 builds the whole worker in seven short prompts, one nervous-system layer at a time. That is the path, and doing it is how the model lands. The fifteen concepts in Parts 1-3 are the reference the build draws on: one idea each, the "why" under a layer you are about to add. There are two good ways through. Read Parts 1-3 first if you like the idea before the keyboard. Or go straight to the Quick Win and Part 4, and dip back into a concept the moment a layer makes you ask "but why does it work like that?" Either way, Part 4 is where you build.
The agent never imports the nervous system, so you can swap Inngest for Temporal or Restate and leave the agent untouched. A single agent crash mid-task is annoying. A workforce of fifty agents handling customer-facing work without a nervous system underneath is impossible: you adopt a platform that gives it to you, or spend six months building a worse version yourself. Four properties make this nervous system uniquely important for agents: Day AI, the CRM for AI-native companies, runs its product on every primitive this course teaches: durable LLM workflows, wait-for-event coordination, replay on failure, debounce plus throttle plus concurrency, and multi-tenant fairness. Two of their founding engineers reached for the same nervous-system picture on their own. It is production language, not curriculum branding. The Agent Factory thesis describes Seven Invariants any production agent system must satisfy. The worker you build here satisfies Invariant 4 (an engine) and Invariant 5 (a system of record, here a small audit trail). This course adds two more, plus a piece of Invariant 1:Why an AI agent needs a nervous system (four properties)
step.wait_for_event (Concept 15), you build an approval queue yourself: database table, polling, timeout handling, audit trail. That is a project, not a feature.Where this course sits in the Agent Factory thesis
step.wait_for_event is the cleanest expression on any platform: the agent suspends, a human emits the awaited event, the agent resumes.
The 15 concepts, at a glance. They map onto the three jobs a nervous system does: the senses (triggers wake the worker), the reflexes (durable execution keeps it correct when something breaks), and balance (flow control keeps it healthy under load). This is the first-pass version, concept plus a one-line gist. When something breaks during a build, the Quick reference at the end has a symptom-to-concept diagnostic that points you back to the concept the failure belongs to.The 15 concepts in one line each (expand for the full map)
# Concept One-line gist Senses (Triggers) how the world reaches the worker 1 Events vs requests A request is sync and someone waits; an event is async and the world has moved on. 2 Cron triggers A schedule wakes the function. One line: TriggerCron(cron="0 9 * * *").3 Webhook triggers An inbound HTTP payload becomes a named event; your function reacts to the name. 4 Idempotency and event semantics Event IDs and step names make a duplicate event (or retry) a no-op. 5 Fan-out and sub-agent delegation One event, N subscribing functions; or one parent firing N child events. Reflexes (Durable execution) keeping the worker correct when something breaks 6 step.run and the durable function modelEach step.run is a checkpoint; the function can crash between steps and resume.7 Memoization, the mechanic underneath Completed steps return stored output instead of re-executing. 8 step.sleep and step.wait_for_eventBoth suspend the function durably, for a duration or for an event. 9 Retries, error handling, dead-letter Automatic backoff retries; after N tries the failed run persists for replay. 10 step.run for AI calls in PythonWrap OpenAI calls in step.run; step.ai.infer offloads inference (step.ai.wrap is TypeScript-only).Balance and recovery flow control under load, recovery, and the human gate 11 Concurrency and throttling concurrency caps active runs; throttle caps starts-per-second.12 Priority and fairness Priority orders the queue; per-key concurrency gives each tenant a fair share. 13 Batching Accumulate events into one batched function call for cheap bulk work. 14 Replay and bulk cancellation Replay failed runs with new code; bulk-cancel runs you no longer want. 15 HITL gates with step.wait_for_eventThe function suspends until a human approves, then resumes with the decision.
Prerequisites. This course assumes you have done From Agent to Digital FTE. If you have, you already meet everything below and you have a worker worth wrapping: Part 4's nervous system points straight at it, and you skip the from-scratch setup in D0. If you have not, do that course first, or read on regardless: D0 builds a minimal worker from scratch so the rest of the course still stands on its own. Either way, you need four things.
- You can drive a general agent. Claude Code or OpenCode, installed and authenticated. Plan mode, rules files, the read-first-then-write workflow: if that rhythm is familiar, you are calibrated. The Agentic Coding Crash Course covers it if not.
- An
OPENAI_API_KEY(or another model key your general agent can use) and a Neon account for the worker's Postgres system of record. The worker runs a real model and reads and writes its customers and audit trail in Neon. Neon is free (no card), and you authorize it with one browser click during setup; sign up at neon.com in about a minute if you don't have an account. The Inngest dev server itself needs no account.- Node.js 20+ available, even though the worker is Python. The Inngest dev server is distributed as a Node CLI (
npx inngest-cli@latest dev).- A working mental model of "event-driven" vs "request/response." If "the world fires an event and zero, one, or many functions react to it" reads as familiar, you are calibrated. If not, Concept 1 gives you the shape.
How to read this page on first pass
The two passes. Pass one puts the nervous-system model, its three layers, in your head; pass two, hands on the keyboard in Part 4, is where you build. If you prefer to build first and have the model form as you go, that works too: start at the Quick Win, run Part 4, and treat each concept as the reference you open when a layer raises a "why." Expand anything labeled "Done when" or "What to watch": runnable behavior to check predictions against. In Part 4 you can skim the load-bearing snippets on a first read; the prose around each one tells you what the layer does, and your agent writes the code when you build. The "Try with AI" blocks are optional extension prompts. Each concept closes with a Predict (commit to an answer before you read on) or a Quick check (test the rule you just read); both exist to make you pause, not to grade you. Every term is defined in context where it first appears.
Current as of May 2026. The whole Part 4 build was run end-to-end against a live Inngest dev server and a real model on inngest 0.5.18, openai-agents 0.17.x (built and re-verified on both 0.17.3 and 0.17.4), fastapi 0.136.3, Python 3.12, and the Inngest CLI. Every snippet in Part 4 is from that working build, not written from memory. The architecture this course teaches does not change when the SDK does; the SDK is this year's interface to it. The one place a minor openai-agents bump can bite is the D5 resume detail (how run-state serialization handles a custom context), so that Decision links the live docs directly. If a live docs page and this page ever disagree on a syntax detail, the docs win: pin your versions, and check the Inngest Python quick start and the OpenAI Agents SDK docs when you build.
Sections that diverge between Claude Code and OpenCode have a switcher; pick one and the page syncs across visits.
The fifteen-minute quick win: set up the base, and see the reflex
Before the 15 concepts that explain why this works, set up the environment the course runs in and watch one task survive a crash. You do this setup once; Part 4 builds the real worker on the same base. By the end you will have:
- the base open in your general agent, with its skills and tools set up for you,
- a fresh Neon database with two tables (
customersandaudit_log) that your agent created, - one tiny worker running, with a dashboard where you can watch it,
- a run you watched go to sleep while it waited, burning zero compute the whole time,
- a run you broke on purpose, then watched the system retry: it kept the work that had already finished and re-ran only the part that broke,
- and that same function with a real agent writing the greeting inside the durable step, so you finish having watched an AI worker run, not just a timer.
Those last two bullets are the point: the retry is the reflex this whole course is about, and the agent running inside it is the promise that reflex serves. It is one sitting, not the full Part 4 build, so do it, then come back for the concepts.
Now you start the two programs from the opening: your worker (your code) and the Inngest dev server (the engine running next to it, with its dashboard at http://127.0.0.1:8288, where /runs lists every run). They connect through a small always-on web layer, FastAPI, the doorway the dev server knocks on to start a run. The whole loop in one line: an event arrives, the dev server reaches your worker through that doorway, your durable function runs a step at a time, and each step is recorded in the dashboard. Your general agent writes and starts both for you; your job is to watch.
One more boundary matters, the same one the Digital FTE course drew. Your worker keeps its customers and its record of what it did in a Neon database, and that database gets touched two different ways. While you build, your general agent reaches into Neon for you, in plain English, to create the tables and check the rows. While the worker runs, it talks to the same database through an ordinary connection of its own. The build-time tool is never wired into the running worker; Neon's own docs are blunt that it is for building and inspecting, not for production. Neon is free with one click; the Inngest dev server needs no account at all.
Get the base and open it
Download the base and open the folder in your general agent. The agent does the setup itself, from the prompts just below. You set this up once: ai-agent-nervous-system/ is your folder for the whole course, the Quick Win and Part 4 alike. You never re-download or re-unzip.
Download ai-agent-nervous-system-base.zip
cd ai-agent-nervous-system
claude
This base assumes a capable general agent (Claude Code, or OpenCode running Claude Sonnet or Opus, GPT-5, or similar). A smaller model will drift on the build prompt; if its first plan looks vague instead of specific, switch to a stronger one before you go further.
Prep the base (~3 min)
The base ships its rules in AGENTS.md and its MCP wiring; the Skills, your key, and the Neon authorization come next. Have your agent set itself up. Paste this:
Read AGENTS.md, then get this base ready: install the Skills it lists for whichever agent you are, copy
.env.exampleto.envfor me, and tell me exactly what you need from me to bring the Neon and Context7 MCP servers online.
Watch for: the agent installing the four Inngest Skills and the neon-postgres Skill (you see the install runs and Installed confirmations), creating .env, then asking you for two things: your OPENAI_API_KEY to paste into .env, and one browser click to authorize Neon over OAuth. Neon is free; if you don't have an account yet, sign up at neon.com in about a minute, or create one right at the authorization screen. INNGEST_DEV=1 is already in .env, so the SDK runs in local dev mode with no signing key. When the install and wiring are done, the agent tells you to start the dev server (next step) and then restart it, because the new Skills and the inngest-dev MCP do not load mid-session.
Done when: the Skills are installed, .env holds your key, Context7 is reachable, and Neon is authorized. The inngest-dev MCP comes online once the dev server is running, which is the next step.
Start the dev server, and confirm the agent can reach it (~2 min)
This course adds two boundaries your agent reaches over MCP: a Neon database it builds and inspects, and the running dev server it sends events to and watches. So before you build anything, bring both up and confirm they are live.
Start the Inngest dev server in its own terminal (it is a Node CLI; leave it running):
npx inngest-cli@latest dev
The dashboard comes up at http://127.0.0.1:8288, and the dev server exposes its MCP endpoint at /mcp. Now restart your general agent (exit and relaunch in the ai-agent-nervous-system folder) so the freshly installed Skills and the inngest-dev MCP both load. Then paste this:
List the Neon tools and the inngest-dev tools you can see.
Watch for: two real lists. The Neon tools (creating a project, running SQL, describing tables, fetching a connection string, and the like) are your agent's hand on the database. The inngest-dev tools (list_functions, send_event, invoke_function, get_run_status, and the rest) are its hand on the running dev server. Everything below rides on both.
Gate open: the reply lists real Neon tool names and real inngest-dev tool names. If the Neon tools are missing: the OAuth didn't finish; redo the Neon authorization from the prep step. If the inngest-dev tools are missing: the dev server is not running (start it), or you skipped the restart (exit, relaunch in this folder, ask again).
Build the store, and grab its connection string (~3 min)
Now create the worker's system of record over the Neon MCP, then hand the worker the one thing it will need to reach it later: a connection string. The worker you build in Part 4 reads its customers and writes its audit trail here. Paste this:
Paste this to your general agent. Plan first; execute on approval.
On a fresh Neon project, create two tables:
customers(id, email, tier) andaudit_log(a record of every action the worker takes). Then call the Neon tool that returns the connection string and write that URL into my.envasDATABASE_URL. Use the Neon tools for all of it; don't write SQL for me to run.
Watch for: the agent calling the Neon MCP tools to create the project and the two tables (you see those tool calls, not SQL you typed), then writing DATABASE_URL into .env. That string is the handoff: Neon MCP provisioned the store, and your worker will use the string, not the MCP server.
Done when: a fresh Neon project exists with a customers table and an audit_log table, and .env holds a DATABASE_URL. Open console.neon.tech, pick the project the agent just made, and open Tables: there sit customers and audit_log, empty for now. You will see rows appear in D0 when the worker runs. (A table is just a spreadsheet: each row one thing, each column one detail.)
Build the first durable function, and drive it from your agent (~3 min)
Now build the smallest durable function, using the Skills you just installed. The Inngest Skills are TypeScript-first in their examples, so your agent takes the patterns from them (what a step is, how a durable function is shaped) and confirms the exact Python signatures from the docs (the dev-server MCP's grep_docs/read_doc, or Context7), not from memory. Paste this:
Using the Inngest Skills, write one tiny Inngest durable function (call it
greet-customer, triggered by ademo/greetevent) that composes a greeting in onestep.run, sleeps fifteen seconds withstep.sleep, then composes a farewell in a secondstep.runand returns both. Serve it from a FastAPI host in local dev mode, and start the host on port 8000 with auto-reload on, so edits I make later are picked up without a manual restart.
The shape it writes, so you know it when you see it: the function is plain async def, the two step.run calls wrap work that should be memoized, and the step.sleep between them suspends the run durably. The process can crash, restart, or redeploy during that sleep, and the run still resumes at the next line when the timer fires. One detail to confirm in the agent's code: the Inngest client is constructed with is_production=False, or it reads the INNGEST_DEV=1 already in your .env. Without one of those, the SDK quietly defaults to Cloud and your function never registers locally.
Done when: the FastAPI host (the doorway from earlier) is running on port 8000, and the dev server (already running from the last step) auto-discovered it. Ask your agent to confirm with the inngest-dev list_functions tool (or open http://127.0.0.1:8288, click Functions, and see greet-customer listed). From here you send events from your agent and watch the runs in the dashboard.
Trigger it, and watch a step sleep at zero compute (you drive)
Send the trigger event from your agent. Paste this:
Send a
demo/greetevent withnameSara using the inngest-devsend_eventtool.
(Prefer the dashboard? In http://127.0.0.1:8288, click Events, then Send event, paste the payload below, and click Send. Either way the same run starts.)
{
"name": "demo/greet",
"data": { "name": "Sara" }
}
Now watch the durable sleep, and you have about fifteen seconds to catch it live. Two ways, pick one:
- Let the agent poll (the agent-native way): "Poll
get_run_statuson that run until it finishes." Mid-sleep the agent reports the run as Running with no end time yet, your host terminal idle the whole while; then it flips to Completed with the output dict and a roughly fifteen-second start-to-end gap. That gap is the sleep. - Watch the dashboard: open
http://127.0.0.1:8288→ Runs → the newest run, right away. The first step is done and the sleep step shows Sleeping with a resume time; after fifteen seconds it resumes on its own and flips to Completed, the returned dict in the Output panel.
Either way, nothing in your code runs during those fifteen seconds: the dev server holds the resume time and the host sits idle. That is the point, a durable wait costs zero compute. (Open the run after it finished and you just see Completed with the output, the live sleep already gone; re-send and look faster, or let the agent poll.)
Break a step, and watch the retry skip the work it already did (the payoff)
Now make a step fail on purpose, so you can watch memoization carry the completed work across the retry. Paste this to your agent:
Make the farewell step raise an error on purpose, so I can watch a run fail. Keep everything else the same.
Send the same demo/greet event again, then read the failing run's per-step trace in the dashboard (Runs → newest). Here is the payoff, and it is in this one failing run: the greeting step shows one completed attempt, and the farewell step shows several Attempts, each retried with backoff (Inngest defaults to several attempts) before the run lands in Failed. Sit with what that attempt count means: the completed greeting step is paid for once, not once per retry. That is durable execution you can see with your own eyes. Why the completed step returns instantly instead of re-running is the mechanic you will meet in Concept 7; for now, just watch it happen.
Two things to expect when you run this:
- The per-step proof is in the dashboard, not the agent. Your agent fires the event and can report run-level status, but the dev-server MCP's
get_run_statusreturns the run summary withsteps: null; it does not expand per-step attempts. The attempt counts that are the memo proof (greeting at one, farewell climbing) live in the dashboard Runs view. This is the one spot in the Quick Win you reach for the browser, not the agent. - Reaching Failed takes a few minutes. With the default retries and exponential backoff, the run keeps retrying the farewell step for several minutes (a real run took about four and a half) before it flips to Failed. You do not have to wait it out: the memo proof shows from the first retry on, the greeting holding at one attempt while the farewell accrues more. Watch a couple of attempts, then move on.
(This dev-server build also shows no separate "memoized" badge. The memo is the attempt count: the completed step sitting at one attempt while the broken step climbs is exactly what "returned from memo, not re-run" looks like here.)
Now fix it:
Now revert the farewell step to the working version.
The host auto-reloads (that is what --reload bought you; if you skipped it, restart the host by hand). Send a fresh demo/greet event and the whole function now runs clean to Completed on the fixed code. One thing about recovery trips people up. The dashboard's Rerun button starts a brand-new run from the top with your current code, every step re-executing from scratch. That is the right tool for incident recovery: a bad deploy broke a batch of runs, so you ship a fix and rerun them. But it is not the memo-preserving resume. The memo-preserving resume is the automatic retry you just watched inside the failing run, where the completed step stayed put.
Make it a real AI worker (the bridge to Part 4)
So far the function only juggles strings, and that was on purpose: durability is easier to see with nothing else in the way. Now make the greeting come from an actual agent, so you watch the same nervous system carry a real AI call. One prompt swaps the hardcoded greeting for a tiny agent; the sleep, the durability, and the dashboard all stay exactly as they are. Paste this:
Replace the hardcoded greeting with a one-line call to a minimal hello-world agent built on the OpenAI Agents SDK (it just writes the greeting), still inside the same
step.run. Keep thestep.sleepand the farewell unchanged. Then fire ademo/greetevent and show me the run.
The only thing that changed is what fills the greeting step: instead of an f-string, a model writes it. And because that call sits inside the same step.run you already proved durable, it is memoized and crash-safe for free, with no new wiring. Watch the run the way you did before (poll from the agent, or open it in the dashboard): the same three-step trace and the same zero-compute sleep, except the first step's output now came from an agent. Your OPENAI_API_KEY is already in .env from the prep step, so there is nothing new to set up.
Done when: a demo/greet run completes and the greeting in the output came from the agent, not a hardcoded string. Sit with what you are looking at, because it is the whole course in one sentence: an AI agent, woken by an event, running durably inside a nervous system, surviving a crash. Part 4 swaps this hello-world agent for a real customer-support worker and wraps it in the full nervous system (a real event trigger, a cron that fans out, flow control, a human-approval gate), but the shape on your screen right now is the shape.
You just set up the whole course environment and saw the nervous system work with your own eyes: the Skills are installed, your Neon store is provisioned with DATABASE_URL in .env, the dev-server MCP is live, and you ran a durable function, watched a step sleep without consuming compute, broke a step and watched the automatic retry return the completed step from memo while only the broken one re-ran, then watched a real agent generate the greeting inside that same durable step. That is the architecture this course is about. The rest of the course scales it up: real senses (cron, webhook, fan-out), stronger reflexes (the agent invocation inside step.run), real balance under load, and the human-approval gate that turns "the agent might mess this up" into "the agent drafts, a human approves, the action issues."
If something did not work, four problems cover almost all of it:
- The dev server cannot reach the function host: confirm the host is running on port 8000.
- The client is in Cloud mode: the agent dropped
is_production=Falseand.envis missingINNGEST_DEV=1, so functions never register locally. Ask it to set one (an explicitis_productionvalue wins over the env var). - The function is missing from the dashboard: the host did not reload; restart it.
- A run hangs with no error and no progress: a de-synced host stalls silently; restart both the host and the dev server together, and run one host against one dev server. (One subtle cause: if
:8288was taken and the dev server came up on8289+, re-pointing theinngest-devMCP URL is not enough; the host still talks to:8288. SetINNGEST_BASE_URL=http://127.0.0.1:<port>on the host so it follows the dev server to the new port.)
If you hit any of these, the universal recovery move works here too: "Something didn't work. Read the error, tell me in plain language what you see, and propose one fix I can approve."
What you built, and where it grows
The environment is set up: the base is open, the Skills are installed, all three MCP servers are wired (Neon, Context7, inngest-dev), your Neon store has its customers and audit_log tables with DATABASE_URL in .env, and the dev server is running. You also saw the one idea the whole course rests on, the reflex of durable execution, with your own eyes, and watched a real agent run inside it. Part 4 swaps that hello-world agent for the customer-support worker, on this same base, in this same folder: it reads those customers and writes those audit rows, then wraps the whole thing in the full nervous system, a real event trigger, a daily cron that fans out, flow control, and the durable human-approval gate on refunds. Part 4 scales this step.run-and-step.sleep skeleton into a worker that does real work on your Neon store. If this Quick Win worked, the concepts ahead explain why each piece is shaped this way.
Part 1: The senses, how the world reaches the worker
From here, Parts 1-3 are the reference shelf behind the build: fifteen concepts, one idea each, grouped by the three jobs a nervous system does. You can read them straight through, or reach into one when a Part 4 layer makes you ask why it works. This first group is the senses.
An AI agent you call by hand runs when you call it. A real AI Worker has senses: it runs when the world reaches it. A customer emails, a webhook arrives, a cron fires at 09:00 daily, another worker hands off work. Each of these is a signal coming in, and a trigger is how the agent feels it. Part 1's five concepts are those senses: the event-driven mental model, the three ways the world reaches in (cron, webhook, event), the semantics that prevent double-processing, and the fan-out patterns that let one signal wake many workers.
Concept 1: Events vs requests, the durable mental model shift
Everything that follows in this course rests on one mental shift: from requests to events.
A request is a synchronous conversation. Someone calls; you handle; you return; they continue. A connection stays open; a human or service is waiting. If you crash, the caller gets an error. An agent you chat with at the prompt is a request: you typed, it streamed back, the conversation belonged to your terminal session.
An event is an asynchronous message. Something happened in the world (a customer signed up, an email arrived, a payment cleared), and the originator emits a named record of that fact. Zero, one, or many functions react to the event independently. No connection stays open. The originator does not know who is listening, does not wait for results, and is not blocked. The world has moved on.
# A request: I'm here, waiting, blocking
result = await agent.handle_customer_message(text=user_input)
print(result) # I unblock when the agent finishes
# An event: I fire-and-forget
await inngest_client.send(events=[
inngest.Event(
name="customer/email.received",
data={"customer_id": "c-4429", "body": email_body, "subject": subject},
),
])
# I return immediately. Somewhere else, one or more Inngest
# functions react to this event on their own schedule.
A request makes the producer wait; an event frees it, and the stored event survives a crash.
The shift sounds small. It is not. Once you think in events, durability and scale fall out almost for free, because:
- The producer cannot be slowed by the consumer (the email-receiver does not wait for the agent to finish drafting a reply).
- The consumer can crash and restart without losing the work (the event is durably stored; Inngest re-delivers it).
- New consumers can be added without changing producers (a second function, say an analytics counter, can subscribe to
customer/email.receivedwithout the email-receiver knowing). - Backpressure becomes a flow-control policy, not a code change (Inngest caps concurrency; the producer keeps firing; events queue).
Predict. Your customer-support worker takes 8 seconds to respond to an email: three seconds for the agent's reasoning, four seconds for two MCP tool calls, one second for the database write. At peak load you receive 50 emails per minute. If you use the request model (the email parser blocks until the agent finishes), how many parallel HTTP connections to your email parser does that imply? If you use the event model (the email parser fires an event and returns immediately), how many? Confidence 1-5.
The answer: the request model needs about 7 concurrent parsers (50/min × 8 seconds is ~6.7 parallel handlers, plus headroom). The event model needs one parser. It fires the event and returns in ~10ms, the event queue absorbs the 50/min spike, and Inngest functions consume the queue at whatever concurrency you allow.
That gap is the whole point. The event becomes a durable boundary between "what happened in the world" and "what the worker does about it," and everything good follows from that one move: the producer never waits, a crashed consumer retries off the stored event, and new consumers attach without touching the producer. Events are how you stop owning the timing of work.Try with AI
Walk me through three scenarios. For each, classify it as REQUEST-MODEL
or EVENT-MODEL, and explain which one fits better:
A) A user clicks "Submit refund request" in the support portal and
expects to see "Refund issued: $30" within 2 seconds.
B) A nightly cron job at 02:00 runs a customer-health-check across
all 5,000 customers and writes a report to Slack.
C) A customer sends an email to support@; we want a draft response
ready within 60 seconds for the on-call agent to review and send.
For each, name (a) what the human's expectation of timing is and
(b) what failure looks like if the model crashes mid-execution.
Concept 2: Cron triggers, work that runs because time passed
The simplest trigger is the clock. Many things an AI Worker does are not reactions to outside events; they are scheduled work: daily health reports, weekly cleanups, hourly recalculations. Inngest's cron trigger is one line of code.
import inngest
@inngest_client.create_function(
fn_id="daily-customer-health-check",
trigger=inngest.TriggerCron(cron="0 9 * * *"), # 09:00 every day, UTC
)
async def daily_health_check(ctx: inngest.Context) -> dict[str, int]:
"""Run a customer-health pass for every Pro/Enterprise customer."""
customers = await ctx.step.run("fetch-pro-customers", fetch_pro_customer_ids)
# fan out: one event per customer, one worker run per event
events = [
inngest.Event(name="customer/health_check.requested", data={"customer_id": cid})
for cid in customers
]
await ctx.step.send_event("fan-out", events)
return {"customers_scheduled": len(customers)}
Three things to notice:
-
The schedule is just standard cron syntax.
0 9 * * *is 09:00 UTC every day;*/15 * * * *is every 15 minutes;0 9 * * 1is Mondays at 09:00. Inngest evaluates the cron in UTC by default; if you need a different timezone, you prefix the cron string itself (for exampleTZ=Europe/Paris 0 12 * * 5), not pass a separate argument. -
The function still uses the same durable steps. Cron-triggered or event-triggered, the function shape is identical:
ctx.step.runfor side effects,ctx.step.send_eventto fan out. Durability works the same. Flow control works the same. The trigger is just how the function starts. -
The cron output is a regular Inngest function run. It shows up in the dashboard, has a run ID, has a trace, supports replay. If your Monday-morning cron run fails at step 3, Tuesday's cron will run normally and Monday's failure stays available for replay after you fix the bug.
What happens if your service is down when the cron fires? This is the question that separates a durable scheduler from a fragile one. Inngest's cron runs are durably recorded the moment the schedule fires. If your function endpoint is unreachable, Inngest retries with backoff until it succeeds or hits the retry ceiling. The cron fired at 09:00 does not "miss" because your deploy was rolling at 09:00; the run waits, you finish your deploy, the run completes. Cron triggers in development have one quirk worth knowing about: the local dev server only fires crons while it is running. Production runs them on Inngest's infrastructure, which is always running.
Quick check. Three claims. Mark each True or False. (a) If a cron function takes 45 minutes to run and is scheduled every 15 minutes, three concurrent instances will be running at any given time. (b) You can use
step.sleepinside a cron-triggered function to spread work across the day. (c) A cron-triggered function can also be invoked manually from the dashboard for testing.
Answers: (a) Depends on concurrency policy: by default Inngest will queue the overlapping runs; if you set concurrency=1 they serialize; if you set concurrency=10 they parallelize. The default is sane. (b) True, and it is a common pattern for "spread daily work across hours to smooth load." (c) True: the Inngest dashboard lets you invoke any function on demand for testing, regardless of its trigger.Try with AI
With my AI coding assistant connected to the Inngest dev server MCP,
write a cron-triggered Inngest function in Python that:
1. Runs every Monday at 09:00 UTC.
2. Queries the audit_log table for all conversations resolved in the
prior week (status='resolved' in that window).
3. Computes per-agent metrics: total conversations resolved, average
resolution time, count of escalations, count of refunds issued.
4. Returns the metrics as a JSON object.
After you write the function, test it now instead of waiting for
Monday: trigger it on demand from the Inngest dev dashboard (the
Invoke button), since the dev server only fires crons while it is
running. Confirm the audit query is correct by running the SQL
directly against the database and checking the rows it returns;
grep_docs can confirm your step.run pattern matches Inngest's
examples, but only running the query proves the SQL itself.
Concept 3: Webhook triggers, when the outside world calls in
The first trigger was the clock. The second is HTTP: something outside your system (Stripe, your email provider, a form on your site, a GitHub event) wants to reach your worker.
Be precise about which part is hard, because it is not the part you would guess. Receiving the POST is easy: a web framework like FastAPI gives you @app.post(...) in three lines. The hard part is everything after the POST lands: queue the call, retry it on failure, survive a crash mid-work, refuse to double-process a redelivery, run the agent, hold a four-hour approval, replay any run from a dashboard. The door is cheap; the kitchen behind it is the work, and that kitchen is Inngest.
So the route stays tiny. Its whole job is to receive the POST, hand the event to Inngest, and reply 200 fast. The durable work runs in the Inngest function behind it. If you instead did that work inside the request handler, you would hit the classic webhook bugs: the sender times out and resends while you are still working, a restart loses the job, a redelivery refunds the customer twice. (Inngest's hosted option can even mint a public inn.gs/e/... URL so you skip writing the route at all.)
Now the part that confuses everyone. Your app ends up with two doors, and they face opposite directions:
DOOR 1: the webhook door (you write it, or use the hosted URL)
Stripe knocks here with DATA -> it just calls send() and is done
DOOR 2: /api/inngest (auto-made by inngest.fast_api.serve)
the ENGINE knocks here to RUN YOUR CODE, one step at a time
it speaks Inngest's own protocol, so a raw Stripe POST here is rejected
These two never talk to each other directly. They connect only through the event: Door 1 drops an event in, the engine picks it up and comes back through Door 2 to run your function. Auto-creating Door 2 (which the Quick Win already did) does nothing for Door 1; that is the one you still write.
So what does the webhook door actually call? Just send(). The whole route is this small:
@app.post("/webhooks/stripe")
async def stripe_webhook(request: fastapi.Request):
payload = await request.json()
# verify the signature, reshape Stripe's envelope, then hand it off:
await inngest_client.send(
inngest.Event(name="stripe/charge.refund.failed", data=reshape(payload)),
)
return {"ok": True} # ack Stripe in milliseconds
That send() drops the event into Inngest's stream and the route is finished. It does not call your function, and it does not call /api/inngest. Inngest handles that half: it matches the event name to on_refund_failed and comes back through Door 2 to run the function's steps. End to end:
Stripe → Door 1 (webhook) → send() → Inngest → Door 2 (/api/inngest) → your function
@inngest_client.create_function(
fn_id="handle-stripe-refund-failed",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def on_refund_failed(ctx: inngest.Context) -> dict[str, str]:
"""Triggered by Stripe webhook → Inngest event → this function."""
charge_id = ctx.event.data["charge_id"]
# Find the support ticket this refund belongs to
ticket = await ctx.step.run(
"find-ticket-for-refund", lookup_ticket_by_charge, charge_id,
)
# Hand the support worker the full context.
# step.run takes (step_id, handler, *args): pass args positionally, not as kwargs.
await ctx.step.run(
"notify-support-agent",
notify_support_agent_of_refund_failure,
ticket["id"], charge_id,
)
return {"ticket": ticket["id"], "action": "notified"}
That is the function behind the door: Inngest matched the event to it and ran it, looking up the ticket and notifying the support worker, with the queue, retries, and idempotency all handled for you. Webhook work is almost always asynchronous like this: the function runs in the background after the fast ack, never during the request.
Two patterns worth a name:
- Generic JSON webhooks. The sender does not have to be a famous vendor. Point any service that can POST JSON at the same kind of URL and choose the event name yourself. The
vendor/event.subtypestyle is just convention, but the dashboard groups events cleanly when you follow it. - Webhook transforms. Vendor payloads are big and nested, and one vendor often sends many event types to a single URL. A transform is a small reshaping function that runs on Inngest's servers the moment the payload arrives, before it becomes an event. (It is written in JavaScript even when your worker is Python, because it runs on Inngest's side, not in your app.) It does two jobs: pick your event name, and flatten the payload to the few fields you actually use. Your function code stays free of vendor-specific JSON.
Predict. A Stripe webhook fires
stripe/charge.refund.failedat the exact same millisecond as your customer-support worker is also callinginngest_client.sendto emit a different event namedcustomer/refund.investigation_needed. Both events arrive in the system simultaneously; the function above triggers on the Stripe event only. Will the function run once or twice? Confidence 1-5.
The answer: once. A function only fires for the event name it is registered to. stripe/charge.refund.failed and customer/refund.investigation_needed are different names, so they wake different functions (or none), no matter that they landed at the same instant. The event name is the routing key.
That is also why naming is not cosmetic. One typo, customer/email_received where the function listens for customer/email.received, and the function silently never runs. Nothing errors; the work just does not happen. The dashboard is your safety net: events that match no function show up in a separate unmatched stream you can watch.
Locally, there is no URL to paste. Everything above is the production path. On your laptop you have no public URL, and Stripe cannot reach localhost. So while you build, you play the part of the webhook yourself: send_event (or the dev dashboard's "Send to Dev Server" button) injects the exact event a real webhook would have produced. That is why the hands-on below tests with send_event and never touches Stripe.
The split is worth holding onto:
| How the event gets in | |
|---|---|
| Production | Stripe POSTs to your live webhook URL; it becomes an event in your stream |
| Local dev (you) | you inject the already-shaped event with send_event |
Your function code is identical either way; it only reacts to the event name and never knows whether the event came from a real webhook or your send_event.Try with AI
I need to handle three webhook sources for my customer-support worker:
A) Stripe: refund failed, charge disputed
B) Postmark (email service): bounced email, complaint
C) My internal admin UI: manual "investigate this ticket" button
For each, decide:
1. What event names you'd use (vendor/event.subtype format).
2. Whether the function reacting to it should run synchronously (the
caller is waiting) or asynchronously (fire and continue).
3. Whether you'd write a webhook transform to reshape the payload, or
consume it raw.
Then write the Inngest function for the Stripe refund-failed case in
Python, using the MCP's grep_docs to find the current syntax for
TriggerEvent and the dev-server MCP's send_event tool to test it.
Concept 4: Idempotency, when the same event arrives twice
The same event will sometimes reach you twice. A customer clicks "Issue refund," the page is slow, and the click fires twice; or the request goes through but the acknowledgment back to the caller is lost, so the caller retries. Either way your worker now sees two customer/refund.requested events for one real refund. If it issues the refund on each, the customer is refunded twice.
This is the most common bug in event systems, not a rare edge case. Senders keep retrying until they get an acknowledgment (networks drop packets, servers restart, endpoints time out), so what you are promised is delivery at least once, never exactly once. The cure is to make the second copy harmless: act on the first, notice the duplicate, skip it. That property has a name. Something is idempotent when running it twice leaves the same result as running it once.
Inngest builds in two layers of this.
Layer 1: Event ID seeds at the source. When you send an event yourself (rather than receiving it from a webhook), you can attach an idempotency key:
await inngest_client.send(events=[
inngest.Event(
name="customer/refund.requested",
data={"order_id": "o-4429", "amount_cents": 5000},
id=f"refund-request-{order_id}", # idempotency key: identical on every retry
),
])
If a second event with the same id is sent within the dedup window (24 hours by default), Inngest drops the duplicate. Same logical event, same id, only one function run. The key must be identical on every duplicate, that is the whole point. Build it from something stable about the request (here, the order id), never from a timestamp or a random value, which changes on each send and silently defeats the dedup.
This is also how you tame the retried webhook from the start of this section. You do not set the id on a webhook event directly, but whoever turns the POST into an event (the hosted transform, or your own receiving route) sets it from the provider's own event id. Stripe stamps a unique id on every event and resends it unchanged on a retry, so the redelivered webhook carries the same id and dedups exactly like a self-sent event.
Layer 2: Step-level idempotency. Inside a function, each step.run is identified by its name. If a function crashes between step 3 and step 4, the retry re-runs the function code from the top, but for steps 1, 2, and 3, Inngest returns the stored outputs without re-executing the step body. Step 4 runs normally for the first time. This is what makes a function "durable": the side effects of completed steps do not re-happen on retry.
@inngest_client.create_function(
fn_id="issue-customer-refund",
trigger=inngest.TriggerEvent(event="customer/refund.requested"),
)
async def issue_refund(ctx: inngest.Context) -> dict[str, str]:
# Step 1: look up the order. If the function retries, this returns
# the SAME order data it computed the first time, from Inngest's memo.
order = await ctx.step.run(
"lookup-order", lookup_order_by_id, ctx.event.data["order_id"],
)
# Step 2: call Stripe. If the function retries AFTER this step
# succeeded, the Stripe call does NOT happen again. The refund is
# issued exactly once even if the function runs three times.
refund = await ctx.step.run(
"issue-stripe-refund",
lambda: call_stripe_refund_api(
charge_id=order["stripe_charge_id"],
amount=ctx.event.data["amount_cents"],
),
)
# Step 3: write the audit row. Same property: runs at most once.
await ctx.step.run(
"audit-refund",
lambda: write_audit_refund_issued(order_id=order["id"], refund=refund),
)
return {"refund_id": refund["id"]}
If this function crashes during step 3, the retry re-enters step 1 (gets cached order data, no DB call), re-enters step 2 (gets cached refund data, no Stripe call), runs step 3 for real, returns. The customer's card is charged once, even if the function ran three times. This is the property that matters most. It is what makes Inngest qualitatively different from a queue with a retry loop.
(Step 1 passes its one argument positionally. Steps 2 and 3 wrap their call in a lambda instead, because step.run forwards only positional arguments, so a lambda is how you hand a step a call that uses keyword arguments. Either form works, and the lambda also makes the step body a self-contained unit Inngest can memoize.)
Memoization gives exactly-once step completion from the function's point of view: once a step is recorded successful, it never re-runs. But there is a narrow window. If a step calls Stripe and the process dies after Stripe charges but before Inngest records the result, the retry calls Stripe again, because to Inngest the step never finished. The fix is to pair step memoization with the provider's own idempotency key (Stripe's Idempotency-Key header, or whatever dedup id your other providers expose). The two are complementary, not substitutes: step.run keeps your function's internal logic exactly-once; the provider's key keeps the external side effect exactly-once.
Quick check. True or false. (a)
step.runmakes the step idempotent only if the function inside is also idempotent. (b) An event with a duplicate ID outside the dedup window will be treated as a new event. (c) Ifstep.runfails mid-execution (the step's code throws an exception), Inngest stores the failure and retries the step on the next attempt without re-running prior steps.
Answers: (a) False: step.run gives the step at-most-once-on-success on its own; it does not need the code inside to be idempotent. Once a step is recorded successful, its body never re-runs on retry. The one exception is the window in the note above: if the process dies after Stripe charged but before Inngest recorded the step, the retry re-calls Stripe, which is exactly why a provider idempotency key backs it up. Your function's internal logic, though, you never have to make idempotent by hand. (b) True: Inngest's dedup window is 24 hours by default; events with the same ID after that window are treated as new. (c) True: the automatic retry is memoized; Inngest knows step 3 failed at attempt 1 and retries just step 3 on attempt 2. Prior successful steps do not re-execute. (This is the within-run retry, not the dashboard Replay button, which is a fresh run, Concept 14.)Try with AI
Here are three scenarios. For each, decide: idempotency PROBLEM or
NO PROBLEM, and if it's a problem, what's the fix:
A) Stripe sends the same charge.refund.failed webhook three times
in 90 seconds (because their first two attempts timed out at
your endpoint). Your function emails the customer.
B) A customer clicks "Issue refund" three times because the page
was slow. Your function calls Stripe and writes audit_log.
C) Your nightly cron at 09:00 sends a customer-health-check event
to each Pro customer. If two crons fire at the same time (a deploy
bug), what happens?
For each problem case, propose ONE specific fix: event ID seed
inside the function, idempotency key in inngest_client.send, or
function-level deduplication on the trigger.
Concept 5: Fan-out and sub-agent delegation, one event many workers
Often a single event needs to trigger work in many places. The Stripe charge.refund.failed event might need to: notify the support agent, write to audit, update the customer's risk score, alert finance ops, post to Slack. Five reactions, all independent, all from one event.
The Inngest pattern: subscribe many functions to the same event. No fan-out code; just multiple @inngest_client.create_function decorators with the same TriggerEvent. Each function runs independently, has its own retries, has its own step trace, fails independently of the others.
@inngest_client.create_function(
fn_id="refund-failed-notify-support",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def notify_support(ctx: inngest.Context) -> dict[str, str]:
# ... runs the customer-support worker to draft a response ...
return {"status": "drafted"}
@inngest_client.create_function(
fn_id="refund-failed-update-risk-score",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def update_risk_score(ctx: inngest.Context) -> dict[str, float]:
# ... runs the risk-scoring worker ...
return {"new_risk_score": 0.42}
@inngest_client.create_function(
fn_id="refund-failed-post-slack",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def post_to_slack(ctx: inngest.Context) -> None:
# ... posts a Slack notification ...
return None
One Stripe webhook arrives. Inngest creates one event. Three functions fire, each in its own run. If post_to_slack fails because Slack is down, the other two are unaffected and complete normally. The failed run sits in the dashboard for replay once Slack recovers. This is the core of multi-worker coordination, and it is the architectural pattern your future manager layer (a later course) will compose at scale.
The other fan-out pattern: parent-fires-N-children. Sometimes the fan-out is dynamic. Your daily cron needs to fire a customer-health event for each Pro customer, which might be 500 or 5,000 depending on the week. The parent function sends N events:
async def fan_out_per_customer_events(
ctx: inngest.Context,
customers: list[str],
run_day: str, # pinned by the caller (the cron's scheduled date), never date.today()
) -> int:
events = [
inngest.Event(
name="customer/health_check.requested",
data={"customer_id": cid},
id=f"daily-health-{cid}-{run_day}", # stable id: identical on every retry
)
for cid in customers
]
# ctx.step.send_event memoizes the send, so a retry of this function
# does not re-fire the fan-out (and even if it did, the stable ids dedup).
await ctx.step.send_event("fan-out", events)
return len(events)
Those 5,000 events go out in one send_event step (a large list is chunked into a few batched calls under the hood, not literally one HTTP request). 5,000 function runs fire, each with its own customer_id, each isolated, each independently retryable. Flow control (Concept 11) caps how many run concurrently so you do not melt your downstream APIs. The cron function returns in seconds; the fan-out runs at whatever rate Inngest's flow-control policies allow.
Sub-agent delegation is a special case of fan-out. Inside a worker run, you delegate sub-tasks to other worker types by sending more events (await ctx.step.send_event(...), so the delegation is memoized like any other step). The parent does not wait for the children unless it explicitly uses step.invoke (which runs a child function and waits for its result) to collect their results.
Predict. You have three functions all triggered by
customer/email.received: the customer-support agent that drafts a reply (15 seconds), an analytics counter (50ms), and a "VIP detector" that checks if the customer is high-value (200ms). When an email arrives, what does the user-visible latency look like for each? Three options: (a) all three add up to ~15 seconds; (b) all three run in parallel, total latency is ~15 seconds (the slowest); (c) each runs independently with no shared latency at all. Confidence 1-5.
The answer: (c). Each function is its own run, in its own process slot. The customer-support agent does not block the analytics counter; the VIP detector does not block the agent. From the outside, the latency for any particular function is just that function's own time. This is why fan-out scales: the consumers are isolated, and if the agent crashes the analytics counter is unaffected. The one caveat, which Concept 11 develops: this isolation is between different functions. When a single function fans out to thousands of runs of itself, a concurrency cap deliberately makes the later runs queue, so those same-function siblings do wait their turn. Different functions never block each other; many runs of the same function can.Try with AI
Design the fan-out architecture for these three scenarios. For each,
sketch the event names and the functions that subscribe:
A) New customer signs up. Need to: send welcome email, create
Stripe customer, post to Slack #new-customers, write to
audit_log, schedule a 7-day follow-up.
B) Customer support email arrives. Need to: draft a reply (agent),
detect sentiment, check if VIP, update customer's "last contact"
timestamp, attach to the right ticket thread.
C) Daily cron at 09:00 needs to run customer-health-check on
~5,000 Pro customers. Each check takes ~30 seconds. We want
the whole batch to complete by 11:00 (a 2-hour window).
For each, decide: how many event types, how many subscriber
functions, what the idempotency story is, and one specific failure
mode this design protects against.
Part 2: The reflexes, what happens when something breaks
Part 1 was about how work reaches the worker. Part 2 is about what happens when that work breaks halfway through.
Picture one turn of a real worker. It calls an agent, the agent calls a few tools, and those tools hit a database, a payment API, and a model. That is several network calls in a row, and any one of them can fail: a timeout, a dropped connection, a service that is down for a few seconds. With no protection, a single small failure throws away everything the worker just did and starts the whole turn over from the top.
Durability is the fix, and it is simple to say plainly: when something fails partway through, the steps that already finished stay finished, and the worker picks up from the point that broke instead of starting over. In the nervous-system picture, this is the reflex: it just happens, fast, without the agent having to think about it.
Inngest gives you this with one tool, step.run, and a mechanism called memoization working underneath it. Part 2 covers both, then the time-based versions (step.sleep and step.wait_for_event), how retries behave, and the step.ai helpers.
If you are skimming: the two that matter most are Concept 6 (
step.run) and Concept 7 (memoization). Everything else in Part 2 builds on them, so read those two slowly. Once they click, Concepts 8 through 10 go quickly.
Concept 6: step.run and the durable function model
A normal Python function runs once, top to bottom. If it crashes halfway through, you start over from the top. If it makes three API calls before crashing, the next attempt makes those three calls again, and pays for them, and possibly double-charges someone, again.
An Inngest function is durable. Each operation you want to be checkpointed gets wrapped in step.run(name, fn, ...). Inngest then drives the function one step at a time. It runs your handler from the top, and when it reaches a step it has not done yet, it runs that step, saves the result, and re-enters the handler from the top again, this time returning every completed step's stored output instead of re-executing it. The function "catches up" to where it left off, takes the next step, and repeats. (So the handler body runs many times for one function, once per step, not just when something fails.)
Why re-enter the handler at all, instead of just continuing where it left off? Because of the two programs from the opening. The engine and your function are two separate programs. One program cannot pause in the middle of another's code and hold its place. So the engine drives your function the only way it can: it calls your function over the web, runs it until the next unfinished step, lets that step run, and gets the result back. Then it stores that result on its own side and calls your function again for the next step, handing back everything it has already stored.
ENGINE YOUR FUNCTION (host)
| call: run from the top -----------> runs to step 1, does it
| <---------------------------------- returns step 1's result
stores result 1
| call again -----------> step 1 from memo, runs step 2
| <---------------------------------- returns step 2's result
stores result 2
| call again -----------> steps 1-2 from memo, runs step 3
| ...and so on, one call per step
That is the whole mechanic. "Re-runs from the top, completed steps from memo" is just the engine calling your function once per step and keeping the results on its side. And because the results live on the engine's side, a finished step survives even if your host crashes and restarts mid-run.
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
customer_id = ctx.event.data["customer_id"]
# Step 1: load the customer record (one DB call)
customer = await ctx.step.run(
"load-customer", load_customer_by_id, customer_id,
)
# Step 2: load the conversation thread (one DB call)
thread = await ctx.step.run(
"load-thread", load_thread_for_customer, customer_id,
)
# Step 3: run the OpenAI Agents SDK agent (your worker).
# step.run forwards only positional args, so a call that needs keyword
# args is wrapped in a lambda (the step body becomes a no-arg callable).
response = await ctx.step.run(
"run-agent",
lambda: run_customer_support_agent(
customer=customer,
thread=thread,
email_body=ctx.event.data["body"],
),
)
# Step 4: write the draft reply to the database
await ctx.step.run(
"save-draft-reply",
lambda: save_reply(customer_id=customer_id, text=response.draft),
)
# Step 5: notify the on-call human reviewer via Slack
await ctx.step.run(
"notify-reviewer",
lambda: post_slack_for_review(response=response),
)
return {"status": "drafted", "reviewer_notified": True}
Five steps. Each one is independently checkpointed.
What durability buys you, in three failures that can hit this exact function:
| If this fails | Without step.run | With step.run |
|---|---|---|
| The agent times out (step 3) | the retry reloads the customer and thread and reruns the agent from scratch, paying for the OpenAI tokens twice | steps 1-2 come back from memo; only step 3 retries, and Inngest handles the transient error for you |
| The process is killed between steps 3 and 4 (deploy, restart, OOM) | the agent's reply is lost; the email goes unanswered until someone notices | the function resumes after the restart: steps 1-3 return from memo in milliseconds, steps 4-5 run, the customer gets the reply |
| Slack returns a 503 (step 5) | you lose the work, or you hand-write retry-and-backoff just for Slack | Inngest retries step 5 with backoff until Slack recovers or the retry budget runs out; steps 1-4 stay done, the draft is already saved |
You do not write any retry loops, any "did I already do this" checks, or any state machine of your own. The state machine is the sequence of step.run calls.
The one rule of step.run. A step must be safe to re-run: if it fails and Inngest runs it again, the second run must not corrupt anything.
- Pure functions are automatically safe.
- Idempotent API calls are safe (Stripe's
idempotency_key, your own MCP server tools): a repeat is a no-op. - Non-deterministic work is still safe to re-run; you just may get a different result on the retry. A fresh random ID, or an LLM call at default temperature, will differ on a second attempt. That is fine for an agent's reply (any valid draft will do). When the exact value must be stable across retries, pin it: pass a seed, or generate it once in its own earlier step and read it back.
Quick check. True or false. (a) The function body re-executes from the top every time Inngest advances to the next step, not only on retries, re-running the plain code (variable assignments, branching) between your
step.runcalls. (b) If a step takes 30 seconds to complete, and the function crashes 25 seconds in, the retry continues that step from second 25. (c)step.runoutputs are stored in Inngest's infrastructure, not in your application.
Answers: (a) True, and it surprises people: Inngest re-enters your handler from the top on every step, skipping completed steps from memo. So code outside step.run runs many times on a clean run, not just on retries. Code inside a step runs once, then returns from memo. (Module-level imports load once regardless; only the handler body re-runs.) This is the real reason to keep work inside step.run. (b) False: step.run is the atomic unit; if a step is interrupted, the retry re-runs the entire step. If your step is so long it cannot be allowed to restart, you break it into smaller steps. (c) True: the step output store is part of Inngest, not your DB. This is why you can replay runs even after your database schema has changed.Try with AI
With my AI coding assistant connected to the Inngest dev server MCP,
shape a customer-support worker into an Inngest durable function.
Take a Runner.run call that processes a customer email and wrap each
of these inside its own step.run:
1. Load the customer record
2. Load the related conversation thread
3. Run the agent (the OpenAI Agents SDK Runner)
4. Persist the draft reply
5. Notify the on-call reviewer
Use grep_docs to find the current Python SDK syntax. Use
invoke_function to test it with a synthetic email payload. Then
deliberately raise an exception in step 4 and use get_run_status
to confirm steps 1-3 don't re-execute on retry.
Concept 7: Memoization, the mechanic underneath resumability
Concept 6 said "steps that have already completed return their stored outputs instead of re-executing." That mechanism is memoization, and it is worth a close look because every other Inngest primitive is built on it.
When you call await ctx.step.run("load-customer", load_customer_by_id, "c-4429"), Inngest keeps a memo store keyed by (run_id, step_name). The same line behaves differently depending on whether that key is already filled:
- First attempt: the memo is empty, so
load_customer_by_idactually runs, and Inngest saves what it returns before handing the result back to you. - Every later replay (Inngest re-enters the handler when it moves on to the next step, and again on any retry): the memo already holds
load-customer, soload_customer_by_iddoes not run, the DB call never happens, and the saved value comes back in milliseconds.
This is why retries are cheap (the expensive work is already cached), why durability is correct (the expensive work never happens twice), and why "the body re-runs top to bottom" is fine despite sounding wasteful: the work inside steps does not actually re-run; only the orchestration code between steps does.
The completed step is paid once, not once per retry.
The implication that surprises new users. Code outside step.run runs every time Inngest re-enters the handler, which is once per step, not just on retries. If you do this:
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
# ANTI-PATTERN: this re-runs every time Inngest advances a step. Don't do this.
expensive_thing: dict = await fetch_expensive_data(ctx.event.data["id"])
await ctx.step.run("do-something", do_something_with, expensive_thing)
return {"status": "done"}
fetch_expensive_data runs again on every step the function takes, even with no failures. This one-step example already calls it twice on a clean run (once per handler re-entry), and every step you add is one more call. So at $0.10 a call it is already wasting money before anything breaks, and a retry pays for it all over again. The fix is to wrap the expensive thing in its own step:
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
expensive_thing: dict = await ctx.step.run(
"fetch-expensive-data", fetch_expensive_data, ctx.event.data["id"],
)
await ctx.step.run("do-something", do_something_with, expensive_thing)
return {"status": "done"}
Now fetch_expensive_data is memoized; retries do not pay for it again.
The step name is the memo key. The Python SDK does not collide on a repeated name; it auto-numbers them by call order (load-customer, then load-customer:1, then load-customer:2), so each gets its own memo slot. But do not lean on that: the auto-numbers carry no meaning, so a dashboard trace showing load-customer:7 tells you nothing about which customer, and inserting or removing a step shifts every later number. Give each call a stable, data-derived name instead, step.run(f"load-customer-{customer_id}", ...) in a loop, so the memo key is tied to the data, not to call order.
Predict. Your function has three steps. Step 1 (
load-customer) costs $0.01 in DB calls and takes 100ms. Step 2 (run-agent) costs $0.20 in OpenAI tokens and takes 12 seconds. Step 3 (save-draft) costs $0.005 in DB calls and takes 50ms. Step 2 fails 30% of the time due to OpenAI rate limits; Inngest retries with backoff. What is the cost difference between (a) wrapping all three instep.runand (b) wrapping only step 2 instep.run? Confidence 1-5.
The answer: with (a), a single retry of step 2 costs the cost of step 2 only ($0.20); step 1 is memoized and skipped, and step 3 has not run yet. With (b), step 1 is outside step.run, so it re-executes on every retry of step 2: about $0.21 per retry ($0.01 for step 1 plus $0.20 for step 2). Step 3 is not the cost here, it runs once, after step 2 finally succeeds; the point is that any work before a failing step re-runs unless you wrap it. Over a thousand emails with a 30% retry rate, that is roughly $3 of wasted step-1 DB calls, and the real danger is bigger than money: if step 1 had a side effect (a write, a charge), leaving it outside step.run makes that side effect happen again on every retry. Wrap everything you do not want re-executed in step.run. It is not optional once you understand the mechanic.Try with AI
With my AI coding assistant: review the Inngest function we built
in Concept 6's Try-with-AI and identify any code BETWEEN step.run
calls that should be wrapped in its own step but isn't. Common
candidates:
- Computed values (timestamps, IDs, formatting) that we want to be
stable across retries
- Calls to logging or metrics services
- Reads from Redis, environment variables, secret managers
Then propose a refactor that moves each of these into its own step
with a meaningful name. For each, explain whether the side effect
is one you want to happen once (use step.run) or every retry
(leave it outside).
Concept 8: step.sleep and step.wait_for_event, durability through time
Some work has to wait. A welcome-email pipeline sends an email immediately, then waits three days, then sends a follow-up. A refund-investigation needs to wait for a human to approve. A trial-conversion flow watches for "user upgraded to paid" within 7 days and sends a different email depending on what it sees.
In a normal Python function, "wait three days" means hold a process open for three days. That is untenable: your process restarts, your hosting bills you for 72 hours of idle compute, your timer gets lost. In Inngest, "wait three days" is one line:
from datetime import timedelta
@inngest_client.create_function(
fn_id="trial-welcome-series",
trigger=inngest.TriggerEvent(event="user/trial.started"),
)
async def welcome_series(ctx: inngest.Context) -> dict[str, str]:
user_id = ctx.event.data["user_id"]
await ctx.step.run("send-welcome-email", send_welcome_email, user_id)
# Wait three days. The function gets paged out of memory. Nothing
# is consuming compute. Three days later, Inngest pages it back in
# and resumes execution at the next line.
await ctx.step.sleep("wait-three-days", timedelta(days=3))
await ctx.step.run("send-followup", send_followup_email, user_id)
return {"status": "completed"}
step.sleep is durable, the nervous system at rest. The function suspends; Inngest stores the resume time; nothing consumes compute while you wait; the function resumes at the right time, with all prior step outputs still memoized. step.sleep (and step.sleep_until) can wait up to one year on paid plans, up to seven days on the free Hobby plan (Inngest usage limits). The seven-day Hobby ceiling is wide enough for every sleep this course uses.
The more powerful sibling is step.wait_for_event. Instead of waiting for time, wait for another event. The function suspends until a matching event arrives, or until a timeout you set expires. This is what makes Inngest the cleanest expression of HITL (Concept 15) and inter-agent coordination patterns:
@inngest_client.create_function(
fn_id="refund-with-approval",
trigger=inngest.TriggerEvent(event="customer/refund.requested"),
)
async def refund_with_approval(ctx: inngest.Context) -> dict[str, str]:
request = ctx.event.data
request_id = request["request_id"]
# If amount is over $100, require approval before issuing
if request["amount_cents"] >= 10_000:
# Notify a human via Slack/email/whatever
await ctx.step.run("notify-approver", notify_human_approver, request)
# Wait for an approval event. Up to 24 hours; expires otherwise.
approval = await ctx.step.wait_for_event(
"wait-for-approval",
event="refund/approval.decided",
timeout=timedelta(hours=24),
if_exp=f"async.data.request_id == '{request_id}'",
)
if approval is None or not approval.data.get("approved"):
return {"status": "rejected_or_timeout"}
# Either it was under $100, or it was approved
refund = await ctx.step.run(
"issue-stripe-refund", call_stripe_refund_api, request,
)
return {"status": "issued", "refund_id": refund["id"]}
What is happening, top to bottom:
the function reaches wait_for_event -> it SUSPENDS (zero compute)
|
| a human sees the Slack note, clicks Approve in your admin UI
| the UI sends a refund/approval.decided event
v
Inngest matches that event to THIS waiting run (if_exp picks the right one)
|
v
the function RESUMES, with the event as the `approval` value
|
v
the refund step runs -> Stripe refund happens, after the human approved
The one subtle part is the match in the middle: if_exp is what makes the approval event wake this request's run and not someone else's.
step.sleep and step.wait_for_event are timeouts you do not pay for. The function looks synchronous in your code ("wait three days, then send the email"), but the runtime semantics are async and durable. This is one of the two things Inngest is famous for (durable retries being the other). Without it, the alternative is a queue plus a state machine plus a database plus a poller, and you would write a thousand lines instead of three.
Quick check. Three claims. Mark each True or False. (a) If
step.sleepis set for 30 days and your service is redeployed five times in those 30 days, the sleep continues uninterrupted on a paid plan. (b) Ifstep.wait_for_eventtimes out, the function raises an exception. (c) Twostep.wait_for_eventcalls in the same function can wait for the same event simultaneously.
Answers: (a) True on a paid plan: sleeps are stored in Inngest's infrastructure, not in your service's memory, so redeploys do not lose them. Note the tier ceiling: a 30-day sleep is fine on a paid plan but exceeds the free Hobby plan's seven-day sleep cap. (b) False: on timeout, wait_for_event returns None. Your code checks for it and decides what to do (rejection, escalation, default-approval, whatever the policy is). (c) False in normal sequential code: a function hits one wait_for_event, suspends, and reaches the next one only after the first resumes, so the two waits run in sequence, and a single matching event resumes whichever wait is currently suspended. They would overlap only if you launched them as parallel steps, a pattern beyond this course. The everyday rule: one event resumes one waiting point.Try with AI
Build a delayed-investigation flow with my AI coding assistant.
Specification:
1. Triggered by event 'customer/refund.failed'.
2. Immediately notify the on-call human via Slack with the refund
details and a "Investigate" button.
3. Wait for the human to click the button (which fires
'customer/refund.investigation_started') for up to 4 hours.
4. If the click arrives in time: run the agent to draft an
investigation summary.
5. If 4 hours pass without a click: escalate to a senior reviewer
by firing 'customer/refund.escalated'.
Use the dev-server MCP's send_event tool to simulate the
human-click event during testing. Use get_run_status to inspect
how the suspended function shows up in the dashboard. Before
writing, use list_docs to scan the Inngest documentation tree
for the right page on wait_for_event semantics, then
read_doc on the page you find to get the exact syntax for
the if_exp filter expression.
Concept 9: Retries, error handling, dead-letter
This is the reflex in close-up. By default, Inngest retries failed steps. The defaults are sensible: ~4 retries with exponential backoff, ranging from a few seconds to a few minutes between attempts. After the final retry fails, the run enters a failed state and stays there for inspection and (optionally) replay. You can tune this per function: retries=10, or retries=0 to never retry. To skip retries for a specific failure (a declined card, a 401), raise inngest.NonRetriableError from inside the step, as the example below does.
@inngest_client.create_function(
fn_id="charge-customer",
trigger=inngest.TriggerEvent(event="order/checkout.completed"),
retries=2, # transient Stripe errors (503, timeout) retry twice
)
async def charge_customer(ctx: inngest.Context) -> dict[str, str]:
try:
charge = await ctx.step.run(
"call-stripe", call_stripe_charge, ctx.event.data,
)
return {"status": "charged", "charge_id": charge["id"]}
except inngest.NonRetriableError as e:
# call_stripe_charge raises NonRetriableError on a declined card, which
# tells Inngest NOT to retry the step (a decline will not become an
# approval on attempt 2). So we land here on the FIRST failure, with no
# wasted retries, mark the order, and kick off the dunning flow.
await ctx.step.run(
"mark-failed",
lambda: mark_order_failed(ctx.event.data["order_id"], reason=str(e)),
)
await ctx.step.run(
"emit-dunning-event", emit_dunning, ctx.event.data["order_id"],
)
return {"status": "card_declined"}
Three patterns matter.
Pattern 1: Transient vs permanent failures. Inngest retries everything by default, but some errors are not transient. A card-declined error from Stripe will be declined again on retry. A 401-unauthorized from your downstream API will not become a 200 just because you wait. Your function should catch these specifically and handle them: write to your DB, emit a downstream event, return cleanly, so they do not waste retry budget on hopeless attempts. Inngest's NonRetriableError explicitly tells Inngest to skip retries for a thrown exception.
Pattern 2: Step-level vs function-level errors. A step that throws is retried. After step-level retries are exhausted, the function fails. Sometimes you want a function to survive a failing step: log the failure, mark the work as "partial," continue. Wrap the step.run in try/except. The step still gets its retries; if all retries fail, the exception propagates to your catch block, where you can decide what to do.
Pattern 3: Dead-letter and replay. A fully failed function does not vanish; it lands in the dashboard's "failed runs" view with its full trace, step outputs, and exception, beside a Replay button. Fix the bug, ship it, replay, with no dead-letter handler to write. (Replay is a fresh run from the top, not a memo-preserving resume, so keep side-effecting steps idempotent; Concept 14 covers it in full.)
Predict. Your function calls Stripe in step 2 and your customer data service in step 4. Stripe returns 503 (service unavailable, transient) on the first attempt of step 2. Step 2 retries 4 times with exponential backoff (~1s, ~2s, ~5s, ~12s); on the 4th retry, Stripe is back, the charge succeeds. Now step 4 runs, and the data service is down with a 500. Does Inngest retry the whole function, or just step 4? How many times? Confidence 1-5.
The answer: just step 4, and it gets its own retry budget. Steps do not share retries. Step 2's four retries are independent of step 4's. Inngest will retry step 4 (default ~4 times) and if the data service comes back, step 4 completes, and the function succeeds. The Stripe charge from step 2 is not re-issued, because step 2's output was memoized after its successful retry. The customer is charged exactly once even though the function spent 20 seconds across retries.Try with AI
With my AI coding assistant: extend the customer-support worker
function from Concept 6 with explicit retry and failure handling.
Specification:
1. The OpenAI Agents SDK call should retry 3 times on transient
failures (rate limit, timeout), but NOT retry on a content-policy
refusal from the model.
2. The Slack notification should retry up to 10 times (Slack is
often flaky; don't lose the notification).
3. The Postgres write should retry once; if it fails again, log the
failure and continue (don't fail the whole function over a
transient DB blip).
For each step, decide what's transient vs permanent and structure
the try/except accordingly. Use grep_docs to find the Python SDK's
NonRetriableError equivalent.
Concept 10: step.run for AI calls in Python (step.ai.wrap is TypeScript-only)
Concepts 6-9 work for any side-effecting code: DB writes, API calls, file writes, agent invocations, and that includes your LLM calls. So here is the headline for AI calls in Python, up front: you keep using ctx.step.run. Inngest does ship AI-specific step.ai primitives, but in Python they are either unavailable or niche, and reaching for them is the common wrong turn this concept exists to prevent.
Important Python-vs-TypeScript note up front. Inngest's
step.aimodule has two methods, and they have different language support.step.ai.infer()is available in both TypeScript and Python (Python SDK v0.5+): it offloads inference to Inngest's infrastructure and traces the call.step.ai.wrap()is TypeScript only: there is no Python equivalent today. For Python projects (like this course's worker), the correct pattern for wrapping an OpenAI Agents SDK call isctx.step.run(...), which already gives you full durability, retries, and observability of the wrapped step's inputs and outputs. You just do not get the LLM-specific prompt/response telemetry that the TypeScriptstep.ai.wrapadds. (Verified against the AI Inference docs as of May 2026.)
step.run wraps the agent run, not a bare model call. In this course your worker is an OpenAI Agents SDK agent, so the agent makes the LLM and tool calls, not you. You wrap the whole agent run in ctx.step.run(...). Inngest does not care what is inside the step; your agent is just the function you hand it. It records the step's input and the agent's result, retries the step on a transient failure, and memoizes it on success so later steps never re-pay the agent's cost.
@inngest_client.create_function(
fn_id="summarize-customer-thread",
trigger=inngest.TriggerEvent(event="customer/thread.summary_requested"),
)
async def summarize_thread(ctx: inngest.Context) -> dict[str, str]:
thread = await ctx.step.run(
"load-thread", load_thread, ctx.event.data["thread_id"],
)
# The agent makes the model and tool calls internally. You wrap the whole
# AGENT RUN in step.run, so Inngest sees it as one step: it records the
# input and the agent's result, retries on a transient failure, and
# memoizes on success so later steps do not re-pay the agent's cost.
result = await ctx.step.run(
"run-agent",
lambda: run_support_agent(thread=thread),
)
return {"summary": result.summary}
The dashboard shows this run as load-thread then run-agent, each with its input and output. The one thing you do not get, versus TypeScript's step.ai.wrap, is LLM-specific telemetry (token counts, model name) broken out in the dashboard's AI view; the Agents SDK's own tracing covers that.
The agent run is one step. Because you wrapped the whole agent, the model and tool calls inside it are not separate Inngest steps. If the agent run fails partway and Inngest retries run-agent, the entire agent re-runs from the start, re-paying the tokens it already spent on that attempt. That is usually fine: an agent draft is cheap to redo, and any valid draft will do. When one agent run is costly enough that you do not want to redo it wholesale, break the work into smaller pieces, each its own step.run (load and retrieve in their own steps, then a shorter agent call), so a retry redoes only the piece that failed.
Because step.run records each step's inputs and outputs to Inngest's observability store, the content you pass through a step is stored and visible in the dashboard. If your prompt includes PII (names, emails, addresses), secrets (API keys, internal tokens), contractual or financial data, or regulated content (HIPAA, GDPR-scoped data, PCI), do not pass the raw content into the step body. Redact, hash, summarize, or pass a reference (a customer_id and ticket_id, not the full ticket text) and reload the sensitive content inside the step body from your authoritative store, where retention and access controls are yours to configure. The same discipline applies to the OpenAI Agents SDK's own tracing if you enable it. Treat step traces as you would treat any production log: useful by default, regulated by policy.
step.ai.infer (Python-supported, but niche). You will rarely reach for it; step.run is the default for every AI call in this course. Its one purpose: instead of calling OpenAI from your process, you ask Inngest's infrastructure to make the call so your process can deallocate while the request is in flight. On serverless platforms that bill for in-flight time, and for long inferences (Deep Research, large embedding batches), that saves real money; for sub-second calls on an always-on server it just adds latency. If you do use it, pull the exact signature from the AI Inference docs for your version; it lives in the experimental inngest.experimental.ai namespace and was not exercised in this course's build.
Quick check. True or false. (a) In Python, wrapping your agent run in
ctx.step.run("run-agent", run_support_agent, ...)makes it durable, retried on transient failures, and memoized on success. (b)step.ai.inferis a hard requirement for using Inngest with the OpenAI Agents SDK in Python. (c) Replacingstep.runwithstep.ai.inferfor a single OpenAI call would always make the function cheaper to run.
Answers: (a) True: this is the recommended Python pattern. The agent run goes inside the step body; Inngest treats the whole step as the unit of work. (b) False: step.run is enough for most cases. step.ai.infer is an optimization for serverless compute cost, not a requirement. The OpenAI Agents SDK integration in the worked example uses plain step.run. (c) False: step.ai.infer saves money only when (i) you are on a serverless platform that bills for in-flight time AND (ii) the call is long enough that the request-offload savings dominate the added orchestration overhead. For sub-second calls on always-on servers, plain step.run wins.Try with AI
With my AI coding assistant: take a customer-support agent
invocation and produce TWO versions of the Inngest function that
calls it:
Version A: The normal pattern. Wrap the Runner.run call (the whole
agent run) in step.run: durable, retried on transient failures,
memoized, with the standard step trace.
Version B: The niche exception, for comparison. step.ai.infer can
only offload ONE model call, not a whole agent, so write a SEPARATE
small function that makes a single direct OpenAI completion via
step.ai.infer (the Python-supported primitive that hands that one
call to Inngest's infrastructure to save serverless compute cost).
This is the one place you call the model directly instead of letting
the agent do it.
For each version, explain (a) what the dashboard trace shows for a
successful run, (b) what happens when the OpenAI call hits a 429
rate limit, and (c) on which kind of deployment (always-on server
vs serverless) Version B's offload saves real money.
Part 3: Balance and recovery, production scale
Parts 1 and 2 got your worker running and surviving crashes. Part 3 is about running it at real scale: keeping one busy worker from overwhelming everything around it, and recovering fast when something goes wrong in bulk. The five concepts, in plain terms:
- Concurrency and throttling (Concept 11): cap how many runs happen at once, and how fast new ones start, so a flood of events does not open a thousand database connections or blow past your OpenAI rate limit in a single second.
- Priority and fairness (Concept 12): make sure one customer sending 500 emails does not push everyone else to the back of the line.
- Batching (Concept 13): handle 10,000 events as about 100 grouped runs instead of 10,000 separate ones.
- Replay and cancellation (Concept 14): after a bad deploy, re-run the runs that failed, on the fixed code; or cancel work you no longer want to happen.
- Human-approval gates (Concept 15): pause the agent and wait for a person before a high-stakes action, like a large refund.
Together they turn a worker that runs into one you can safely put in front of paying customers.
Concept 11: Concurrency and throttling
Your prototype handles a few emails a minute and is fine. Then a busy morning sends 1,000 at once, your worker tries to run all 1,000 at the same time, and it opens 1,000 OpenAI calls and 1,000 database connections in the same instant, exhausting both. This is the most common gap between a prototype and production, and the fix is two small limits, one line each:
- Concurrency is how many runs may execute at the same time.
- Throttling is how fast new runs are allowed to start.
from datetime import timedelta
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
concurrency=[inngest.Concurrency(limit=10)],
throttle=inngest.Throttle(limit=100, period=timedelta(minutes=1)),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
...
concurrency=10 says: at most 10 of these functions are running at any moment. The 11th event waits in queue until one of the 10 finishes. throttle=100/minute says: at most 100 new runs start per minute. The 101st event waits even if there is concurrency headroom.
Why you usually want both. Concurrency protects your downstream systems from too many calls at once (the 1,000-connections problem above). Throttle protects them from a burst: if 500 emails land at 9:00 sharp, you do not want 500 runs starting in the same second, even if you have concurrency headroom; throttle spreads the starts out.
The subtle part, and the reason a concurrency cap alone is not always enough: concurrency limits how many runs are in flight, not how fast new ones start. If your runs are fast, a freed slot fills the instant one finishes. So concurrency=10 can still launch hundreds of starts a second, more than enough to blow past a "30 requests per minute" limit even though only 10 ever run at once. So match the knob to the limit you are protecting: a count limit (a 20-connection database pool) wants concurrency; a rate limit (OpenAI's 30 per minute) wants throttle. When runs are slow, concurrency bounds the rate too as a side effect and you may not need throttle; when runs are fast, only throttle holds the rate.
Per-key concurrency. A single concurrency limit applies to the function globally. A more interesting pattern is per-key concurrency: limit by some property of the event. You pass a list of caps instead of one:
concurrency=[
inngest.Concurrency(limit=10), # global cap
inngest.Concurrency(limit=2, key="event.data.customer_id"), # per-customer cap
],
This says: at most 10 functions running globally, AND at most 2 per customer at a time. If a single customer sends 100 emails in a minute, only 2 of their emails are processed simultaneously; the other 98 queue behind. Meanwhile, other customers' emails flow normally; they are not blocked by the chatty customer. This is multi-tenant fairness in two lines of code. Concept 12 develops the pattern further.
Picture the whole policy under a 9am burst: the throttle slows how fast runs start, the concurrency cap holds how many run at once, and the per-customer key keeps one flood from taking every slot, while everything else waits in a durable queue.
Nothing is dropped; work queues. Three knobs decide what runs, how fast it starts, and who waits.
Quick check. Three claims, True or False. (a) If you set
concurrency=10and 1,000 events arrive at once, 990 of them are dropped. (b) Throttling and concurrency limits both reduce total throughput. (c) Per-key concurrency requires a key that is deterministic from the event data.
Answers: (a) False: events are not dropped; they queue. Inngest's queue is durable; the 990 events wait until concurrency slots open up. (b) False. Throttling caps start-rate; concurrency caps in-flight runs. Neither drops work; both shape when work executes. Throughput over a long window is unchanged if your average load is below the limits. Throughput over a peak is shaped: bursts are absorbed by the queue. (c) True: the key expression is evaluated on the event data; it has to produce a stable string for the same logical scope (customer_id is fine; current_timestamp is not).Try with AI
With my AI coding assistant: design the concurrency and throttling
policy for the customer-support worker. Constraints:
- OpenAI rate limit: 30 requests per minute, hard cap.
- Postgres connection pool: 20 max connections (the worker takes 1 per run).
- Some customers send bursts of 30+ emails in a minute (an angry
customer); these shouldn't starve other customers.
- We expect ~1,000 emails per day, with peaks around 9am and 2pm.
Propose:
1. A global concurrency value
2. A per-customer concurrency value
3. A throttle (limit and period)
For each, explain what production failure it protects against and
what the cost is (in queue latency at peak).
Concept 12: Priority and fairness, multi-tenant scaling
Concurrency limits work. Per-key concurrency adds basic fairness. Production-grade multi-tenant systems need more: priorities (Enterprise customers should not wait behind hobbyists for the same compute) and fair-share scheduling (no single tenant can monopolize the system even within their concurrency cap).
Priority. Inngest evaluates a priority expression on each event; runs with higher priority jump the queue ahead of runs with lower priority. It is one more argument on the same create_function from Concept 11:
priority=inngest.Priority(
# Higher number wins (range -600..600). The producer puts the tier's
# priority on the event directly: Enterprise = 100, Pro = 0, Free = -100.
run="event.data.tier_priority",
),
When the concurrency queue has 50 runs waiting, the Enterprise customers' runs go first, then Pro, then Free. Inside the same tier, FIFO order applies. Priority does not override concurrency or throttle limits; it just decides which of the waiting runs gets the next free slot. An Enterprise customer still waits for a slot to open; they just get the next one.
Fair-share scheduling. When you have hundreds of tenants competing for the same global concurrency pool, FIFO plus priority is not enough. A single tenant sending a burst can still occupy most of the slots for minutes. Fair-share scheduling, implemented via the key parameter on concurrency with a thoughtful sizing, gives each tenant a guaranteed slice:
concurrency=[
inngest.Concurrency(limit=50), # global pool
inngest.Concurrency(limit=3, key="event.data.tenant_id"), # max 3 per tenant
],
With this: 50 total slots, no tenant takes more than 3. If 20 tenants are active, that is at most 60 slots requested but only 50 available. Fair-share rotates them through, every tenant gets some share, no one gets shut out.
Predict. You have a customer-support function with
concurrency=10and per-customerconcurrency=2. You also have priority configured: Enterprise = high, Free = low. At 9:00am, the queue has: 5 events from Customer A (Free), 5 events from Customer B (Enterprise), and 10 events from a single new Customer C (Free, just bought their first plan). In what order do they execute? Confidence 1-5.
The answer: it resolves in three passes, in this order.
1. per-customer cap (2 each) -> eligible pool = 2 from A, 2 from B, 2 from C (6 runs)
2. priority sorts the pool -> B's 2 first (Enterprise), then A's 2 and C's 2 (Free, FIFO)
3. fill the 10 global slots -> all 6 fit, so 6 run now; the rest wait
As each run finishes, that customer's next queued event becomes eligible (pass 1), and the next free slot goes to the highest-priority waiter (pass 2). The per-customer cap is what stops Customer C's ten events from taking the whole queue.
Flow control is the one place in this course where "run it and watch" does not fully hold. Of the four knobs in Concepts 11 and 12, only concurrency is observable on the local dev server: send a burst and you will see only N run at once. The other three you configure and reason about locally, then confirm the effect in Inngest Cloud (or a branch deploy):
- Throttle is a rate limit the dev server does not enforce, so locally your runs start as fast as they can, regardless of the limit. The config is correct; the rate only bites in Cloud.
- Priority and fair-share only appear under sustained multi-tenant contention, a full queue with many tenants competing. A handful of test events never creates that, so they stay invisible locally even when configured correctly.
So for these three, "verified" means the config is accepted and the function runs, and you can reason about the behavior. Do not conclude "nothing is enforced" from a quiet dev server; confirm the real effect under load in Cloud.
Try with AI
With my AI coding assistant: extend the customer-support worker
configuration with a priority and fair-share scheme. Requirements:
1. Three customer tiers: Enterprise, Pro, Free.
2. Enterprise customers should never wait more than 5 seconds at
peak load.
3. Free tier customers should get fair access: no Free customer
should be starved for more than 60 seconds, even when the
global queue is full.
4. A single noisy customer (regardless of tier) should not occupy
more than 3 slots.
Write the concurrency + priority configuration. For each line of
config, explain which requirement it satisfies.
Concept 13: Batching, cost-effective bulk processing
Some work is naturally batched. You do not summarize each of 10,000 customer conversations independently; you call the LLM with a batch of 50 at a time. You do not write 10,000 audit rows one at a time; you write them in one bulk insert. Inngest's batch trigger lets you accumulate events and invoke a single function with the batch as input.
@inngest_client.create_function(
fn_id="batch-embed-tickets",
trigger=inngest.TriggerEvent(event="ticket/resolved"),
batch_events=inngest.Batch(
max_size=50, # invoke when 50 events accumulated, OR
timeout=timedelta(seconds=30), # invoke when 30 seconds pass, whichever first
),
)
async def batch_embed_resolved_tickets(ctx: inngest.Context) -> dict[str, int]:
# ctx.events (plural) instead of ctx.event
ticket_ids = [e.data["ticket_id"] for e in ctx.events]
tickets = await ctx.step.run(
"load-tickets", load_tickets_by_ids, ticket_ids,
)
# One embedding call for 50 tickets, not 50 calls for 1 ticket each
embeddings = await ctx.step.run(
"embed-batch", embed_texts_batch,
[t["text"] for t in tickets],
)
await ctx.step.run(
"store-embeddings", store_embeddings_batch,
ticket_ids, embeddings,
)
return {"batched": len(ctx.events)}
What changes: ctx.events is a list, not a single event. The function runs once per batch instead of once per event. The OpenAI embedding API is called with a 50-text batch instead of 50 single-text calls, which is dramatically cheaper (you pay per token, but the per-request overhead is gone) and faster (one API round-trip instead of 50).
Batching is the right tool when the work is naturally bulkable (embeddings, bulk DB writes, bulk emails) and you can tolerate up to your timeout's worth of latency before the work happens. It is the wrong tool when each event requires interactive response or when ordering matters across events in unpredictable ways.
Quick check. True or false. (a) Batched functions still get retries and memoization; the batch as a whole is durably memoized. (b) If the batch timeout expires with only 3 events accumulated, the function will not run until the next 47 arrive. (c) You can combine
batch_eventswithconcurrencyto cap how many batches run in parallel.
Answers: (a) True: the batch is the unit of work; retries replay the whole batch with all its events still in scope. (b) False: that is the whole point of the timeout. After 30 seconds the function runs with whatever is accumulated, even if it is 1 event. (c) True: this is the production pattern. Batch plus concurrency together cap your downstream load nicely.Try with AI
With my AI coding assistant: write a batched Inngest function that
embeds resolved support tickets, converting a per-ticket event
handler into one batched call.
Triggers: 'ticket/resolved' event, batched at 50 events or 30 seconds.
The function should:
1. Load the ticket bodies in one query
2. Call OpenAI embeddings API with a 50-text batch (faster + cheaper)
3. Store the embeddings
4. Emit a 'ticket/embedded' event per ticket for downstream consumers
Use grep_docs to find the OpenAI batch-embedding pattern.
Concept 14: Replay and bulk cancellation, production recovery
Sometimes everything goes wrong at once. You shipped a bug; a thousand runs failed in the last six hours. Or your downstream API was down for 30 minutes; everything that tried to call it during that window died. Or you discovered a logic error and want to redo a day's work after fixing it.
First, the distinction that trips everyone up. Inngest gives you two ways a failed step can re-run, and they behave differently:
- Automatic retry (within the same run). When a step throws, Inngest retries the function with backoff, re-entering from the top. Completed steps return from memo and do not re-execute; only the failing step runs again. This is the memo-preserving resume, the one you watched in the Quick Win, and the one that makes the "$0.20 spent at step 3 is not re-spent" property true. It is automatic and happens inside the original run.
- Replay / Rerun (the dashboard button, across many runs). This starts a brand-new run from the top with your current deployed code, every step re-executing from scratch (a rerun gets a new run id and re-runs the first step, not a resume of the old one). So in practice the old run's memo does not save you here. It is for incident recovery, not for skipping completed work.
Keeping these straight is the whole concept. The memo payoff lives in the automatic retry; Replay is a fresh start. The two rows below are the same five steps under each path:
Memo protects you within a run; an idempotency key, not memo, protects you across reruns.
Two opposite recovery primitives. Replay says "this work failed, I want it to run again on the fixed code." Bulk cancellation says "this work was queued but I no longer want it to happen." Same dashboard surface, opposite intent. Most teams need both within their first three months of running real traffic.
Replay is the recovery primitive. Failed runs persist with their full step history, the input event, and the exception from the failed step. From the dashboard you open the Functions view, filter to a function that has failed runs, select a time window and a failure pattern (any specific error message or just "all failures"), and click Replay. Inngest schedules each as a fresh run from the top on whatever code is deployed now.
Three things to understand about replay.
- Replay uses your current deployed code. If you deployed a fix between when the runs failed and when you replay them, the replayed runs use the new code. This is the whole point: take a population of runs that died on a bug, ship the fix, and re-run them all hands-off.
- Replay re-executes every step; it does not reuse the old run's memo. A replayed run is a new run, so each step runs again from scratch on the fixed code. Cost-wise, plan for the cost of the whole function per replayed run, not just the failed step. The thing that keeps a replay from issuing a second real-world side effect (a duplicate refund, a duplicate email) is not memo, it is an idempotency key on that side effect (Concept 4): you derive a stable key from the request (for a refund, something like
(order_id, request_id)) and the provider treats a repeat as a no-op. The minimal worker in this course omits that key for brevity, its refund matches on the customer and writes unconditionally, so a production version would add one before any real money moves. - Replay is opt-in. Failed runs sit in the dashboard until you act on them. They do not retry forever; they do not disappear. They wait for you.
Bulk cancellation is the inverse. Sometimes you have thousands of queued or sleeping runs that you no longer want: a campaign got canceled, a customer churned and you no longer want to send them follow-up emails, a feature got rolled back. From the dashboard you select a function and a time window or event filter, and click Cancel. The matching runs terminate cleanly: their step.sleep and step.wait_for_event calls do not resume, queued runs do not start, in-flight runs check for cancellation and exit at the next step boundary. Cancellation respects the step boundary; an in-flight step.run finishes the step it is in before terminating, so you do not get half-completed Stripe charges or torn DB writes.
Replay vs cancellation as a decision. When something has gone wrong with a population of runs, ask one question: do I want this work to succeed or do I want it to not happen? If the work should succeed (bug-fix recovery), replay. If the work should not happen (cancelled campaign, churned customer, rolled-back feature), cancel. If you are unsure (for example, the failed runs include some you want to recover and some that should not have fired in the first place), filter your dashboard query more narrowly so each subset gets the right treatment.
Three patterns this enables in practice:
- The "we shipped a bug" recovery. Find the failed runs in the time window of the bad deploy, fix the bug, ship the fix, replay the failures. The customer experience: their email did not get a reply for an hour but did eventually get one, without you writing any recovery code.
- The "campaign canceled" rollback. A welcome series that fires three follow-up emails over 14 days; the customer churns on day 4. You do not want to send the day-7 and day-14 follow-ups. Bulk-cancel matching
wait-for-eventandsleepruns. - The "schema migration" replay. You changed how the agent formats summaries; you want yesterday's tickets re-summarized with the new format. Find those runs (successful or not) and replay them; because a replay is a fresh run from the top, the agent re-runs every step on the new code, which is exactly what you want here. Keep your side-effecting steps idempotent so re-running them does not double-charge or double-send.
The dev-server MCP makes recovery accessible without leaving your general agent. During development you can ask the AI to use get_run_status to inspect a failed run, then recover the work by re-firing the event on the fixed code (give it a new event id, since re-firing with the same id is deduplicated to a no-op by Concept 4's idempotency semantics). The dashboard Rerun button is the equivalent one-click path. Either way you get a fresh run on the current code, not a memo-preserving resume.
Quick check. True or false. (a) A dashboard Replay re-runs the work on the new deployed code. (b) A dashboard Replay returns the original run's successful steps from memo and only re-runs the failed one. (c) The automatic retry inside a failing run returns completed steps from memo and re-runs only the failing step. (d) Bulk-canceling a function that is in flight will mid-step abort the currently-executing
step.runto terminate faster.
Answers: (a) True: a replay is a fresh run from the top on whatever is deployed now, which is why it is the tool for bug-fix recovery. (b) False: this is the trap. A replay is a new run that re-executes every step from the top, so the old run's memo does not carry over. What stops a replayed side effect from firing twice is the idempotency key, not memo. (c) True: this is the memo-preserving path, and it is the one you watched in the Quick Win. The completed step sits at one attempt while the failing step retries. (d) False: cancellation respects the step boundary; the current step.run finishes (or fails) before the run terminates. This prevents torn writes.Try with AI
Walk through a recovery scenario with my AI coding assistant:
Yesterday at 14:00 we deployed a change to the worker's agent step.
A bug in the new code made the agent step throw on every run.
From 14:00 to 18:00, 47 customer-support runs failed at that step.
At 18:30 we noticed, fixed the bug, and re-deployed.
Use the dev-server MCP's grep_docs to find Inngest's replay docs,
then:
1. Outline the exact dashboard steps to identify the 47 failed runs.
2. Explain what a dashboard Replay does for one of those runs: is it
a fresh run from the top on the fixed code, or a resume that
reuses the old run's memo? What does that mean for the cost of
replaying all 47?
3. Confirm whether the customers will see one reply or several if a
replayed run re-sends the email, and name the mechanism that
keeps it to one (hint: it is not memo).
4. Identify ONE scenario in this story where you'd prefer to
bulk-cancel instead of replay, and explain why.
Concept 15: HITL gates with step.wait_for_event, Invariant 1 in the runtime
Some actions are too important to let the agent take on its own. Issuing a $500 refund, sending a legal notice, closing an account: you want the agent to investigate and propose the action, but a person to approve it before it actually happens. That pause for a human is an approval gate, and it is the one place in this whole system where the worker stops and waits for someone. (In the Agent Factory's terms this is Invariant 1, the human is the principal: on a high-stakes decision, the person's call is what runs, not the agent's.)
Inngest's step.wait_for_event (Concept 8) makes this clean. The agent runs up to the decision point, then suspends and waits for an approval event. A human reviews it (in Slack, an admin UI, or email) and clicks approve or reject; that click fires the event, the function wakes up with the verdict, and it acts. Your code controls what the agent is allowed to do, not how it reasons.
@inngest_client.create_function(
fn_id="refund-with-hitl-gate",
trigger=inngest.TriggerEvent(event="customer/refund.investigated"),
concurrency=[inngest.Concurrency(limit=5)],
)
async def refund_with_gate(ctx: inngest.Context) -> dict[str, str]:
request_id = ctx.event.data["request_id"]
amount_cents = ctx.event.data["amount_cents"]
# Step 1: the agent's analysis (your worker, run durably).
# Keyword-arg calls are wrapped in a lambda; step.run forwards only positional args.
analysis = await ctx.step.run(
"agent-investigates",
lambda: run_refund_investigation_agent(request_id=request_id),
)
# Step 2: if the agent thinks refund is warranted AND amount > $100,
# gate behind human approval
needs_approval = analysis.recommends_refund and amount_cents >= 10_000
if needs_approval:
await ctx.step.run(
"notify-approver",
lambda: send_slack_approval_request(
request_id=request_id,
analysis=analysis,
amount_cents=amount_cents,
),
)
# === THE HITL GATE ===
approval = await ctx.step.wait_for_event(
"wait-for-human-approval",
event="refund/approval.decided",
timeout=timedelta(hours=24),
if_exp=f"async.data.request_id == '{request_id}'",
)
if approval is None:
# Timeout: no human responded in 24h. Escalate.
await ctx.step.run(
"escalate-timeout",
lambda: escalate_to_senior_reviewer(request_id=request_id),
)
return {"status": "escalated_timeout"}
if not approval.data["approved"]:
await ctx.step.run(
"notify-rejected",
lambda: notify_customer_rejected(request_id=request_id),
)
return {"status": "rejected_by_human"}
# Either it was approved, or it didn't need approval
refund = await ctx.step.run(
"issue-refund",
lambda: call_stripe_refund(request_id=request_id, amount_cents=amount_cents),
)
await ctx.step.run(
"audit-approved-refund",
lambda: audit_refund(
request_id=request_id,
refund=refund,
approved_by="human" if needs_approval else "auto",
),
)
return {"status": "issued", "refund_id": refund["id"]}
What you see in code: a sequence of steps, with one wait_for_event in the middle. What is happening at runtime:
- The agent runs (step 1, durably).
- The function decides whether the gate applies (in-code logic, free of side effects).
- If gated: a Slack notification fires (step 2, durable). The function suspends for up to 24 hours.
- A human in Slack clicks Approve or Reject. The admin backend calls
inngest_client.sendwithrefund/approval.decidedand therequest_id. - Inngest matches the event to the suspended function (the
if_expfilter ensures only matching request IDs match). The function resumes at the next line. - The function uses the human's decision to either issue the refund or notify rejection. Both paths audit the decision and the approver.
This is what makes Inngest qualitatively different from a queue-plus-state-machine. The HITL pattern is one primitive. The function's code reads top to bottom, with the gate inline. There is no callback, no state restoration, no if state == waiting_for_approval: ... dispatching. The runtime handles the suspend/resume mechanic; your code expresses the policy.
The agent proposes, a person decides, and the wait costs nothing.
A later course develops Invariant 1 architecturally: authored intent, spec-driven workflows, the manager-of-workers layer that decides which gates apply to which actions. This course gives you the runtime primitive. When that manager layer arrives, the gate it implements will be exactly this wait_for_event pattern, just composed at fleet scale. Knowing the primitive now means the architectural pattern later reads as "a sensible composition" rather than "magic."
This is the keystone you build in Part 4's Decision 5: the refund approval, made durable. The concept here is the shape; the worked example wires it to a real needs_approval tool and proves the refund fires exactly once.
Predict. You have an HITL gate set with
timeout=timedelta(hours=24). A customer's refund request comes in at 17:00 on a Friday. No human is online over the weekend. The gate's timeout fires at 17:00 Saturday. Your timeout handler records a blocked refund. The reviewer reads the request Monday at 9:00am. Walk through the timeline: how many function runs were active during the weekend? How much compute did Inngest charge for? Confidence 1-5.
The answer: zero active function runs during the weekend. The function was suspended: Inngest stored its state, paged the function out of memory, and waited for either the event or the timeout. Inngest does not bill for suspended time. When Saturday 17:00 came and the timeout fired, the function resumed for the few hundred milliseconds it took to write the blocked-refund audit row, then completed. The fact that the reviewer does not look until Monday costs nothing from the worker's side. The economics of HITL workflows on Inngest are dramatically different from polling-based queues that bill you for every second of "is this approved yet?" polling.Try with AI
With my AI coding assistant: design a durable refund-approval gate.
Specification:
1. The agent investigates and decides a refund is warranted, but the
refund tool needs human approval before it runs.
2. The gate should:
- Notify the on-call reviewer with the agent's recommendation
- Wait up to 4 hours for the reviewer to approve or reject
- On approve: issue the refund.
- On reject: do not issue; record a blocked refund.
- On 4-hour timeout: do not issue; record a blocked refund.
3. Every branch (approve/reject/timeout) writes an audit row from a
small fixed set of action names, capturing what was decided.
Use the dev-server MCP's send_event to simulate each branch of
the reviewer's decision during testing.
Part 4: The worked example, a customer-support AI Worker
This is the spine of the course: where you actually build. Everything before this was the model and the reference. From here you assemble the real worker. First the worker (one prompt), then the nervous system around it, one layer per prompt. Each layer names the concept it draws on, so if a layer raises a "why," that concept in Parts 1-3 is the page to open. You direct your general agent in short plain-English prompts and it writes the code. The snippets shown below are the few load-bearing lines of each layer, not the files. The full implementation was run end-to-end against a live dev server and a real model, so the shapes you see are what runs. If a signature looks unfamiliar, your agent checks the current docs.
The full flow you are about to build, one email end to end:
a customer emails
|
v
the INNGEST ENGINE catches the event and drives your worker,
one step at a time, storing each result as it goes:
1. audit: "message received"
2. load the customer from Neon
3. YOUR AGENT drafts a reply (the thinking part; D1 makes it durable)
4. is it a refund? PAUSE for a human (waits hours, survives crashes; D5)
5. on approve: issue the refund; on reject: record it
6. audit: "reply sent"
if a step crashes, the engine re-runs only that step, never the
finished ones (D6). the same worker also wakes on a daily cron
and runs under flow-control caps (D3, D4).
Same two-program picture from the opening, the engine driving your agent, now the real worker. You build it one layer at a time:
The shape: seven prompts, on the base you already set up.
- D0 builds the worker itself, standalone.
- D1 makes the agent run durable.
- D2 lets an event wake it.
- D3 adds a daily cron that fans out.
- D4 adds flow control.
- D5 is the keystone: a durable human-approval gate on refunds.
- D6 proves the worker survives a broken step: retry without redoing completed work, then recover.
The agent never changes after D0; every layer is the nervous system, added from the outside.
Before you start. Your environment is already set up from the Quick Win: open the same
ai-agent-nervous-systemfolder, with the Inngest andneon-postgresSkills installed, yourOPENAI_API_KEYand your NeonDATABASE_URLin.env, yourcustomersandaudit_logtables provisioned, and all three MCP servers (Neon, Context7,inngest-dev) wired. Two reminders only:
- The dev server is running. Start it again if you closed it:
npx inngest-cli@latest devin its own terminal. The dashboard is athttp://127.0.0.1:8288. (When you later deploy to Inngest Cloud, the free Hobby tier is $0 with no credit card; its ceilings are in Part 5.)- One casing note for the MCP calls below. The dev-server tool names are
snake_case(send_event,get_run_status,invoke_function), but their parameters arecamelCase(get_run_statustakesrunId,invoke_functiontakesfunctionId). The Python SDK issnake_casethroughout; only the MCP call parameters arecamelCase.
The brief
You build a small customer-support worker and give it a nervous system. The worker reads its sample customers from the Neon customers table (id, email, tier), drafts a warm reply to an incoming email, can issue a refund only with human approval, and writes an audit row into the Neon audit_log table for every action, from a small fixed set of action names it chooses (a closed set, so a typo becomes a loud error instead of a silent bad row). The seven prompts then add Inngest around it: an event wakes it, the agent call runs durably, a daily cron fans out a health check per eligible customer, flow control caps concurrency and throttle, the refund pauses on a durable human gate, and a replay path recovers failed runs.
A note on the prompts that follow. Each one is written the way you would actually say it to a general agent: short, plain, trusting it to handle the detail. They work pasted cold, and better still if you first ask the agent to orient ("read the project and tell me what you see, then ask me anything unclear before you start") as the files pile up. The prompts are the destination; orienting first is the on-ramp.
D0: Build the worker, standalone
Where you are: the base is open, the dev server is running, and your Neon store is provisioned, but no worker exists yet. This Decision builds the standalone worker; by the end it runs on a sample email and writes an audit row to Neon.
The base already ships an AGENTS.md your agent read on open, so it knows the project. That is why these prompts stay short. The one rule in it worth keeping in your own head is the architectural invariant of the whole course: the worker's own code never imports from inngest. The agent and its tools stay plain Python; the nervous system wraps them from outside. That separation, the agent and the nervous system kept apart, is what lets you swap Inngest for Temporal or Restate later and leave the worker untouched.
Your Neon system of record is already provisioned from the Quick Win: the customers and audit_log tables exist, and DATABASE_URL is in your .env. So the worker reads and writes that database from the start. Now build the worker. Paste this:
Build me a minimal customer-support agent with the OpenAI Agents SDK, running in a local sandbox. It reads the sample customers from my Neon
customerstable (each row has an id, email, and tier), drafts a warm reply to an incoming customer email, and can issue a refund, but the refund tool needs human approval before it runs. When an email reports a duplicate charge, an overcharge, or a failed order, the agent must actually call the refund tool, not just promise a refund in prose. Write an audit row into my Neonaudit_logtable for every action, using a small fixed set of action names and theDATABASE_URLin.env. Seed thecustomerstable with five sample rows first if it is empty. Keep it small; it exists to be wrapped, not shipped. Then run it on a sample email and show me the reply.
The worker reaches Postgres through DATABASE_URL, never the Neon MCP (that is your build-time tool only). One line of what the agent writes is load-bearing for the rest of the course, the refund tool's decorator:
@function_tool(needs_approval=True)
def issue_refund(order_id: str, amount_cents: int, reason: str) -> str:
...
needs_approval=True makes the agent pause instead of issuing the refund: the run comes back with the refund pending for a human to decide. It is the hook the D5 keystone hangs off. (This floor gates every refund to keep the keystone simple; in production you would gate only above a threshold, the over-$100 pattern from Concept 15. Same wiring.) One thing to keep factored, because D5 leans on it: build the agent and its sandbox run-config as separate pieces, so D5 can rebuild the agent and re-supply the sandbox on resume.
Done when: the agent runs on a sample email and prints a short reply, and a new row is in the Neon audit_log table (check it in the console, or ask your agent to read it back over the Neon tools). If the email describes a refund, the run pauses at the refund tool rather than issuing it; that pause is the whole point, and D5 makes it durable.
The prompts in this Part assume a frontier-class general agent (Claude Sonnet or Opus, a GPT-5-class model, or Gemini 2.5 Pro). The Inngest architecture you are learning (events, steps, memoization, flow control) is SDK-level and holds whatever model drives your agent. But the build experience leans on strong instruction-following, especially the D5 keystone. On a weaker model, expect to iterate on a prompt more than once and to spell out the file names. The architecture is not broken; the prompting just needs more scaffolding.
D1: Make the agent run durable
Where you are: a worker that runs only when you call it, losing everything on a crash mid-run. This Decision wraps the agent call in step.run; by the end a completed run shows the agent step memoized in the dashboard.
The nervous system begins here: wrap the whole agent call in a single step.run so it is durable and memoized. Paste this:
Wrap the agent run in an Inngest durable function so it survives crashes and retries transient failures. The whole agent call goes inside a single
step.runso it is memoized. Run it in local dev mode against the Inngest dev server, with a FastAPI host. Confirm a completed run shows the agent step memoized in the dashboard.
The agent call is the expensive part (model tokens, several seconds). Inside step.run its result is memoized, so when a later step fails and the run retries, the agent does not run again. That is the difference between a worker that re-pays and re-acts on every retry and one that does each expensive thing once. Keep the agent invoked with a plain (non-streamed) run; D5's durable resume builds on it.
It runs as two processes: the FastAPI host, and the Inngest dev server pointed at it. Your agent starts both.
Done when: the dashboard lists the function and a completed run shows the agent step. (You wake it with a real event in D2; for now, discoverable is enough.)
D2: Trigger it on an event
Where you are: the durable function exists, but you still trigger it by hand and nothing is recorded. This Decision wakes it on a real event and writes an audit row on each side of the agent.
This is the first time the opening's picture runs for real. Instead of you calling the worker, a customer/email.received event arrives, the engine catches it, and the engine calls your worker to run. You also start recording what happened: one audit row just before the agent, one just after. Paste this:
Make the worker wake on a
customer/email.receivedevent instead of being run by hand. Add an ingress audit step before the agent and a reply audit step after it. Send a test event and show me the run completing with both audit rows.
To test it locally, send the event yourself with the dev-server MCP's send_event (a customer/email.received event carrying the email text and customer id), no webhook needed. In production you would instead point your email provider at an Inngest webhook URL, which is a dashboard setting, not code.
Done when: a test event drives a run that completes with three steps in order (audit, agent, audit) and two new rows in the Neon audit_log table, one before the agent and one after.
Why two steps, not one. Each audit write is its own step.run, so each is memoized on its own. If the reply step fails and the run retries, the ingress row is not written twice and the agent does not run twice, so the audit trail stays exactly-once across retries (the property D6 proves).
D3: A daily cron that fans out
Where you are: a worker the world wakes one email at a time. This Decision adds a daily cron that fans out one event per eligible customer; by the end each gets its own durable child run.
Add scheduled work: a daily cron that fires one health-check event per Pro and Enterprise customer, each event triggering its own durable run. Paste this:
Add a daily cron that fans out one
customer/health_check.requestedevent per Pro and Enterprise customer, each one idempotency-keyed so a re-delivered cron run never double-fires. Each child event triggers its own durable run that writes one audit row. Invoke the cron manually and show me one child run per eligible customer.
Two things carry this Decision. The fan-out goes inside a step (step.send_event, not a bare client send), so a retry of the cron does not re-emit duplicates. And each event gets an idempotency id derived from the customer and the cron tick (something like health-{customer}-{cron_run}): if the same tick is delivered twice (a redeploy, a retry), the duplicate drops, so each customer gets exactly one check that day. Invoke the cron from your agent with the MCP's invoke_function (don't wait for 09:00). One dev quirk: the dev server only fires crons while it is running; production runs them on Inngest's always-on infrastructure.
Done when: the parent completes in seconds and the dashboard shows one child run per eligible customer, with standard-tier customers correctly skipped.
Why fan-out, not a loop. The parent does not process the customers itself; it sends N events and returns. Each child is its own run, isolated, independently retryable, capped by its own concurrency. A loop inside one function would couple them: one slow customer holds up the rest, and a crash loses the whole batch. Fan-out is how one scheduled wake-up becomes N independent durable runs.
D4: Flow control
Step back first: by now you have assembled one worker, reached three ways, all sharing one Neon store. This is the thing D4 puts caps on.
INNGEST ENGINE (routes events, runs functions, stores steps)
|
┌──────────────┼────────────────┐
v v v
an email a daily cron one run per customer
arrives fans out a the cron emitted
(D2: the check per (D3: each isolated,
email worker) customer (D3) retryable on its own)
└────────── all run in YOUR host ───────────┘
|
Neon Postgres (customers + audit_log)
Same agent inside each path; only how the world reaches it differs. Now you keep all of it healthy under load.
Where you are: a worker that handles each email but would fire all of them at once under a burst. This Decision adds three flow-control policies; by the end a twenty-event burst queues under the cap with no dropped or duplicated rows.
When five hundred emails land at 9am, the worker should not fire five hundred model calls at once: that blows the rate limit and starves everyone behind the noisy customer. Add a global concurrency cap, a per-customer cap, and a throttle. Paste this:
Add flow control to the email handler: a global concurrency cap, a per-customer concurrency key so one noisy customer can't starve the rest, and a throttle to protect the OpenAI rate limit. Fire a burst of twenty events across five customers and show me they queue under the cap and all complete with no dropped or duplicated audit rows.
Three knobs do three jobs: a global concurrency cap (how many runs execute at once), a per-customer concurrency key (so one noisy account takes at most a slot or two and never starves the rest), and a throttle (how many runs start per minute). Match the throttle to your real downstream limit: the brief's OpenAI cap is about 30 per minute, so 30, not a generic 100. (A function carries at most two concurrency policies; the global-plus-per-key pair is the common shape.)
The concurrency cap protects two ceilings: the model's rate limit and your Neon connection budget. A single running copy of your worker already keeps its own database connections capped, because every run in it shares one connection pool. The concurrency cap is what keeps the total sane once you run several copies at once: ten copies at a limit of 10 each is about 100 connections, which you size against Neon's budget. The pool bounds one copy; the cap bounds the fleet.
Fire the burst from your agent: twenty customer/email.received events across five customers via send_event.
Done when: the burst queues under the cap (running count stays at or below the global limit, and at or below the per-customer limit), every run completes, and the audit trail has exactly one row in and one out per event, with no dropped runs, no duplicates, and no Neon connection errors.
Why these are policy, not code. None of this lives in your function body; it is configuration the runtime enforces. Without the caps, a burst either melts a downstream system or lets one tenant monopolize the worker. Writing the same fairness by hand is a queue plus a scheduler plus a rate limiter, hundreds of lines. Here it is three decorator arguments.
D5: A durable human-approval gate on refunds (the keystone)
Where you are: back in D0 your agent already pauses before a refund, but that pause only lives in memory. This Decision makes it survive a crash, a deploy, or a reviewer who takes hours, so the refund still fires exactly once when they finally approve.
Here is the whole idea before any code. Your agent decides a refund is warranted, but it must not issue it until a human says yes. D0's pause holds that decision only in the running process, so a crash or a slow reviewer loses it. D5 turns that pause into a durable wait: the function goes to sleep (costing nothing) and only wakes when the decision arrives.
the agent decides a refund is warranted
|
v
it PAUSES and asks a human (it does NOT issue the refund yet)
|
v
the function SLEEPS, waiting for the decision
(minutes or hours; free while it waits; survives a crash,
a deploy, a reviewer who goes to lunch)
|
v
a human clicks Approve or Reject -> sends the decision event
|
v
the function WAKES and finishes:
approved -> issue the refund (exactly once)
rejected -> no refund; record it
no answer in 4h -> no refund; record a timeout
Paste this:
Right now the agent pauses before a refund, but that pause is lost if the worker crashes or the reviewer takes hours. Make the pause survive that: when the agent stops for approval, save where it stopped, then wait up to four hours for a human's approve-or-reject for this customer. When the decision comes in, pick up exactly where the agent left off and finish, so the refund happens at most once per run. On a rejection, the reply to the customer must say the refund was declined, never that it was issued. Then prove it for me: drive a refund, show the run waiting, send an approval, and show exactly one refund row. Do it again with a rejection and show a blocked row and no refund.
That whole picture is one line of code. The function stops at wait_for_event and starts again when the decision event shows up:
decision = await ctx.step.wait_for_event(
"await-refund-approval",
event="refund/approval.decided", # what we are waiting for
timeout=datetime.timedelta(hours=4), # give up after 4 hours
if_exp=f"async.data.customer_id == '{customer_id}'", # only THIS customer's decision
)
# no decision came in 4 hours -> write a blocked-refund row and stop
# approved or rejected -> pick the agent back up and finish
That one call is the entire gate. You write no queue, no polling loop, and no "is it approved yet?" flags to check by hand. The runtime holds the pause for you. Your code just says what to wait for and what to do with the answer. Three things are easy to get wrong, though, and each one quietly breaks the gate:
if_expcorrelates the decision to this customer, so an approval for one customer never resumes another's run.customer_idworks here because the demo has at most one refund pending per customer; if a customer could ever have two refunds in flight at once, correlate on a uniquerequest_id(the key Concepts 8 and 15 use) or the run id instead, or one approval could resume the wrong run.- When the agent resumes, hand it back the state you saved, not a brand-new conversation. Here is what goes wrong if you forget: a fresh conversation does not remember that it already asked for approval, so the resumed agent hits the refund again, asks for approval again, and loops forever. Rebuild the agent and re-supply its run-config, then feed it only the saved state. (This is why D0 kept the agent build and its run-config factored apart; it is the one detail that, missed, makes resume fail.)
- Saving the state silently drops your custom context, so put it back by hand. This is the trap that fails without an error. When the Agents SDK serializes the paused run, it does not carry over a custom run context (the object your refund tool reads the customer id and idempotency key from); it saves an empty one and only warns. So on resume you must re-supply that context yourself, with
RunState.from_string(agent, saved_state, context_override=your_context). Skip it and the approved refund tool runs with no context: it quietly writes no refund row, while the run still reports success. You see "approved, but norefund_issuedrow" and nothing to explain it. (Verified onopenai-agents0.17.x; the exact serialization rules are the kind of beta detail that shifts between minor versions, so confirm against the current Agents SDK run-state docs when you build.)
Drive it from your agent: send a refund-describing customer/email.received event, watch the run suspend at the gate (the dashboard shows it WAITING at zero compute), then send_event a refund/approval.decided carrying {"approved": true, ...} for that customer. Do it again with {"approved": false}.
Done when: on approval, the suspended run resumes and the Neon audit_log table has exactly one refund_issued row. On rejection, the run resumes, the audit has a refund_blocked row and no refund_issued, and the agent's reply explains the denial.
The gate gives you exactly-once within a single run, and the boundary is worth stating. If the same refund is driven through two runs (a re-sent event, a manual replay), nothing here stops a second refund on its own; that is the job of the stable idempotency key from Concept 4 (or the provider's own key), keyed off the request, exactly as the refund example there showed. The minimal worker leaves that key out to stay small, so prove "exactly once" against one run, and reach for the Concept 4 key the moment a real refund could be driven twice.
Why this is the keystone. Every other layer (the senses, the reflexes, balance) keeps the worker correct or healthy on its own. This one is where the human mind re-enters the loop on a high-stakes action, durably, for as long as it takes.
D6: Prove durability survives a broken step
Where you are: a full worker with every layer wrapped. This Decision proves the property that justified all of it; by the end you have watched a broken run retry its failing step many times while its completed audit step runs exactly once, then recovered the work on a fresh run.
The last property to prove is the one that justified all of this, the memoization mechanic from Concept 7. You understood it there; now prove it in your own worker. Paste this:
Deliberately break the agent step so it fails, fire an event, and show me Inngest retrying it while the earlier audit step stays memoized, so the failing run writes its ingress audit row exactly once across all the agent retries. Then fix the step and recover the work, and show me the recovery completing.
Break the agent step on purpose, fire a few customer/email.received events, and read each run's trace. The proof is inside each failing run: the ingress audit step shows one completed attempt (its row written once) while the agent step shows several attempts as it retries with backoff and then fails, and the reply step never runs. The audit step at one attempt while the agent step climbs is Concept 7's memoization, now in your own worker: the failing run writes its ingress row once, no matter how many times the agent retries.
Then revert the break and recover the work by re-firing the event on the fixed code (or, for a real bad-deploy batch, the dashboard Rerun button; both start a fresh run from the top, Concept 14). Here is the part that surprises people, and it is correct, not a bug: the recovery is a brand-new run, so it writes its own ingress row. After a break-then-recover, that customer legitimately has two ingress rows, one from the failed run, one from the recovery. Memoization is a within-run guarantee; it never spans two separate runs.
Done when: in the failed run's trace, the ingress step sat at one attempt and wrote one row while the agent step accrued several attempts and failed (that one-attempt-despite-N-retries is the memoization), and the recovery run then completes on the fixed code. The diagnostic is per-run, not per-customer: open a single run's trace and confirm the ingress step shows one attempt. Two ingress rows across two separate runs is correct; the ingress step running twice within one run would be the bug (usually a non-unique step name).
Why this is the bright line. A worker that loses customer work on a bad deploy is just an agent you call. A worker that takes the same bad deploy, fails loudly, retries the broken step without redoing the work it already finished, and recovers cleanly on a fresh run after the fix, is an AI Worker.
Point this same nervous system at your own SandboxAgent worker instead of the minimal floor; the wrapping is identical. And this step.wait_for_event approval replaces the hand-rolled run-state table from that course's optional Decision 10: the durable gate you just built is the persistence layer, so you can delete the table.
What just happened
You built a small customer-support worker and gave it a nervous system, one layer at a time. The worker's internals never changed after D0: same SandboxAgent, same two tools, same Neon Postgres audit trail. What changed is everything around it. It now wakes on a customer/email.received event and on a daily cron that fans out per eligible customer, runs durably (the agent call inside step.run), respects flow control (global and per-customer concurrency, a throttle), gates refunds on a durable human approval (step.wait_for_event), and recovers from a bad deploy by replaying failed runs, with the audit trail showing that within any single run each step fired exactly once, no matter how many times that run retried.
The agent code is the same; its reach is not. You started with an agent you operate, prompt it, watch it, prompt it again. You now have a worker that operates on its own: the world wakes it, its reflexes carry it through failures, it holds its balance under load, and a human steps in only where the stakes demand one. That is the line the opening drew, between an agent you operate and an FTE that operates on its own, and you just built across it.
The remaining concerns are observability at scale, multi-worker coordination, and the manager layer that decides which workers handle which traffic. That is the next course in the track. This course covers the unit of production-ready execution; the next composes those units into a workforce.
Part 5: Where this course leaves off
The cost shape of an AI Worker
Two cost surfaces matter: infrastructure cost (Inngest, and whatever store and compute you run the worker on) and inference cost (model tokens). Infrastructure stays roughly flat as load increases; inference scales linearly. The method below is what to learn; any dollar figure goes stale the week it ships, so treat the numbers as illustrative and check current pricing pages before you put a number in a budget.
Inngest pricing. Inngest charges per execution: each function run, plus each step-level retry, counts as one execution.
| Tier | Price | Executions / month | Concurrent steps | Notable |
|---|---|---|---|---|
| Hobby | $0 | 50,000 | 5 | 3 users, 50 realtime connections, no credit card |
| Pro | from $75 / month | 1,000,000 | 100+ | 1000+ realtime connections, 15+ users, 7-day trace retention |
| Enterprise | custom | custom | 500-50,000 | SAML / RBAC, 90-day trace retention, dedicated support |
Note that Inngest meters two different things. One is executions (the table above): a function run plus each step retry. The other is events (what you send in): the first 1-5M events per day are included, and above that overage starts around $0.000050 per event and declines at higher volume. On Pro, going past the 1M-execution cap adds $50 per additional 1M executions.
Hobby-tier ceilings that matter here. The 5-concurrent-step cap means that even if you declare concurrency=Concurrency(limit=10) in code, the platform's account-level cap holds you at 5. Your code is correct for production; observed concurrency on the free tier is 5. step.sleep and step.sleep_until are also tier-bounded: up to seven days on the free Hobby plan, up to one year on paid plans (Inngest usage limits).
Inference cost dominates. A typical customer-support run uses a few thousand to ten thousand model tokens per conversation. Multiply your per-token price by your tokens-per-email by your emails-per-day and you have the line that matters; for most workers it dwarfs everything else. This is what you optimize. Everything else is a rounding error. The two highest-value levers: keep a stable cached prompt prefix (so the model bills the repeated part at the cheaper cached rate, not full price on every call), and route easy turns to a cheaper model.
Three Inngest-specific cost levers once you are in the optimization zone:
- Do not wrap pure functions in
step.run. If a function has no side effects, it does not need durability; wrapping it adds a step-run charge for no benefit. Savestep.runfor I/O and side effects. - Use
batch_eventsfor bulk paths. A 50-event batch is one function run, not 50. - Suspend cheaply with
step.sleepandstep.wait_for_event. Suspended functions do not bill for the suspension time. A 3-day delayed-followup costs the same as a 3-second one.
The shape at scale: inference is the bill that grows with traffic; Inngest, your data store, and compute stay comparatively flat. Run the same multiplication at your real volume rather than trusting a figure printed here.
Swap guide: the nervous system is invariant, the platform is not
This course names Inngest at every layer. That is because a teaching example needs concrete answers, not "use any orchestrator you like." But the architecture works with any compliant alternative. Five swaps the course's design explicitly anticipates:
-
Trigger surface: Inngest events → Temporal signals, Restate handlers, AWS EventBridge + Lambda. Each platform has a way to express "this code runs when this named thing happens." The event names, the payload shapes, and the idempotency discipline all transfer. What changes: the SDK's decorator syntax and the dashboard.
-
Durable execution: Inngest
step.run→ Temporal activities, Restate handlers, custom Postgres-backed state machines. Each gives you the "memoize this side-effecting call, retry on transient failure, resume after crash" semantics. Temporal is the closest analog and the older, more enterprise-tested option. Restate is the newest and has a more functional-programming flavor. Custom state machines are what teams write when they cannot adopt a managed platform; usually 1,000-10,000 lines of code that recreate ~70% of what Inngest gives you for free. -
HITL primitive:
step.wait_for_event→ Temporal'sawait Workflow.execute_activity(approval_signal), Restate's awakeables, custom Redis/Postgres approval queues. The pattern is the same: function suspends, external signal resumes it, audit captures the decision. Inngest's expression is the cleanest at writing; Temporal's is more verbose but battle-tested at large scale. -
Cron scheduling: Inngest cron triggers → Kubernetes CronJobs + queue, GitHub Actions schedules, AWS EventBridge schedules. Cron triggers are commodity. The Inngest advantage is not having cron; it is that cron-triggered functions get the same durability/replay/flow-control as event-triggered ones, automatically. Other platforms make you wire that yourself.
-
Flow control: Inngest concurrency + throttle → Temporal task queues with worker concurrency, Redis-backed rate limiters, AWS SQS message visibility timeouts. Other platforms can do this; Inngest does it with the configuration density we have seen (one decorator argument).
Dapr as the open companion at production scale. A more ambitious replacement worth naming: Dapr Agents as the structural companion to Inngest at production scale, in the way OpenCode is to Claude Code. Dapr Agents reached v1.0 GA on March 23, 2026 under CNCF governance (CNCF announcement, Dapr Agents core concepts). DurableAgent is the production-ready class; the older Agent class is deprecated. Choose Dapr when Kubernetes-native deployment and multi-language SDKs matter more than Inngest's local dev experience. Inngest is the better learning tool (the dashboard makes the mental model visible); Dapr is the better scale tool when you have hit Inngest's tier ceilings or need K8s-native multi-language deployment.
Inngest is also open source (github.com/inngest/inngest; the 1.0 release added self-hosting support in September 2024) and self-hostable via Helm + KEDA. The axes that matter at scale are governance, support, and maturity: Inngest is governed by a single vendor with a young self-hosting story; Dapr is CNCF-governed with a longer production track record.
| This course's concept | Inngest primitive | Dapr production analogue | Teaching note |
|---|---|---|---|
| Scheduled work | TriggerCron | Cron input binding / Dapr Scheduler | Same idea: time wakes the worker. Dapr usually requires component configuration. |
| Webhook/event ingress | Inngest webhook endpoint → event | HTTP endpoint, input bindings, or pub/sub ingress | Inngest hides more plumbing; Dapr gives infrastructure control. |
| Internal events | inngest_client.send() | Dapr pub/sub | Same event-driven mental model; broker is pluggable in Dapr. |
| Fan-out | One event triggers many functions | One topic/event consumed by many services | Same architecture; Dapr uses broker/topic/subscriber composition. |
| Durable steps | step.run() + memoization | Dapr Workflows + activities | Similar production purpose, different developer model. |
| Waiting without compute | step.sleep() | Durable workflow timers | Both avoid holding a process open while waiting. |
| Human approval gate | step.wait_for_event() | Workflow external events/signals, pub/sub, actors | Inngest expression is simpler; Dapr is more composable. |
| Retries | Function/step retries | Workflow/activity retries + resiliency policies | Dapr makes resiliency a runtime policy as well as workflow behavior. |
| Dead-letter / failed runs | Inngest dashboard failed runs + replay | Broker DLQ + workflow status/restart/manual tooling | Inngest is more turnkey here; Dapr is more infrastructure-native. |
| Flow control | Concurrency, throttling, priority, batching | Kubernetes scaling, app concurrency, broker controls, resiliency policies, bulk pub/sub | Dapr can do it, but it is not one decorator argument. Inngest is denser. |
| Stateful coordination | wait_for_event, event keys, step state | Actors + state store + workflows | Dapr Actors are stronger for long-lived identity/stateful coordination. |
| Agent runtime | Your agent inside Inngest function | DurableAgent / Dapr Agents v1.0 GA | Dapr Agents explicitly makes the agent workflow-backed and resumable. |
This table is a translation guide, not a claim of identical APIs. Inngest teaches the production pattern with a compact developer experience: triggers, steps, waits, replay, and flow control in one product surface. Dapr implements the same production architecture through distributed-systems building blocks: bindings, pub/sub, workflows, actors, state, resiliency, and Kubernetes-native operations. The concepts transfer directly; the implementation style changes. Verified against Dapr's bindings overview and Dapr Agents core concepts as of May 2026.
Three reasons to reach for Dapr at production scale:
- CNCF-governed, vendor-neutral by charter: no single vendor controls the platform or your dependence on it.
- Polyglot with first-class Python. Dapr Agents is Python-first; the same agent code can run alongside services written in JavaScript, Go, .NET, Java, or PHP without anyone learning a second framework.
- Horizontally scalable on Kubernetes by design. Run in your own cluster, in a managed offering (Diagrid Catalyst), or locally via
dapr init. The scaling story is the same architecture in every environment.
The honest caveat: Dapr is not a getting-started platform. Running it in production means Kubernetes, state store, pub/sub broker, placement service, observability, YAML components, sidecars. That is a lot of operational surface when your goal is still to learn the patterns, which is why this course starts on Inngest: one command, and the dashboard appears. Reach for Dapr once the patterns have landed and the question shifts to running at organizational scale on infrastructure you control.
Learn the concepts on Inngest and the OpenAI Agents SDK first: fast feedback loop, minimal infrastructure, focus on the patterns. When you reach the scale where Kubernetes governance, polyglot teams, or vendor-neutrality become non-negotiable, the same architectural patterns lift onto Dapr with the translation table above as your key. The patterns transfer; the substrate changes; what you learned in this course remains the load-bearing knowledge.
What this course doesn't cover (yet)
The worker you built satisfies four of the Seven Invariants the thesis sets out. Specifically: it runs on an engine (Invariant 4, the SandboxAgent), against a system of record (Invariant 5, the audit trail), with the world able to call it (Invariant 7, the triggers you added), and with the human as principal at a gated decision (Invariant 1, partial: the runtime mechanism is here, the broader architectural pattern is later). The remaining three Invariants, and the broader architecture that makes a workforce out of workers, are later courses. One bullet each:
- Invariant 2: Every human needs a delegate. A personal agent at the edge that holds your context, represents your judgment, and brokers work to the workforce. The thesis names OpenClaw as the current realization.
- Invariant 3: The workforce needs a manager. An orchestrator that assigns work, enforces budgets, audits execution, exposes hiring as a callable capability. The thesis names Paperclip.
- Invariant 6: The workforce is expandable under policy. A meta-layer where an authorized agent generates a prompt, provisions a runtime, and registers a new worker, without waking a human. Claude Managed Agents is one realization.
A single worker waking on events, running durably, and gating on humans is the smallest unit of the architecture this course teaches. The next course extends that worker into a workforce: multiple workers coordinated by a manager, expandable on demand, woken by triggers, governed by spec. Same OpenAI Agents SDK foundation, same audit habit, same Inngest nervous system. The architecture is invariant.
How to actually get good at this
Reading this crash course does not make you good at building AI Workers. Using it does. You start by building the worker, feel the friction as you wrap it, and let each piece of friction teach you which concept it belongs to.
The mapping for this course:
- "Why does my function not fire when the event arrives?" → event name typo or namespace mismatch (Concept 3). Compare the event name string in your
TriggerEventto the one ininngest_client.sendbyte-for-byte. - "Why did my function fire twice for the same logical event?" → missing idempotency key (Concept 4). Add an
id=to the event with a deterministic seed. - "Why did my function 'lose work' after a deploy?" → code outside
step.rundoing the work (Concept 7). Wrap the I/O and side effects in named steps. - "Why did the customer get charged twice?" → the Stripe call was outside
step.run, or the step name was not unique (Concepts 6 and 7). Move the call into a namedstep.run; make the step name globally unique within the function. - "Why does OpenAI return 429 errors at the 9am peak?" → missing throttle (Concept 11). Add
throttle=Throttle(limit=N, period=timedelta(minutes=1)). - "Why does one customer's bursts starve other customers?" → missing per-key concurrency (Concept 12). Add a second
Concurrency(limit=2, key="event.data.customer_id"). - "Why did my HITL gate fire silently over the weekend?" → missing timeout handler that writes to audit (Concept 15). Branch on
approval is Noneand write the audit row explicitly.
Build the architecture one piece at a time. That is why Part 4 is seven prompts, not one. Build the worker (D0). Wrap the agent in step.run (D1) and watch what changes when you deliberately crash mid-run. Wake it on an event (D2). Add the cron fan-out (D3), then flow control (D4) once you have actually hit a rate limit, then the durable approval gate (D5) when a high-stakes action actually needs a human. Each layer is its own learning. Combined into one big rewrite, they are a wall.
The discipline this course teaches (wake on events, run durably, gate on humans, replay on bugs) is the architectural invariant. Whatever platform implements it, that four-property contract is what you are really committing to. This is the Lindy bet: you build on the parts that have lasted, plain functions, SQL, a typed language, an event bus, not this season's wrapper. The product is replaceable; the discipline is not.
Quick reference
A separator between the narrative course and the during-build reference. The sections below are meant to be searched, not read top to bottom. The one-line gist of each concept is in the intro's collapsed cheat sheet; this section is the during-build diagnostic, the two decision trees, and the file layout.
Decision tree: pick the trigger surface
When a new thing happens in the world, where does the wake-up come from?
- An external system sent us an HTTP request. → Webhook trigger. Configure the source in the Inngest dashboard; reshape the payload via the transform; consume the resulting event.
- A schedule says it is time. → Cron trigger.
TriggerCron(cron="..."). Use UTC; production crons fire even when your service is mid-deploy. - Another Inngest function emitted an event during its run. → Event trigger.
TriggerEvent(event="ns/name.subtype"). Subscribe one or many functions to the same name. - An interactive user is waiting for an immediate response. → Not an Inngest trigger. Keep the request/response in your normal web endpoint; if the response involves heavy work, fire an event from inside the request and return immediately, letting Inngest handle the work asynchronously.
Decision tree: pick the step primitive
Given a function is running and you need to do something, which step.* call do you reach for?
- A side-effecting call (API, DB, file write, agent invocation). →
ctx.step.run("name", fn, ...). The default. Memoized on success, retried on transient failure. - A long-running OpenAI call on a serverless platform that bills for in-flight time. →
ctx.step.ai.infer(...). Offloads the inference to Inngest's infrastructure so your function process can deallocate. - Wait for a fixed duration before continuing. →
ctx.step.sleep("name", timedelta(...)). Durable; zero compute while waiting (up to seven days on the free plan, one year on paid). - Wait for an external event (human approval, sibling-function completion). →
ctx.step.wait_for_event("name", event="...", timeout=..., if_exp=...). Durable; resumes when the event arrives or returnsNoneon timeout. - Pure deterministic computation (formatting a string, computing a date). → Just write the code. No
step.runneeded; no charge.
File-location quick-ref
A flat project, four files, no src/ nesting:
ai-agent-nervous-system/
├── .claude/
│ └── skills/ # the four Inngest skills (installed in the Quick Win)
│ ├── inngest-setup/SKILL.md
│ ├── inngest-events/SKILL.md
│ ├── inngest-steps/SKILL.md
│ └── inngest-durable-functions/SKILL.md
├── db.py # Neon Postgres access: pooled asyncpg, load_customers, record (closed-vocabulary audit) (D0)
├── worker.py # the worker: SandboxAgent + 2 tools (D0)
├── inngest_app.py # the nervous system: Inngest functions + FastAPI host (D1-D5)
├── .env # OPENAI_API_KEY, DATABASE_URL, INNGEST_DEV=1
└── AGENTS.md # the base's rules file (read on open)
These filenames are one sensible layout, not a requirement; your agent may land on agent.py and main.py instead, and that is fine. What matters is the boundary, not the names: the worker code never imports inngest, and exactly one file wires the nervous system on top. With that layout, the customers and audit trail live in your Neon database (provisioned in the Quick Win, seeded in D0), not in local files; the worker files never change after D0, and every nervous-system layer (D1 through D5) edits the one Inngest file.
Diagnostic table, symptom → root cause → concept
| Symptom | First suspect | Concept to re-read |
|---|---|---|
| Function never fires when expected event arrives | Event name typo, namespace mismatch | C3 (webhooks), C5 (fan-out) |
| Function fires twice for the same logical event | Missing idempotency key | C4 (idempotency) |
| Function "lost work" after deploy | Code outside step.run doing the work | C7 (memoization) |
| Cron schedule did not fire over a deploy | Local dev server only, production runs on Inngest infra | C2 (cron) |
| Customer charged twice for one refund | Stripe call outside step.run, or step name not unique | C6 (step.run), C7 (memoization) |
| OpenAI rate-limit errors during 9am peak | Missing throttle | C11 (concurrency + throttle) |
| One customer's bursts starve other customers | Missing per-key concurrency | C12 (priority + fairness) |
| Function suspended forever, never resumed | Event name in wait_for_event does not match the event being sent | C8 (wait_for_event), C15 (HITL) |
| HITL timeout fired silently over the weekend | Missing timeout handler that writes to audit | D5 (durable refund gate), C15 (HITL) |
| Yesterday's failed runs disappeared from dashboard | Runs persist until manually replayed or after retention window | C14 (replay) |
| Replay re-charged customers | Replay is a fresh run that re-executes every step; the charge had no idempotency key | C4 (idempotency), C14 (replay is a fresh run) |
| Function trace does not show OpenAI prompt | Step trace shows function inputs/outputs but no LLM-specific prompt/token telemetry | C10 (Python uses step.run; LLM-specific telemetry needs your own OpenAI client tracing; step.ai.wrap's prompt-level traces are TypeScript-only) |
Appendix: optional lineage and an Inngest cheat sheet
You do not need the Digital FTE course to do Part 4: D0 builds the worker from scratch. Two short notes for context.
A.1: If you are coming from the Digital FTE course
The From Agent to Digital FTE course builds a richer customer-support worker: portable Skills, a Postgres system of record, and a custom MCP server. If you did it, you already have a SandboxAgent worker sitting on disk, and you can skip D0's minimal floor: point the nervous system (D1 onward) at your own worker instead. The wrapping is identical. One bonus: the durable refund gate you build in D5 (step.wait_for_event) replaces the hand-rolled run-state table from that course's optional Decision 10, so you can delete it. If you have not done that course, ignore all of this; D0 gives you everything you need.
A.2: Inngest-specific essentials this course uses
If anything below feels unfamiliar, skim the corresponding doc page before diving into Part 4.
- Inngest client instantiation. A single
inngest.Inngest(app_id=...)instance per Python project, exported from one module and imported wherever you decorate functions. Python quick start. - Function decoration.
@inngest_client.create_function(fn_id=..., trigger=...). The trigger can beTriggerEvent,TriggerCron, or a list of both for multi-trigger functions. ctx.step.run,ctx.step.sleep,ctx.step.wait_for_event,ctx.step.ai.infer. The four step primitives that make up 90% of what you will write in Python. (TypeScript has a fifth,step.ai.wrap, for LLM-specific tracing; Python projects usestep.runfor AI calls.)inngest_client.send(events=[...]). Emit events from anywhere in your code (inside functions, inside agent tools, from CLI scripts). Use anid=for idempotency.- Dev server startup.
npx inngest-cli@latest dev. Runs on:8288. Dashboard athttp://127.0.0.1:8288. MCP athttp://127.0.0.1:8288/mcp. If:8288is taken it uses8289+; then setINNGEST_BASE_URL=http://127.0.0.1:<port>on the host so it follows, not just the MCP URL.
A.3: The two shifts that are actually hard
The hardest thing about this course is not Inngest's syntax. It is the mental shift from request to event (Concept 1) and from in-process execution to durable execution (Concept 6). The syntax is mechanical once those two land. Re-read Concepts 1 and 6 first if anything else feels harder than it should.
Flashcards Study Aid
Knowledge Check
A quick gated self-check on the ideas you just ran through.