Apne AI Agent ko Nervous System Dein
15 concepts, real use ka taqreeban 80%: senses (triggers), reflexes (durable execution), aur balance (flow control).
Aap ne ek agent banaya hai jo kaam karta hai. Lekin woh sirf tab tak kaam karta hai jab tak aap usay dekh rahe hon. Aap Claude Code ya OpenCode kholte hain, type karte hain, woh jawab deta hai. Aur jis lamhe aap door hote hain, woh ruk jata hai. Yahi gap, ek aise agent ke darmiyan jise aap operate karte hain aur ek aise worker ke darmiyan jo khud operate karta hai, is poore course ka subject hai.
Hairani ki baat yeh hai ke is gap ko kya band karta hai, aur woh koi smarter agent nahin. Aap ke agent ke paas kaam karne ke liye jo chahiye woh pehle se hai: sochne ke liye LLM, act karne ke liye tools aur MCP servers, aur jo workflows woh janta hai un ke liye skills. Jo us ke paas nahin, woh hai nervous system. Apne hi jism ke baare mein sochein: aap ka brain sochta hai aur aap ke muscles act karte hain, lekin ek doosra system neeche chalta rehta hai aap ke baghair, aap ki heartbeat aur aap ke reflexes, woh signals jo aap ko sote hue zinda rakhte hain. Aap dhyan dena chhor dein to bhi aap ka dil dharakta rehta hai; agent ke paas is ka koi version nahin, is liye jis lamhe aap usay drive karna band karte hain, woh ruk jata hai. Nervous system woh connective tissue hai jo loop ko khud band kar deta hai, baghair kisi human ke har turn drive kiye: woh duniya ko sense karta hai aur kuch hone par agent ko jagata hai, woh reflex se react karta hai jab koi step fail ho (aur ghanton apni jagah rokta hai jab woh kisi insan ya slow API ka wait kar raha ho), aur woh agent ko balance mein rakhta hai jab paanch sau requests ek saath aa girein. Yahi woh line hai jo ek aise agent ko jise aap operate karte hain aur ek aise FTE ko jo khud operate karta hai alag karti hai. Aap apne agent ko yeh nervous system dete hain; aap agent dobara nahin likhte. Yahi woh ek idea hai jis ke gird yeh poora course bana hai.
Jo tool aap ke agent ko nervous system deta hai us ka technical naam hai, durable execution engine, aur hum Inngest naam ka ek istemaal karte hain. Yeh patterns Temporal, Restate, aur Dapr Agents par bhi carry over hote hain. Yeh sirf teaching picture nahin hai. Day AI, AI-native companies ke liye bana ek CRM, Inngest ko apne product ka "nervous system" kehta hai, aur jo har hissa yeh course sikhata hai us par chalta hai. Inngest ka free Hobby tier shuru karne ki sab se aasaan jagah hai: na credit card, one-command dev server, aur ek dashboard jise aap build karte hue dekh sakte hain.
Misaal jaan boojh kar patli rakhi gayi hai: ek customer-support agent jo chand sample customers dekhta hai, ek reply draft karta hai, aur refund sirf tab issue karta hai jab koi human approve kare. Yeh jaan boojh kar patli hai: mushkil agent mein nahin hai, is liye hum usay chhota rakhte hain aur mehnat us ke gird wale nervous system par lagate hain. Aap isay yahin scratch se banate hain. Yeh pehle wale Digital FTE course se ideas share karta hai lekin us mein se kuch assume nahin karta. Environment ek baar set kar lein neeche diye Quick Win mein, aur Part 4 saat paste-and-watch prompts mein worker bana deta hai. Yeh Python-first hai inngest-py par: aap apne coding agent ko plain English mein direct karte hain aur woh code likhta hai. Agar aap kar ke seekhte hain, Parts 1-3 skim karein aur Part 4 par jump karein.
Ek single agent ka task ke beech crash ho jana pareshan kun hai. Lekin pachas agents ki ek workforce jo customer-facing kaam handle kar rahi ho, baghair neeche nervous system ke, namumkin hai: ya to aap aisa platform adopt karte hain jo yeh aap ko de, ya phir chhe mahine laga kar khud is ka weaker version banate hain. Chaar properties is nervous system ko agents ke liye uniquely important banati hain: Day AI, AI-native companies ke liye CRM, apne product ko har us primitive par chalata hai jo yeh course sikhata hai: durable LLM workflows, wait-for-event coordination, failure par replay, debounce plus throttle plus concurrency, aur multi-tenant fairness. Un ke do founding engineers ne khud isi nervous-system picture tak independently rasai paayi. Yeh production language hai, curriculum branding nahin. The Agent Factory thesis Seven Invariants describe karti hai jo kisi bhi production agent system ko satisfy karne chahiye. Jo worker aap yahan banate hain woh Invariant 4 (engine) aur Invariant 5 (system of record, yahan ek chhota audit trail) satisfy karta hai. Yeh course do aur add karta hai, plus Invariant 1 ka ek hissa:Ek AI agent ko nervous system kyun chahiye (chaar properties)
step.wait_for_event (Concept 15) ke baghair aap approval queue khud build karte hain: database table, polling, timeout handling, audit trail. Yeh project hai, feature nahin.Yeh course Agent Factory thesis mein kahan baithta hai
step.wait_for_event kisi bhi platform par is ka sab se clean expression hai: agent suspend hota hai, human awaited event emit karta hai, agent resume karta hai.
15 concepts, ek nazar mein. Yeh un teen kaamon par map hote hain jo ek nervous system karta hai: senses (triggers worker ko jagate hain), reflexes (durable execution usay correct rakhti hai jab kuch toote), aur balance (flow control usay load ke neeche healthy rakhta hai). Yeh first-pass version hai, concept plus one-line gist. Jab build ke dauran kuch toote, end mein Quick reference mein ek symptom-to-concept diagnostic hai jo aap ko us concept par wapas le jata hai jis se woh failure taalluq rakhti hai.15 concepts, ek-ek line mein (full map ke liye expand karein)
# Concept One-line gist Senses (Triggers) duniya worker tak kaise pahunchti hai 1 Events vs requests Request sync hoti hai aur koi wait karta hai; event async hota hai aur duniya aage barh chuki hoti hai. 2 Cron triggers Schedule function ko jagata hai. Ek line: TriggerCron(cron="0 9 * * *").3 Webhook triggers Inbound HTTP payload named event ban jata hai; aap ka function us name par react karta hai. 4 Idempotency and event semantics Event IDs aur step names duplicate event (ya retry) ko no-op bana dete hain. 5 Fan-out and sub-agent delegation Ek event, N subscribing functions; ya ek parent jo N child events fire karta hai. Reflexes (Durable execution) worker ko correct rakhna jab kuch toot jaye 6 step.run and the durable function modelHar step.run checkpoint hai; function steps ke darmiyan crash ho kar resume kar sakta hai.7 Memoization, the mechanic underneath Completed steps dobara execute hone ke bajaye stored output return karte hain. 8 step.sleep and step.wait_for_eventDono function ko durably suspend karte hain, duration ke liye ya event ke liye. 9 Retries, error handling, dead-letter Automatic backoff retries; N tries ke baad failed run replay ke liye saved rehta hai. 10 step.run for AI calls in PythonOpenAI calls ko step.run mein wrap karein; step.ai.infer inference offload karta hai (step.ai.wrap TypeScript-only hai).Balance (Flow control) worker ko load ke neeche healthy rakhna 11 Concurrency and throttling concurrency active runs cap karta hai; throttle starts-per-second cap karta hai.12 Priority and fairness Priority queue order karti hai; per-key concurrency har tenant ko fair share deti hai. 13 Batching Saste bulk kaam ke liye events ko ek batched function call mein jama karein. 14 Replay and bulk cancellation Failed runs ko naye code ke saath replay karein; jin runs ki zarurat nahin unhein bulk-cancel karein. 15 HITL gates with step.wait_for_eventFunction human approval tak suspend rehta hai, phir decision ke saath resume karta hai.
Pehle se kya chahiye. Chaar cheezen, aur baqi course apne aap mein mukammal hai (Part 4 apna worker scratch se banata hai).
- Aap ek coding agent drive kar sakte hain. Claude Code ya OpenCode, installed aur authenticated. Plan mode, rules files, read-first-then-write workflow: agar yeh rhythm aap ko familiar hai, to aap calibrated hain. Agar nahin, to Agentic Coding Crash Course is ko cover karta hai.
- Aap ke paas ek
OPENAI_API_KEYhai (ya koi doosri model key jo aap ka coding agent use kar sake) aur worker ke Postgres system of record ke liye ek Neon account. Worker ek real model chalata hai aur apne customers aur audit trail ko Neon mein read aur write karta hai. Neon free hai (na card), aur aap setup ke dauran ek browser click se usay authorize karte hain; agar account nahin hai to neon.com par taqreeban ek minute mein sign up kar lein. Inngest dev server khud ko kisi account ki zarurat nahin.- Aap ke paas Node.js 20+ available hai, chahe worker Python ho. Inngest dev server ek Node CLI ke taur par distribute hota hai (
npx inngest-cli@latest dev).- Aap ke paas "event-driven" vs "request/response" ka working mental model hai. Agar "duniya event fire karti hai aur zero, one, ya many functions us par react karte hain" familiar lagta hai, to aap calibrated hain. Agar nahin, Concept 1 aap ko shape de dega.
From Agent to Digital FTE kar liya? Aap ke paas wrap karne ke liye ek richer worker hai; Part 4 ke end mein ek callout nervous system ko us ki taraf point karta hai. Yeh bonus hai, gate nahin.
Is page ko pehli reading mein kaise parhein, plus un terms ki glossary jo aap ko milengi
Pehli reading. Jo bhi "Done when" ya "What to watch" label ke saath ho usay expand karein: runnable behavior jis se aap apni predictions check kar sakte hain. Part 4 mein aap load-bearing snippets pehli reading par skim kar sakte hain; har ek ke gird wala narrative batata hai ke woh layer kya karti hai, aur jab aap build karte hain to code aap ka agent likhta hai. "Try with AI" blocks optional extension prompts hain. Pass one ka goal yeh hai ke nervous-system model, us ki teen layers, aap ke zehan mein aa jaye; pass two, keyboard par haath rakh kar, woh jagah hai jahan aap build karte hain. Har concept ek Predict (parhne se pehle ek jawab par commit karein) ya ek Quick check (jo rule abhi parha us ko test karein) par khatam hota hai; dono is liye hain ke aap ruk jayein, grade karne ke liye nahin.
Glossary (har term us jagah context mein bhi explain hoti hai jahan pehli baar aati hai):
- Production Worker: Ek AI agent jis ke gird nervous system ho: senses jo usay jagate hain (triggers), reflexes jo failures survive karte hain (durable execution), aur balance jo usay load ke neeche scale karta hai (flow control).
- Event: Ek named, immutable message jo batata hai ke kuch hua. Misaal:
{"name": "customer/email.received", "data": {"customer_id": "..."}}. Yeh trigger surface hai. - Inngest function: Ek Python function jo
@inngest_client.create_functionse decorated hota hai, triggers aur steps declare karta hai. Durable work ki unit. - Step: Inngest function ke andar kaam ki unit jo
ctx.step.run(),ctx.step.sleep(),ctx.step.wait_for_event(), yactx.step.ai.infer()mein wrapped hoti hai. Har step independently retried aur memoized hota hai. - Memoization: Jab koi function crash ho kar restart hota hai, Inngest function code top se dobara chalata hai lekin jis
step.runka result already cached ho us ke liye stored output return karta hai. Function kaam dobara kiye baghair us jagah catch up kar leta hai jahan toota tha. - Flow control: Per-function policies:
concurrency(max active runs),throttle(max starts per second),priority(queue order),batch_events(invoke karne se pehle accumulate karna). - HITL (Human In The Loop): Ek function continue karne se pehle human approval ya input ka wait karne ke liye pause hota hai.
step.wait_for_eventprimitive hai. - Replay: Failed runs ko fresh runs ke taur par top se dobara chalana, bug fix ke baad current code par (run ke andar wale automatic retry se alag, jo memo se resume karta hai). Dashboard ka Rerun button.
- Dev server: Inngest ka local dev environment
npx inngest-cli@latest devke zariye. Dashboardhttp://127.0.0.1:8288par; MCP endpoint/mcppar.
May 2026 tak current. Poora Part 4 build end-to-end ek live Inngest dev server aur ek real model ke against chalaya gaya tha, inngest 0.5.18, openai-agents 0.17.3, fastapi 0.136.3, Python 3.12, aur Inngest CLI par. Part 4 ka har snippet usi working build se hai, memory se likha hua nahin. Jo architecture yeh course sikhata hai woh SDK badalne se nahin badalta; SDK is saal us ka interface hai. Agar koi live docs page aur yeh page kisi syntax detail par kabhi ikhtilaf karein, to docs jeetenge: apne versions pin karein, aur build karte waqt Inngest Python quick start aur OpenAI Agents SDK docs check karein.
Jin sections mein Claude Code aur OpenCode alag hote hain un ke paas switcher hai; ek choose karein aur page visits ke darmiyan sync rehta hai.
Pandrah minute ka quick win: base set karein, aur reflex dekhein
Un 15 concepts ko parhne se pehle jo explain karte hain ke yeh architecture kyun kaam karti hai, woh environment set up karein jis mein poora course chalta hai aur ek durable function ko crash survive karte dekhein. Yeh setup aap ek dafa karte hain; Part 4 customer-support worker isi base par banata hai. Akhir tak aap ke paas hoga:
- base aap ke coding agent mein khula, Skills installed, aur teen MCP servers wired (Neon, Context7, aur dev-server
inngest-dev), - ek fresh Neon database do tables ke saath,
customersauraudit_log, jo aap ne MCP par banayi aur console mein dekhi, jis kaDATABASE_URL.envmein likha hai taake worker baad mein use kar sake, - ek tiny durable function (ek
step.run, ekstep.sleep, ek FastAPI host) Inngest dev server ke against running, - ek run jo aap ne trigger kiya aur sleep par suspend hote dekha, zero compute consume hote hue,
- aur ek run jo aap ne jaan boojh kar broken kiya, phir Inngest ko retry karte dekha, jo already-completed step ko memo se return karta hai jab ke sirf broken step dobara execute hua.
Woh aakhri beat is poore course ka asal point hai, miniature mein: woh reflex jise aap apni aankhon se dekh sakte hain, ek step fail hota hai aur system jo kaam pehle hi finish kar chuka usay dobara kiye baghair recover kar leta hai. Yeh Part 4 worked example nahin (poora Worker, saat prompts); yeh ek baithak hai. Yeh karein, phir concepts ke liye wapas aayein.
Ek Production Worker do processes hain saath saath, aur inhein alag rakhna mental model hai: ek Python function host (aap ka code, jo function ko Inngest tak serve karta hai) aur Inngest dev server (woh nervous system jo runs trigger karta hai, steps memoize karta hai, aur aap ko dashboard dikhata hai). Aap ka coding agent dono ko wire karta hai, woh Skills install karta hai jo usay Inngest patterns sikhati hain, aur inngest-dev MCP ke zariye dev server se baat karta hai.
Ek aur boundary matter karti hai, aur yeh wahi hai jo Digital FTE course ne khinchi thi. Aap ka worker apne customers aur apna audit trail ek Neon Postgres database mein rakhta hai, aur woh database do alag tareeqon se touch hoti hai. Aap ka coding agent us ko build aur inspect karne ke liye Neon MCP use karta hai: tables banana, rows parhna, connection string nikalna, sab plain English mein development time par. Aap ka worker apni Postgres connection (DATABASE_URL) use karta hai usay runtime par read aur write karne ke liye. Worker kabhi Neon MCP call nahin karta, aur Neon ke apne docs is ki wajah saaf batate hain: MCP server development aur inspection ke liye hai, kabhi bhi running app mein wired nahin. Neon ek OAuth click se free hai; Inngest dev server ko bilkul koi account nahin chahiye.
Base lein aur ise kholein
Base download karein aur folder ko apne coding agent mein kholein. Agent khud setup karta hai, neeche diye prompts se. Aap isay ek dafa set karte hain: ai-agent-nervous-system/ poore course ke liye aap ka folder hai, Quick Win aur Part 4 dono ke liye. Aap kabhi dobara download ya re-unzip nahin karte.
ai-agent-nervous-system-base.zip download karein
cd ai-agent-nervous-system
claude
Yeh base ek capable general agent assume karta hai (Claude Code, ya OpenCode jo Claude Sonnet ya Opus, GPT-5, ya is jaisa chala raha ho). Chhota model build prompt par drift karega; agar us ka pehla plan specific ke bajaye vague lage, to aage barhne se pehle kisi stronger par switch karein.
Base prep karein (~3 min)
Base apne rules AGENTS.md mein aur apni MCP wiring ship karta hai; Skills, aap ki key, aur Neon authorization aage aate hain. Apne agent se khud ko set up karwayein. Yeh paste karein:
Read AGENTS.md, then get this base ready: install the Skills it lists for whichever agent you are, copy
.env.exampleto.envfor me, and tell me exactly what you need from me to bring the Neon and Context7 MCP servers online.
Is par nazar rakhein: agent chaar Inngest Skills aur neon-postgres Skill install kar raha hai (aap install runs aur Installed confirmations dekhte hain), .env bana raha hai, phir aap se do cheezen maang raha hai: aap ki OPENAI_API_KEY jo .env mein paste ho, aur Neon ko OAuth par authorize karne ke liye ek browser click. Neon free hai; agar abhi tak account nahin, to neon.com par taqreeban ek minute mein sign up kar lein, ya authorization screen par hi ek bana lein. INNGEST_DEV=1 pehle se .env mein hai, is liye SDK local dev mode mein chalta hai baghair signing key ke. Jab install aur wiring ho jaye, agent aap ko batata hai ke dev server start karein (next step) aur phir usay restart karein, kyun ke nayi Skills aur inngest-dev MCP mid-session load nahin hote.
Done when: Skills installed hain, .env mein aap ki key hai, Context7 reachable hai, aur Neon authorized hai. inngest-dev MCP tab online hota hai jab dev server running ho, jo agla step hai.
Dev server start karein, aur confirm karein ke agent us tak pahunch sakta hai (~2 min)
Yeh course do boundaries add karta hai jin tak aap ka agent MCP par pahunchta hai: ek Neon database jise woh build aur inspect karta hai, aur running dev server jise woh events bhejta aur dekhta hai. To kuch bhi build karne se pehle, dono ko le aayein aur confirm karein ke woh live hain.
Inngest dev server ko apne terminal mein start karein (yeh ek Node CLI hai; ise chalta chhor dein):
npx inngest-cli@latest dev
Dashboard http://127.0.0.1:8288 par aata hai, aur dev server apna MCP endpoint /mcp par expose karta hai. Ab apna coding agent restart karein (exit kar ke ai-agent-nervous-system folder mein dobara launch karein) taake freshly installed Skills aur inngest-dev MCP dono load hon. Phir yeh paste karein:
List the Neon tools and the inngest-dev tools you can see.
Is par nazar rakhein: do real lists. Neon tools (project banana, SQL chalana, tables describe karna, connection string lana, waghaira) aap ke agent ka database par haath hain. inngest-dev tools (list_functions, send_event, invoke_function, get_run_status, aur baqi) running dev server par us ka haath hain. Neeche sab kuch dono par sawar hai.
Gate open: reply real Neon tool names aur real inngest-dev tool names list karta hai. Agar Neon tools missing hain: OAuth khatam nahin hua; prep step se Neon authorization dobara karein. Agar inngest-dev tools missing hain: dev server running nahin hai (start karein), ya aap restart skip kar gaye (exit karein, is folder mein dobara launch karein, dobara poochein).
Store banayein, aur us ka connection string lein (~3 min)
Ab worker ka system of record Neon MCP par banayein, phir worker ko woh ek cheez de dein jo usay baad mein wahan pahunchne ke liye chahiye hogi: ek connection string. Jo worker aap Part 4 mein banate hain woh apne customers yahan se parhta hai aur apna audit trail yahan likhta hai. Yeh paste karein:
Paste this to your coding agent. Plan first; execute on approval.
On a fresh Neon project, create two tables:
customers(id, email, tier) andaudit_log(a record of every action the worker takes). Then call the Neon tool that returns the connection string and write that URL into my.envasDATABASE_URL. Use the Neon tools for all of it; don't write SQL for me to run.
Is par nazar rakhein: agent Neon MCP tools call kar raha hai project aur dono tables banane ke liye (aap woh tool calls dekhte hain, koi SQL nahin jo aap ne type ki), phir DATABASE_URL .env mein likh raha hai. Woh string handoff hai: Neon MCP ne store provision kiya, aur aap ka worker string use karega, MCP server nahin.
Done when: ek fresh Neon project maujood hai jis mein ek customers table aur ek audit_log table hai, aur .env mein ek DATABASE_URL hai. console.neon.tech kholein, jo project agent ne abhi banaya woh chunein, aur Tables kholein: wahan customers aur audit_log mojood hain, abhi khaali. Jab worker chalega to aap D0 mein rows aate dekhenge. (Ek table bas ek spreadsheet hai: har row ek cheez, har column ek detail.)
Pehla durable function banayein, aur ise dashboard se drive karein (~3 min)
Ab sab se chhota durable function banayein, un Skills se jo aap ne abhi install kiye. Inngest Skills apni misaalon mein TypeScript-first hain, is liye aap ka agent un se patterns leta hai (step kya hai, durable function ki shape kya hai) aur exact Python signatures docs se confirm karta hai (dev-server MCP ke grep_docs/read_doc, ya Context7), memory se nahin. Yeh paste karein:
Using the Inngest Skills, write one tiny Inngest durable function (call it
greet-customer, triggered by ademo/greetevent) that composes a greeting in onestep.run, sleeps fifteen seconds withstep.sleep, then composes a farewell in a secondstep.runand returns both. Serve it from a FastAPI host in local dev mode, and start the host on port 8000 with auto-reload on, so edits I make later are picked up without a manual restart.
Jo shape woh likhta hai, taake aap usay dekh kar pehchanein: function plain async def hai, do step.run calls woh kaam wrap karti hain jo memoized hona chahiye, aur darmiyan ka step.sleep run ko durably suspend karta hai (sleep ke dauran process crash, restart, ya redeploy ho sakta hai; timer fire hone par run next line par resume karta hai). Agent ke code mein ek detail confirm karein: Inngest client is_production=False ke saath construct hota hai, ya woh .env mein pehle se mojood INNGEST_DEV=1 parhta hai. In mein se ek ke baghair, SDK chupke se Cloud par default ho jata hai aur aap ka function kabhi locally register nahin hota.
Done when: function host port 8000 par running hai, aur dev server (jo last step se pehle hi running hai) ne usay auto-discover kar liya. http://127.0.0.1:8288 kholein, Functions click karein, aur greet-customer listed hai. Baqi aap browser se drive karte hain.
Ise trigger karein, aur ek step ko zero compute par sleep karte dekhein (aap drive karte hain)
Trigger event bhejein. Sab se aasaan raasta dashboard hai: http://127.0.0.1:8288 mein, Events click karein, phir Send event, yeh paste karein, aur Send click karein:
{
"name": "demo/greet",
"data": { "name": "Sara" }
}
(Agent mein hi rehna pasand hai? Usay kahein ke event MCP par bhej de: "Send a demo/greet event with name Sara using the inngest-dev send_event tool." Dono soorat mein wahi run start hota hai.)
Runs click karein aur naya run kholein. Pehla step complete hota hai; sleep step ek resume time ke saath Sleeping dikhata hai. Aap ke code mein kuch nahin chal raha, host terminal idle hai, aur yahi point hai: ek durable wait zero compute cost karta hai. Pandrah seconds ke baad run khud resume hota hai, farewell step complete hota hai, aur status Completed mein flip ho jata hai. Output panel returned dict dikhata hai.
Ek step toren, aur retry ko woh kaam skip karte dekhein jo woh pehle hi kar chuka (payoff)
Ab ek step ko jaan boojh kar fail karwayein, taake aap memoization ko completed kaam retry ke aar paar le jate dekh sakein. Yeh apne agent ko paste karein:
Make the farewell step raise an error on purpose, so I can watch a run fail. Keep everything else the same.
Wahi demo/greet event dobara bhejein, phir run kholein aur us ka trace parhein. Yahan payoff hai, aur yeh isi ek failing run mein hai: greeting step ek completed attempt dikhata hai, aur farewell step kayi Attempts dikhata hai, har ek backoff ke saath retried (Inngest default mein kayi attempts karta hai) is se pehle ke run Failed mein land kare. Us attempt count ke matlab ke saath thori der rukein: completed greeting step ek hi dafa pay hota hai, har retry par ek dafa nahin. Yeh durable execution hai jise aap apni aankhon se dekh sakte hain. Kyun completed step dobara chalne ke bajaye foran return ho jata hai, woh mechanic Concept 7 mein milega; abhi, bas isay hote dekhein.
(Yeh dev-server build koi alag "memoized" badge nahin dikhata. Memo hi attempt count hai: completed step ek attempt par baitha hua jab ke broken step charhta ja raha, yahan "memo se return hua, dobara nahin chala" bilkul aisa hi dikhta hai.)
Ab ise fix karein:
Now revert the farewell step to the working version.
Host auto-reload karta hai (yahi --reload ne aap ko diya; agar skip kiya, to host khud restart karein). Ek fresh demo/greet event bhejein aur ab poora function fixed code par saaf Completed tak chalta hai. Recovery ke baare mein ek imandar note, kyun ke yeh logon ko bites karta hai: dashboard ka Rerun button ek bilkul naya run top se start karta hai aap ke current code ke saath, har step scratch se dobara execute hote hue. Yeh incident recovery ke liye sahi tool hai (ek bad deploy ne runs ka batch toot diya; aap fix ship karte hain aur unhein rerun karte hain), lekin yeh memo-preserving resume nahin hai. Memo-preserving resume woh automatic retry hai jo aap ne abhi failing run ke andar dekha, jahan completed step apni jagah ruka raha.
Aap ne abhi poore course ka environment set kar diya aur nervous system ko apni aankhon se kaam karte dekha: Skills installed hain, aap ka Neon store provision ho chuka hai DATABASE_URL ke saath .env mein, dev-server MCP live hai, aur aap ne ek durable function chalaya, ek step ko compute consume kiye baghair sleep karte dekha, phir ek step toren aur automatic retry ko completed step memo se return karte dekha jab ke sirf broken step dobara chala. Yahi woh architecture hai jis ke baare mein yeh course hai. Baqi course ise scale up karta hai: real senses (cron, webhook, fan-out), stronger reflexes (step.run ke andar agent invocation), real balance load ke neeche, aur woh human-approval gate jo "agent shayad yeh bigar de" ko "agent draft karta hai, human approve karta hai, action issue hota hai" mein badal deta hai.
Agar kuch kaam nahin hua, to chaar problems lagbhag sab cover kar leti hain:
- Dev server function host tak nahin pahunch raha: confirm karein ke host port 8000 par running hai.
- Client Cloud mode mein hai: agent ne
is_production=Falsedrop kar diya aur.envmeinINNGEST_DEV=1missing hai, is liye functions kabhi locally register nahin hote. Usay kahein ke ek set kar de (explicitis_productionvalue env var par jeet jati hai). - Function dashboard se missing hai: host reload nahin hua; isay restart karein.
- Ek run baghair error aur baghair progress hang ho jata hai: ek de-synced host khamoshi se stall karta hai; host aur dev server dono saath restart karein, aur ek host ko ek dev server ke against chalayein. (Ek subtle wajah: agar
:8288taken thi aur dev server8289+par aaya, toinngest-devMCP URL ko re-point karna kaafi nahin; host phir bhi:8288se baat karta hai. Host parINNGEST_BASE_URL=http://127.0.0.1:<port>set karein taake woh dev server ke peeche naye port par chale.)
Agar in mein se koi pesh aaye, to universal recovery move yahan bhi kaam karta hai: "Something didn't work. Read the error, tell me in plain language what you see, and propose one fix I can approve."
Aap ne kya banaya, aur yeh kahan barhta hai
Environment set ho gaya: base khula hai, Skills installed hain, teenon MCP servers wired hain (Neon, Context7, inngest-dev), aap ke Neon store mein us ki customers aur audit_log tables hain DATABASE_URL ke saath .env mein, aur dev server running hai. Aap ne woh ek idea bhi dekha jis par poora course tika hai, durable execution ka reflex, apni aankhon se. Part 4 customer-support worker isi base par, isi folder mein banata hai: woh in customers ko parhta hai aur in audit rows ko likhta hai, phir poori cheez ko full nervous system mein wrap karta hai, ek real event trigger, ek daily cron jo fan out karta hai, flow control, aur refunds par durable human-approval gate. Part 4 is step.run-aur-step.sleep skeleton ko ek aise worker mein scale karta hai jo aap ke Neon store par real kaam karta hai. Agar yeh Quick Win kaam kar gaya, to aage wale concepts explain karte hain ke har piece is shape ka kyun hai.
Part 1: Senses, duniya worker tak kaise pahunchti hai
Ek AI agent jise aap haath se call karte hain woh tab chalta hai jab aap usay call karte hain. Ek real Production Worker ke paas senses hote hain: woh tab chalta hai jab duniya us tak pahunchti hai. Ek customer email karta hai, ek webhook aata hai, ek cron 09:00 par daily fire hota hai, ek doosra worker kaam hand off karta hai. In mein se har ek ek signal hai jo andar aa raha hai, aur trigger woh tareeqa hai jis se agent usay mehsoos karta hai. Part 1 ke paanch concepts wohi senses hain: event-driven mental model, woh teen tareeqe jin se duniya andar pahunchti hai (cron, webhook, event), woh semantics jo double-processing rokte hain, aur fan-out patterns jo ek signal ko kayi workers jagane dete hain.
Concept 1: Events vs requests, durable mental shift
Is course mein jo kuch aage aata hai woh ek mental shift par tika hai: requests se events ki taraf.
Request ek synchronous conversation hoti hai. Koi call karta hai; aap handle karte hain; aap return karte hain; woh continue karte hain. Ek connection khula rehta hai; ek human ya service wait kar rahi hoti hai. Agar aap crash kar jayein, caller ko error milta hai. Ek agent jis se aap prompt par chat karte hain woh ek request hai: aap ne type kiya, us ne stream back kiya, conversation aap ke terminal session ki thi.
Event ek asynchronous message hota hai. Duniya mein kuch hua (ek customer sign up hua, ek email aayi, ek payment clear hui), aur originator us fact ka ek named record emit karta hai. Zero, one, ya many functions event par independently react karte hain. Koi connection khula nahin rehta. Originator ko nahin pata kaun listen kar raha hai, woh results ka wait nahin karta, aur block nahin hota. Duniya aage barh chuki hoti hai.
# A request: I'm here, waiting, blocking
result = await agent.handle_customer_message(text=user_input)
print(result) # I unblock when the agent finishes
# An event: I fire-and-forget
await inngest_client.send(events=[
inngest.Event(
name="customer/email.received",
data={"customer_id": "c-4429", "body": email_body, "subject": subject},
),
])
# I return immediately. Somewhere else, one or more Inngest
# functions react to this event on their own schedule.
Yeh shift chhoti lagti hai. Hai nahin. Jab aap events mein sochte hain, durability aur scale lagbhag free milne lagte hain, kyun ke:
- Producer ko consumer slow nahin kar sakta (email-receiver agent ke reply draft karne ka wait nahin karta).
- Consumer kaam lose kiye baghair crash aur restart kar sakta hai (event durably stored hai; Inngest usay re-deliver karta hai).
- Producers ko change kiye baghair naye consumers add kiye ja sakte hain (ek doosra function, maan lein ek analytics counter,
customer/email.receivedsubscribe kar sakta hai baghair is ke ke email-receiver ko pata chale). - Backpressure code change ke bajaye flow-control policy ban jata hai (Inngest concurrency cap karta hai; producer fire karta rehta hai; events queue hote hain).
Predict. Aap ka customer-support Worker ek email ka jawab dene mein 8 seconds leta hai: agent ki reasoning ke liye teen seconds, do MCP tool calls ke liye chaar seconds, database write ke liye ek second. Peak load par aap 50 emails per minute receive karte hain. Agar aap request model use karte hain (email parser agent ke finish hone tak block hota hai), to aap ke email parser tak kitne parallel HTTP connections imply hote hain? Agar aap event model use karte hain (email parser ek event fire kar ke foran return hota hai), to kitne? Confidence 1-5.
Jawab: request model ko taqreeban 7 concurrent parsers chahiye hote hain (50/min × 8 seconds = ~6.7 parallel handlers, plus headroom). Event model ko ek parser chahiye hota hai (woh event fire karta hai aur ~10ms mein return ho jata hai; event queue 50/min spike absorb karti hai; Inngest functions queue ko utni concurrency par consume karte hain jitni aap allow karte hain). Event model production rate ko consumption rate se decouple karta hai. Yeh sirf scaling fact nahin; architectural fact hai. Event "duniya mein kya hua" aur "Worker us ke baare mein kya karta hai" ke darmiyan ek durable boundary ban jata hai. Consumer ko mid-processing crash karein to event retry ke liye ab bhi maujood hai. Teen aur consumer types add karein aur producer notice nahin karta. Events woh tareeqa hain jis se aap kaam ki timing ke owner rehna chhor dete hain.Try with AI
Walk me through three scenarios. For each, classify it as REQUEST-MODEL
or EVENT-MODEL, and explain which one fits better:
A) A user clicks "Submit refund request" in the support portal and
expects to see "Refund issued: $30" within 2 seconds.
B) A nightly cron job at 02:00 runs a customer-health-check across
all 5,000 customers and writes a report to Slack.
C) A customer sends an email to support@; we want a draft response
ready within 60 seconds for the on-call agent to review and send.
For each, name (a) what the human's expectation of timing is and
(b) what failure looks like if the model crashes mid-execution.
Concept 2: Cron triggers, woh kaam jo is liye chala ke waqt guzar gaya
Sab se simple trigger clock hai. Ek Production Worker bahut sa kaam aisa karta hai jo bahar ke events ka reaction nahin hota; woh scheduled work hota hai: daily health reports, weekly cleanups, hourly recalculations. Inngest ka cron trigger ek line ka code hai.
import inngest
@inngest_client.create_function(
fn_id="daily-customer-health-check",
trigger=inngest.TriggerCron(cron="0 9 * * *"), # 09:00 every day, UTC
)
async def daily_health_check(ctx: inngest.Context) -> dict[str, int]:
"""Run a customer-health pass for every Pro/Enterprise customer."""
customers = await ctx.step.run("fetch-pro-customers", fetch_pro_customer_ids)
# fan out: one event per customer, one Worker run per event
await ctx.step.run("fan-out", fan_out_per_customer_events, customers)
return {"customers_scheduled": len(customers)}
Teen cheezen note karein:
-
Schedule standard cron syntax hi hai.
0 9 * * *har din 09:00 UTC hai;*/15 * * * *har 15 minutes hai;0 9 * * 1Mondays at 09:00 hai. Inngest cron ko UTC mein evaluate karta hai; agar aap ko alag timezone chahiye, woh ek function parameter hai, alag concept nahin. -
Function ab bhi
ctx.step.runuse karta hai. Cron-triggered ho ya event-triggered, function ki shape identical hoti hai. Steps wahi kaam karte hain. Durability wahi kaam karti hai. Flow control wahi kaam karta hai. Trigger bas yeh hai ke function kaise start hota hai. -
Cron ka output ek regular Inngest function run hota hai. Woh dashboard mein dikhta hai, ek run ID rakhta hai, ek trace rakhta hai, replay support karta hai. Agar aap ka Monday-morning cron run step 3 par fail ho, to Tuesday ka cron normally chalega aur Monday ki failure bug fix ke baad replay ke liye available rahegi.
Agar cron fire hone ke waqt aap ki service down ho to kya hota hai? Yeh sawal ek durable scheduler ko ek fragile se alag karta hai. Inngest ke cron runs us lamhe durably record ho jate hain jab schedule fire hota hai; agar aap ka function endpoint unreachable ho, Inngest backoff ke saath retry karta hai jab tak woh succeed kare ya retry ceiling hit kare. 09:00 par fire hua cron is liye "miss" nahin hota ke aap ka deploy 09:00 par rolling tha; run wait karta hai, aap apna deploy finish karte hain, run complete hota hai. Development mein cron triggers ki ek quirk jaan leni chahiye: local dev server sirf tab crons fire karta hai jab woh running ho. Production unhein Inngest ke infrastructure par run karta hai, jo hamesha running hota hai.
Quick check. Teen claims. Har ek ko True ya False mark karein. (a) Agar koi cron function run hone mein 45 minutes leta hai aur har 15 minutes schedule hai, to kisi bhi waqt teen concurrent instances running honge. (b) Aap cron-triggered function ke andar
step.sleepuse kar ke kaam ko din bhar spread kar sakte hain. (c) Cron-triggered function ko testing ke liye dashboard se manually bhi invoke kiya ja sakta hai.
Answers: (a) Concurrency policy par depend karta hai: default mein Inngest overlapping runs queue karega; agar aap concurrency=1 set karein to woh serialize honge; agar concurrency=10 set karein to parallelize honge. Default sane hai. (b) True, aur "daily work ko hours mein spread kar ke load smooth karna" ek common pattern hai. (c) True: Inngest dashboard testing ke liye kisi bhi function ko on demand invoke karne deta hai, us ka trigger kuch bhi ho.Try with AI
With my AI coding assistant connected to the Inngest dev server MCP,
write a cron-triggered Inngest function in Python that:
1. Runs every Monday at 09:00 UTC.
2. Queries the audit_log table for all conversations resolved in the
prior week (status='resolved' in that window).
3. Computes per-agent metrics: total conversations resolved, average
resolution time, count of escalations, count of refunds issued.
4. Returns the metrics as a JSON object.
After you write the function, use the MCP's `invoke_function` tool to
test it manually (instead of waiting for Monday). Confirm the audit
SQL is correct by using `grep_docs` to search Inngest's docs for
"step.run" examples.
Concept 3: Webhook triggers, jab bahar ki duniya call karti hai
Doosra trigger surface HTTP hai. Koi external system (Stripe, aap ka email provider, ek customer-portal form, ek GitHub webhook) aap ke Worker ko call karna chahta hai. Inngest ke baghair aap ko: ek HTTPS endpoint khara karna, payload parse karna, source validate karna, ek queue mein likhna, queue se consume karne wala worker likhna, retries handle karna, idempotency handle karna, telemetry ship karni hoti. Har ek hafte bhar ka infrastructure work hai.
Inngest ke saath, endpoint provided hota hai. Aap Inngest dashboard mein ek webhook configure karte hain https://inn.gs/e/<your-key> jaisi URL ke saath, Stripe (ya jo bhi ho) ko us URL par point karte hain, aur webhook payload aap ke event stream mein ek event ban jata hai. Matching event-name trigger wala koi bhi function ab fire ho jata hai.
@inngest_client.create_function(
fn_id="handle-stripe-refund-failed",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def on_refund_failed(ctx: inngest.Context) -> dict[str, str]:
"""Triggered by Stripe webhook → Inngest event → this function."""
charge_id = ctx.event.data["charge_id"]
customer_id = ctx.event.data["customer_id"]
# Look up which support ticket originated this refund
ticket = await ctx.step.run(
"find-ticket-for-refund", lookup_ticket_by_charge, charge_id,
)
# Wake the customer-support Worker with the full context
await ctx.step.run(
"notify-support-agent",
notify_support_agent_of_refund_failure,
ticket_id=ticket["id"], charge_id=charge_id,
)
return {"ticket": ticket["id"], "action": "notified"}
Flow yeh hai: Stripe ek charge refund karne mein fail hota hai → Stripe Inngest webhook URL par POST karta hai → Inngest stripe/charge.refund.failed naam ka ek event banata hai → upar wala function (jo us event name se match karta hai) fire hota hai → function steps use kar ke ticket lookup karta hai aur support agent ko notify karta hai. HTTP plumbing mein se kuch bhi aap ko likhna nahin. Na endpoint, na parser, na queue, na consumer.
Do related patterns naam se yaad rakhne ke laiq hain:
- Generic JSON webhooks. Agar source koi known vendor nahin, to aap kisi bhi JSON-emitting service ko isi type ke endpoint par point karte hain aur event name choose karte hain. Slash-namespaced names (
vendor/event.subtype) convention hain; kuch enforce nahin karta, lekin follow karne par dashboard clean sort hota hai. - Webhook transforms. Agar incoming payload aap ki required shape se match nahin karta, Inngest aap ko ek "transform" function define karne deta hai jo receipt time par server-side run hota hai aur event ko aap ke event stream mein enter hone se pehle reshape kar deta hai. Is se aap ka function code provider-specific fields se clean rehta hai.
Predict. Ek Stripe webhook bilkul usi millisecond par
stripe/charge.refund.failedfire karta hai jab aap ka customer-support Worker bhiinngest_client.sendcall kar raha hai taake ek alag event emit kare jis ka naamcustomer/refund.investigation_neededhai. Dono events system mein simultaneously arrive karte hain; upar wala function sirf Stripe event par trigger hota hai. Function ek dafa run hoga ya do dafa? Confidence 1-5.
Jawab: ek dafa. Function sirf stripe/charge.refund.failed par trigger hone ke liye registered hai; customer/refund.investigation_needed event ka naam alag hai aur woh ek alag function se match karta hai (ya kisi se nahin, agar aap ne ek likha hi nahin). Ek event ka naam us ki routing key hai. Alag names wale do events kabhi ghalti se same function trigger nahin karte, chahe ek hi instant par arrive hon. Yeh ek wajah hai ke naming discipline matter karti hai: ek event name mein typo (customer/email_received vs customer/email.received) ka matlab hai function kabhi fire nahin hoga, aur symptom silent hota hai. Inngest dashboard isay catch karne mein madad karta hai: unmatched events ek alag stream mein appear hote hain jise aap audit kar sakte hain.Try with AI
I need to handle three webhook sources for my customer-support Worker:
A) Stripe: refund failed, charge disputed
B) Postmark (email service): bounced email, complaint
C) My internal admin UI: manual "investigate this ticket" button
For each, decide:
1. What event names you'd use (vendor/event.subtype format).
2. Whether the function reacting to it should run synchronously (the
caller is waiting) or asynchronously (fire and continue).
3. Whether you'd write a webhook transform to reshape the payload, or
consume it raw.
Then write the Inngest function for the Stripe refund-failed case in
Python, using the MCP's grep_docs to find the current syntax for
TriggerEvent and the dev-server MCP's send_event tool to test it.
Concept 4: Idempotency aur event semantics, wahi event do dafa fire hona
Webhooks exactly-once nahin hote. Woh at-least-once hote hain: sender retry karta hai agar usay acknowledgment na mile. Networks packets drop karte hain, services restart hoti hain, aap ka endpoint timeout hota hai aur sender retry karta hai chahe aap asal mein succeed kar chuke hon. Idempotency ke baghair har webhook system aakhir kar kisi ko double-bill, double-email, ya double-refund kar deta hai. Yeh theoretical concern nahin; yeh event systems ka sab se common production bug hai.
Defense ki do layers, dono Inngest mein built-in.
Layer 1: Source par Event ID seeds. Jab aap event khud send karte hain (webhook se receive karne ke bajaye), aap ek idempotency key attach kar sakte hain:
await inngest_client.send(events=[
inngest.Event(
name="customer/refund.requested",
data={"order_id": "o-4429", "amount_cents": 5000},
id=f"refund-request-{order_id}-{request_timestamp}", # idempotency key
),
])
Agar same id ke saath ek doosra event dedup window ke andar send ho (default 24 hours), Inngest duplicate drop kar deta hai. Same logical event, same id, sirf ek function run.
Layer 2: Step-level idempotency. Function ke andar har step.run apne naam se identify hota hai. Agar function step 3 aur step 4 ke darmiyan crash ho, retry function code ko top se dobara run karta hai, lekin steps 1, 2, aur 3 ke liye Inngest step body re-execute kiye baghair stored outputs return karta hai. Step 4 pehli dafa normally run hota hai. Isi se ek function "durable" banta hai: completed steps ke side effects retry par dobara nahin hote.
@inngest_client.create_function(
fn_id="issue-customer-refund",
trigger=inngest.TriggerEvent(event="customer/refund.requested"),
)
async def issue_refund(ctx: inngest.Context) -> dict[str, str]:
# Step 1: look up the order. If the function retries, this returns
# the SAME order data it computed the first time, from Inngest's memo.
order = await ctx.step.run(
"lookup-order", lookup_order_by_id, ctx.event.data["order_id"],
)
# Step 2: call Stripe. If the function retries AFTER this step
# succeeded, the Stripe call does NOT happen again. The refund is
# issued exactly once even if the function runs three times.
refund = await ctx.step.run(
"issue-stripe-refund", call_stripe_refund_api,
charge_id=order["stripe_charge_id"],
amount=ctx.event.data["amount_cents"],
)
# Step 3: write the audit row. Same property: runs at most once.
await ctx.step.run(
"audit-refund", write_audit_refund_issued,
order_id=order["id"], refund=refund,
)
return {"refund_id": refund["id"]}
Agar yeh function step 3 ke dauran crash ho, retry step 1 mein dobara enter karta hai (cached order data milta hai, koi DB call nahin), step 2 mein dobara enter karta hai (cached refund data milta hai, koi Stripe call nahin), step 3 real run karta hai, return karta hai. Customer ka card ek hi dafa charge hota hai, chahe function teen dafa chala ho. Yeh killer feature hai. Isi se Inngest ek aisi queue se qualitatively different banta hai jis mein retry loop ho.
Inngest ki memoization aap ko function ke perspective se exactly-once step completion deti hai: jab step.run kisi step ko successful record kar le, woh re-execute nahin hoga. Lekin ek narrow window hoti hai. Agar aap ke step ki body Stripe call kare (side effect Stripe ke servers par hota hai), phir Inngest ke result record karne se pehle crash ho jaye, retry Stripe ko dobara call karega. Inngest ke perspective se step "complete nahin hua." Stripe ke perspective se charge pehle hi ho chuka hai. Production-grade pattern Inngest step memoization plus provider-level idempotency keys hai: Stripe ka Idempotency-Key header, Postmark ka MessageID reuse, aap ke apne MCP server ka idempotency contract. step.run aur provider idempotency keys ko complementary samjhein, substitutes nahin: step.run aap ke function ki internal logic ko exactly-once rakhta hai; provider ki idempotency key external side effect ko exactly-once rakhti hai.
Quick check. True ya false. (a)
step.runstep ko idempotent sirf tab banata hai jab andar wala function bhi idempotent ho. (b) Dedup window ke bahar duplicate ID wala ek event ek naye event ke taur par treat hoga. (c) Agarstep.runmid-execution fail hota hai (step ka code ek exception throw karta hai), Inngest failure store karta hai aur next attempt par prior steps re-run kiye baghair step retry karta hai.
Answers: (a) False: step.run step ke invocation ko idempotent banata hai (success par zyada se zyada ek dafa chalega), lekin agar andar wala function non-idempotent ho (jaise Stripe call), to at-most-once guarantee exactly wohi hai jo aap chahte hain. Poora point yeh hai ke aap ko Stripe-calling ko khud idempotent nahin banana parta. (b) True: Inngest ki dedup window default mein 24 hours hai; us window ke baad same ID wale events naye treat hote hain. (c) True: automatic retry memoized hota hai; Inngest janta hai step 3 attempt 1 par fail hua tha aur attempt 2 par sirf step 3 retry karta hai. Pehle wale successful steps re-execute nahin hote. (Yeh within-run retry hai, dashboard Replay button nahin, jo ek fresh run hai, Concept 14.)Try with AI
Here are three scenarios. For each, decide: idempotency PROBLEM or
NO PROBLEM, and if it's a problem, what's the fix:
A) Stripe sends the same charge.refund.failed webhook three times
in 90 seconds (because their first two attempts timed out at
your endpoint). Your function emails the customer.
B) A customer clicks "Issue refund" three times because the page
was slow. Your function calls Stripe and writes audit_log.
C) Your nightly cron at 09:00 sends a customer-health-check event
to each Pro customer. If two crons fire at the same time (a deploy
bug), what happens?
For each problem case, propose ONE specific fix: event ID seed
inside the function, idempotency key in inngest_client.send, or
function-level deduplication on the trigger.
Concept 5: Fan-out aur sub-agent delegation, ek event many Workers
Aksar ek single event ko many jagahon par kaam trigger karna hota hai. Stripe charge.refund.failed event ko shayad: support agent notify karna, audit likhna, customer ka risk score update karna, finance ops alert karni, Slack par post karna ho. Paanch reactions, sab independent, sab ek event se.
Inngest pattern: same event ko many functions subscribe karte hain. Koi fan-out code nahin; bas multiple @inngest_client.create_function decorators same TriggerEvent ke saath. Har function independently run hota hai, apni retries rakhta hai, apna step trace rakhta hai, aur doosron se independently fail hota hai.
@inngest_client.create_function(
fn_id="refund-failed-notify-support",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def notify_support(ctx: inngest.Context) -> dict[str, str]:
# ... runs the customer-support Worker to draft a response ...
return {"status": "drafted"}
@inngest_client.create_function(
fn_id="refund-failed-update-risk-score",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def update_risk_score(ctx: inngest.Context) -> dict[str, float]:
# ... runs the risk-scoring Worker ...
return {"new_risk_score": 0.42}
@inngest_client.create_function(
fn_id="refund-failed-post-slack",
trigger=inngest.TriggerEvent(event="stripe/charge.refund.failed"),
)
async def post_to_slack(ctx: inngest.Context) -> None:
# ... posts a Slack notification ...
return None
Ek Stripe webhook aata hai. Inngest ek event banata hai. Teen functions fire hote hain, har ek apne run mein. Agar Slack down hone ki wajah se post_to_slack fail ho, baqi do unaffected rehte hain aur normally complete hote hain. Failed run Slack recover hone par replay ke liye dashboard mein baitha rehta hai. Yeh multi-Worker coordination ka core hai, aur yahi architectural pattern aap ki future manager layer (ek later course) scale par compose karegi.
Doosra fan-out pattern: parent-fires-N-children. Kabhi fan-out dynamic hota hai. Aap ke daily cron ko har Pro customer ke liye ek customer-health event fire karna hota hai, jo week ke hisab se 500 ya 5,000 ho sakte hain. Parent function N events send karta hai:
from datetime import date
async def fan_out_per_customer_events(
customers: list[str],
) -> int:
events = [
inngest.Event(
name="customer/health_check.requested",
data={"customer_id": cid},
id=f"daily-health-{cid}-{date.today().isoformat()}", # idempotency
)
for cid in customers
]
await inngest_client.send(events=events)
return len(events)
5,000 events ek single send call mein send hote hain. 5,000 function runs fire hote hain, har ek apne customer_id ke saath, har ek isolated, har ek independently retryable. Flow control (Concept 11) cap karta hai ke kitne concurrently run hon taake aap apni downstream APIs melt na kar dein. Cron function seconds mein return karta hai; fan-out us rate par run hota hai jo Inngest ki flow-control policies allow karti hain.
Sub-agent delegation fan-out ka ek special case hai. Ek Worker run ke andar, aap await inngest_client.send(...) call kar ke sub-tasks doosri Worker types ko delegate kar sakte hain. Parent children ka wait nahin karta jab tak woh explicitly step.invoke use kar ke unhein synchronously run aur un ke results collect na kare.
Predict. Aap ke paas teen functions hain jo sab
customer/email.receivedse triggered hain: customer-support agent jo reply draft karta hai (15 seconds), ek analytics counter (50ms), aur ek "VIP detector" jo check karta hai ke customer high-value hai ya nahin (200ms). Jab ek email aati hai, to har ek ke liye user-visible latency kaisi dikhti hai? Teen options: (a) teenon mil kar ~15 seconds ban jate hain; (b) teenon parallel mein chalte hain, total latency ~15 seconds (sab se slow); (c) har ek independently chalta hai baghair kisi shared latency ke. Confidence 1-5.
Jawab: (c). Har function apna run hai, apne process slot mein. Customer-support agent analytics counter ko block nahin karta; VIP detector agent ko block nahin karta. Bahar se, kisi bhi particular function ki latency bas us function ka apna time hai. Koi function kisi sibling function ka wait nahin karta. Isi liye fan-out scale hota hai: consumers isolated hain. Agar agent crash kare, analytics counter unaffected rehta hai.Try with AI
Design the fan-out architecture for these three scenarios. For each,
sketch the event names and the functions that subscribe:
A) New customer signs up. Need to: send welcome email, create
Stripe customer, post to Slack #new-customers, write to
audit_log, schedule a 7-day follow-up.
B) Customer support email arrives. Need to: draft a reply (agent),
detect sentiment, check if VIP, update customer's "last contact"
timestamp, attach to the right ticket thread.
C) Daily cron at 09:00 needs to run customer-health-check on
~5,000 Pro customers. Each check takes ~30 seconds. We want
the whole batch to complete by 11:00 (a 2-hour window).
For each, decide: how many event types, how many subscriber
functions, what the idempotency story is, and one specific failure
mode this design protects against.
Part 2: Reflexes, jab kuch toot jaye to kya hota hai
Senses worker ko jagate hain. Reflexes worker ko us se bachate hain jo aage aata hai. Ek worker ek agent call karta hai, agent chand tools call karta hai, tools ek database aur ek payment API aur ek model call karte hain: ek hi turn mein kayi external calls, jin mein se koi bhi fail ho sakti hai. Durability ke baghair, ek single transient failure mid-turn poore flow ko top se restart kar deta hai. Reflex automatic hota hai: woh tezi se act karta hai, baghair is ke ke agent ke zehan ko decide karna pare. Yeh wohi hai jo durable execution aap ko deti hai. Durability woh property hai jo kehti hai: jab kuch mid-execution fail ho, jo kaam already complete ho chuka woh complete rehta hai, aur execution us jagah se resume hoti hai jahan toota tha. Inngest yeh ek primitive (step.run) aur neeche ek memoization mechanic se deliver karta hai. Part 2 dono explain karta hai, plus time-based variants (step.sleep, step.wait_for_event), retry semantics, aur step.ai primitives.
First-pass compression note. Agar aap scan kar rahe hain, load-bearing concepts 6 (
step.run) aur 7 (memoization) hain. Concepts 8-10 un par build karte hain. 6 aur 7 dhyan se parhein; jab yeh dono aap ke zehan mein aa jayein to baqi fast read hoga.
Concept 6: step.run and the durable function model
Ek normal Python function ek dafa chalta hai, top to bottom. Agar woh halfway crash ho jaye, aap top se dobara start karte hain. Agar woh crash se pehle teen API calls karta hai, to next attempt woh teen calls dobara karta hai, aur un ka cost pay karta hai, aur shayad kisi ko dobara charge bhi kar deta hai.
Ek Inngest function durable hota hai. Jis operation ko aap checkpoint karna chahte hain woh step.run(name, fn, ...) mein wrap hota hai. Function har attempt par ab bhi top to bottom chalta hai, lekin jo steps already complete ho chuke hain woh re-execute hone ke bajaye apne stored outputs return karte hain. Function us jagah tak "catch up" karta hai jahan toota tha, phir aage continue karta hai.
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
customer_id = ctx.event.data["customer_id"]
# Step 1: load the customer record (one DB call)
customer = await ctx.step.run(
"load-customer", load_customer_by_id, customer_id,
)
# Step 2: load the conversation thread (one DB call)
thread = await ctx.step.run(
"load-thread", load_thread_for_customer, customer_id,
)
# Step 3: run the OpenAI Agents SDK agent (your worker)
response = await ctx.step.run(
"run-agent",
run_customer_support_agent,
customer=customer,
thread=thread,
email_body=ctx.event.data["body"],
)
# Step 4: write the draft reply to the database
await ctx.step.run(
"save-draft-reply", save_reply,
customer_id=customer_id, text=response.draft,
)
# Step 5: notify the on-call human reviewer via Slack
await ctx.step.run(
"notify-reviewer", post_slack_for_review, response=response,
)
return {"status": "drafted", "reviewer_notified": True}
Paanch steps. Har ek independently checkpoint hota hai.
Durability aap ko yahan kya deti hai, teen failure scenarios mein:
-
Scenario A: agent step ek timeout throw karta hai. Agar agent call ke gird
step.runna ho, to is function ka next retry customer dobara load karta hai, thread dobara load karta hai, aur agent ko scratch se dobara chalata hai, OpenAI tokens ka cost do dafa pay karte hue us kaam ke liye jo agent pehle hi jugwiyan kar chuka tha.step.runke saath, customer aur thread loads memoized hain (steps 1-2 re-execute nahin hote); sirf step 3 retry karta hai. Inngest ke automatic retries transient OpenAI errors ko aap ke code ke jaane baghair handle kar lete hain. -
Scenario B: function process step 3 aur step 4 ke darmiyan kill ho jata hai (ek deploy roll out hua, ek node restart hua, container OOM ho gaya). Durability ke baghair, agent ka response lose ho jata hai aur customer ki email tab tak unanswered rehti hai jab tak koi notice na kare. Durability ke saath, function restart ke baad resume hota hai: steps 1, 2, 3 apne stored outputs milliseconds mein return karte hain, step 4 real run karta hai, step 5 real run karta hai, customer ko drafted reply mil jati hai.
-
Scenario C: Slack step 5 par 503 return karta hai.
step.runke baghair, aap ya to kaam lose karte ya Slack call ke liye specifically retry-and-backoff logic khud likhte.step.runke saath, Inngest step 5 ko exponential backoff ke saath retry karta hai jab tak Slack recover na kare; us dauran steps 1-4 completed rehte hain aur re-execute nahin honge. Draft reply pehle se database mein hai; notification akeli cheez hai jo pending hai.
Aap koi retry loops nahin likhte, koi "kya maine yeh pehle hi kar liya" checks nahin, koi state machines nahin. State machine hi step.run calls ki sequence hai. Har step ek node hai; har transition durable hai.
step.run ka ek rule. Jo function step.run ko pass hota hai woh apne inputs ke diye hue deterministic hona chahiye: usay same arguments ke saath do dafa call karna same result produce kare.
- Yeh pure functions ke liye automatic hai.
- Yeh idempotent API calls ke liye automatic hai (Stripe ka
idempotency_key, aap ke apne MCP server tools). - Isay "ek random ID generate karo" ya "default temperature ke saath ek LLM call karo" jaisi cheezon ke liye care chahiye (ek retry original attempt se alag output produce kar sakta hai, jo kabhi kabhi matter karta hai).
Jab operation deterministic na ho, to aap usay deterministic bana dete hain: ek seed pass karein, random value step ke bahar pre-generate karein, ya yeh accept karein ke retry original se alag ho sakta hai (aksar ek agent response ke liye theek hai).
Quick check. True ya false. (a) Function ki body har retry par top se re-execute hoti hai, including saare imports aur
step.runcalls ke bahar wale variable assignments. (b) Agar ek step complete hone mein 30 seconds leta hai, aur function 25 seconds mein crash ho jata hai, to retry us step ko second 25 se continue karta hai. (c)step.runoutputs Inngest ke infrastructure mein store hote hain, aap ki application mein nahin.
Answers: (a) True, aur isi liye aap kaam ko step.run ke andar rakhte hain. step.run ke bahar wala code har retry par re-run hota hai; andar wala code per attempt ek dafa chalta hai aur success par memoized hota hai. (b) False: step.run atomic unit hai; agar ek step interrupt ho, to retry poora step dobara chalata hai. Agar aap ka step itna lamba hai ke usay restart hone ki ijaazat nahin di ja sakti, to aap usay chhote steps mein toer dete hain. (c) True: step output store Inngest ka hissa hai, aap ki DB ka nahin. Isi liye aap runs ko apni database schema badalne ke baad bhi replay kar sakte hain.Try with AI
With my AI coding assistant connected to the Inngest dev server MCP,
shape a customer-support worker into an Inngest durable function.
Take a Runner.run call that processes a customer email and wrap each
of these inside its own step.run:
1. Load the customer record
2. Load the related conversation thread
3. Run the agent (the OpenAI Agents SDK Runner)
4. Persist the draft reply
5. Notify the on-call reviewer
Use grep_docs to find the current Python SDK syntax. Use
invoke_function to test it with a synthetic email payload. Then
deliberately raise an exception in step 4 and use get_run_status
to confirm steps 1-3 don't re-execute on retry.
Concept 7: Memoization, resumability ke neeche wala mechanic
Concept 6 ne kaha "jo steps already complete ho chuke hain woh re-execute hone ke bajaye apne stored outputs return karte hain." Woh mechanism memoization hai aur is mechanic ko samajhna zaruri hai, kyun ke har doosra Inngest primitive ise use karta hai.
Jab aap await ctx.step.run("load-customer", load_customer_by_id, "c-4429") call karte hain, pehle attempt par teen cheezen hoti hain:
- Inngest apne memo store ko check karta hai: "kya is run mein step
load-customerke liye koi stored result hai?" Nahin hai. - Function
load_customer_by_id("c-4429")chalta hai. Woh{"id": "c-4429", "tier": "pro", ...}return karta hai. - Inngest woh result memo store mein likhta hai,
(run_id, step_name="load-customer")se keyed. Phir woh result aap ke code ko return karta hai.
Agar function step 3 ke baad crash ho aur Inngest retry kare, to doosre attempt par function body top se re-run hoti hai. Jab execution usi line par pahunchti hai, teen mukhtalif cheezen hoti hain:
- Inngest apne memo store ko check karta hai: "kya is run mein step
load-customerke liye koi stored result hai?" Haan, woh attempt 1 par store hua tha. - Function
load_customer_by_id("c-4429")nahin chalta. DB call nahin hoti. - Inngest stored result aap ke code ko milliseconds mein return karta hai.
Isi liye retries saste hain: mehngga kaam pehle se cached hai. Isi liye durability correct hai: mehngga kaam do dafa nahin hota. Aur isi liye "function body top se bottom dobara chalti hai" theek hai bawajood ke yeh waste lagta hai: steps ke andar ka kaam asal mein dobara nahin chalta; sirf steps ke darmiyan wala orchestration code chalta hai.
Woh implication jo naye users ko hairan karti hai. step.run ke bahar wala code har attempt par chalta hai. Agar aap yeh karein:
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
# ANTI-PATTERN: this runs on every retry. Don't do this.
expensive_thing: dict = await fetch_expensive_data(ctx.event.data["id"])
await ctx.step.run("do-something", do_something_with, expensive_thing)
return {"status": "done"}
fetch_expensive_data har retry par chalta hai. Agar woh per call $0.10 cost karta hai aur function 5 dafa retry karta hai, to aap ne abhi same data ko paanch dafa fetch karne par $0.50 kharch kar diye. Fix yeh hai ke mehnggi cheez ko us ke apne step mein wrap karein:
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
expensive_thing: dict = await ctx.step.run(
"fetch-expensive-data", fetch_expensive_data, ctx.event.data["id"],
)
await ctx.step.run("do-something", do_something_with, expensive_thing)
return {"status": "done"}
Ab fetch_expensive_data memoized hai; retries us ke liye dobara pay nahin karte.
Step name hi memo key hai. Isi liye step names function ke andar unique hone chahiye. Agar aap ke paas same function mein do step.run("load-customer", ...) calls hon, to Inngest dono calls ke liye pehle wale ka stored output return karega. Yeh lagbhag kabhi woh nahin hota jo aap chahte hain. Agar aap ke paas ek loop hai jo ek step ko N dafa call karta hai, to unhein uniquely name karein (step.run(f"load-customer-{i}", ...)) taake har iteration ka apna memo slot ho.
Predict. Aap ke function ke teen steps hain. Step 1 (
load-customer) DB calls mein $0.01 cost karta hai aur 100ms leta hai. Step 2 (run-agent) OpenAI tokens mein $0.20 cost karta hai aur 12 seconds leta hai. Step 3 (save-draft) DB calls mein $0.005 cost karta hai aur 50ms leta hai. Step 2 OpenAI rate limits ki wajah se 30% baar fail hota hai; Inngest backoff ke saath retry karta hai. (a) teenon kostep.runmein wrap karne aur (b) sirf step 2 kostep.runmein wrap karne ke darmiyan cost ka kya farq hai? Confidence 1-5.
Jawab: (a) ke saath, ek single retry aap ko sirf step 2 ka cost deta hai ($0.20). Customer aur save-draft memoized hain; woh re-execute nahin hote. (b) ke saath, har retry aap ko steps 1 aur 3 plus step 2 ka cost deta hai: $0.215 per retry. Ek hazar emails par 30% retry rate ke saath, yeh pure waste mein taqreeban $4.50 ka farq hai, plus woh operational complexity ke pata lagana ke step 3 do dafa chalne par kya partially write hua. Jo bhi aap re-execute nahin karwana chahte usay step.run mein wrap karein. Mechanic samajhne ke baad yeh optional nahin rehta.Try with AI
With my AI coding assistant: review the Inngest function we built
in Concept 6's Try-with-AI and identify any code BETWEEN step.run
calls that should be wrapped in its own step but isn't. Common
candidates:
- Computed values (timestamps, IDs, formatting) that we want to be
stable across retries
- Calls to logging or metrics services
- Reads from Redis, environment variables, secret managers
Then propose a refactor that moves each of these into its own step
with a meaningful name. For each, explain whether the side effect
is one you want to happen once (use step.run) or every retry
(leave it outside).
Concept 8: step.sleep and step.wait_for_event, waqt ke zariye durability
Kuch kaam ko wait karna parta hai. Ek welcome-email pipeline foran ek email bhejti hai, phir teen din wait karti hai, phir ek follow-up bhejti hai. Ek refund-investigation ko ek human ke approve karne ka wait karna parta hai. Ek trial-conversion flow 7 din ke andar "user upgraded to paid" ka watch karta hai aur jo woh dekhta hai us ke hisab se ek alag email bhejta hai.
Ek normal Python function mein, "teen din wait karo" ka matlab hai teen din ek process khula rakhna. Yeh untenable hai: aap ka process restart hota hai, aap ki hosting aap ko 72 hours ki idle compute ki bill bhejti hai, aap ka timer lose ho jata hai. Inngest mein, "teen din wait karo" ek line hai:
from datetime import timedelta
@inngest_client.create_function(
fn_id="trial-welcome-series",
trigger=inngest.TriggerEvent(event="user/trial.started"),
)
async def welcome_series(ctx: inngest.Context) -> dict[str, str]:
user_id = ctx.event.data["user_id"]
await ctx.step.run("send-welcome-email", send_welcome_email, user_id)
# Wait three days. The function gets paged out of memory. Nothing
# is consuming compute. Three days later, Inngest pages it back in
# and resumes execution at the next line.
await ctx.step.sleep("wait-three-days", timedelta(days=3))
await ctx.step.run("send-followup", send_followup_email, user_id)
return {"status": "completed"}
step.sleep durable hai, nervous system aaram par. Function suspend hota hai; Inngest resume time store karta hai; jab aap wait karte hain to kuch compute consume nahin hota; function sahi waqt par resume hota hai, saare prior step outputs ab bhi memoized ke saath. step.sleep (aur step.sleep_until) paid plans par ek saal tak, free Hobby plan par saat din tak wait kar sakte hain (Inngest usage limits). Saat-din ki Hobby ceiling is course ke har sleep ke liye kaafi chaurri hai.
Zyada powerful sibling step.wait_for_event hai. Waqt ka wait karne ke bajaye, ek doosre event ka wait karein. Function tab tak suspend hota hai jab tak ek matching event arrive na kare, ya jab tak aap ka set kiya hua timeout expire na ho. Yahi cheez Inngest ko HITL (Concept 15) aur inter-agent coordination patterns ka sab se clean expression banati hai:
@inngest_client.create_function(
fn_id="refund-with-approval",
trigger=inngest.TriggerEvent(event="customer/refund.requested"),
)
async def refund_with_approval(ctx: inngest.Context) -> dict[str, str]:
request = ctx.event.data
request_id = request["request_id"]
# If amount is over $100, require approval before issuing
if request["amount_cents"] >= 10_000:
# Notify a human via Slack/email/whatever
await ctx.step.run("notify-approver", notify_human_approver, request)
# Wait for an approval event. Up to 24 hours; expires otherwise.
approval = await ctx.step.wait_for_event(
"wait-for-approval",
event="refund/approval.decided",
timeout=timedelta(hours=24),
if_exp=f"async.data.request_id == '{request_id}'",
)
if approval is None or not approval.data.get("approved"):
return {"status": "rejected_or_timeout"}
# Either it was under $100, or it was approved
refund = await ctx.step.run(
"issue-stripe-refund", call_stripe_refund_api, request,
)
return {"status": "issued", "refund_id": refund["id"]}
Kya ho raha hai:
- Function
wait_for_eventtak pahunchta hai. Woh suspend hota hai. Zero compute consumed. - Ek human Slack notification dekhta hai, aap ke admin UI mein "Approve" click karta hai, aap ka UI
inngest_client.send(events=[Event(name="refund/approval.decided", data={"request_id": "...", "approved": True})])call karta hai. - Inngest event ko waiting function se match karta hai (
if_expyeh yaqeeni banata hai ke sirf is request_id ke events match karein) aur function ko event ke saathapprovalreturn value ke taur par resume karta hai. - Function refund step tak continue karta hai. Stripe refund human ke approve karne ke baad hota hai.
step.sleep aur step.wait_for_event aise timeouts hain jin ka aap pay nahin karte. Function aap ke code mein synchronous lagta hai ("teen din wait karo, phir email bhejo"), lekin runtime semantics async aur durable hain. Yeh do cheezon mein se ek hai jin ke liye Inngest mashhoor hai (doosri durable retries). Is ke baghair, alternative ek queue plus ek state machine plus ek database plus ek poller hai, aur aap teen ke bajaye ek hazar lines likhte.
Quick check. Teen claims. Har ek ko True ya False mark karein. (a) Agar
step.sleep30 din ke liye set ho aur aap ki service un 30 dinon mein paanch dafa redeploy ho, to ek paid plan par sleep uninterrupted continue karta hai. (b) Agarstep.wait_for_eventtimeout ho jaye, to function ek exception throw karta hai. (c) Same function mein dostep.wait_for_eventcalls same event ka simultaneously wait kar sakti hain.
Answers: (a) ek paid plan par True: sleeps Inngest ke infrastructure mein store hote hain, aap ki service ki memory mein nahin, is liye redeploys unhein lose nahin karte. Tier ceiling note karein: ek 30-din ka sleep ek paid plan par theek hai lekin free Hobby plan ke saat-din sleep cap se zyada hai. (b) False: timeout par, wait_for_event None return karta hai. Aap ka code is ke liye check karta hai aur decide karta hai ke kya karna hai (rejection, escalation, default-approval, jo bhi policy ho). (c) True, lekin mushtaba: dono fire honge jab ek matching event arrive kare. Agar dono wait_for_event calls ke alag if_exp filters hon, to yeh theek hai. Agar woh identical hon, to aap shayad ek refactor opportunity dekh rahe hain.Try with AI
Build a delayed-investigation flow with my AI coding assistant.
Specification:
1. Triggered by event 'customer/refund.failed'.
2. Immediately notify the on-call human via Slack with the refund
details and a "Investigate" button.
3. Wait for the human to click the button (which fires
'customer/refund.investigation_started') for up to 4 hours.
4. If the click arrives in time: run the agent to draft an
investigation summary.
5. If 4 hours pass without a click: escalate to a senior reviewer
by firing 'customer/refund.escalated'.
Use the dev-server MCP's send_event tool to simulate the
human-click event during testing. Use get_run_status to inspect
how the suspended function shows up in the dashboard. Before
writing, use list_docs to scan the Inngest documentation tree
for the right page on wait_for_event semantics, then
read_doc on the page you find to get the exact syntax for
the if_exp filter expression.
Concept 9: Retries, error handling, dead-letter
Yeh reflex close-up mein hai. Default mein, Inngest failed steps retry karta hai. Defaults sensible hain: ~4 retries exponential backoff ke saath, attempts ke darmiyan chand seconds se chand minutes tak. Final retry fail hone ke baad, run ek failed state mein enter hota hai aur inspection aur (optionally) replay ke liye wahin rehta hai. Aap ise per function tune kar sakte hain: retries=10, retries=0 (bilkul retry na karo), specific exception types jo retry nahin hone chahiye.
@inngest_client.create_function(
fn_id="charge-customer",
trigger=inngest.TriggerEvent(event="order/checkout.completed"),
retries=2, # only retry twice; this involves Stripe; don't keep hammering
)
async def charge_customer(ctx: inngest.Context) -> dict[str, str]:
try:
charge = await ctx.step.run(
"call-stripe", call_stripe_charge, ctx.event.data,
)
return {"status": "charged", "charge_id": charge["id"]}
except StripeCardDeclinedError as e:
# A declined card is not a transient failure. Don't retry.
# Mark the order as failed in our database and emit an event
# for the dunning flow.
await ctx.step.run(
"mark-failed", mark_order_failed,
ctx.event.data["order_id"], reason=str(e),
)
await ctx.step.run(
"emit-dunning-event", emit_dunning, ctx.event.data["order_id"],
)
return {"status": "card_declined"}
Teen patterns matter karte hain.
Pattern 1: Transient vs permanent failures. Inngest default mein har cheez retry karta hai, lekin kuch errors transient nahin hote. Stripe se ek card-declined error retry par bhi decline hoga. Aap ki downstream API se ek 401-unauthorized sirf is liye 200 nahin ban jayega ke aap wait karte hain. Aap ke function ko inhein specifically catch kar ke handle karna chahiye: apni DB mein likho, ek downstream event emit karo, cleanly return karo, taake woh hopeless attempts par retry budget waste na karein. Inngest ka NonRetriableError explicitly Inngest ko batata hai ke ek thrown exception ke liye retries skip kare.
Pattern 2: Step-level vs function-level errors. Ek step jo throw karta hai woh retry hota hai. Step-level retries khatam hone ke baad, function fail hota hai. Kabhi aap chahte hain ke ek function ek failing step ko survive kare: failure log karo, kaam ko "partial" mark karo, continue karo. step.run ko try/except mein wrap karein. Step ab bhi apni retries paata hai; agar saari retries fail hon, to exception aap ke catch block tak propagate hoti hai, jahan aap decide kar sakte hain ke kya karna hai.
Pattern 3: Dead-letter aur replay. Jab ek function poori tarah fail hota hai, woh ghaib nahin hota. Woh Inngest dashboard ke "failed runs" view mein enter hota hai, full trace, saare step outputs, exception, aur ek Replay button ke saath. Bug fix ship karne ke baad, aap failed runs replay kar sakte hain: har ek fixed code par top se dobara chalta hai (ek fresh run, memo-preserving resume nahin, woh distinction Concept 14 hai). Yeh traditional queues ka "dead-letter queue" pattern hai, sivaye is ke ke aap dead-letter handler nahin likhte. Aap bas bug fix karte hain aur replay karte hain, side-effecting steps ko idempotent rakhte hue taake ek re-run double-act na kare.
Predict. Aap ka function step 2 mein Stripe call karta hai aur step 4 mein aap ki customer data service. Stripe step 2 ke pehle attempt par 503 return karta hai (service unavailable, transient). Step 2 exponential backoff ke saath 4 dafa retry karta hai (~1s, ~2s, ~5s, ~12s); 4th retry par, Stripe wapas aa jata hai, charge succeed karta hai. Ab step 4 chalta hai, aur data service ek 500 ke saath down hai. Kya Inngest poora function retry karta hai, ya sirf step 4? Kitni dafa? Confidence 1-5.
Jawab: sirf step 4, aur usay apna retry budget milta hai. Steps retries share nahin karte. Step 2 ki chaar retries step 4 ki retries se independent hain. Inngest step 4 ko retry karega (default ~4 dafa) aur agar MCP server wapas aa jaye, to step 4 complete hota hai, aur function succeed karta hai. Step 2 ka Stripe charge dobara issue nahin hota, kyun ke step 2 ka output us ke successful retry ke baad memoized ho gaya tha. Customer exactly ek dafa charge hota hai bawajood ke function ne retries par 20 seconds kharch kiye.Try with AI
With my AI coding assistant: extend the customer-support Worker
function from Concept 6 with explicit retry and failure handling.
Specification:
1. The OpenAI Agents SDK call should retry 3 times on transient
failures (rate limit, timeout), but NOT retry on a content-policy
refusal from the model.
2. The Slack notification should retry up to 10 times (Slack is
often flaky; don't lose the notification).
3. The Postgres write should retry once; if it fails again, log the
failure and continue (don't fail the whole function over a
transient DB blip).
For each step, decide what's transient vs permanent and structure
the try/except accordingly. Use grep_docs to find the Python SDK's
NonRetriableError equivalent.
Concept 10: step.run for AI calls in Python (step.ai.wrap is TypeScript-only)
Concepts 6-9 kisi bhi side-effecting code ke liye kaam karte hain: DB writes, API calls, file writes, agent invocations. Inngest AI-specific step primitives bhi ship karta hai jo un patterns ko handle karte hain jin ke liye LLM calls prone hoti hain: rate-limit retries, prompts aur responses mein observability, aur (optionally) inference proxying jo serverless compute costs kam karti hai.
Pehle ek aham Python-vs-TypeScript note. Inngest ke
step.aimodule ke do methods hain, aur un ki language support alag hai.step.ai.infer()TypeScript aur Python dono mein available hai (Python SDK v0.5+): woh inference ko Inngest ke infrastructure par offload karta hai aur call ko trace karta hai.step.ai.wrap()sirf TypeScript hai: aaj koi Python equivalent nahin. Python projects ke liye (jaise is course ka Worker), ek OpenAI Agents SDK call ko wrap karne ka correct patternctx.step.run(...)hai, jo aap ko pehle hi full durability, retries, aur wrapped step ke inputs aur outputs ki observability deta hai. Aap ko bas woh LLM-specific prompt/response telemetry nahin milti jo TypeScript kastep.ai.wrapadd karta hai. (May 2026 tak AI Inference docs ke against verified.)
Python mein OpenAI calls ke liye step.run (recommended pattern). Aap ka function OpenAI call ctx.step.run("name", fn, ...) ke andar karta hai. Inngest step ke inputs aur outputs trace karta hai (jo arguments aap ne pass kiye aur jo return hua), transient failures par retry karta hai, aur result memoize karta hai taake baad ke steps ki retries OpenAI cost dobara pay na karein. Prompt aur response dashboard mein step ke input/output ke taur par record hote hain:
from openai import AsyncOpenAI
oai = AsyncOpenAI()
async def call_openai_summary(thread_text: str) -> str:
"""A normal async function. Inngest doesn't care that this is an LLM call."""
response = await oai.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "Summarize this support thread in 3 sentences."},
{"role": "user", "content": thread_text},
],
)
return response.choices[0].message.content
@inngest_client.create_function(
fn_id="summarize-customer-thread",
trigger=inngest.TriggerEvent(event="customer/thread.summary_requested"),
)
async def summarize_thread(ctx: inngest.Context) -> dict[str, str]:
thread: list = await ctx.step.run(
"load-thread", load_thread, ctx.event.data["thread_id"],
)
# The OpenAI call is wrapped in step.run. Inngest sees this as a step:
# the inputs (formatted thread text) are recorded, the output (summary
# string) is recorded, the call is memoized on success, and retries are
# automatic on transient failures.
summary: str = await ctx.step.run(
"openai-summary", call_openai_summary, format_thread(thread),
)
return {"summary": summary}
Dashboard mein, yeh run function ka step trace dikhata hai (load-thread ke baad openai-summary) har step ke inputs aur outputs ke saath. Agar OpenAI ne ek 429 (rate limited) return kiya, to Inngest openai-summary ko backoff ke saath automatically retry karta hai: wahi memoization semantics jaise Concept 7, is liye retries pehle wale load-thread step ko double-bill nahin karteen. Jo aap ko nahin milta (TypeScript ke step.ai.wrap ke muqable): automatic LLM-specific telemetry jaise token counts, model name, aur provider-specific traces dashboard ke AI view mein alag broken out. Zyada tar Python production workloads ke liye, standard step trace plus aap ki apni OpenAI client telemetry (misaal ke taur par, OpenAI Agents SDK ki tracing) is gap ko cover kar deti hai.
Kyun ke step.run har step ke inputs aur outputs ko Inngest ke observability store mein record karta hai, jo content aap step ke zariye pass karte hain woh store hota hai aur dashboard mein visible hota hai. Agar aap ke prompt mein PII (names, emails, addresses), secrets (API keys, internal tokens), contractual ya financial data, ya regulated content (HIPAA, GDPR-scoped data, PCI) ho, to raw content step body mein pass na karein. Redact, hash, summarize, ya ek reference pass karein (ek customer_id aur ticket_id, poora ticket text nahin) aur sensitive content ko step body ke andar apne authoritative store se reload karein, jahan retention aur access controls aap ke configure karne ke liye hain. Yahi discipline OpenAI Agents SDK ki apni tracing par bhi apply hoti hai agar aap usay enable karein. Step traces ko waise hi treat karein jaise aap kisi bhi production log ko karte: by default useful, policy se regulated.
step.ai.infer: serverless cost reduction ke liye ek niche tool (Python-supported). Aap shaz-o-nadir is ki taraf jayenge; step.run is course ki har AI call ke liye default hai. step.ai.infer ek specific situation ke liye maujood hai: apne function process se OpenAI call karne ke bajaye, aap Inngest ke infrastructure se call karwate hain, taake jab tak request in flight ho aap ka function process deallocate kar sake. Un serverless platforms par (Vercel, Cloudflare Workers, AWS Lambda) jo in-flight time ki bill karte hain, yeh wait ke dauran compute cost bachata hai. Long-running inferences (Deep Research, large embedding batches) ke liye savings real hain. Sub-second calls ke liye, yeh latency add karta hai baghair zyada benefit ke.
Agar aap kabhi is ki taraf jayein, to apne installed version ke liye AI Inference docs se exact signature nikalein: woh experimental inngest.experimental.ai namespace mein rehta hai aur is course ke build mein exercise nahin hua.
Quick check. True ya false. (a) Python mein,
ctx.step.run("name", call_openai, ...)OpenAI call ko durable, transient failures par retried, aur success par memoized banata hai. (b)step.ai.inferPython mein OpenAI Agents SDK ke saath Inngest use karne ke liye ek hard requirement hai. (c) Ek single OpenAI call ke liyestep.runkostep.ai.inferse replace karna hamesha function ko chalane mein sasta bana dega.
Answers: (a) True: yeh recommended Python pattern hai. OpenAI call step body ke andar jati hai; Inngest poore step ko unit of work treat karta hai. (b) False: step.run zyada tar cases ke liye kaafi hai. step.ai.infer serverless compute cost ke liye ek optimization hai, requirement nahin. Worked example mein OpenAI Agents SDK integration plain step.run use karti hai. (c) False: step.ai.infer sirf tab paise bachata hai jab (i) aap aise serverless platform par hon jo in-flight time ki bill karta hai AUR (ii) call itni lambi ho ke request-offload savings added orchestration overhead par ghaalib aa jayein. Always-on servers par sub-second calls ke liye, plain step.run jeet jata hai.Try with AI
With my AI coding assistant: take a customer-support agent
invocation and produce TWO versions of the Inngest function that
calls it:
Version A: Wrap the Runner.run call in step.run (the recommended
Python pattern: durable, retried on transient failures, memoized;
you get the standard step trace).
Version B: For comparison, write a SEPARATE small Inngest function
that calls a single OpenAI completion via step.ai.infer (the
Python-supported step.ai primitive that offloads inference to
Inngest's infrastructure to save serverless compute cost).
For each version, explain (a) what the dashboard trace shows for a
successful run, (b) what happens when the OpenAI call hits a 429
rate limit, and (c) on which kind of deployment (always-on server
vs serverless) Version B's offload saves real money.
Part 3: Balance aur recovery, production scale
Balance teesri layer hai: yeh worker ko load ke neeche healthy rakhta hai, jaise aap ka jism khud ko steady rakhta hai jab aap usay zor se push karte hain. Concurrency worker ko downstream systems melt karne se rokta hai. Throttling aap ko rate-limit walls se door rakhti hai. Priority aur fairness ek chatty customer ko sab ko starve karne se rokte hain. Batching "midnight par 10,000 events" ko "100 manageable function runs" mein badal deti hai. Replay "kal ke bug ne hamein 200 failed interactions ka kharcha karwaya" ko "hum ne ise fix kar diya; 200 conversations resume ho gaye" mein badal deti hai. Human-approval gates agent ko tab tak suspend karte hain jab tak ek human approve na kare. Part 3 ke paanch concepts aap ko woh production policies dete hain jo ek working worker ko aise worker mein badal deti hain jise aap paying customers ke saamne rakh sakein.
Concept 11: Concurrency and throttling
Concurrency ek function ke un runs ki maximum tadaad hai jo ek saath execute ho sakti hain. Throttling un runs ki maximum tadaad hai jo per unit time start ho sakti hain. Dono per function ek-ek line se configure hote hain. Dono sab se common production gap hain jab teams prototype se scale par jati hain.
from datetime import timedelta
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
concurrency=[inngest.Concurrency(limit=10)],
throttle=inngest.Throttle(limit=100, period=timedelta(minutes=1)),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
...
concurrency=10 kehta hai: kisi bhi lamhe in functions mein se zyada se zyada 10 running honge. 11th event queue mein wait karta hai jab tak 10 mein se ek finish na ho. throttle=100/minute kehta hai: per minute zyada se zyada 100 naye runs start hote hain. 101st event wait karta hai chahe concurrency headroom ho.
Practice mein dono kyun matter karte hain. Concurrency downstream systems ko protect karti hai: agar aap ka customer-support Worker OpenAI aur Postgres se baat karta hai, to 1,000 concurrent runs ka matlab hai 1,000 simultaneous OpenAI calls aur 1,000 simultaneous Postgres connections. Aap apna OpenAI rate limit khatam kar denge, apna connection pool khatam kar denge, ya dono. Throttle bursts ke khilaf protect karti hai: agar 500 customer emails 9:00am sharp par aayein, aap nahin chahte ke 500 functions usi second mein start hon; throttle start rate ko smooth kar deti hai.
Per-key concurrency. Ek single concurrency limit function par globally apply hoti hai. Ek zyada interesting pattern per-key concurrency hai: event ki kisi property se limit karo.
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
concurrency=[
inngest.Concurrency(limit=10), # global cap
inngest.Concurrency(limit=2, key="event.data.customer_id"), # per-customer cap
],
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
...
Yeh kehta hai: globally zyada se zyada 10 functions running, AUR per customer ek waqt mein zyada se zyada 2. Agar ek single customer ek minute mein 100 emails bheje, to un ki sirf 2 emails simultaneously process hoti hain; baqi 98 peeche queue hoti hain. Us dauran, doosre customers ki emails normally chalti hain; woh chatty customer se block nahin hote. Yeh do lines code mein multi-tenant fairness hai. Concept 12 is pattern ko aage barhata hai.
Quick check. Teen claims, True ya False. (a) Agar aap
concurrency=10set karein aur 1,000 events ek saath aayein, to un mein se 990 drop ho jate hain. (b) Throttling aur concurrency limits dono total throughput kam karti hain. (c) Per-key concurrency ko ek aisi key chahiye jo event data se deterministic ho.
Answers: (a) False: events drop nahin hote; woh queue hote hain. Inngest ki queue durable hai; 990 events tab tak wait karte hain jab tak concurrency slots khaali na hon. (b) False. Throttling start-rate cap karti hai; concurrency in-flight runs cap karti hai. Na koi kaam drop karti hai; dono shape karti hain ke kaam kab execute ho. Ek lambi window par throughput unchanged rehta hai agar aap ka average load limits se neeche ho. Ek peak par throughput shaped hoti hai: bursts queue se absorb hote hain. (c) True: key expression event data par evaluate hoti hai; usay same logical scope ke liye ek stable string produce karni hoti hai (customer_id theek hai; current_timestamp nahin).Try with AI
With my AI coding assistant: design the concurrency and throttling
policy for the customer-support Worker. Constraints:
- OpenAI rate limit: 30 requests per minute, hard cap.
- Postgres connection pool: 20 max connections (the Worker takes 1 per run).
- Some customers send bursts of 30+ emails in a minute (an angry
customer); these shouldn't starve other customers.
- We expect ~1,000 emails per day, with peaks around 9am and 2pm.
Propose:
1. A global concurrency value
2. A per-customer concurrency value
3. A throttle (limit and period)
For each, explain what production failure it protects against and
what the cost is (in queue latency at peak).
Concept 12: Priority and fairness, multi-tenant scaling
Concurrency limits kaam karti hain. Per-key concurrency basic fairness add karti hai. Production-grade multi-tenant systems ko zyada chahiye: priorities (Enterprise customers ko same compute ke liye hobbyists ke peeche wait nahin karna chahiye) aur fair-share scheduling (koi single tenant apni concurrency cap ke andar bhi system ko monopolize na kar sake).
Priority. Inngest har event par ek priority expression evaluate karta hai; higher priority wale runs lower priority wale runs se aage queue mein jump karte hain.
@inngest_client.create_function(
fn_id="customer-support-conversation",
trigger=inngest.TriggerEvent(event="customer/email.received"),
concurrency=[inngest.Concurrency(limit=10)],
priority=inngest.Priority(
# Enterprise tier = high priority; Pro = 0; Free = low priority
run="100 - (event.data.customer_tier_priority * 100)",
),
)
async def handle_email(ctx: inngest.Context) -> dict[str, str]:
...
Jab concurrency queue mein 50 runs wait kar rahe hon, to Enterprise customers ke runs pehle jate hain, phir Pro, phir Free. Same tier ke andar, FIFO order apply hota hai. Priority concurrency ya throttle limits ko override nahin karti; woh bas decide karti hai ke waiting runs mein se kaun se ko next free slot milega. Ek Enterprise customer ab bhi ek slot khulne ka wait karta hai; usay bas agla mil jata hai.
Fair-share scheduling. Jab aap ke paas saikron tenants same global concurrency pool ke liye compete kar rahe hon, to FIFO plus priority kaafi nahin. Ek single tenant ek burst bhej kar ab bhi minuton ke liye zyada tar slots occupy kar sakta hai. Fair-share scheduling, jo concurrency par key parameter ke zariye ek thoughtful sizing ke saath implement hoti hai, har tenant ko ek guaranteed slice deti hai:
concurrency=[
inngest.Concurrency(limit=50), # global pool
inngest.Concurrency(limit=3, key="event.data.tenant_id"), # max 3 per tenant
],
Is ke saath: 50 total slots, koi tenant 3 se zyada nahin leta. Agar 20 tenants active hon, to yeh zyada se zyada 60 slots requested hue lekin sirf 50 available. Fair-share unhein rotate karta hai, har tenant ko kuch share milta hai, koi shut out nahin hota.
Predict. Aap ke paas ek customer-support function hai
concurrency=10aur per-customerconcurrency=2ke saath. Aap ke paas priority bhi configured hai: Enterprise = high, Free = low. 9:00am par, queue mein hai: Customer A se 5 events (Free), Customer B se 5 events (Enterprise), aur ek single naye Customer C se 10 events (Free, jis ne abhi apna pehla plan khareeda). Yeh kis order mein execute hote hain? Confidence 1-5.
Jawab: yeh ek multi-level decision hai. Pehle, per-customer cap of 2 ka matlab hai ke har customer ke zyada se zyada 2 events ek waqt mein run hone ke eligible hain. To candidates ka pool hai: A se 2, B se 2, C se 2: chhe runs foran eligible. Doosra, priority decide karti hai ke un chhe mein se kaun se pehle slots bharein: B ke do pehle chalte hain (Enterprise), phir A ke do aur C ke do (Free, FIFO). To t=0 par: B ke 2 chalte hain, phir A ke 2 start hote hain, phir C ke 2 start hote hain. Total: 6 active. Jaise har ek finish hota hai, us customer ka agla queued event eligible ho jata hai aur agla slot priority se bharta hai. Yeh us kism ki policy hai jo Inngest mein ek feature hai aur aap ke apne code mein ek hazar-line scheduler.Try with AI
With my AI coding assistant: extend the customer-support Worker
configuration with a priority and fair-share scheme. Requirements:
1. Three customer tiers: Enterprise, Pro, Free.
2. Enterprise customers should never wait more than 5 seconds at
peak load.
3. Free tier customers should get fair access: no Free customer
should be starved for more than 60 seconds, even when the
global queue is full.
4. A single noisy customer (regardless of tier) should not occupy
more than 3 slots.
Write the concurrency + priority configuration. For each line of
config, explain which requirement it satisfies.
Concept 13: Batching, cost-effective bulk processing
Kuch kaam fitri tor par batched hota hai. Aap 10,000 customer conversations mein se har ek ko independently summarize nahin karte; aap LLM ko ek waqt mein 50 ke batch ke saath call karte hain. Aap 10,000 audit rows ek-ek kar ke nahin likhte; aap unhein COPY karte hain. Inngest ka batch trigger aap ko events accumulate karne aur ek single function ko batch ko input ke taur par invoke karne deta hai.
@inngest_client.create_function(
fn_id="batch-embed-tickets",
trigger=inngest.TriggerEvent(event="ticket/resolved"),
batch_events=inngest.Batch(
max_size=50, # invoke when 50 events accumulated, OR
timeout=timedelta(seconds=30), # invoke when 30 seconds pass, whichever first
),
)
async def batch_embed_resolved_tickets(ctx: inngest.Context) -> dict[str, int]:
# ctx.events (plural) instead of ctx.event
ticket_ids = [e.data["ticket_id"] for e in ctx.events]
tickets = await ctx.step.run(
"load-tickets", load_tickets_by_ids, ticket_ids,
)
# One embedding call for 50 tickets, not 50 calls for 1 ticket each
embeddings = await ctx.step.run(
"embed-batch", embed_texts_batch,
[t["text"] for t in tickets],
)
await ctx.step.run(
"store-embeddings", store_embeddings_batch,
ticket_ids, embeddings,
)
return {"batched": len(ctx.events)}
Kya badalta hai: ctx.events ek list hai, ek single event nahin. Function per event ek dafa ke bajaye per batch ek dafa chalta hai. OpenAI embedding API ko 50-text batch ke saath call kiya jata hai, 50 single-text calls ke bajaye, jo dramatically sasta hai (aap per token pay karte hain, lekin per-request overhead khatam ho jata hai) aur faster (ek API round-trip 50 ke bajaye).
Batching sahi tool hai jab kaam fitri tor par bulkable ho (embeddings, bulk DB writes, bulk emails) aur aap kaam hone se pehle apne timeout jitni latency tolerate kar sakein. Yeh ghalat tool hai jab har event ko interactive response chahiye ho ya jab events ke aar paar ordering unpredictable tareeqon se matter kare.
Quick check. True ya false. (a) Batched functions ko ab bhi retries aur memoization milti hain; batch bata ke taur par durably memoized hota hai. (b) Agar batch timeout sirf 3 events accumulate hone ke saath expire ho jaye, to function tab tak nahin chalega jab tak agle 47 arrive na hon. (c) Aap
batch_eventskoconcurrencyke saath combine kar ke cap kar sakte hain ke kitne batches parallel mein chalein.
Answers: (a) True: batch unit of work hai; retries poore batch ko us ke saare events ab bhi scope mein ke saath replay karti hain. (b) False: yahi timeout ka poora point hai. 30 seconds ke baad function jo bhi accumulated ho us ke saath chalta hai, chahe woh 1 event ho. (c) True: yeh production pattern hai. Batch plus concurrency mil kar aap ke downstream load ko achhi tarah cap kar dete hain.Try with AI
With my AI coding assistant: write a batched Inngest function that
embeds resolved support tickets, converting a per-ticket event
handler into one batched call.
Triggers: 'ticket/resolved' event, batched at 50 events or 30 seconds.
The function should:
1. Load the ticket bodies in one query
2. Call OpenAI embeddings API with a 50-text batch (faster + cheaper)
3. Store the embeddings
4. Emit a 'ticket/embedded' event per ticket for downstream consumers
Use grep_docs to find the OpenAI batch-embedding pattern.
Concept 14: Replay and bulk cancellation, production recovery
Kabhi sab kuch ek saath ghalat ho jata hai. Aap ne ek bug ship kiya; pichle chhe ghanton mein ek hazar runs fail ho gaye. Ya aap ki downstream API 30 minutes ke liye down thi; us window ke dauran jis ne bhi usay call karne ki koshish ki woh mar gaya. Ya aap ne ek logic error discover kiya aur usay fix karne ke baad ek din ka kaam dobara karna chahte hain.
Pehle, woh distinction jo sab ko trip karti hai. Inngest aap ko ek failed step ko dobara chalane ke do tareeqe deta hai, aur woh alag behave karte hain:
- Automatic retry (same run ke andar). Jab ek step throw karta hai, Inngest function ko backoff ke saath retry karta hai, top se dobara enter karte hue. Completed steps memo se return hote hain aur re-execute nahin hote; sirf failing step dobara chalta hai. Yeh memo-preserving resume hai, wohi jo aap ne Quick Win mein dekha, aur wohi jo "step 3 par kharch kiye $0.20 dobara kharch nahin hote" property ko sach banata hai. Yeh automatic hai aur original run ke andar hota hai.
- Replay / Rerun (dashboard button, kayi runs ke aar paar). Yeh aap ke current deployed code ke saath ek bilkul naya run top se start karta hai, har step scratch se re-executing (ek rerun ko ek naya run id milta hai aur woh pehla step dobara chalata hai, purane ka resume nahin). To practice mein purane run ka memo aap ko yahan nahin bachata. Yeh incident recovery ke liye hai, completed work skip karne ke liye nahin.
In ko alag rakhna poora concept hai. Memo payoff automatic retry mein rehta hai; Replay ek fresh start hai.
Do opposite recovery primitives. Replay kehta hai "yeh kaam fail hua, main chahta hun ke yeh fixed code par dobara chale." Bulk cancellation kehti hai "yeh kaam queued tha lekin main ab nahin chahta ke yeh ho." Same dashboard surface, opposite intent. Zyada tar teams ko real traffic chalane ke pehle teen mahinon mein dono ki zarurat parti hai.
Replay recovery primitive hai. Failed runs apni full step history, input event, aur failed step ke exception ke saath persist hote hain. Dashboard se aap Functions view kholte hain, ek aise function par filter karte hain jis ke failed runs hon, ek time window aur ek failure pattern select karte hain (koi specific error message ya bas "all failures"), aur Replay click karte hain. Inngest har ek ko jo bhi code ab deployed ho us par ek fresh run top se schedule karta hai.
Replay ke baare mein teen cheezen samajh lein.
- Replay aap ka current deployed code use karta hai. Agar aap ne runs ke fail hone aur unhein replay karne ke darmiyan ek fix deploy kiya, to replayed runs naya code use karte hain. Yahi poora point hai: ek aisi runs ki population lo jo ek bug par mari, fix ship karo, aur unhein hands-off dobara chalao.
- Replay har step re-execute karta hai; woh purane run ka memo reuse nahin karta. Ek replayed run ek naya run hai, is liye har step fixed code par scratch se dobara chalta hai. Cost ke lehaaz se, per replayed run poore function ke cost ka plan karein, sirf failed step ka nahin. Jo cheez ek replay ko doosra real-world side effect (ek duplicate refund, ek duplicate email) issue karne se rokti hai woh memo nahin hai, woh us side effect par ek idempotency key hai (Concept 4): aap request se ek stable key derive karte hain (ek refund ke liye, kuch aisa jaise
(order_id, request_id)) aur provider ek repeat ko no-op treat karta hai. Is course ka minimal worker brevity ke liye woh key omit karta hai, us ka refund customer par match karta hai aur unconditionally likhta hai, is liye ek production version koi bhi real paisa harkat karne se pehle ek add karega. Memo ek run ke andar protect karta hai; idempotency key reruns ke aar paar protect karti hai. - Replay opt-in hai. Failed runs dashboard mein tab tak baithte hain jab tak aap un par act na karein. Woh hamesha ke liye retry nahin hote; woh ghaib nahin hote. Woh aap ka wait karte hain.
Bulk cancellation is ka inverse hai. Kabhi aap ke paas hazaron queued ya sleeping runs hote hain jo aap ab nahin chahte: ek campaign cancel ho gaya, ek customer churn kar gaya aur aap ab usay follow-up emails nahin bhejna chahte, ek feature roll back ho gaya. Dashboard se aap ek function aur ek time window ya event filter select karte hain, aur Cancel click karte hain. Matching runs cleanly terminate hote hain: un ke step.sleep aur step.wait_for_event calls resume nahin karteen, queued runs start nahin hote, in-flight runs cancellation ke liye check karte hain aur agle step boundary par exit kar jate hain. Cancellation step boundary ka ehtraam karti hai; ek in-flight step.run terminate hone se pehle jis step mein woh hai usay finish kar leta hai, taake aap ko aadhe-complete Stripe charges ya torn DB writes na milein.
Replay vs cancellation as a decision. Jab runs ki ek population ke saath kuch ghalat ho jaye, ek sawal poochein: kya main chahta hun ke yeh kaam succeed ho ya main chahta hun ke yeh ho hi na? Agar kaam succeed hona chahiye (bug-fix recovery), replay. Agar kaam nahin hona chahiye (cancelled campaign, churned customer, rolled-back feature), cancel. Agar aap unsure hain (misaal ke taur par, failed runs mein kuch aise hain jo aap recover karna chahte hain aur kuch jo pehli jagah fire hi nahin hone chahiye thay), apni dashboard query ko zyada narrowly filter karein taake har subset ko sahi treatment mile.
Teen patterns jo yeh practice mein enable karta hai:
- "Hum ne ek bug ship kiya" recovery. Bad deploy ke time window mein failed runs dhoondein, bug fix karein, fix ship karein, failures replay karein. Customer experience: un ki email ko ek ghante tak reply nahin mili lekin aakhir mein mil gayi, baghair is ke ke aap ne koi recovery code likha.
- "Campaign canceled" rollback. Ek welcome series jo 14 din mein teen follow-up emails fire karti hai; customer day 4 par churn karta hai. Aap day-7 aur day-14 follow-ups nahin bhejna chahte. Matching
wait-for-eventaursleepruns ko bulk-cancel karein. - "Schema migration" replay. Aap ne badla ke agent summaries ko kaise format karta hai; aap chahte hain ke kal ke tickets naye format ke saath dobara summarize hon. Un runs ko dhoondein (successful ya nahin) aur replay karein; kyun ke ek replay ek fresh run top se hai, agent naye code par har step dobara chalata hai, jo yahan bilkul wohi hai jo aap chahte hain. Apne side-effecting steps ko idempotent rakhein taake unhein dobara chalana double-charge ya double-send na kare.
Dev-server MCP recovery ko aap ke coding agent se nikle baghair accessible bana deta hai. Development ke dauran aap AI se kah sakte hain ke ek failed run inspect karne ke liye get_run_status use kare, phir fixed code par event dobara fire kar ke kaam recover kare (usay ek naya event id dein, kyun ke same id ke saath dobara fire karna Concept 4 ke idempotency semantics se ek no-op tak deduplicate ho jata hai). Dashboard ka Rerun button equivalent one-click raasta hai. Dono soorat mein aap ko current code par ek fresh run milta hai, memo-preserving resume nahin.
Quick check. True ya false. (a) Ek dashboard Replay kaam ko naye deployed code par dobara chalata hai. (b) Ek dashboard Replay original run ke successful steps ko memo se return karta hai aur sirf failed wale ko dobara chalata hai. (c) Ek failing run ke andar automatic retry completed steps ko memo se return karta hai aur sirf failing step ko dobara chalata hai. (d) Ek in-flight function ko bulk-cancel karna currently-executing
step.runko mid-step abort kar dega taake jaldi terminate ho.
Answers: (a) True: ek replay jo bhi ab deployed ho us par ek fresh run top se hai, isi liye yeh bug-fix recovery ka tool hai. (b) False: yeh trap hai. Ek replay ek naya run hai jo har step top se re-execute karta hai, is liye purane run ka memo carry over nahin hota. Jo cheez ek replayed side effect ko do dafa fire hone se rokti hai woh idempotency key hai, memo nahin. (c) True: yeh memo-preserving raasta hai, aur wohi hai jo aap ne Quick Win mein dekha. Completed step ek attempt par baitha rehta hai jab ke failing step retry karta hai. (d) False: cancellation step boundary ka ehtraam karti hai; current step.run run terminate hone se pehle finish (ya fail) hota hai. Yeh torn writes rokta hai.Try with AI
Walk through a recovery scenario with my AI coding assistant:
Yesterday at 14:00 we deployed a change to the worker's agent step.
A bug in the new code made the agent step throw on every run.
From 14:00 to 18:00, 47 customer-support runs failed at that step.
At 18:30 we noticed, fixed the bug, and re-deployed.
Use the dev-server MCP's grep_docs to find Inngest's replay docs,
then:
1. Outline the exact dashboard steps to identify the 47 failed runs.
2. Explain what a dashboard Replay does for one of those runs: is it
a fresh run from the top on the fixed code, or a resume that
reuses the old run's memo? What does that mean for the cost of
replaying all 47?
3. Confirm whether the customers will see one reply or several if a
replayed run re-sends the email, and name the mechanism that
keeps it to one (hint: it is not memo).
4. Identify ONE scenario in this story where you'd prefer to
bulk-cancel instead of replay, and explain why.
Concept 15: HITL gates with step.wait_for_event, runtime mein Invariant 1
Agent Factory ki Invariant 1 kehti hai ke human principal hai: authored intent, agent ki autonomous judgment nahin, woh hai jise runtime ko high-stakes decisions par honor karna chahiye. Yeh woh ek jagah hai jahan ek human zehan loop mein wapas qadam rakhta hai. Baqi har jagah nervous system khud chalta hai, reflex se; yahan woh ruk kar ek shakhs ka wait karta hai. Yeh production mein approval gates ke taur par dikhta hai: agent analysis karta hai, action draft karta hai, lekin action execute nahin karta jab tak ek human approve na kare.
Inngest ka step.wait_for_event (Concept 8) aaj kisi bhi platform par is ka sab se clean expression hai. Agent decision ke point tak chalta hai, suspend hota hai, aur ek approval event ka wait karta hai. Human review karta hai (Slack mein, ek admin UI mein, email mein) aur approve ya reject click karta hai. Event fire hota hai. Function human ke verdict ke saath resume hota hai aur us hisab se act karta hai. Yeh wohi hai jis ka matlab runtime par spec-driven hai: nervous system plan enforce karta hai, ke kis action ko ek human chahiye, kis order mein, kis timeout ke saath. Woh agent ki reasoning ko police nahin karta; woh control karta hai ke agent ko kya karne ki ijaazat hai.
@inngest_client.create_function(
fn_id="refund-with-hitl-gate",
trigger=inngest.TriggerEvent(event="customer/refund.investigated"),
concurrency=[inngest.Concurrency(limit=5)],
)
async def refund_with_gate(ctx: inngest.Context) -> dict[str, str]:
request_id = ctx.event.data["request_id"]
amount_cents = ctx.event.data["amount_cents"]
# Step 1: the agent's analysis (your worker, run durably)
analysis = await ctx.step.run(
"agent-investigates",
run_refund_investigation_agent,
request_id=request_id,
)
# Step 2: if the agent thinks refund is warranted AND amount > $100,
# gate behind human approval
needs_approval = analysis.recommends_refund and amount_cents >= 10_000
if needs_approval:
await ctx.step.run(
"notify-approver",
send_slack_approval_request,
request_id=request_id,
analysis=analysis,
amount_cents=amount_cents,
)
# === THE HITL GATE ===
approval = await ctx.step.wait_for_event(
"wait-for-human-approval",
event="refund/approval.decided",
timeout=timedelta(hours=24),
if_exp=f"async.data.request_id == '{request_id}'",
)
if approval is None:
# Timeout: no human responded in 24h. Escalate.
await ctx.step.run(
"escalate-timeout",
escalate_to_senior_reviewer,
request_id=request_id,
)
return {"status": "escalated_timeout"}
if not approval.data["approved"]:
await ctx.step.run(
"notify-rejected", notify_customer_rejected,
request_id=request_id,
)
return {"status": "rejected_by_human"}
# Either it was approved, or it didn't need approval
refund = await ctx.step.run(
"issue-refund", call_stripe_refund,
request_id=request_id, amount_cents=amount_cents,
)
await ctx.step.run(
"audit-approved-refund", audit_refund,
request_id=request_id, refund=refund,
approved_by="human" if needs_approval else "auto",
)
return {"status": "issued", "refund_id": refund["id"]}
Code mein jo aap dekhte hain: steps ki ek sequence, beech mein ek wait_for_event ke saath. Runtime par kya ho raha hai:
- Agent chalta hai (step 1, durably).
- Function decide karta hai ke gate apply hoti hai ya nahin (in-code logic, side effects se paak).
- Agar gated: ek Slack notification fire hoti hai (step 2, durable). Function suspend hota hai. Zero compute consumed, 24 hours tak.
- Slack mein ek human Approve ya Reject click karta hai. Admin backend
inngest_client.sendkorefund/approval.decidedaurrequest_idke saath call karta hai. - Inngest event ko suspended function se match karta hai (
if_expfilter yaqeeni banata hai ke sirf matching request IDs match karein). Function next line par resume hota hai. - Function human ke decision ko use kar ke ya to refund issue karta hai ya rejection notify karta hai. Dono raaste decision aur approver ko audit karte hain.
Yahi cheez Inngest ko ek queue-plus-state-machine se qualitatively different banati hai. HITL pattern ek primitive hai. Function ka code top to bottom parha jata hai, gate inline ke saath. Koi callback nahin, koi state restoration nahin, koi if state == waiting_for_approval: ... dispatching nahin. Runtime suspend/resume mechanic handle karta hai; aap ka code policy express karta hai.
Ek later course Invariant 1 ko architecturally develop karta hai: authored intent, spec-driven workflows, manager-of-workers layer jo decide karti hai ke kaun se gates kaun se actions par apply hote hain. Yeh course aap ko runtime primitive deta hai. Jab woh manager layer aaye, to jo gate woh implement karegi woh exactly yahi wait_for_event pattern hoga, bas fleet scale par compose kiya hua. Primitive ab jaan lene ka matlab hai ke architectural pattern baad mein "ek sensible composition" ki tarah parha jayega, "magic" ki tarah nahin.
Yeh woh keystone hai jo aap Part 4 ke Decision 5 mein banate hain: refund approval, durable bana di gayi. Yahan concept shape hai; worked example use ek real needs_approval tool se wire karta hai aur sabit karta hai ke refund exactly ek dafa fire hota hai.
Predict. Aap ke paas ek HITL gate
timeout=timedelta(hours=24)ke saath set hai. Ek customer ki refund request Friday 17:00 par aati hai. Weekend par koi human online nahin. Gate ka timeout Saturday 17:00 par fire hota hai. Aap ka timeout handler ek blocked refund record karta hai. Reviewer request ko Monday 9:00am par parhta hai. Timeline se guzrein: weekend ke dauran kitne function runs active thay? Inngest ne kitne compute ki bill ki? Confidence 1-5.
Jawab: weekend ke dauran zero active function runs. Function suspended tha: Inngest ne us ki state store ki, function ko memory se page out kiya, aur ya event ya timeout ka wait kiya. Inngest suspended time ki bill nahin karta. Jab Saturday 17:00 aaya aur timeout fire hua, to function un chand sau milliseconds ke liye resume hua jo blocked-refund audit row likhne mein lage, phir complete ho gaya. Yeh haqeeqat ke reviewer Monday tak nahin dekhta worker ki taraf se kuch cost nahin karti. Inngest par HITL workflows ki economics polling-based queues se dramatically different hai jo aap ko "kya yeh abhi tak approved hai?" polling ke har second ki bill karti hain.Try with AI
With my AI coding assistant: design a durable refund-approval gate.
Specification:
1. The agent investigates and decides a refund is warranted, but the
refund tool needs human approval before it runs.
2. The gate should:
- Notify the on-call reviewer with the agent's recommendation
- Wait up to 4 hours for the reviewer to approve or reject
- On approve: issue the refund.
- On reject: do not issue; record a blocked refund.
- On 4-hour timeout: do not issue; record a blocked refund.
3. Every branch (approve/reject/timeout) writes an audit row from a
small fixed set of action names, capturing what was decided.
Use the dev-server MCP's send_event to simulate each branch of
the reviewer's decision during testing.
Part 4: Worked example, ek customer-support Production Worker
Yahan aap build karte hain. Pehle worker (ek prompt), phir us ke gird nervous system, per prompt ek layer. Aap apne coding agent ko chhote plain-English prompts mein direct karte hain aur woh code likhta hai; neeche dikhaye gaye snippets har layer ki chand load-bearing lines hain, files nahin. Poora implementation ek live dev server aur ek real model ke against end-to-end chalaya gaya, is liye jo shapes aap dekhte hain woh wohi hain jo run hote hain. Agar koi signature unfamiliar lage, to aap ka agent current docs check karta hai.
Shape: saat prompts, us base par jo aap pehle hi set kar chuke.
- D0 khud worker banata hai, standalone.
- D1 agent run ko durable banata hai.
- D2 ek event ko usay jagane deta hai.
- D3 ek daily cron add karta hai jo fan out karta hai.
- D4 flow control add karta hai.
- D5 keystone hai: refunds par ek durable human-approval gate.
- D6 sabit karta hai ke worker ek broken step survive karta hai: completed work dobara kiye baghair retry, phir recover.
Shuru karne se pehle. Aap ka environment Quick Win se pehle hi set ho chuka hai: wohi
ai-agent-nervous-systemfolder kholein, Inngest aurneon-postgresSkills installed ke saath, aap kiOPENAI_API_KEYaur aap ka NeonDATABASE_URL.envmein, aap kicustomersauraudit_logtables provisioned, aur teenon MCP servers (Neon, Context7,inngest-dev) wired. Sirf do reminders:
- Dev server running hai. Agar aap ne ise band kiya tha to dobara start karein: apne terminal mein
npx inngest-cli@latest dev. Dashboardhttp://127.0.0.1:8288par hai. (Jab aap baad mein Inngest Cloud par deploy karenge, to free Hobby tier $0 hai baghair credit card ke; us ki ceilings Part 5 mein hain.)- Neeche MCP calls ke liye ek casing note. Dev-server tool names
snake_casehain (send_event,get_run_status,invoke_function), lekin un ke parameterscamelCasehain (get_run_statusrunIdleta hai,invoke_functionfunctionIdleta hai). Python SDK throughoutsnake_casehai; sirf MCP call parameterscamelCasehain.
The brief
Aap ek chhota customer-support worker banate hain aur usay ek Production Worker nervous system dete hain. Worker apne sample customers ko Neon customers table se parhta hai (id, email, tier), ek incoming email ko ek warm reply draft karta hai, refund sirf human approval ke saath issue kar sakta hai, aur ek chhote fixed set se har action ke liye Neon audit_log table mein ek audit row likhta hai: message_received, message_sent, refund_issued, refund_blocked. Phir saat prompts us ke gird Inngest add karte hain: ek event usay jagata hai, agent call durably chalti hai, ek daily cron per eligible customer ek health check fan out karta hai, flow control concurrency aur throttle cap karta hai, refund ek durable human gate par pause hota hai, aur ek replay path failed runs recover karta hai.
Aage aane wale prompts par ek note. Har ek us tarah likha hai jaise aap waqai apne coding agent se kahenge: chhota, plain, us par bharosa karte hue ke woh detail handle karega. Woh cold paste karne par kaam karte hain, aur is se bhi behtar agar aap pehle agent se orient karne kahein ("read the project and tell me what you see, then ask me anything unclear before you start") jaise files jama hoti jati hain. Prompts destination hain; pehle orient karna on-ramp hai.
D0: Worker banayein, standalone
Aap kahan hain: base khula hai, dev server running hai, aur aap ka Neon store provisioned hai, lekin abhi tak koi worker nahin hai. Yeh Decision standalone worker banata hai; akhir tak woh ek sample email par chalta hai aur Neon ko ek audit row likhta hai.
Base pehle se ek AGENTS.md ship karta hai jise aap ke agent ne open par parha, is liye woh project ko janta hai; isi liye yeh prompts chhote rehte hain. Us mein ka woh ek rule jo apne zehan mein rakhne laiq hai woh poore course ka architectural invariant hai: worker ka apna code kabhi inngest se import nahin karta. Agent aur us ke tools plain Python rehte hain; nervous system unhein bahar se wrap karta hai. Woh separation, agent aur nervous system alag rakhna, wohi hai jo aap ko baad mein Inngest ko Temporal ya Restate se swap karne aur worker ko untouched chhorne deta hai.
Aap ka Neon system of record Quick Win se pehle hi provisioned hai: customers aur audit_log tables maujood hain, aur DATABASE_URL aap ke .env mein hai. To worker us database ko shuru se read aur write karta hai. Ab worker banayein. Yeh paste karein:
Build me a minimal customer-support agent with the OpenAI Agents SDK, running in a local sandbox. It reads the sample customers from my Neon
customerstable (each row has an id, email, and tier), drafts a warm reply to an incoming customer email, and can issue a refund, but the refund tool needs human approval before it runs. Write an audit row into my Neonaudit_logtable for every action, using a small fixed set of action names and theDATABASE_URLin.env. Seed thecustomerstable with five sample rows first if it is empty. Keep it small; it exists to be wrapped, not shipped. Then run it on a sample email and show me the reply.
Creates: worker.py aur db.py (ek flat project, koi src/ nesting nahin). D1 teesri file ke taur par Inngest host add karta hai. Agent Postgres tak DATABASE_URL ke zariye pahunchta hai, kabhi Neon MCP server ke zariye nahin, jo sirf aap ka build-time tool hai.
Seed data itni chhoti hai ke page par rakhi ja sake, teen tiers ke aar paar paanch sample customers, jinhein agent customers table mein insert karta hai:
[
{ "id": "cust_001", "email": "ada@example.com", "tier": "enterprise" },
{ "id": "cust_002", "email": "grace@example.com", "tier": "pro" },
{ "id": "cust_003", "email": "linus@example.com", "tier": "pro" },
{ "id": "cust_004", "email": "edsger@example.com", "tier": "standard" },
{ "id": "cust_005", "email": "alan@example.com", "tier": "standard" }
]
Aap ka agent do chhoti Python files likhta hai. db.py Postgres access rakhta hai: DATABASE_URL par ek chhota pooled asyncpg connection, ek load_customers() read, aur ek record() audit-write helper ek closed vocabulary ke saath, char-item set ke bahar koi bhi action name raise karta hai, jo ek typo ko ek silent bad row ke bajaye ek loud error mein badal deta hai. worker.py ek SandboxAgent hai do tools ke saath jo db.py mein call karte hain. Us ki sirf ek line baqi course ke liye load-bearing hai, refund tool ka decorator:
@function_tool(needs_approval=True)
def issue_refund(order_id: str, amount_cents: int, reason: str) -> str:
...
Woh needs_approval=True agent ko refund issue karne ke bajaye pause karwata hai: run refund pending ke saath wapas aata hai aur ek human decide karta hai. Yeh woh hook hai jis par poora HITL keystone (D5) tika hai. (Yeh floor har refund gate karta hai, jo keystone ko simple rakhta hai; ek production worker aam tor par sirf ek threshold ke upar gate karega, Concept 15 ka over-$100 pattern. Wiring dono soorat mein identical hai.)
Ek structural note jo agent jo likhta hai us mein confirm karein, kyun ke D5 us par depend karta hai: build_agent() aur sandbox run_config() ko alag functions rakhein. Jab D5 ek paused run ko resume karta hai woh agent ko usi tool shape par rebuild karta hai aur wohi run_config() dobara pass karta hai; saved state sandbox session carry nahin karti, is liye resume ko usay dobara supply karna parta hai. Inhein abhi alag factor karein aur keystone baad mein ek chhota step ban jata hai.
Done when: agent ek sample email par chalta hai aur ek chhoti reply print karta hai, aur Neon audit_log table mein ek nayi row hai (console mein check karein, ya apne agent se kahein ke usay Neon tools par parh kar wapas bataye). Agar email ek refund describe karti hai, to run refund tool par pause hota hai use issue karne ke bajaye; woh pause hi poora point hai, aur D5 use durable banata hai.
Is Part ke prompts ek frontier-class coding agent assume karte hain (Claude Sonnet ya Opus, ek GPT-5-class model, ya Gemini 2.5 Pro). Jo Inngest architecture aap seekh rahe hain (events, steps, memoization, flow control) woh SDK-level hai aur jo bhi model aap ke agent ko drive kare us ke saath holds karti hai. Lekin build experience strong instruction-following par leans karta hai, khaaskar D5 keystone. Ek weaker model par, ek prompt par ek se zyada dafa iterate karne aur file names spell karne ki tawaqqo rakhein. Architecture broken nahin hai; prompting ko bas zyada scaffolding chahiye.
D1: Agent run ko durable banayein
Aap kahan hain: ek worker jo sirf tab chalta hai jab aap usay call karte hain, ek crash mid-run par sab kuch lose karta hua. Yeh Decision agent call ko step.run mein wrap karta hai; akhir tak ek completed run dashboard mein agent step ko memoized dikhata hai.
Nervous system yahan se shuru hota hai: poore agent call ko ek single step.run mein wrap karein taake woh durable aur memoized ho. Yeh paste karein:
Wrap the agent run in an Inngest durable function so it survives crashes and retries transient failures. The whole agent call goes inside a single
step.runso it is memoized. Run it in local dev mode against the Inngest dev server, with a FastAPI host. Confirm a completed run shows the agent step memoized in the dashboard.
Creates: inngest_app.py (dev mode mein ek Inngest client, agent call ek helper mein, aur ek FastAPI host jise dev server discover karta hai).
Jo shape matter karti hai woh ek step.run hai jo agent call ko wrap karti hai:
async def handle_customer_email(ctx: inngest.Context) -> dict:
email_text = ctx.event.data["email_text"]
outcome = await ctx.step.run("run-agent", functools.partial(_run_agent, email_text))
return {"replied": outcome["status"] == "done"}
Do idioms jo agent jo likhta hai us mein confirm karein. Step handler apne koi arguments nahin leta, is liye functools.partial email_text ko pehle se bind karta hai, yahi tareeqa hai jis se aap kisi bhi step mein data pass karte hain, aur aap ise yahan se har step par dekhenge. Aur agent helper plain Runner.run use karta hai, streamed runner nahin: yeh woh raasta hai jis par human-approval keystone (D5) bana hai, is liye ise shuru se use karna D5 ko ek rewrite ke bajaye ek chhota step bana deta hai. Client is_production=False ke saath construct hota hai (Quick Win se dev-mode flag).
Ise do processes ke taur par chalayein, function host aur woh dev server jo usay dhoondta hai:
uv run uvicorn inngest_app:app --port 8000 --reload --log-level info # terminal 1: function host (your model key is sourced here; --reload picks up the D6 break/fix edits)
npx inngest-cli@latest dev -u http://127.0.0.1:8000/api/inngest # terminal 2: dev server, auto-discovers the host
Done when: dashboard handle-customer-email list karta hai aur ek completed run run-agent step dikhata hai. (Aap usay D2 mein ek event se theek se jagate hain; abhi ke liye, function ka discoverable hona kaafi hai.)
Yeh load-bearing move kyun hai. Agent call mehngga hissa hai: model tokens, kayi seconds. step.run ke andar us ka result memoized hota hai, is liye jab koi baad ka step fail ho aur function retry kare, agent dobara nahin chalta. Woh single wrapping ek aise worker, jo har retry par double-pay aur double-act karta hai, aur ek aise worker, jo har mehnggi cheez exactly ek dafa karta hai, ke darmiyan farq hai.
D2: Ise ek event par trigger karein
Aap kahan hain: ek durable function jo pehle se customer/email.received (D1 ka decorator) se triggered hai, lekin koi audit trail nahin. Yeh Decision agent ke har taraf ek audit row add karta hai; akhir tak ek real event ek run drive karta hai jis ki dono rows likhi jati hain.
Agent se pehle ek audit step aur ek us ke baad add karein, phir worker ko ek real event se jagayein use haath se chalane ke bajaye. Yeh paste karein:
Make the worker wake on a
customer/email.receivedevent instead of being run by hand. Add an ingress audit step before the agent and a reply audit step after it. Send a test event and show me the run completing with both audit rows.
Edits: inngest_app.py (function ko agent ke har taraf ek audit step milta hai).
Shape agent step ke gird do aur step.run calls hai:
customer_id = ctx.event.data.get("customer_id") # bound from the event, alongside D1's email_text
await ctx.step.run("audit-received", functools.partial(
db.record, "message_received", customer_id=customer_id, detail=email_text[:80]))
outcome = await ctx.step.run("run-agent", functools.partial(_run_agent, email_text))
await ctx.step.run("audit-sent", functools.partial(
db.record, "message_sent", customer_id=customer_id, detail=(outcome["reply"] or "")[:80]))
Har row closed set se ek action name use karti hai: message_received andar, message_sent bahar, aur db.record use Neon audit_log table mein DATABASE_URL par likhta hai. Test event agent se dev-server MCP ke send_event tool ke saath bhejein (name: "customer/email.received", ek data object email_text aur customer_id ke saath). Dev server koi bhi event accept karta hai, is liye locally test karne ke liye aap koi webhook configure nahin karte; production mein aap apne email provider ko ek Inngest webhook URL par point karenge jo us ke payload ko is event mein reshape karta hai, jo ek dashboard setting hai, code nahin.
Done when: run complete hota hai, trace teen steps order mein dikhata hai (audit-received, run-agent, audit-sent), aur Neon audit_log table mein us customer ke liye ek message_received aur ek message_sent row hai.
Do audit steps kyun, ek nahin. Har ek apna step.run hai, is liye har ek independently memoized hai. Agar reply step fail ho aur function retry kare, to ingress row do dafa nahin likhti (memo hit) aur agent do dafa nahin chalta (woh bhi memoized). Audit trail retries ke aar paar exactly-once rehta hai, woh property jo D6 sabit karega.
D3: Ek daily cron jo fan out karta hai
Aap kahan hain: ek worker jise duniya ek waqt mein ek email se jagati hai. Yeh Decision ek daily cron add karta hai jo per eligible customer ek event fan out karta hai; akhir tak har ek ko apna durable child run milta hai.
Scheduled work add karein: ek daily cron jo har Pro aur Enterprise customer ke liye ek health-check event fire karta hai, har event apna durable run trigger karta hua. Yeh paste karein:
Add a daily cron that fans out one
customer/health_check.requestedevent per Pro and Enterprise customer, each one idempotency-keyed so a re-delivered cron run never double-fires. Each child event triggers its own durable run that writes one audit row. Invoke the cron manually and show me one child run per eligible customer.
Creates: ek cron parent jo fan out karta hai aur ek event consumer jo har child handle karta hai, dono host ke saath registered.
Do shapes is Decision ko carry karte hain. Trigger ek one-line cron decorator hai, aur fan-out N events hai har ek ek idempotency key carry karte hue:
@inngest_client.create_function(fn_id="daily-health-check", trigger=inngest.TriggerCron(cron="0 9 * * *"))
async def daily_health_check(ctx: inngest.Context) -> dict:
# ... select Pro/Enterprise customers, then:
events = [
inngest.Event(
name="customer/health_check.requested",
data={"customer_id": c["id"]},
id=f"health-{c['id']}-{ctx.event.id}", # idempotency key per (customer, cron run)
)
for c in eligible
]
await ctx.step.send_event("fan-out-health-checks", events)
Idempotency key load-bearing detail hai: id=f"health-{customer}-{cron_run}" ka matlab hai ke agar same cron run do dafa deliver ho (ek redeploy, ek retry), to duplicate event drop ho jata hai, is liye har customer ko per day exactly ek check milta hai. Consumer ek ordinary event-triggered function hai jo ek audit row likhta hai. Cron ko agent se MCP ke invoke_function tool ke saath invoke karein (kal 09:00 ka wait na karein). Ek dev quirk: dev server sirf tab crons fire karta hai jab woh running ho; production unhein Inngest ke always-on infrastructure par run karta hai.
Done when: parent seconds mein complete hota hai aur dashboard per eligible customer ek customer-health-check child run dikhata hai, standard-tier customers theek se skip ho jate hain.
Fan-out kyun, ek loop nahin. Parent customers ko khud process nahin karta; woh N events bhejta hai aur return karta hai. Har child apna run hai, isolated, independently retryable, apni concurrency se capped. Ek function ke andar ek loop unhein couple kar deta: ek slow customer baqi ko rok deta, aur ek crash poora batch lose kar deta. Fan-out woh tareeqa hai jis se ek scheduled wake-up N independent durable runs ban jata hai.
D4: Flow control
Aap kahan hain: ek worker jo har email handle karta hai lekin ek burst ke neeche un sab ko ek saath fire kar deta. Yeh Decision teen flow-control policies add karta hai; akhir tak ek twenty-event burst cap ke neeche queue hota hai baghair dropped ya duplicated rows ke.
Jab 9am par paanch sau emails aa girein, to worker ko ek saath paanch sau model calls fire nahin karni chahiye: woh rate limit blow kar deta hai aur noisy customer ke peeche sab ko starve kar deta hai. Ek global concurrency cap, ek per-customer cap, aur ek throttle add karein. Yeh paste karein:
Add flow control to the email handler: a global concurrency cap, a per-customer concurrency key so one noisy customer can't starve the rest, and a throttle to protect the OpenAI rate limit. Fire a burst of twenty events across five customers and show me they queue under the cap and all complete with no dropped or duplicated audit rows.
Edits: inngest_app.py (email function par teen decorator arguments).
Yeh teen arguments hi lesson hain, poora D4 in mein rehta hai:
concurrency=[
inngest.Concurrency(limit=10), # global cap
inngest.Concurrency(limit=2, key="event.data.customer_id"), # per-customer cap
],
throttle=inngest.Throttle(limit=100, period=datetime.timedelta(minutes=1)),
Teen knobs, teen jobs. Global limit=10 cap karta hai ke kitne runs ek saath execute hon, do real ceilings ko protect karte hue: model ka rate limit, aur aap ka Neon connection budget. Do cheezen aap ke connections ko bound karti hain, aur woh alag scales par kaam karti hain. Ek single worker replica ke andar, saare runs ek asyncpg pool share karte hain, is liye pool ka max_size wohi hai jo connections ko flat rakhta hai chahe kitne runs active hon (ek bees-run burst ek host par ab bhi mutthi bhar pooled connections par sawar hota hai). Replicas ke aar paar, woh local pool ab madad nahin karta, replica two ka apna pool hai, is liye concurrency cap wohi hai jo total runs, aur is liye total connections, fleet-wide bound karta hai: das replicas har ek limit=10 par ek sau runs aur ek-sau-ish connections hain, jise aap Neon ke budget ke against size karte hain (free tier chand sau pooled allow karta hai). Pool aur cap mil kar protection hain: pool ek replica ko bound karta hai, cap fleet ko bound karta hai. Kisi ek ke baghair, ek paanch-sau-email burst un-pooled, un-capped replicas ke aar paar Neon ke accept se kahin zyada connections khol deta hai. Per-customer limit=2 event.data.customer_id par keyed ka matlab hai ke ek customer ka burst zyada se zyada do slots occupy karta hai, is liye ek account ka flood doosron ko kabhi starve nahin karta. throttle cap karta hai ke kitne runs per minute start hon, ek spike ko ek steady rate mein smooth karte hue. Ek function zyada se zyada do concurrency policies carry karta hai; global-plus-per-key pair common shape hai. Burst agent se fire karein: paanch customers ke aar paar bees customer/email.received events send_event ke zariye.
Done when: burst cap ke neeche queue hota hai (running count 10 par ya us se neeche rehti hai, aur per customer 2 par ya us se neeche), har run complete hota hai, aur Neon audit_log table mein exactly bees message_received aur bees message_sent rows hain. Koi dropped runs nahin, koi duplicates nahin, aur burst ke neeche koi Neon connection-limit errors nahin, is single host par asyncpg pool connections ko flat rakhta hai (aap burst running ke saath bhi sirf mutthi bhar in use dekhenge), aur cap wohi hai jo unhein replicas ke aar paar flat rakhega jab aap scale out karenge.
Yeh policy kyun hain, code nahin. Is mein se kuch aap ke function body mein nahin rehta; yeh configuration hai jo runtime enforce karta hai. Caps ke baghair, ek burst ya to ek downstream system melt kar deta hai ya ek tenant ko worker monopolize karne deta hai. Wahi fairness haath se likhna ek queue plus ek scheduler plus ek rate limiter hai, saikron lines. Yahan yeh teen decorator arguments hain.
D5: Refunds par ek durable human-approval gate (keystone)
Aap kahan hain: ek worker jis ka refund pause (D0 ka needs_approval=True) ephemeral hai, running process mein reh raha. Yeh Decision us pause ko durable banata hai; akhir tak run zero compute par suspend hota hai, ek real approval event ka wait karta hai, aur refund ko exactly ek dafa issue karne ke liye resume hota hai.
Woh ephemeral pause gap hai: ek crash, ek deploy, ya ek reviewer jo dopaher le leta hai, aur pending refund chala gaya. Yeh poore course ka keystone hai: pause ko durable banao, taake function zero compute par suspend ho, jitni der lage ek real approval event ka wait kare, phir bilkul wahi agent run resume kare. Yeh paste karein:
The refund approval is currently an in-process pause that a crash or a slow reviewer would lose. Make it durable: when the agent pauses on the refund, persist its serialized run state as the step's output, then suspend the whole function on
step.wait_for_eventwaiting for arefund/approval.decidedevent (give it a four-hour timeout and match it to this customer). When the decision arrives, rehydrate the state, apply approve or reject, and resume the agent so the refund fires exactly once. Drive a refund, show me the run suspended and waiting, send an approval, and show me exactly one refund audit row. Then do it again with a rejection and show me a blocked row and no refund.
Edits: inngest_app.py (agent helpers pause aur resume karna seekh lete hain; email function ko gate milti hai).
Yeh Decision doosron se zyada code earn karta hai, kyun ke suspend-and-resume dance hi lesson hai. Jab agent pause karta hai, woh apni run state serialize karta hai; jab decision aata hai, aap us state ko rehydrate karte hain, approve ya reject apply karte hain, aur resume karte hain:
async def _run_agent(email_text: str) -> dict:
agent = worker.build_agent()
result = await Runner.run(agent, email_text, run_config=worker.run_config())
if result.interruptions: # the refund tool paused for approval
return {"status": "needs_approval", "state": result.to_state().to_string()}
return {"status": "done", "reply": result.final_output}
async def _resume_agent(state_str: str, approved: bool, rejection_message: str | None) -> dict:
agent = worker.build_agent()
state = await RunState.from_string(agent, state_str)
for item in state.get_interruptions():
if approved:
state.approve(item)
else:
state.reject(item, rejection_message=rejection_message or "Refund denied.")
db.record("refund_blocked", detail=f"args={item.arguments}")
result = await Runner.run(agent, state, run_config=worker.run_config())
return {"status": "resumed", "reply": result.final_output}
Email function ke andar, gate ek inline wait_for_event hai jahan agent pause hua; decision ek resume step drive karta hai:
decision = await ctx.step.wait_for_event(
"await-refund-approval",
event="refund/approval.decided",
timeout=datetime.timedelta(hours=4),
if_exp=f"async.data.customer_id == '{customer_id}'",
)
# (decision is None on timeout -> write a refund_blocked row and return)
resumed = await ctx.step.run("resume-agent", functools.partial(
_resume_agent, outcome["state"], bool(decision.data.get("approved")), decision.data.get("rejection_message")))
Ise top to bottom parhein: gate ek otherwise ordinary function mein ek inline call hai. Koi callback nahin, koi state-machine dispatch nahin, koi if status == waiting: branching invocations ke aar paar nahin. Runtime suspend aur resume handle karta hai; aap ka code policy express karta hai. Chaar details apni jagah earn karti hain:
result.to_state().to_string()paused run ko serialize karta hai, aur wohrun-agentstep ka output ban jata hai, is liye woh durably stored hai.to_state()synchronous hai;to_string()woh string return karta hai jise aap persist karte hain.RunState.from_string(agent, s)await hota hai (woh ek coroutine hai) aur us stored string ko seedha leta hai. Phir aapstate.get_interruptions()parapproveyarejectkarte hain aur resume ke liyeRunner.run(agent, state, ...)call karte hain. (Ek resume approvals pending chhor sakta hai, is liye real helper tab tak loop karta hai jab tak koi baqi na rahe.)- Resume par wohi
run_config()dobara pass hota hai, aur agent usi tool shape par rebuild hota hai. Serialized state sandbox session carry nahin karti, is liye resume ko usay dobara supply karna parta hai. Yeh woh ek detail hai jo agar miss ho jaye to resumed run ko fail kar deti hai. (D0 nebuild_agentaurrun_configko bilkul isi ke liye alag factor kiya tha.) if_expdecision ko is customer se match karta hai (async.data.customer_id == '...'), is liye ek customer ke liye ek approval kabhi kisi doosre customer ke run ko resume nahin karti.
Ise agent se drive karne ke liye: ek customer/email.received event bhejein jis ki email ek refund describe karti ho, run ko await-refund-approval par suspend hote dekhein (dashboard usay WAITING dikhata hai, run status RUNNING lekin zero compute ke saath), phir send_event ke zariye refund/approval.decided {"approved": true, "customer_id": "cust_001"} ke saath bhejein. Dobara karein {"approved": false} ke saath.
Done when: approval par, suspended run resume hota hai aur Neon audit_log table mein exactly ek refund_issued row hota hai. Rejection par, run resume hota hai, audit mein ek refund_blocked row hoti hai aur koi refund_issued nahin, aur agent ki reply denial explain karti hai.
Yeh keystone kyun hai. Har doosri layer (senses, reflexes, balance) worker ko khud se correct ya healthy rakhti hai. Yeh woh hai jahan human zehan ek high-stakes action par loop mein wapas enter karta hai, durably, jitni der lage, wait karte hue zero cost par. Is ka ek queue-plus-database-plus-poller version ek chhota project hai. Yahan yeh ek wait_for_event aur ek resume hai.
D6: Sabit karein ke durability ek broken step survive karti hai
Aap kahan hain: ek poora worker har layer wrapped ke saath. Yeh Decision us property ko sabit karta hai jis ne yeh sab justify kiya; akhir tak aap ne ek broken run ko apne failing step ko kayi dafa retry karte dekha hai jab ke us ka completed audit step exactly ek dafa chalta hai, phir ek fresh run par kaam recover kar liya.
Sabit karne wali aakhri property wohi hai jis ne yeh sab justify kiya, Concept 7 ka memoization mechanic. Aap ne use wahan samjha; ab use apne hi worker mein sabit karein. Yeh paste karein:
Deliberately break the agent step so it fails, fire an event, and show me Inngest retrying it while the earlier audit step stays memoized, so the failing run writes its ingress audit row exactly once across all the agent retries. Then fix the step and recover the work, and show me the recovery completing.
Agent step ko jaan boojh kar break karein (_run_agent ke andar ek ValueError raise karein), mukhtalif customers ke liye chand customer/email.received events fire karein, aur har run ka trace parhein. Yeh proof hai, aur yeh har failing run ke andar hai: audit-received ek completed attempt dikhata hai aur apni row ek dafa likhta hai; run-agent kayi Attempts dikhata hai jab woh backoff ke saath retry karta hai (Inngest default mein kayi attempts karta hai) aur phir fail hota hai; audit-sent kabhi nahin chalta. Audit step ek attempt par baitha hua jab ke agent step charhta ja raha woh Concept 7 ka memoization hai, ab aap ke apne worker mein visible: failing run sirf ek message_received row likhta hai chahe agent step kitni dafa retry kare.
Phir break revert karein (host auto-reloads agar aap ne --reload ke saath chalaya tha; warna restart karein) aur kaam recover karein event ko fixed code par dobara fire kar ke (ya, ek real bad-deploy batch ke liye, dashboard ka Rerun button; dono ek fresh run top se start karte hain, Concept 14 mein cover hua). Yahan woh hissa hai jo logon ko hairan karta hai, aur yeh correct behavior hai, bug nahin: recovery ek bilkul naya run hai, is liye woh audit-received dobara chalata hai aur apni apni message_received row likhta hai. Ek break-then-recover ke baad, us customer ke paas jaayazi tor par do message_received rows hain, ek failed run se, ek recovery se. Memoization ek within-run guarantee hai; woh kabhi do alag runs ke aar paar span nahin karti.
Done when: failed run ke trace mein, audit-received ek attempt par baitha tha aur ek row likhi jab ke run-agent ne kayi attempts jama kiye aur fail hua, woh ek-attempt-bawajood-N-retries hi memoization hai, sabit. Phir recovery run fixed code par run-agent aur audit-sent complete karta hai. Console mein Neon audit_log query karein (ya apne agent se kahein ke usay Neon tools par parh kar wapas bataye): ek customer jise aap ne broke-and-recover kiya us ke paas do message_received rows hongi (failed run plus recovery) aur ek message_sent (sirf recovery itni door gayi), jo bilkul theek hai. Asal diagnostic per-run hai, per-customer nahin: ek single run ka trace kholein aur confirm karein ke audit-received ek attempt dikhata hai. Agar ek run ka trace ingress step ko do dafa chalte dikhaye, to woh ek memoization bug hai (aam tor par ek non-unique step name); do alag runs ke aar paar phaili do rows nahin.
Yeh bright line kyun hai. Ek worker jo ek bad deploy par customer work lose kar deta hai woh bas ek agent hai jise aap call karte hain. Ek worker jo wahi bad deploy leta hai, loudly fail hota hai, broken step ko us kaam ko dobara kiye baghair retry karta hai jo woh pehle hi finish kar chuka (agent step ke kayi attempts, lekin ingress audit ek dafa likhi), aur fix ke baad ek fresh run par cleanly recover karta hai, woh ek Production Worker hai. Proof failed run ka apna trace hai, ek ingress attempt against kayi agent attempts, runs ke aar paar ek row count nahin.
Isi nervous system ko minimal floor ke bajaye apne SandboxAgent worker ki taraf point karein; wrapping identical hai. Aur yeh step.wait_for_event approval us course ke optional Decision 10 ke hand-rolled run-state table ko replace kar deta hai: jo durable gate aap ne abhi banaya wohi persistence layer hai, is liye aap table delete kar sakte hain.
Kya hua abhi
Aap ne ek chhota customer-support worker banaya aur usay ek nervous system diya, ek waqt mein ek layer. Worker ke internals D0 ke baad kabhi nahin badle: wohi SandboxAgent, wohi do tools, wohi Neon Postgres audit trail. Jo badla woh us ke gird sab kuch hai. Woh ab ek customer/email.received event par aur ek daily cron par jagta hai jo per eligible customer fan out karta hai, durably chalta hai (agent call step.run ke andar), flow control ka ehtraam karta hai (global aur per-customer concurrency, ek throttle), refunds ko ek durable human approval par gate karta hai (step.wait_for_event), aur ek bad deploy se failed runs replay kar ke recover karta hai, audit trail dikhate hue ke kisi bhi single run ke andar har step exactly ek dafa fire hua, chahe woh run kitni dafa retry hua.
Agent code wahi hai; us ki reach nahin. Aap ne ek aise agent se shuru kiya jise aap operate karte hain, ise prompt karo, dekho, dobara prompt karo. Aap ke paas ab ek worker hai jo khud operate karta hai: duniya use jagati hai, us ke reflexes use failures ke aar paar le jate hain, woh load ke neeche apni balance rakhta hai, aur ek human sirf wahan qadam rakhta hai jahan stakes ek ki demand karte hain. Yahi woh line hai jo opening ne khinchi thi, ek aise agent ke darmiyan jise aap operate karte hain aur ek aise FTE ke darmiyan jo khud operate karta hai, aur aap ne abhi us ke aar paar build kiya.
Baqi concerns scale par observability, multi-worker coordination, aur woh manager layer hain jo decide karti hai ke kaun se workers kaun sa traffic handle karein. Woh track mein agla course hai. Yeh course production-ready execution ki unit cover karta hai; agla un units ko ek workforce mein compose karta hai.
Part 5: Yeh course kahan khatam hota hai
Ek Production Worker ki cost shape
Do cost surfaces matter karti hain: infrastructure cost (Inngest, aur jo bhi store aur compute par aap worker chalate hain) aur inference cost (model tokens). Infrastructure load barhne par lagbhag flat rehti hai; inference linearly scale karti hai. Neeche jo method hai woh seekhne ki cheez hai; koi bhi dollar figure us hafte stale ho jata hai jab woh ship hota hai, is liye numbers ko illustrative samjhein aur budget mein koi number daalne se pehle current pricing pages check karein.
Inngest pricing. Inngest per execution charge karta hai: har function run, plus har step-level retry, ek execution ke taur par count hota hai.
| Tier | Price | Executions / month | Concurrent steps | Notable |
|---|---|---|---|---|
| Hobby | $0 | 50,000 | 5 | 3 users, 50 realtime connections, no credit card |
| Pro | from $75 / month | 1,000,000 | 100+ | 1000+ realtime connections, 15+ users, 7-day trace retention |
| Enterprise | custom | custom | 500-50,000 | SAML / RBAC, 90-day trace retention, dedicated support |
Events pricing upar layer hoti hai: pehle 1-5M events per day included hain; us se upar, overage taqreeban $0.000050 per event se shuru hoti hai aur higher volume par girti hai. Pro $50 per additional 1M executions add karta hai jab aap 1M cap se aage nikal jate hain.
Hobby-tier ceilings jo yahan matter karti hain. 5-concurrent-step cap ka matlab hai ke chahe aap code mein concurrency=Concurrency(limit=10) declare karein, platform ka account-level cap aap ko 5 par rakhta hai. Aap ka code production ke liye correct hai; free tier par observed concurrency 5 hai. step.sleep aur step.sleep_until bhi tier-bounded hain: free Hobby plan par saat din tak, paid plans par ek saal tak (Inngest usage limits).
Inference cost ghaalib aati hai. Ek typical customer-support run per conversation chand hazar se das hazar model tokens use karta hai. Apni per-token price ko apne tokens-per-email se aur apni emails-per-day se multiply karein aur aap ke paas woh line hai jo matter karti hai; zyada tar workers ke liye woh baqi sab ko dwarf kar deti hai. Yeh wohi hai jo aap optimize karte hain. Baqi sab ek rounding error hai. Do sab se high-value levers: ek stable cached prompt prefix rakhein (taake model repeated hisse ko sasti cached rate par bill kare, har call par full price nahin), aur easy turns ko ek saste model par route karein.
Teen Inngest-specific cost levers jab aap optimization zone mein hon:
- Pure functions ko
step.runmein wrap na karein. Agar ek function ke koi side effects nahin, usay durability nahin chahiye; usay wrap karna baghair benefit ke ek step-run charge add karta hai.step.runko I/O aur side effects ke liye bachayein. - Bulk paths ke liye
batch_eventsuse karein. Ek 50-event batch ek function run hai, 50 nahin. step.sleepaurstep.wait_for_eventke saath saste suspend karein. Suspended functions suspension time ki bill nahin karteen. Ek 3-din delayed-followup ek 3-second wale jitna cost karta hai.
Scale par shape: inference woh bill hai jo traffic ke saath barhti hai; Inngest, aap ka data store, aur compute comparatively flat rehte hain. Apni real volume par wahi multiplication chalayein, yahan print kiye gaye kisi figure par bharosa karne ke bajaye.
Swap guide: nervous system invariant hai, platform nahin
Yeh course har layer par Inngest naam leta hai. Yeh is liye ke ek teaching example ko concrete answers chahiye, "jo orchestrator pasand ho use karo" nahin. Lekin architecture kisi bhi compliant alternative ke saath kaam karti hai. Paanch swaps jin ki course ka design explicitly tawaqqo karta hai:
-
Trigger surface: Inngest events → Temporal signals, Restate handlers, AWS EventBridge + Lambda. Har platform ke paas "yeh code tab chalta hai jab yeh named cheez ho" express karne ka tareeqa hai. Event names, payload shapes, aur idempotency discipline sab transfer hote hain. Jo badalta hai: SDK ka decorator syntax aur dashboard.
-
Durable execution: Inngest
step.run→ Temporal activities, Restate handlers, custom Postgres-backed state machines. Har ek aap ko "is side-effecting call ko memoize karo, transient failure par retry karo, crash ke baad resume karo" semantics deta hai. Temporal sab se qareeb analog hai aur purana, zyada enterprise-tested option. Restate sab se naya hai aur is ka zaiqa zyada functional-programming wala hai. Custom state machines woh hain jo teams tab likhte hain jab woh ek managed platform adopt nahin kar sakte; aam tor par 1,000-10,000 lines code jo ~70% recreate karte hain jo Inngest aap ko free deta hai. -
HITL primitive:
step.wait_for_event→ Temporal kaawait Workflow.execute_activity(approval_signal), Restate ke awakeables, custom Redis/Postgres approval queues. Pattern wahi hai: function suspend hota hai, external signal use resume karta hai, audit decision capture karta hai. Inngest ka expression likhne mein sab se clean hai; Temporal ka zyada verbose lekin large scale par battle-tested hai. -
Cron scheduling: Inngest cron triggers → Kubernetes CronJobs + queue, GitHub Actions schedules, AWS EventBridge schedules. Cron triggers commodity hain. Inngest ki advantage cron hona nahin hai; yeh hai ke cron-triggered functions ko wahi durability/replay/flow-control milta hai jo event-triggered wale ko, automatically. Doosre platforms aap se woh khud wire karwate hain.
-
Flow control: Inngest concurrency + throttle → Temporal task queues with worker concurrency, Redis-backed rate limiters, AWS SQS message visibility timeouts. Doosre platforms yeh kar sakte hain; Inngest ise us configuration density ke saath karta hai jo hum ne dekhi (ek decorator argument).
Dapr as the open companion at production scale. Ek zyada ambitious replacement naam lene laiq: Dapr Agents production scale par Inngest ka structural companion, us tarah jaise OpenCode Claude Code ka hai. Dapr Agents 23 March 2026 ko CNCF governance ke neeche v1.0 GA tak pahuncha (CNCF announcement, Dapr Agents core concepts). DurableAgent production-ready class hai; purani Agent class deprecated hai. Dapr tab chunein jab Kubernetes-native deployment aur multi-language SDKs Inngest ke local dev experience se zyada matter karein. Inngest behtar learning tool hai (dashboard mental model ko visible banata hai); Dapr behtar scale tool hai jab aap Inngest ki tier ceilings hit kar chuke hon ya K8s-native multi-language deployment chahiye ho.
Inngest open source bhi hai (github.com/inngest/inngest; 1.0 release ne September 2024 mein self-hosting support add ki) aur Helm + KEDA ke zariye self-hostable. Jo axes scale par matter karte hain woh governance, support, aur maturity hain: Inngest ek single vendor se governed hai ek young self-hosting story ke saath; Dapr CNCF-governed hai ek longer production track record ke saath.
| This course's concept | Inngest primitive | Dapr production analogue | Teaching note |
|---|---|---|---|
| Scheduled work | TriggerCron | Cron input binding / Dapr Scheduler | Same idea: time wakes the Worker. Dapr usually requires component configuration. |
| Webhook/event ingress | Inngest webhook endpoint → event | HTTP endpoint, input bindings, or pub/sub ingress | Inngest hides more plumbing; Dapr gives infrastructure control. |
| Internal events | inngest_client.send() | Dapr pub/sub | Same event-driven mental model; broker is pluggable in Dapr. |
| Fan-out | One event triggers many functions | One topic/event consumed by many services | Same architecture; Dapr uses broker/topic/subscriber composition. |
| Durable steps | step.run() + memoization | Dapr Workflows + activities | Similar production purpose, different developer model. |
| Waiting without compute | step.sleep() | Durable workflow timers | Both avoid holding a process open while waiting. |
| Human approval gate | step.wait_for_event() | Workflow external events/signals, pub/sub, actors | Inngest expression is simpler; Dapr is more composable. |
| Retries | Function/step retries | Workflow/activity retries + resiliency policies | Dapr makes resiliency a runtime policy as well as workflow behavior. |
| Dead-letter / failed runs | Inngest dashboard failed runs + replay | Broker DLQ + workflow status/restart/manual tooling | Inngest is more turnkey here; Dapr is more infrastructure-native. |
| Flow control | Concurrency, throttling, priority, batching | Kubernetes scaling, app concurrency, broker controls, resiliency policies, bulk pub/sub | Dapr can do it, but it is not one decorator argument. Inngest is denser. |
| Stateful coordination | wait_for_event, event keys, step state | Actors + state store + workflows | Dapr Actors are stronger for long-lived identity/stateful coordination. |
| Agent runtime | Your agent inside Inngest function | DurableAgent / Dapr Agents v1.0 GA | Dapr Agents explicitly makes the agent workflow-backed and resumable. |
Yeh table ek translation guide hai, identical APIs ka claim nahin. Inngest production pattern ko ek compact developer experience ke saath sikhata hai: triggers, steps, waits, replay, aur flow control ek product surface mein. Dapr usi production architecture ko distributed-systems building blocks ke zariye implement karta hai: bindings, pub/sub, workflows, actors, state, resiliency, aur Kubernetes-native operations. Concepts seedha transfer hote hain; implementation style badalta hai. May 2026 tak Dapr ke bindings overview aur Dapr Agents core concepts ke against verified.
Production scale par Dapr ki taraf reach karne ki teen wajuhat:
- CNCF-governed, charter se vendor-neutral: koi single vendor platform ya us par aap ke inhisar ko control nahin karta.
- Polyglot first-class Python ke saath. Dapr Agents Python-first hai; wahi agent code JavaScript, Go, .NET, Java, ya PHP mein likhi services ke saath chal sakta hai baghair is ke ke koi doosra framework seekhe.
- Kubernetes par design se horizontally scalable. Apne cluster mein chalayein, ek managed offering (Diagrid Catalyst) mein, ya locally
dapr initke zariye. Scaling story har environment mein wahi architecture hai.
Imandar caveat: Dapr ek getting-started platform nahin. Use production mein chalane ka matlab hai Kubernetes, state store, pub/sub broker, placement service, observability, YAML components, sidecars. Woh kaafi operational surface hai jab aap ka goal abhi bhi patterns seekhna hai, isi liye yeh course Inngest par shuru hota hai: ek command, aur dashboard aa jata hai. Dapr ki taraf tab reach karein jab patterns aa chuke hon aur sawal infrastructure par organizational scale par chalane ki taraf shift ho jaye jise aap control karte hain.
Concepts pehle Inngest aur OpenAI Agents SDK par seekhein: fast feedback loop, minimal infrastructure, patterns par focus. Jab aap us scale par pahunchein jahan Kubernetes governance, polyglot teams, ya vendor-neutrality non-negotiable ban jayein, to wahi architectural patterns upar di gayi translation table ko apni key bana kar Dapr par lift ho jate hain. Patterns transfer hote hain; substrate badalta hai; jo aap ne is course mein seekha woh load-bearing knowledge rehta hai.
Yeh course (abhi) kya cover nahin karta
Jo worker aap ne banaya woh thesis ke set kiye Seven Invariants mein se chaar satisfy karta hai. Khaaskar: woh ek engine par chalta hai (Invariant 4, SandboxAgent), ek system of record ke against (Invariant 5, audit trail), duniya us ko call karne ke qaabil ke saath (Invariant 7, jo triggers aap ne add kiye), aur ek gated decision par human as principal ke saath (Invariant 1, jugwiyan: runtime mechanism yahan hai, broader architectural pattern baad mein). Baqi teen Invariants, aur woh broader architecture jo workers se ek workforce banata hai, baad ke courses hain. Ek-ek bullet:
- Invariant 2: Har human ko ek delegate chahiye. Edge par ek personal agent jo aap ka context rakhta hai, aap ki judgment represent karta hai, aur kaam workforce ko broker karta hai. Thesis maujooda realization ke taur par OpenClaw ka naam leti hai.
- Invariant 3: Workforce ko ek manager chahiye. Ek orchestrator jo kaam assign karta hai, budgets enforce karta hai, execution audit karta hai, hiring ko ek callable capability ke taur par expose karta hai. Thesis Paperclip ka naam leti hai.
- Invariant 6: Workforce policy ke neeche expandable hai. Ek meta-layer jahan ek authorized agent ek prompt generate karta hai, ek runtime provision karta hai, aur ek naya Worker register karta hai, baghair ek human ko jagaye. Claude Managed Agents ek realization hai.
Ek single worker jo events par jagta hai, durably chalta hai, aur humans par gate karta hai, woh us architecture ki sab se chhoti unit hai jo yeh course sikhata hai. Agla course us worker ko ek workforce mein extend karta hai: ek manager se coordinated multiple workers, on demand expandable, triggers se woken, spec se governed. Wohi OpenAI Agents SDK foundation, wohi audit habit, wohi Inngest nervous system. Architecture invariant hai.
Is mein waqai achha kaise banein
Yeh crash course parhna aap ko Production Workers banane mein achha nahin banata. Use karna banata hai. Aap worker bana kar shuru karte hain, friction ko mehsoos karte hain jab aap usay wrap karte hain, aur friction ke har tukre ko yeh sikhane dete hain ke woh kis concept se taalluq rakhta hai.
Is course ki mapping:
- "Mera function event aane par fire kyun nahin hota?" → event name typo ya namespace mismatch (Concept 3). Apne
TriggerEventmein event name string koinngest_client.sendwale se byte-for-byte compare karein. - "Mera function same logical event ke liye do dafa kyun fire hua?" → missing idempotency key (Concept 4). Event mein ek
id=ek deterministic seed ke saath add karein. - "Mere function ne ek deploy ke baad 'kaam lose' kyun kiya?" →
step.runke bahar kaam karta hua code (Concept 7). I/O aur side effects ko named steps mein wrap karein. - "Customer do dafa kyun charge hua?" → Stripe call
step.runke bahar thi, ya step name unique nahin tha (Concepts 6 aur 7). Call ko ek namedstep.runmein move karein; step name ko function ke andar globally unique banayein. - "OpenAI 9am peak par 429 errors kyun return karta hai?" → missing throttle (Concept 11).
throttle=Throttle(limit=N, period=timedelta(minutes=1))add karein. - "Ek customer ke bursts doosre customers ko kyun starve karte hain?" → missing per-key concurrency (Concept 12). Ek doosra
Concurrency(limit=2, key="event.data.customer_id")add karein. - "Mera HITL gate weekend par silently kyun fire hua?" → missing timeout handler jo audit mein likhta ho (Concept 15).
approval is Nonepar branch karein aur audit row explicitly likhein.
Architecture ek waqt mein ek piece banayein. Isi liye Part 4 saat prompts hai, ek nahin. Worker banayein (D0). Agent ko step.run (D1) mein wrap karein aur dekhein kya badalta hai jab aap jaan boojh kar mid-run crash karte hain. Use ek event par jagayein (D2). Cron fan-out add karein (D3), phir flow control (D4) jab aap ne waqai ek rate limit hit ki ho, phir durable approval gate (D5) jab ek high-stakes action ko waqai ek human chahiye ho. Har layer apni learning hai. Ek bare rewrite mein combine ho kar, woh ek wall hai.
Jo discipline yeh course sikhata hai (events par jaago, durably chalo, humans par gate karo, bugs par replay karo) woh architectural invariant hai. Jo bhi platform use implement kare, woh char-property contract wohi hai jis par aap waqai commit kar rahe hain. Yeh Lindy bet hai: aap un hisson par build karte hain jo lasting hain, plain functions, SQL, ek typed language, ek event bus, is season ka wrapper nahin. Product replaceable hai; discipline nahin.
Quick reference
Narrative course aur during-build reference ke darmiyan ek separator. Neeche wale sections search karne ke liye hain, top to bottom parhne ke liye nahin. Har concept ka one-line gist intro ke collapsed cheat sheet mein hai; yeh section during-build diagnostic, do decision trees, aur file layout hai.
Decision tree: trigger surface chunein
Jab duniya mein koi nayi cheez ho, wake-up kahan se aati hai?
- Ek external system ne hamein ek HTTP request bheji. → Webhook trigger. Source ko Inngest dashboard mein configure karein; payload ko transform ke zariye reshape karein; result wale event ko consume karein.
- Ek schedule kehta hai ke waqt aa gaya. → Cron trigger.
TriggerCron(cron="..."). UTC use karein; production crons tab bhi fire hote hain jab aap ki service mid-deploy ho. - Ek doosre Inngest function ne apne run ke dauran ek event emit kiya. → Event trigger.
TriggerEvent(event="ns/name.subtype"). Ek ya many functions ko same name par subscribe karein. - Ek interactive user ek immediate response ka wait kar raha hai. → Ek Inngest trigger nahin. Request/response ko apne normal web endpoint mein rakhein; agar response mein heavy work shamil ho, to request ke andar se ek event fire karein aur foran return karein, Inngest ko kaam asynchronously handle karne dein.
Decision tree: step primitive chunein
Maan lein ek function running hai aur aap ko kuch karna hai, aap kaun si step.* call ki taraf reach karte hain?
- Ek side-effecting call (API, DB, file write, agent invocation). →
ctx.step.run("name", fn, ...). Default. Success par memoized, transient failure par retried. - Ek serverless platform par ek long-running OpenAI call jo in-flight time ki bill karta hai. →
ctx.step.ai.infer(...). Inference ko Inngest ke infrastructure par offload karta hai taake aap ka function process deallocate kar sake. - Continue karne se pehle ek fixed duration wait karo. →
ctx.step.sleep("name", timedelta(...)). Durable; wait karte hue zero compute (free plan par saat din tak, paid par ek saal). - Ek external event ka wait karo (human approval, sibling-function completion). →
ctx.step.wait_for_event("name", event="...", timeout=..., if_exp=...). Durable; event aane par resume karta hai ya timeout parNonereturn karta hai. - Pure deterministic computation (ek string format karna, ek date compute karna). → Bas code likhein. Koi
step.runnahin chahiye; koi charge nahin.
File-location quick-ref
Ek flat project, chaar files, koi src/ nesting nahin:
ai-agent-nervous-system/
├── .claude/
│ └── skills/ # the four Inngest skills (installed in the Quick Win)
│ ├── inngest-setup/SKILL.md
│ ├── inngest-events/SKILL.md
│ ├── inngest-steps/SKILL.md
│ └── inngest-durable-functions/SKILL.md
├── db.py # Neon Postgres access: pooled asyncpg, load_customers, record (closed-vocabulary audit) (D0)
├── worker.py # the worker: SandboxAgent + 2 tools (D0)
├── inngest_app.py # the nervous system: Inngest functions + FastAPI host (D1-D5)
├── .env # OPENAI_API_KEY, DATABASE_URL, INNGEST_DEV=1
└── AGENTS.md # the base's rules file (read on open)
Customers aur audit trail aap ki Neon database mein rehte hain (Quick Win mein provisioned, D0 mein seeded), local files mein nahin. Worker (db.py, worker.py) D0 ke baad kabhi nahin badalta. Har nervous-system layer (D1 se D5 tak) inngest_app.py edit karti hai.
Diagnostic table, symptom → root cause → concept
| Symptom | First suspect | Concept to re-read |
|---|---|---|
| Function never fires when expected event arrives | Event name typo, namespace mismatch | C3 (webhooks), C5 (fan-out) |
| Function fires twice for the same logical event | Missing idempotency key | C4 (idempotency) |
| Function "lost work" after deploy | Code outside step.run doing the work | C7 (memoization) |
| Cron schedule did not fire over a deploy | Local dev server only, production runs on Inngest infra | C2 (cron) |
| Customer charged twice for one refund | Stripe call outside step.run, or step name not unique | C6 (step.run), C7 (memoization) |
| OpenAI rate-limit errors during 9am peak | Missing throttle | C11 (concurrency + throttle) |
| One customer's bursts starve other customers | Missing per-key concurrency | C12 (priority + fairness) |
| Function suspended forever, never resumed | Event name in wait_for_event does not match the event being sent | C8 (wait_for_event), C15 (HITL) |
| HITL timeout fired silently over the weekend | Missing timeout handler that writes to audit | D5 (durable refund gate), C15 (HITL) |
| Yesterday's failed runs disappeared from dashboard | Runs persist until manually replayed or after retention window | C14 (replay) |
| Replay re-charged customers | Replay is a fresh run that re-executes every step; the charge had no idempotency key | C4 (idempotency), C14 (replay is a fresh run) |
| Function trace does not show OpenAI prompt | Step trace shows function inputs/outputs but no LLM-specific prompt/token telemetry | C10 (Python uses step.run; LLM-specific telemetry needs your own OpenAI client tracing; step.ai.wrap's prompt-level traces are TypeScript-only) |
Appendix: optional lineage aur ek Inngest cheat sheet
Yeh course apne aap mein mukammal hai: Part 4 worker ko scratch se banata hai, is liye neeche kuch bhi prerequisite nahin. Context ke liye do chhote notes.
A.1: Agar aap Digital FTE course se aa rahe hain
From Agent to Digital FTE course ek richer customer-support worker banata hai: portable Skills, ek Postgres system of record, aur ek custom MCP server. Agar aap ne woh kiya, to aap ke paas pehle se ek SandboxAgent worker disk par baitha hai, aur aap D0 ka minimal floor skip kar sakte hain: nervous system (D1 onward) ko apne worker ki taraf point karein. Wrapping identical hai. Ek bonus: jo durable refund gate aap D5 mein banate hain (step.wait_for_event) woh us course ke optional Decision 10 ke hand-rolled run-state table ko replace kar deta hai, is liye aap usay delete kar sakte hain. Agar aap ne woh course nahin kiya, to is sab ko nazarandaaz karein; D0 aap ko woh sab deta hai jo chahiye.
A.2: Inngest-specific essentials jo yeh course use karta hai
Agar neeche kuch bhi unfamiliar lage, to Part 4 mein ghusne se pehle corresponding doc page skim karein.
- Inngest client instantiation. Ek single
inngest.Inngest(app_id=...)instance per Python project, ek module se exported aur jahan bhi aap functions decorate karein wahan imported. Python quick start. - Function decoration.
@inngest_client.create_function(fn_id=..., trigger=...). TriggerTriggerEvent,TriggerCron, ya multi-trigger functions ke liye dono ki ek list ho sakta hai. ctx.step.run,ctx.step.sleep,ctx.step.wait_for_event,ctx.step.ai.infer. Char step primitives jo Python mein jo aap likhenge us ka 90% banate hain. (TypeScript ke paas ek paanchwan hai,step.ai.wrap, LLM-specific tracing ke liye; Python projects AI calls ke liyestep.runuse karte hain.)inngest_client.send(events=[...]). Apne code mein kahin se bhi events emit karein (functions ke andar, agent tools ke andar, CLI scripts se). Idempotency ke liye ekid=use karein.- Dev server startup.
npx inngest-cli@latest dev.:8288par chalta hai. Dashboardhttp://127.0.0.1:8288par. MCPhttp://127.0.0.1:8288/mcppar. Agar:8288taken ho to woh8289+use karta hai; phir host parINNGEST_BASE_URL=http://127.0.0.1:<port>set karein taake woh follow kare, sirf MCP URL nahin.
A.3: Woh do shifts jo waqai mushkil hain
Is course ki sab se mushkil cheez Inngest ka syntax nahin. Woh request se event tak ka mental shift (Concept 1) aur in-process execution se durable execution tak ka (Concept 6) hai. Syntax mechanical hai jab yeh dono aa jayein. Agar kuch aur jitna mushkil hona chahiye us se zyada mushkil lage to pehle Concepts 1 aur 6 dobara parhein.