Skip to main content

Digital FTE banana: 4 ghante ka crash course

15 Concepts, real use ka 80% - Skills, System of Record, aur MCP

Yeh Mode 2 — Manufacturing track ka second course hai. Pichla course, Build AI Agents, aap ko sessions, guardrails, aur sandboxed deployment ke saath streaming chat agent se guzarta tha. End par aap ke paas agent tha. Woh kaam karta tha. Lekin terminal close karte hi sab kuch bhool jata tha.

Yeh course us agent ko Worker banata hai: aisa agent jo company ke andar waqai reh sake. Same SDK. Same SandboxAgent runtime, jis compute ko aap choose karein: starter UnixLocalSandboxClient par chalta hai (zero infrastructure, sirf API key), aur Docker, Cloudflare, E2B, ya Modal sab one-line client change ke saath identically kaam karte hain (Part 5 ka Swap guide dekhein). Teen naye pieces add hote hain:

  1. Skills — portable folders jinhein agent demand par load karta hai, bajaye is ke ke har capability Python mein hard-code ho.
  2. Neon Postgres database — real, durable memory jise agent read aur write karta hai.
  3. MCP (Model Context Protocol) — woh wire jo in dono ko connect karta hai.

Yaad rakhne wali one sentence. Agent ko Worker mein badalna do sawal hain: is ki capabilities kahan rehti hain? aur is ki memory kahan rehti hai? Pehle ka jawab Skills hain. Doosre ka jawab MCP ke zariye reached database hai. Is course ki har cheez ya in dono jawabon mein se aik hai, ya in ke darmiyan wiring.

Plain English mein yeh kyun matter karta hai

Chat agent aik single Python process mein rehta hai. Us ki capabilities code mein baked hoti hain. Us ki memory process restart hote hi disappear ho jati hai. Demo ke liye yeh fine hai. Lekin aise agent ke liye fine nahin jo jawab de: "maine is customer ko six weeks pehle kya bataya tha?" ya "kya hum yeh sawal pehle dekh chuke hain?"

Worker (woh term jo hamari thesis company ke andar operate karne wale agent ke liye use karti hai) dono maslon ko fix karta hai:

  • Is ki capabilities Skill folders mein rehti hain jinhein agent zaroorat par find aur load karta hai. Nayi capability add karne ke liye redeploy nahin chahiye.
  • Is ki memory aur state Postgres database mein rehti hai jise agent audit ke under read/write karta hai, aur learned material par vector search hoti hai.

Pichla course engine banata tha. Yeh course woh cheez banata hai jis ke against engine run karta hai.

State-and-trust extended: pichle course ne agent ko in-process state aur code-defined trust diya tha; yeh course state ko Postgres system of record tak extend karta hai aur trust ko discoverable Skills ki library tak, jise MCP ke zariye wire kiya gaya hai.

Agent Factory thesis mein yeh course kahan fit hota hai

Thesis Seven Invariants name karti hai jo har production agent system ko satisfy karne chahiye. Pichle course ne engine layer (Invariant 4) cover ki: OpenAI Agents SDK as the harness, sandbox as the compute. Yeh course do aur invariants plus engine ke teen pillars mein se aik cover karta hai:

  • Invariant 5 — har Worker system of record ke against run karta hai. Engine woh hai jis par Worker chalta hai; system of record woh hai jis ke against Worker chalta hai. Neon Postgres + pgvector is ki aik realization hai. Koi bhi durable, governed store kaam karta hai — hum Neon is liye use karte hain kyun ke start free hai, zero tak scale hota hai, aur official MCP server ship karta hai. Architecture invariant hai; product replaceable hai.
  • Skills capability layer. Skills open standard hain (originally Anthropic, ab ecosystem-wide agentskills.io par) jo Worker ki capabilities ko folders mein rehne deti hain, jinhein agent demand par discover, load, aur execute karta hai.
  • MCP as the connector. Pichle course ne MCP ko lightly use kiya tha. Yeh course MCP ko agent aur database ke darmiyan key pattern banata hai.

15-concept cheat sheet

Production failures aksar teen cheezon mein se kisi aik par trace hoti hain: Skill jo agent ko kabhi mili hi nahin, database jo waqai source of truth nahin, ya MCP wire jo data drop kar deta hai. Yeh table diagnostic hai.

#ConceptLayerYeh kis sawal ka jawab deta hai
1What an Agent Skill isSkillsReusable capability kahan rehti hai? Folder mein, SKILL.md plus optional scripts/references ke saath.
2Progressive disclosureSkillsSkills ko saath rakhna cheap kyun hai? Discovery -> activation -> execution sirf zaroorat ki cheez load karta hai.
3Writing a SKILL.mdSkillsSkill file mein asal mein kya hota hai? Metadata, trigger description, operational instructions.
4Skill packaging conventionsSkillsSkills tools ke darmiyan kaise travel karti hain? Same folder Claude Code, OpenCode, aur kisi bhi compliant client mein kaam karta hai.
5Composing skillsSkillsSmall skills ko filesystem handoff ke zariye kab chain karna hai vs. aik badi skill kab likhni hai.
6Why managed PostgresSystem of recordKaunsa store "system of record" kehlane ke qabil hai? Persistence, branching, governance, aur agent-required vector primitives wala store.
7The Worker's schemaSystem of recordAgent ko asal mein kaun se tables chahiye? Conversations, messages, documents, embeddings, audit log, capability invocations.
8pgvector basicsSystem of recordSemantic search Postgres mein kaise kaam karti hai? Embedding column, distance operators, index types.
9The embedding pipelineSystem of recordText queryable vector kaise banta hai? Chunking, embedding model, re-embed kab karna hai.
10Audit trail as disciplineSystem of recordWorker ke liye "reads and writes" ka kya matlab hai? Har action trace chhorta hai jise company replay kar sakti hai.
11What MCP is and isn'tMCPTools, resources, aur prompts ke liye protocol: framework nahin, service nahin.
12The Neon MCP serverMCPAgent ka database interface: kya expose karta hai, authenticate kaise karta hai.
13Connecting MCP to the Agents SDKMCPSDK ki MCP integration: server register kaise hota hai, model kya dekhta hai, trust boundary kahan rehti hai.
14Custom MCP serversMCPApna server kab likhna hai vs. sirf @function_tool use karna. Decision tree.
15MCP under loadMCPTransport choices, connection pooling, queue kab karna hai.

Jab yeh mapping samajh aa jaye, baqi mostly mechanics hai. Production failure in mein se kisi aik par trace hoti hai: Skill discover nahin hui (description vague), system of record par do Workers agree nahin kar rahe (schema race), ya MCP wire events drop kar raha hai (workload ke liye wrong transport). Diagnostic batata hai masla kahan hai.

Yeh course kis ke liye hai

Intermediate. Aap ke paas yeh hona chahiye:

  • Ideally Build AI Agents complete kiya ho. Recommended hai, required nahin: companion zip runnable starter ship karta hai jo us course ka end state reproduce karta hai, is liye aap us ke baghair bhi begin kar sakte hain. Bas agar kiya ho to aap better samjhein ge pieces is shape mein kyun hain.
  • Agentic Coding Crash Course wali Plan-mode aur rules-file habits.
  • Aik PRIMM-AI+ cycle ka experience.
  • Postgres ka working mental model (tables, rows, SQL).

Yahan aap Python haath se type nahin karte. Aap ka coding agent (Claude Code ya OpenCode) har code block likhta hai; aap ka kaam hai usay direct karna, jo woh produce karta hai use parhna, aur drift catch karna. Yeh Mode 2 discipline hai, pichle course jaisi. Parts 2 aur 3 dense ho jate hain (Pydantic models, asyncpg pools, chhota custom MCP server), is liye agent ke saath zyada back-and-forth expect karein. Full prerequisites aur glossary Appendix A.1 aur A.5 mein hain.

Currency

May 2026 tak current. openai-agents 0.17.x, current mcp SDK, Neon ke MCP server docs, aur pgvector 0.8+ ke against verified. Build karne ke baad versions apne pyproject.toml mein pin karein. Is course ki state-and-trust architecture products badalne se nahin badalti; products is saal ka best fit hain. Jab docs aur yeh page disagree karein, docs win karte hain: Cloudflare Sandbox tutorial aur Neon docs canonical hain. Part 5 ka Swap guide per-product alternatives list karta hai (Supabase, Pinecone, Cohere embeddings, LangGraph, Docker/E2B/Modal sandboxes, aur others).

Pichle course se kya same rehta hai, kya naya hai

Jo aap ab bhi use karte hain: Agent, Runner, @function_tool, sessions, streaming events, guardrails, aur Shell() / Filesystem() capabilities ke saath SandboxAgent. Sandbox khud (UnixLocalSandboxClient learning ke liye, phir Docker, Cloudflare, E2B, ya Modal) ab bhi har us cheez ke liye trust boundary hai jo agent execute karta hai, including Skill scripts jo shell commands chalate hain.

Naya kya hai: ab yeh primitives Skills library aur Neon system of record ke upar baithte hain jo MCP se connected hai. Pichle course ke stub tools aur SQLite session gone hain.

Architecture aik open SDK (OpenAI Agents SDK, MIT-licensed), aik open connector protocol (MCP), aur replaceable managed infrastructure (execution ke liye aap ka sandbox, storage ke liye Neon) combine karti hai. Har piece doosron ko change kiye baghair swap hota hai.

Yeh Python-first course hai, predecessor ki tarah. Skills format language-agnostic hai (SKILL.md Markdown file hai chahe aap ka agent Python ho ya TypeScript) aur MCP transport-agnostic hai, lekin Part 4 mein hum wohi Python chat-agent extend karte hain jo aap ne pichle course mein banaya tha.

Apna tool choose karein, page usi ke mutabiq chalega

Pichle course ka dual-tool pattern yahan continue hota hai. Jo sections Claude Code aur OpenCode ke darmiyan differ karte hain un mein switcher hai; ek choose karein aur page visits ke across sync rehta hai.

Non-tech leaders ke liye reading path (aap build nahin karein ge, aap ko team ka scope samajhna hai)

Agar CTO ya product partner ne aap ko yahan bheja hai taake aap next planning meeting mein achhe sawal pooch sakein, to poora chapter zaroori nahin. 25-minute path:

  1. Upar callout wali "one sentence to remember" (capabilities + memory).
  2. Skill asal mein kya hai: Concept 1. Do paragraphs.
  3. System of record kyun matter karta hai: Concept 6 opening (schema details skip karein).
  4. Audit-trail discipline: Concept 10 opening.
  5. Cost shape: Part 5 ka "Cost shape of a Worker". Real numbers: aik Worker ke liye $60-$240/month, 50 ki workforce ke liye $3K-$12K.
  6. Swap guide: Part 5 ka "Swap guide". Kaun se vendor choices strategic hain (architecture stays) vs. tactical (engineers pick).

In chhe cheezon ke baad planning meeting mein le jane layak teen sawal: (1) Hamare pehle Worker ko kaun si Skills chahiye, aur har aik kis existing playbook se aati hai? (2) Sab se chhota audit trail kya hai jo ship ho kar review pass kar sake? (3) Kaun se vendor commitments 12 months ke liye lock karna theek hai, aur kaun se swappable rehne chahiye?

Yeh chapter aap ka coding agent kaise use karta hai

digital-fte-crash-course.zip download karein aur unzip karein. Zip ab complete, runnable starter project contain karta hai, aisa patch nahin jo kisi existing project mein drop karna ho. Yeh pichle course ka end state reproduce karta hai: uv sync, .env mein OPENAI_API_KEY set karein, aur aap ke paas working sandboxed chat agent hai. Aap ka coding agent session start par AGENTS.md brief pick karta hai, aur woh brief all eight decisions ke across durable rules carry karta hai: MCP boundaries, audit discipline, embedding-model invariants, canonical action vocabulary.

Part 4 ke per-decision briefs phir surgical hain — sirf THIS step mein kya change hota hai (nayi file, naya tool, nayi wiring) — jabke AGENTS.md architectural rules ko turn after turn context mein rakhta hai.

Agar aap read ke bajaye build karna chahte hain, Parts 1-3 skim karein aur Part 4 par jump karein. Pichle course ka chat agent eight surgical decisions ke through customer-support Worker mein evolve hota hai, us course ke worked example jaisi same shape mein.

Puri architecture aik line mein. Engine = OpenAI Agents SDK + aap ka sandbox (UnixLocalSandboxClient learning ke liye, Docker real local isolation ke liye, Cloudflare production ke liye; agent code teeno ke across identical hai). Capability = Skills (Part 1). Truth = Neon Postgres + pgvector (Part 2). Connector = MCP (Part 3). Part 4 ke eight decisions in chaaron ko wire karte hain. Agar is document ka sirf aik sentence stick kare, woh yahi ho.


Fifteen-minute Quick Win: pehle ek dafa successful run karein, phir samjhein ke kyun kaam kiya

Un 15 concepts ko parhne se pehle jo explain karte hain ke yeh architecture kyun kaam karti hai, is ka sab se chhota working version build karein. Is section ke aakhir tak aap ke paas yeh cheezein hongi:

  • one Postgres table holding agent ki first durable write
  • one audit row recording what agent did, in the same transaction
  • one MCP boundary agent crossed to do it
  • ek Skill folder, jo baad mein activate hone ke liye ready ho jab aap same agent ko Claude Code ke andar se run karein
  • is sawal ka working jawab: "kya system of record + MCP boundary ne mere liye waqai kuch kiya?"

Yeh Part 4 ka worked example nahin hai; woh full Worker hai, eight Decisions aur hundreds of lines. Yeh sirf ek screen hai. Agar aap ke paas sirf aik sitting hai, yeh karein, phir concepts ke liye wapas aayen jab aap samajhna chahen ke har piece is shape mein kyun hai.

Shuru karne se pehle: starter project zip mein ship hota hai. digital-fte-crash-course.zip download karein, unzip karein, uv sync run karein, aur .env mein OPENAI_API_KEY set karein. Ab aap ke paas fully runnable sandboxed chat agent hai: pichle course ka same end state, scratch se reproduced. Pichla course required nahin, lekin Build AI Agents complete karna recommended, required nahin hai taake aap samajh sakein pieces is shape mein kyun hain. Quick Win phir Skill folder, MCP server, aur project ke cli.py mein ~30 lines add karta hai. Starter unzip hone ke baad yeh do uv commands hain (uv add, phir uv run).

One precondition. Skill discovery (.claude/skills/ scan karna, frontmatter parhna, descriptions model ko present karna) Claude Code, OpenCode, aur Codex mein client capability hai. OpenAI Agents SDK ka bare Agent(...) apne aap .claude/skills/SKILL.md nahin parhta. Quick Win standalone Python chat agent build karta hai (pichle course jaisi same cli.py shape) aur "Remember this: ..." ko MCP tool ki apni docstring ke zariye route karta hai. Step 1 ka Skill folder baad mein tab activate hota hai jab aap same agent Claude Code ke andar se run karte hain; Quick Win ke standalone Python path mein woh fire nahin hota, aur point yahi hai: MCP boundary akeli architecture prove karne ke liye enough hai.

Step 1. Starter chat-agent/ project mein aik Skill folder create karein, aik file ke saath:

mkdir -p .claude/skills/log-a-note
cat > .claude/skills/log-a-note/SKILL.md << 'EOF'
---
name: log-a-note
description: Saves a short text note to the durable notes table in Postgres, along with a timestamp. Use when the user says "remember this", "save a note", "log that", or otherwise asks for something to be persisted between sessions. Returns the row_id of the saved note.
---

# Log a note

When this skill activates:

1. Extract the note text from the user's message — everything after "remember:" or "save this:" or similar trigger phrases, or the whole message if no trigger phrase is used.
2. Call the `save_note` tool with the extracted text.
3. Reply with one short sentence confirming the save and citing the row_id.
EOF

Step 2. Ek Postgres table aur ek choti audit table fresh database ya Neon branch par banayein, aisi jagah nahin jahan pehle se tables hon. (Neon ka web console use karein, ya apne control wale kisi bhi Postgres par CREATE DATABASE; abhi pgvector ki zaroorat nahin.) notes aur audit_log naam itne generic hain ke existing schema se collide ho sakte hain, aur neeche wala CREATE TABLE empty target assume karta hai.

CREATE TABLE notes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
note_text TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
action TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Step 3. Ek tool expose karne wala 30-line MCP server likhein. Full pattern Concept 14 aur Decision 6 mein dobara aata hai; yeh smallest possible version hai:

# notes_mcp.py
import json
import os
import asyncpg
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("notes")
_pool = None

async def pool():
global _pool
if _pool is None:
_pool = await asyncpg.create_pool(os.environ["DATABASE_URL"])
return _pool

@mcp.tool()
async def save_note(text: str) -> dict[str, str]:
"""Save a note to the durable notes table. Returns the row_id."""
p = await pool()
async with p.acquire() as c:
async with c.transaction():
row_id = await c.fetchval(
"INSERT INTO notes (note_text) VALUES ($1) RETURNING id::text", text)
await c.execute(
"INSERT INTO audit_log (action, payload) VALUES ($1, $2::jsonb)",
"note_saved",
json.dumps({"row_id": row_id, "len": len(text)}),
)
return {"row_id": row_id, "status": "saved"}

if __name__ == "__main__":
mcp.run(transport="stdio")

Audit payload json.dumps() se guzarta hai, f-string se nahin. Yahan Quick Win values accident se injection-free hain (UUID aur int), lekin worked example user-influenced strings accept karega; safe pattern dono jagah same hai. row_id key load-bearing hai: Step 5 ki verification query payload->>'row_id' par join karti hai, is liye yeh key aap ke written payload mein honi chahiye.

Step 4. Agent build karein. Sab se pehle SDK, DB driver, aur MCP package install karein (starter SDK already ship karta hai, is liye practical taur par aap asyncpg aur mcp add kar rahe hain):

uv add openai-agents asyncpg mcp

Phir starter ke cli.py mein imports aur main() wiring aise dikhte hain (baqi cli.py starter jaisa ship hua hai waisa hi rehta hai):

# cli.py — Quick Win shape
import asyncio
import os

from agents import Agent, OpenAIChatCompletionsModel, Runner, set_tracing_disabled
from agents.mcp import MCPServerStdio
from openai import AsyncOpenAI

# Uncomment if your backend is not OpenAI (DeepSeek, OpenRouter, local proxy):
# set_tracing_disabled(True)


async def main():
# The OpenAI Agents SDK accepts any OpenAI-API-compatible backend.
# Set OPENAI_BASE_URL only when pointing at a non-OpenAI host.
client = AsyncOpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url=os.environ.get("OPENAI_BASE_URL"),
)
model = OpenAIChatCompletionsModel(
model=os.environ["MODEL_NAME"],
openai_client=client,
)

async with MCPServerStdio(
name="notes",
params={
"command": "python",
"args": ["notes_mcp.py"],
"env": {**os.environ}, # inherit PATH + DATABASE_URL into the subprocess
},
client_session_timeout_seconds=30, # default 5s is too short for first call to a remote Postgres
) as notes:
agent = Agent(
name="ChatAgent",
model=model,
mcp_servers=[notes],
instructions="You are a helpful assistant. Use the tools available to you.",
)
result = await Runner.run(
agent,
"Remember this: the production deploy needs a new env var before Friday.",
)
print(result.final_output)


if __name__ == "__main__":
asyncio.run(main())

Teen details naam lena zaroori hain. env={**os.environ} sab se common Quick Win failure ka practical fix hai: agar sirf DATABASE_URL pass karein to spawned subprocess PATH lose kar deta hai, is liye kuch shells par python notes_mcp.py bhi find nahi hota. Parent env inherit karein aur subprocess cleanly start hota hai. client_session_timeout_seconds=30 SDK ke 5-second default ko override karta hai - remote Postgres par first save_note call ke liye yeh default chhota hai, kyunke call ko TLS connection open karna, fresh pool se connection lena, aur transaction ke andar do inserts chalane hote hain. 5s par client give up kar ke retry karta hai; pehli call server-side commit ho chuki hoti hai aur duplicate rows ban jati hain. set_tracing_disabled(True) (upar example mein commented) SDK ki platform.openai.com trace-upload attempt ko silence karta hai. Real OpenAI run par isay commented rehne dein; backend kuch aur ho to uncomment karein.

Step 5. Run it.

export DATABASE_URL='postgresql://…/your_db?sslmode=require'
export OPENAI_API_KEY='sk-…' # OpenAI, DeepSeek, OpenRouter, whatever you have

# Pick any model your provider serves. Common choices:
# 'gpt-4o' or 'gpt-5.5' (OpenAI), 'deepseek-chat' (DeepSeek),
# 'meta-llama-3.1-…' (OpenRouter), etc.
export MODEL_NAME='gpt-4o'

# UNCOMMENT THIS UNLESS your OPENAI_API_KEY is from OpenAI itself.
# DeepSeek, OpenRouter, and other OpenAI-API-compatible providers need
# the base URL set, otherwise the SDK will hit api.openai.com with your
# non-OpenAI key and you'll see a 401.
# export OPENAI_BASE_URL='https://api.deepseek.com/v1'

# Smoke-test the key before the full run. This should print the start of a
# JSON model list; a 401 here means the key or OPENAI_BASE_URL is wrong, and
# you learn that in one second instead of four files deep.
curl -sS "${OPENAI_BASE_URL:-https://api.openai.com/v1}/models" \
-H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200; echo

uv run python cli.py

MCP tool ka docstring hi akela routing signal hai: model "Remember this: ..." ko save_note se match karta hai, usay call karta hai, aur server dono rows aik transaction mein write karta hai. One MCP boundary, one trigger word, one transaction (neeche figure full path trace karta hai).

Quick Win flow as a single horizontal pipeline: user prompt enters SDK, which calls list_tools() on notes_mcp.py and gets back save_note(text); model matches the prompt to that tool from its docstring; MCPServerStdio routes the call over stdio to notes_mcp.py, which writes one row to notes and one to audit_log in one transaction; a single SQL JOIN at the bottom verifies the two-row commit landed.

MCP tool ka docstring poora routing kaam kar raha hai. Aik transaction dono rows likhti hai. Aik JOIN commit verify karta hai.

Verification, aik SQL query mein:

SELECT n.id, n.note_text, n.created_at, al.action, al.payload
FROM notes n
JOIN audit_log al ON al.payload->>'row_id' = n.id::text
ORDER BY n.created_at DESC LIMIT 1;

Jab ek row mein aap ka note text nazar aaye aur joined audit row mein matching row_id ke saath "action": "note_saved" ho, to architecture kaam kar rahi hai. System of record truth hold kar raha hai. Agent ne likhne ke liye MCP boundary cross ki. Audit trail replay ho sakta hai.

Agar single prompt ke baad SELECT COUNT(*) FROM notes; 1 ke bajaye 2 return kare, to pehla save_note call client-side timeout hua aur SDK ne server-side retry kiya - fix Step 4 mein client_session_timeout_seconds=30 hai (edit karne ke baad dobara run karein). Upar wala verification JOIN LIMIT 1 ki wajah se yeh hide kar deta hai, isi liye count check karna useful hai.

Yeh ek Skill, one tool, two tables, and ~50 lines of new code. Part 4 worked example scales the same pattern to three Skills, ten tables, three MCP tools, and an embedding pipeline, but the shape is identical. If this quick win worked, the rest of the course is explaining why each piece is shaped the way it is and what changes when you scale it up.

Agar kuch kaam na kare, document ke end ke qareeb "When something feels wrong" table scan karein; yeh har common failure ko us concept se map karta hai jo usay explain karta hai. Phir yahan wapas aa jayein.


Part 1: Skills, capability as portable folders

Previous course ki agent had its capabilities baked into Python. Every tool was a @function_tool-decorated function in tools.py; every behavior was an instruction string in agents.py. That worked for one demo. It does not work for a workforce. Skills extract capability out of code and into folders agent discover karta hai, loads, and executes on demand: version-controlled, sharable across agents, addable without redeploying.

Concept 1: Agent Skill asal mein kya hai

Ek Agent Skill aisa folder hota hai jisme SKILL.md file plus optional scripts, references, aur assets hote hain. Folder hi skill hai; SKILL.md us ka entry point hai. Yeh format originally Anthropic ne release kiya tha aur ab agentskills.io par published open standard hai, jahan aaj Claude Code aur OpenCode do reference clients hain aur in ke ilawa compliant clients ki list grow ho rahi hai. Compatibility claims ke liye agentskills.io specification authoritative source hai.

Minimum viable skill:

hello-skill/
└── SKILL.md

With contents:

---
name: hello-skill
description: A skill that responds with a friendly greeting tailored to the user's name and the time of day. Use when the user asks to be greeted, says hello, or starts a conversation.
---

# Hello skill

When the user greets you or asks to be greeted:

1. Determine the current local time of day.
2. Compose a friendly, time-appropriate greeting that uses the user's name if available.
3. Keep the greeting under 25 words.

Examples:

- Morning + name known: "Good morning, Sara — hope your week is starting well."
- Evening + no name: "Good evening — happy to be of help."

Yeh complete, valid skill hai. Startup par client name aur description read karta hai; body sirf tab load hoti hai jab model kisi task ke liye skill select kare. No code. No deployment. No SDK call. A folder with a markdown file.

Yahi file-on-disk property skills ko tools ke across portable, teams ke across shareable, aur kisi bhi text artifact ki tarah version-controllable banati hai. Skills na Python objects hain jinhein aap import karte hain, na API endpoints jinhein call karte hain, aur na framework primitives jinhein instantiate karte hain.

PRIMM, Predict. Given the hello-skill folder above, what does agent load into context at startup, before any user message arrives? Three options: (a) the entire SKILL.md file, including the instructions; (b) just the frontmatter name and description, nothing else; (c) nothing, since Skills load only when invoked. Confidence 1 - 5.

Answer is (b). At startup, agent sirf read karta hai the metadata for every skill in its skill library. The full body (the instructions, the examples, the references) loads on demand. Yeh progressive disclosure, and it's the next concept.

Try with AI

I want to write my first Agent Skill. Give me three skill ideas
suitable for a customer-support agent. For each, write the frontmatter
(name + description) only. The description must be specific enough
that the agent knows EXACTLY when to invoke it — vague descriptions
like "Helps with tickets" don't count. After I review your three,
ask me which I'd refine first.

Concept 2: Progressive disclosure, skill loading ka three-stage model

Aik skill wala chat agent trivial context cost rakhta hai. Fifty Skills wala Worker nahin, jab tak loading model clever na ho. Progressive disclosure wahi clever part hai. Skills teen stages mein load hoti hain; har stage pichli se zyada expensive hoti hai, aur next stage sirf tab load hoti hai jab previous stage signal de ke skill relevant hai.

Stage 1, Discovery. At startup, agent load karta hai the name and description of every available skill. The spec recommends keeping this footprint tiny: roughly 100 tokens per skill. A Worker with fifty Skills pays roughly 5,000 tokens for discovery, every turn, every conversation. That's the cost of knowing what's in the library.

Stage 2, Activation. When model decide karta hai ke task matches a skill ki description, it loads the full SKILL.md body into context. The spec recommends keeping this under 5,000 tokens per skill. Most well-written Skills sit at 500 - 2,000 tokens. The activation cost is paid only on turns that actually use the skill.

Stage 3, Execution. If the skill ki instructions reference other files (a script under scripts/, a reference under references/, a template under assets/), those load only when agent explicitly reaches for them. Some Skills never load any references; deep technical Skills might load three or four.

Progressive disclosure timeline: at startup, only the names and descriptions of all Skills are loaded (cheap, paid every turn). On activation, the full SKILL.md body loads (medium, paid only on matching turns). At execution, referenced files load on demand (variable, paid only when reached for).

A short walkthrough of what this looks like in practice. Imagine aap ka customer-support Worker has 30 Skills in .claude/skills/, each with a SKILL.md. agent starts; the runtime reads each SKILL.md and extracts the YAML frontmatter only (name and description) for all 30 files. Roughly 3,000 tokens land in model's context as a table of "available Skills." The body of each SKILL.md is not loaded; the references under references/ are not loaded; the scripts under scripts/ are not even read. model now knows what tools it has, but not how to use any of them.

user sends a message: "Bring me up to speed on ticket TKT-1042." model scans its 30-description list, matches the summarize-ticket description (which mentions "ticket ID," "bring me up to speed"), and decides this skill applies. Now the runtime loads the full summarize-ticket/SKILL.md body, about 1,500 tokens, into context. model read karta hai the operational instructions: "extract the ticket ID, call lookup_customer, call find_similar_resolved_tickets, compose a five-section summary." It follows the instructions.

Step 3 of those instructions references references/style.md, the formatting guide for summaries. model us file tak jata hai for that file; the runtime loads it on demand. The other 29 Skills' bodies, the other reference files, the scripts under scripts/: none of it ever entered model's context. Only the metadata for all 30, plus the body and one reference for the one skill that fired, was paid for. The Predict below puts numbers on that gap.

PRIMM, Predict. A Worker has 30 Skills in its library. The average SKILL.md frontmatter is 80 tokens; the average body is 1,500 tokens; each skill has on average two reference files totalling 4,000 tokens of references. On a typical turn that activates one skill and reads one reference, what is the rough context cost of the skill system? Three options: (a) ~2,400 tokens (startup metadata + one activation + one reference); (b) ~6,900 tokens; (c) ~135,000 tokens (everything loaded all the time). Confidence 1 - 5.

Answer is (b), roughly 6,900 tokens: 30 x 80 tokens for discovery (2,400), plus 1,500 tokens for the one activated skill, plus ~2,000 tokens for one referenced file. The cost scales linearly with library size for discovery, and constant per-turn for activation and execution. Yeh what makes a skill library affordable. Without progressive disclosure, 30 Skills x 1,500 tokens of body x 4,000 tokens of references = 165,000 tokens per turn just to know what agent can do. Nobody runs that.

Two consequences for skill design:

  • Descriptions are the load-bearing field. They're what fires in Stage 1 and determines whether Stage 2 happens at all. A vague description ("helps with PDFs") fires too often and burns activation tokens for nothing; a sharp description ("extracts text and tables from PDFs, fills PDF forms, merges PDFs; use when working with PDF files") fires when it should and stays quiet when it shouldn't. Most skill failures are description failures.

  • Long SKILL.md bodies are expensive. Keep SKILL.md under ~500 lines, ideally 100 - 300. Move depth into reference files. The 500-line limit isn't arbitrary; it's the threshold where activation cost starts crowding out actual reasoning context.

Try with AI

I have a SKILL.md that's grown to 1,200 lines because I kept adding
edge cases. Walk me through how to split it. The skill is a
"customer-refund-issuance" skill: it has the core refund process,
five edge cases (international, partial, post-30-days, with-restocking-fee,
fraud-flagged), three reference policies (US, EU, UK), and a Python script
that calls the payment gateway. Help me decide what stays in SKILL.md,
what moves to references/, what moves to assets/, and what stays as scripts/.

Concept 3: SKILL.md likhna, skill ka model ke saath contract

SKILL.md file ke do parts hain: YAML frontmatter (contract) aur Markdown body (operational instructions). Frontmatter agent-facing API hai; body actual doing hai.

Frontmatter, by field.

FieldRequiredConstraintPurpose
nameYes1 - 64 chars, lowercase alphanumeric + hyphens, no leading/trailing/consecutive hyphens, must match folder nameThe skill ki identifier.
descriptionYes1 - 1024 chars, non-emptyThe trigger surface. What agent read karta hai at discovery to decide whether to invoke this skill.
licenseNoLicense name or path to bundled license fileWhat terms the skill ships under.
compatibilityNo<=500 charsEnvironment requirements (intended product, system packages, network access). Most Skills don't need this.
metadataNoArbitrary key-value mappingClient-specific extensions (author, version, etc.).
allowed-toolsNoSpace-separated tool listPre-approved tools the skill may use. Experimental; support varies.

Minimum viable frontmatter is two fields:

---
name: my-skill
description: One sentence explaining what this skill does and when to use it.
---

Zyada production-shaped frontmatter:

---
name: extract-meeting-notes
description: Extracts structured action items, decisions, and follow-ups from raw meeting transcripts. Use when the user provides a transcript, meeting recording text, or asks to summarize a meeting. Produces a markdown summary with explicit sections for Decisions, Action Items (with owner and deadline), and Follow-ups.
license: MIT
compatibility: Requires Python 3.12+ and access to the OpenAI API.
metadata:
author: panaversity
version: "1.2"
---

Description is the load-bearing field. Reread the spec's good vs. poor examples and internalize the difference:

Good: "Extracts text and tables from PDF files, fills PDF forms, and merges multiple PDFs. Use when working with PDF documents or when user mentions PDFs, forms, or document extraction."

Poor: "Helps with PDFs."

Achhi description what batati hai ("extracts text and tables, fills forms, merges"), when batati hai ("when working with PDF documents or when user mentions PDFs, forms, or document extraction"), aur specific keywords surface karti hai jin se model match kar sake ("PDFs," "forms," "extraction"). Poor description in mein se kuch nahi kehti. Description quality single biggest determinant hai ke aap ki skill jab fire honi chahiye tab fire hoti hai aur jab nahi honi chahiye tab quiet rehti hai.

The body, by convention. No format requirements; the spec says "Write whatever helps agents perform task effectively." But three sections show up in nearly every well-written skill:

  1. Step-by-step instructions. What agent does, in numbered steps, in operational language. Not narrative ("This skill is designed to help with refunds") but imperative ("Look up the order. Check the policy. Issue the refund."). Imperative Skills run better than narrative ones.

  2. Misaals of inputs and outputs. One or two short examples that show the expected shape of input and the expected shape of output. Misaals are worth roughly five times more than descriptions for steering model behavior. Don't skip them.

  3. Common edge cases. Do ya teen cases jo happy path se obvious nahin hote; aam tor par woh cases jo production mein waqai toot chuke hon. Edge cases apni jagah real failures se earn karte hain, imagined failures se nahin.

PRIMM, Predict. Two Skills have the same name field (summarize-document) but live in different folders: one in ~/.claude/skills/ (user-level) and one in .claude/skills/ (project-level). The current task matches both descriptions. What happens? Three options: (a) agent picks one at random; (b) the project-level skill wins because of folder precedence; (c) agent load karta hai both and lets model choose. Confidence 1 - 5.

Answer depends on the client, but the prevailing convention across Claude Code and OpenCode is (b), project-level wins over user-level. Same precedence pattern as CLAUDE.md resolution, agentic-coding crash course ki rules-file behavior, and most tool configuration. The principle: more specific context overrides more general context. Two Skills with the same name from the same folder is a different problem; that one is usually an error, since folder names must match name values.

Try with AI

You have two skills in the same project:
- name: lookup-customer
description: Looks up a customer record.
- name: get-customer-profile
description: Retrieves customer profile information.

A user asks: "What's John Smith's email?"

Walk through three steps with the AI:

1. For each skill, predict whether it WILL fire on this prompt.
Confidence 1-5.
2. If both fire, the model has to pick one — predict which it picks
and why.
3. Rewrite BOTH descriptions so exactly one fires on this prompt,
and the other fires only on a clearly different prompt. State the
prompt that SHOULD fire the other skill.

Concept 4: Skill packaging, where Skills live and how they travel

Skills filesystem artifacts hain. Isi wajah se woh portable hoti hain; isi wajah se un ki packaging conventions matter karti hain. Folder structure sahi ho to aap ki skill har compliant client mein kaam karti hai; ghalat ho to kisi mein nahi.

Har tool Skills kahan dhoondta hai.

ToolProject-levelUser-level (global)
Claude Code.claude/skills/<skill-name>/SKILL.md~/.claude/skills/<skill-name>/SKILL.md
OpenCode.opencode/skills/<skill-name>/SKILL.md~/.config/opencode/skills/<skill-name>/SKILL.md
OpenCode (fallback).claude/skills/<skill-name>/SKILL.md~/.claude/skills/<skill-name>/SKILL.md

The OpenCode fallback is the load-bearing piece for portability. OpenCode reads from its own .opencode/skills/ first, but falls back to .claude/skills/ if a skill isn't found there. Practical consequence: write aap ka skill once in .claude/skills/, and it works in both tools without modification. Yeh most concrete payoff of the open Agent Skills format. The same SKILL.md you ship to a Claude Code user works for an OpenCode user, byte-for-byte.

Directory ke hisaab se full folder structure.

my-skill/
├── SKILL.md # Required: frontmatter + body. The entry point.
├── scripts/ # Optional: executable code the skill can run.
│ ├── extract.py
│ └── normalize.sh
├── references/ # Optional: deep documentation, loaded on demand.
│ ├── REFERENCE.md
│ └── policies/
│ └── us-refund-policy.md
└── assets/ # Optional: templates, schemas, lookup tables.
├── report-template.md
└── status-codes.json

Each directory has a specific job:

  • scripts/ holds executable code agent can run. Python, Bash, JavaScript: language support depends on the client and the sandbox. agent invokes scripts by relative path (e.g., scripts/extract.py). Scripts should be self-contained, document their dependencies in a comment header, and handle edge cases gracefully because agent not.

  • references/ holds deep documentation agent load karta hai on demand. The convention: keep each reference file focused on one topic (finance.md, legal.md, error-codes.md) so agent load karta hai only what the current task needs. A 5,000-token reference file agent pohanchta hai once per session is cheap; a 50,000-token reference file is not. Keep references one level deep from SKILL.md. Avoid references/category/subcategory/file.md-style nesting; the spec is explicit on this.

  • assets/ holds static resources: document templates, configuration templates, lookup tables, schemas, images. These are consumed by agent (or by scripts agent runs) but not read as instructions.

file reference syntax inside SKILL.md. Use relative paths from the skill root:

For policy details, see [the US refund policy](references/policies/us-refund-policy.md).

To extract the structured data, run `scripts/extract.py`.

The output should follow the template in `assets/report-template.md`.

agent read karta hai the SKILL.md body, sees the path, and loads the referenced file on demand. No special syntax, no link resolution gymnastics, just relative paths.

Quick check. You have a skill at .claude/skills/refund-issuance/SKILL.md that references references/policies/us.md. The skill is being invoked while agent ki working directory is /home/user/projects/customer-support. Kahan agent look for the policy file? Answer: /home/user/projects/customer-support/.claude/skills/refund-issuance/references/policies/us.md; paths are relative to the skill ki directory, not agent ki working directory. Get this wrong and aap ka skill breaks in subtle ways under deployment.

Try with AI

I'm porting a skill from Claude Code to OpenCode and want to verify
my folder layout is correct. The skill is at .claude/skills/extract-meeting-notes/
and contains SKILL.md, scripts/parse.py, and references/glossary.md.

For each of these claims, tell me TRUE or FALSE and explain why:

1. Without any changes, this skill will be discovered by OpenCode.
2. If I move the folder to .opencode/skills/extract-meeting-notes/,
Claude Code will stop discovering it.
3. The script reference inside SKILL.md should use the absolute path
~/.claude/skills/extract-meeting-notes/scripts/parse.py.
4. If two skills have the same name and one is in .claude/skills/
and the other in ~/.claude/skills/, the user-level one wins.

Concept 5: Composing Skills, when to chain small Skills vs. write one big one

Skills compose hoti hain. "weekly customer-health report" capability aik giant skill ho sakti hai jo research, drafting, formatting, aur review aik shot mein kare. Ya yeh chaar Skills (research-customer-health, draft-customer-health-report, format-customer-health-report, review-customer-health-report) ho sakti hain jo filesystem ke through aik doosre ko hand off karti hain.

Dono kaam karte hain. In ki properties bohat different hain.

One big skill. Easier to discover (one description, one match). Lower discovery cost (one entry in the discovery stage). Tighter coupling: when the activation triggers, all the steps run, in order, in one context. Hard to test in isolation. Hard to reuse a piece elsewhere. When something fails halfway through, model has to recover with all of the now-irrelevant earlier work still in context.

Bohat si small Skills. Discovery cost zyada (aik ke bajaye chaar entries). Activation orchestration cost zyada (kisi cheez ko unhein chain karna hota hai). Coupling loose: har step independently test, replace, ya reuse ho sakta hai. Jab aik fail ho, failure localized rehta hai; previous steps ke artifacts already disk par hote hain. Har step ko fresh activation milti hai, yani accumulated context pollution nahin hoti.

Composing Skills: a monolithic &#39;customer-health-report&#39; Skill (top) runs four steps in one context with one activation, vs four small Skills (bottom) handing off through tmp/ files. The monolithic version is simpler to discover but accumulates context across steps; the chained version pays four activations but each one starts fresh, can be reused alone, and leaves intermediate artifacts on disk for debugging.

Practice mein kaam karne wala decision rule:

Aik skill likhein agar steps tightly coupled hain aur isolation mein rarely reuse hote hain. Bohat si Skills likhein agar koi bhi step kisi different workflow se apne aap invoke ho sakta hai, ya steps ke darmiyan context-isolation orchestration simplicity se zyada valuable hai.

Customer support karne wale Worker ke liye summarize-ticket shayad apni skill honi chahiye (insan use karte hain, escalation flow use karta hai, weekly digest use karta hai, audit replay use karta hai). escalate-to-tier-2 bhi shayad apni skill honi chahiye (escalation criteria aur tone requirements reuse hoti hain). Decomposition value-of-isolation versus cost-of-orchestration hai; do ya teen composed steps ke baad isolation usually jeet jata hai.

Filesystem handoff pattern. Jab Skills compose hoti hain, sab se clean handoff filesystem hota hai, conversation nahi. Skill A apna output tmp/research-customer-{id}.md mein likhti hai. Skill B invoke hoti hai, tmp/research-customer-{id}.md se parhti hai, aur tmp/draft-customer-{id}.md mein likhti hai. Skill C draft parhti hai aur report likhti hai. Conversation sirf final report dekhti hai; intermediate artifacts disk par rehte hain jahan agent unhein dobara parh sakta hai, human inspect kar sakta hai, aur audit trail baad mein unhein dhoond sakta hai. Yeh wohi pattern hai jo previous course ne subagent output isolation ke liye use kiya tha: same insight, skill granularity par apply hua.

customer-health-pipeline/
├── tmp/ # ephemeral handoff dir
│ ├── research-customer-{id}.md # skill A output
│ ├── draft-customer-{id}.md # skill B output
│ └── reviewed-draft-customer-{id}.md # skill C output
└── output/
└── customer-health-{id}.pdf # skill D final

Yeh also where the next two parts of the course start to matter. Some skill handoffs don't belong on filesystem: they belong in system of record. The "customer health" snapshot from Skill A doesn't just feed Skill B; it's also a record the company needs to keep, to query later, to find similar cases against, to audit if the customer complains six weeks later. A skill that writes to a temp file is a draft. A skill that writes to system of record hai an action. That distinction is what Part 2 builds.

Try with AI

Compare two design approaches for a customer-refund-issuance workflow:

Design A: One big skill called "issue-refund" that handles eligibility check,
policy lookup, amount calculation, gateway call, ticket update, and customer
notification.

Design B: Five small skills (check-refund-eligibility, lookup-refund-policy,
calculate-refund-amount, call-payment-gateway, update-support-ticket,
notify-customer) chained via filesystem handoffs in tmp/.

For each design, name (1) one situation where it's the right choice and
(2) one specific failure mode it's vulnerable to. Then tell me which
design you'd ship and why.
Writing Skills for less-disciplined models

concepts 1-5 describe how Skills should work when model is a strong instruction-follower (Claude Sonnet/Opus, GPT-5-class, Gemini 2.5 Pro). With a smaller or cheaper model (deepseek-chat, Haiku-class, Llama-70B, Mistral, most local models), three things drift:

  1. Multi-skill sequencing. SKILL.md imperatives like "ALWAYS run this BEFORE drafting" or "call X, then Y, then Z" land reliably on the strong models and unreliably on the weaker ones. The fix is to add a short GENERAL-FLOW preamble in agent ki system prompt spelling out the order. SKILL bodies stay declarative; system prompt provides the orchestration scaffold.
  2. Format drift. A weaker model will silently add emojis, markdown tables, "Action Taken" headers, or paraphrase aap ka inputs even when the SKILL body says "output as five paragraphs, no tables, preserve user's text verbatim." Be more explicit on a weaker model: list what NOT to do, not just what to do.
  3. Trigger blindness. Descriptions that fire on "summarize ticket TKT-1042" may miss "what's the story on #1042" on a smaller model. Write descriptions with more concrete trigger phrasings (Concept 3's discipline matters more, not less, when model is weaker).

Rule of thumb: budget the strong model's effort into the SKILL.md, budget the weak model's effort into system prompt. The fact that the architecture survives this drift with extra scaffolding is itself a vote for the architecture: the SKILL/MCP/SoR boundaries are stable; only the prompting around them changes.


Part 2: Neon Postgres + pgvector as system of record

Part 1 gave agent capabilities it can load on demand. Those capabilities are useless without state to work against: the customer history, the prior conversations, the policy library, the audit log. Previous course ki agent kept state in a local SQLite file (the session DB) and stub data in Python lists. That works for a demo. It doesn't work for a Worker that has to answer "what did we tell this customer six weeks ago?" or "have we seen this question before, and what did the resolution look like?"

Yeh Part dono ko system of record se replace karta hai: thesis ki architectural term us "authoritative store of state" ke liye jahan se workforce read karti hai aur jahan write karti hai. Hum pgvector extension ke saath Neon Postgres use karte hain. Architecture invariant hai; Neon product hai. Koi bhi durable, addressable, governed Postgres requirement satisfy karta hai.

First pass ke liye skim path. Part 2 dense hai (six tables, SQL, embedding pipelines, vector index math). Agar aap ka Postgres rusty hai, to Concepts 6 aur 7 shape samajhne ke liye parhein aur 8-10 skim karein. Minimum-viable Worker ko day one par tamam six tables ki zaroorat nahi hoti: messages + embeddings architecture kaam karti mehsoos karne ke liye enough hain. Jab real library embed karni ho to documents add karein; jab pehli dafa "agent ne last Tuesday 3am par kya kiya?" ka jawab chahiye ho to audit_log + capability_invocations; aur jab ek user ki multiple conversations hon to conversations. Neeche six-table schema woh jagah hai jahan aap end up karte hain, woh jagah nahi jahan shuru karte hain.

Concept 6: Why managed Postgres, and why Neon specifically

Thesis is product-agnostic about systems of record: "the AI-Native company's existing databases, workflows, and operational platforms (CRMs, ERPs, ticketing systems, data warehouses, ledgers) serve as system of record." For agent you're building from scratch, though, aap ko chahiye to pick something. The question is not "Postgres vs. MongoDB vs. a vector DB"; it's "which Postgres."

Dedicated vector database ke bajaye Postgres kyun. Teen reasons jo 2026 mein bhi hold karte hain.

  1. One database, one transaction, one auth boundary. A vector DB plus a relational DB means two stores to keep consistent, two auth systems to scope, two backup pipelines to maintain. The pgvector extension puts vectors next to user records, ticket records, and audit logs they're related to: a JOIN is a JOIN, not a network hop between two eventually-consistent services. Every major managed Postgres provider (AWS RDS, Google Cloud SQL, Azure, Supabase, Neon) ships pgvector by default; Neon alone reports 30,000+ databases running it. For most workloads, it's the empirical answer to "is this enough." It almost always is.

  2. Postgres hard parts already karta hai. Transactions, indexes, foreign keys, row-level security, point-in-time recovery, query planning. Dedicated vector DB ko yeh sab scratch se invent karna padta hai aur aksar kuch cheezein worse karta hai. Default boring choice ke compounding advantages hote hain.

  3. MCP servers exist for Postgres at every layer. Neon ships one (for management). General Postgres MCP servers exist (for SQL execution). aap likh sakte hain apna (for scoped runtime access). The MCP ecosystem around Postgres is the most mature.

Neon specifically kyun: teen differentiators.

  • Scale-to-zero ke saath serverless. Idle hone par free tier kuch cost nahin karta. Jo Worker roz 50 conversations handle karta hai, woh apna zyada waqt $0 cost karte hue guzarta hai, provisioned instance ke $50/month par nahin. Multi-Worker AI-native company ki economics ke liye yeh critical hai, kyun ke bohat se Workers bursty hote hain.

  • Branching. A Neon database branches in seconds: a full copy-on-write clone of aap ka production data, ready to query. The intended use is dev/test environments and migration safety; agent-relevant use is letting agent experiment on a branch without touching production. A migration that goes wrong on a branch is reverted by deleting the branch. The same operation on a non-branchable database is a restore from backup.

  • First-class MCP integration. Neon ships an official MCP server (remote at https://mcp.neon.tech/mcp, OAuth-authed) that exposes project management, branch lifecycle, SQL execution, and migration tooling to any MCP client. We'll use this for development. Critically (and we'll return to this) Neon explicitly does not recommend the Neon MCP server for production runtime use. It's a powerful natural-language interface; production agents talk to Postgres through narrower, scoped MCP servers you write khud, or through direct connections. The distinction matters; Part 3 makes it operational.

Quick check. Agla paragraph parhne se pehle teen claims ko True ya False mark karein: (a) Neon MCP server production AI Workers ke runtime database connection ke liye intended hai. (b) Production se branched Neon database apne andar already saara production data le kar start hota hai. (c) Jis Neon database ne aik ghante se query receive nahin ki, woh free tier par ab bhi aap ko paisa cost kar raha hai.

Jawabat: (a) False: Neon ki apni documentation kehti hai MCP server sirf development/testing ke liye hai. (b) True: branches copy-on-write hoti hain, is liye branch data movement ke baghair production ke logical clone ke taur par start hoti hai. (c) False: scale-to-zero ka matlab idle databases free tier par kuch cost nahin karte. Yeh teen jawabat batate hain ke Neon SoR ke taur par operationally kyun fit hota hai.

Try with AI

Your customer-support Worker handles EU customers. Your legal team
says: "All customer data must stay in Frankfurt." Neon's free tier
offers US-East regions only; their paid tier offers EU regions.

For each of these three approaches, decide GO or NO-GO and give ONE
sentence of justification:

A) Use Neon's paid tier in an EU region (pay-as-you-go, typically $25–75/month for a single Worker per [Neon's pricing](https://neon.com/pricing)) and ship.
B) Use Postgres on a Frankfurt VPS you manage yourself (no pgvector
MCP server, no branching, no scale-to-zero) and ship.
C) Use Neon's free US tier; embed everything client-side before
sending; tell legal it's "encrypted at rest."

Then say which one you'd actually pick and what you'd ask legal
before committing.

Concept 7: The Worker's schema, what tables agent actually needs

Worker ka system of record sirf "conversations kahin store kar do" nahi. Yeh structured schema hai jo thesis ke mutabiq har Worker ki chaar cheezen support karta hai: read truth, write outcomes, leave traces, find similar prior work. Six tables 90% case cover karti hain. Domain specifics ke liye aap aur add kar sakte hain; load-bearing six yahi hain.

-- 1. CONVERSATIONS: the top-level unit of work
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ended_at TIMESTAMPTZ,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
-- searchable summary; populated by the agent at conversation end
summary TEXT
);
CREATE INDEX idx_conversations_user ON conversations(user_id, started_at DESC);

-- 2. MESSAGES: the turns of a conversation
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'tool', 'system')),
content TEXT NOT NULL,
-- token counts let you reason about cost without re-counting
input_tokens INT,
output_tokens INT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_messages_conv ON messages(conversation_id, created_at);

-- 3. DOCUMENTS: the agent's reference library
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source TEXT NOT NULL, -- 'policy_library', 'kb_article', 'past_case', etc.
title TEXT NOT NULL,
body TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_documents_source ON documents(source);

-- 4. EMBEDDINGS: vector representations of documents AND past conversations
CREATE TABLE embeddings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- one of these is populated; the other is NULL
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
chunk_text TEXT NOT NULL,
chunk_index INT NOT NULL,
embedding VECTOR(1536) NOT NULL,
model TEXT NOT NULL, -- 'text-embedding-3-small', etc.
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CHECK (
(document_id IS NOT NULL)::int + (conversation_id IS NOT NULL)::int = 1
)
);
-- the load-bearing index for semantic search; see Concept 8
CREATE INDEX idx_embeddings_hnsw
ON embeddings USING hnsw (embedding vector_cosine_ops);

-- 5. AUDIT_LOG: every action the Worker takes, replayable forever
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
conversation_id UUID REFERENCES conversations(id) ON DELETE SET NULL,
actor TEXT NOT NULL, -- 'worker:customer-support', 'system', etc.
action TEXT NOT NULL, -- see "canonical action vocabulary" below
target TEXT, -- table name, skill name, etc.
payload JSONB NOT NULL, -- the data of the action
result JSONB, -- what happened
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_audit_conv ON audit_log(conversation_id, created_at);
CREATE INDEX idx_audit_action ON audit_log(action, created_at);

-- 6. CAPABILITY_INVOCATIONS: every skill or tool call, for replay and metrics
CREATE TABLE capability_invocations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
capability TEXT NOT NULL, -- 'skill:summarize-ticket', 'tool:search_docs', etc.
arguments JSONB NOT NULL,
result JSONB,
status TEXT NOT NULL CHECK (status IN ('ok', 'error', 'timeout')),
latency_ms INT,
cost_cents INT, -- approximate cost in 1/100 cents
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_cap_conv ON capability_invocations(conversation_id, created_at);

Worker schema: six tables. Conversations and messages hold the dialogue. Documents and embeddings hold the reference library and its vectors. Audit log and capability invocations hold the trace. Foreign keys link them into one consistent system of record.

Design choices par kuch notes, kyun ke har aik load-bearing hai:

  • Documents aur conversations dono ke liye aik embeddings table. A CHECK constraint ensure karta hai ke document_id ya conversation_id mein se exactly aik set ho. Is se aap aik query mein "policies AND past conversations" ke across semantic search kar sakte hain; sawal "kya hum is ka jawab pehle de chuke hain?" ko do nahin, aik index milta hai.

  • audit_log uses BIGSERIAL, not UUID. Audit rows are written constantly; the sadar integer key keeps inserts tez and ordering trivial. The other tables use UUID because rows leave database (in API responses, in URLs) and UUIDs avoid leaking row counts.

  • capability_invocations separates Skills from tools. A skill invocation and a @function_tool invocation are conceptually similar but operationally different (different code paths, different cost profiles, different failure modes). Storing them in one table with a capability discriminator lets you query both together when asking "what did agent do?" while still splitting them when asking "what did Skills cost vs. tools?"

  • metadata JSONB columns are escape hatches. The schema can't predict every domain-specific field aap ko chahiye; JSONB lets you add fields without migrations. Use sparingly: fields that appear in many queries should be promoted to columns.

Yeh minimum operational schema for a Worker: three tables for the work (conversations, messages, documents), one for the embeddings, two for the trace (audit_log, capability_invocations). aap add more for domain specifics: a customers table, a tickets table, a policies table with versioning. Those are the same shape: relational data agent read karta hai and writes via MCP.

PRIMM, Predict. A Worker handles 200 conversations/day, each averaging 10 messages, with 30% triggering one skill invocation and 50% writing two audit rows beyond the skill row. After one month (30 days), roughly how many rows are in each of the six tables? Three options: (a) similar volumes across all six (~6,000 - 60,000 each); (b) audit_log dwarfs the others by 10x - 50x; (c) embeddings dwarfs everything because each message gets embedded. Confidence 1 - 5.

Answer is (b): audit_log will be the largest by a wide margin because every interaction produces multiple audit rows: message sent, skill invoked, db write, db write again, conversation closed, etc. A rough estimate: conversations ~6,000; messages ~60,000; capability_invocations ~1,800; audit_log probably 150,000 - 300,000. Plan aap ka retention and indexing accordingly; audit_log is the table aap partition first as you grow.

Try with AI

I want to extend this schema for a customer-support Worker that
handles software bug reports specifically. What three additional
tables would you add, and what columns would they have? For each,
say what the agent will use it for (read access? write access?
both?) and what foreign keys connect it to the six core tables.

Concept 8: pgvector basics, types, distance operators, indexes

Upar wali embeddings table agent ki memory ko semantically searchable banati hai. Is ke peeche technology pgvector hai: Postgres extension jo vector data types aur similarity-search operators add karta hai. Yeh har major managed Postgres provider par default ship hota hai aur zyada tar workloads ke liye "kya separate vector DB use karun?" ka default-choice jawab hai: nahi.

The vector type. VECTOR(n) is a fixed-length floating-point column. n is the dimensionality of aap ka embeddings: 1536 for OpenAI's text-embedding-3-small, 3072 for text-embedding-3-large, varies for other models. The dimension must match model that produced the embedding. Mixing dimensions is the most common pgvector bug: insert 1536-dim vectors from model A, query with 1536-dim vectors from model B, and the results are nonsense even though every query "works."

2000 se upar dimensions ke liye HALFVEC type half-precision floats use karta hai (32-bit ke bajaye 16-bit, yani har value mein kam precision) aur minor recall cost ke saath storage lagbhag aadhi kar deta hai. Hamare 1536-dim case ke liye plain VECTOR(1536) fine hai.

Teen distance operators. pgvector exposes three ways to measure how similar two vectors are. Pick one for aap ka use case and stick with it; mixing is a recipe for confusion.

OperatorNameWhat it measuresKab use
<->L2 distance (Euclidean)Straight-line distance in n-dimensional spaceImage embeddings, geometric similarity
<#>Negative inner productDot product (negated)When aap ka embedding model produces un-normalized vectors and you care about magnitude
<=>Cosine distanceAngle between vectors, regardless of magnitudeText embeddings: the default for our case

OpenAI ke text-embedding-3-small aur text-embedding-3-large normalized vectors produce karte hain, jis ka matlab cosine distance aur Euclidean distance equivalent rankings dete hain. Text embeddings ke liye convention ke taur par cosine distance (<=>) use karein; yeh sab se common, sab se documented hai, aur is ke index ops vector_cosine_ops named hain.

A semantic-search query, in full:

-- Find the 5 documents most similar to the user's question
SELECT d.id, d.title, d.body,
e.embedding <=> $1 AS distance
FROM embeddings e
JOIN documents d ON d.id = e.document_id
WHERE e.document_id IS NOT NULL -- excludes conversation embeddings
ORDER BY e.embedding <=> $1 -- smaller distance = more similar
LIMIT 5;

$1 here is the embedding of user's question, generated at query time. The <=> operator is what makes this O(n) into O(log n) given the right index.

Do index types: HNSW aur IVFFlat. pgvector ke paas do practical index choices hain. 2026 tak recommendation converge ho chuki hai:

  • Start with HNSW unless you have a reason not to. Yeh graph-based hota hai (database query time par in-memory neighbor graph walk karta hai), build slowly hota hai lekin queries fast hoti hain, operator support broad hoti hai, aur performance predictable rehti hai. New projects ke liye default.
  • Use IVFFlat if build time matters more than query time. Yeh partition-based hota hai (vectors buckets mein cluster hote hain aur phir turn by turn search hote hain), HNSW se 5-6x faster build hota hai, lekin queries slower hoti hain. Jab index frequently re-build karna ho ya insert-heavy workload ho to useful.
  • DiskANN exists (Timescale ki pgvectorscale extension ke zariye, separate add-on) un bohat large indexes ke liye jo RAM mein fit nahi hote. Aap ko almost certainly is ki zaroorat nahi.

Upar wale schema ka HNSW index:

CREATE INDEX idx_embeddings_hnsw
ON embeddings USING hnsw (embedding vector_cosine_ops);

Two tuning knobs aap set if you tune at all: m (the number of connections per node, default 16) and ef_construction (the build-time effort, default 64). The defaults are sensible for most workloads; only touch them when you've benchmarked.

Quick check. Three claims to mark True or False. (a) aap ke paas ho sakte hain multiple HNSW indexes on the same embedding column, one per distance operator. (b) Inserting a vector into a table with an HNSW index incurs more cost than inserting into a table with no vector index. (c) An HNSW index can be created before any data is loaded into the table. Answers: all three are True. Pgvector lets you index for multiple distance operators (rare but possible), the index maintenance cost is real (which is why some workloads prefer batch-insert-then-index), and HNSW (unlike IVFFlat) doesn't need a training step.

Try with AI

Two scenarios. For each, pick HNSW or IVFFlat and justify with one
specific property of the index:

Scenario A: A research index of 10M scientific papers. Built once,
queried millions of times. Build time is "whatever it takes —
overnight is fine." Query latency directly affects user experience.

Scenario B: A live index of customer support tickets that's
re-indexed every 4 hours because thousands of new tickets stream in.
Query patterns are simple (top-5 nearest neighbors). The current
HNSW build takes 20 minutes — a third of the re-index cycle.

After you answer: name ONE thing that would change your answer for
each scenario. Be specific about what you'd need to see in
production metrics before switching.

Concept 9: The embedding pipeline, text in, queryable vector out

Embeddings magic se appear nahin hoti; aap embedding model call kar ke unhein generate karte hain. Pipeline straightforward hai, magar har step par decision point hai jo ghalat ho to kaat leta hai.

The four-step pipeline.

  1. Document ko itne chhote pieces mein chunk karein ke coherently embed ho sake.
  2. Embedding model call kar ke har chunk ko embed karein.
  3. Chunk text, embedding, aur metadata ko embeddings table mein store karein.
  4. Query by embedding user's question and finding nearest neighbors.

The embedding pipeline: raw policy document (top-left) is chunked at semantic boundaries into ~400-token pieces with overlap; each chunk is batch-embedded into a 1536-dim vector via OpenAI&#39;s text-embedding-3-small; vectors are stored as VECTOR(1536) rows in the embeddings table with an HNSW index. At query time (bottom), user&#39;s question goes through the same embed step, then a single SQL query with the cosine-distance operator finds the top-5 nearest chunks. Three traps to avoid are called out at the bottom: model mismatch, wrong chunk size, mixing sources without filter.

Chunking. Embedding models have token limits: OpenAI's text-embedding-3-small accepts up to 8,191 tokens per chunk, far more than you typically want. The question isn't "what's the maximum" but "what's the chunk size that preserves the unit of meaning user will search for." Three rules that work in practice:

  • Chunk at semantic boundaries when aap kar sakte hain. Section headings, paragraph breaks, structural markers in aap ka source. A chunk that ends mid-sentence retrieves badly.
  • Target 200 - 500 tokens per chunk for most text. Long enough to carry meaning, short enough to be specific. documents with longer chunks tend to "match everything weakly" rather than "match the right thing strongly."
  • Include 10 - 20% overlap between chunks for chunks that fall across natural boundaries. The overlap costs storage but improves recall when user's question lies near a chunk boundary.
  • Don't chunk what's already short. documents already under ~300 tokens (single resolved tickets, short policy paragraphs, FAQ entries) don't need chunking; embed them as one piece. The chunker below is shaped for longer source documents (policy PDFs, knowledge-base articles, conversation transcripts). For the worked example's seed corpus of short resolved tickets, aap skip kar sakte hain the chunker and embed each ticket as a single chunk (Decision 5 stores each ticket as a documents row whose body is summary || '\n' || resolution, and embeds that).

Worked example ke liye typed chunking function:

# src/chat_agent/embedding/chunker.py
from dataclasses import dataclass

@dataclass(frozen=True)
class Chunk:
text: str
index: int
source_offset: int # character offset in the original document

def chunk_text(
text: str,
target_tokens: int = 400,
overlap_tokens: int = 60,
) -> list[Chunk]:
"""Split text into overlapping chunks at paragraph boundaries.

Approximates tokens as words * 0.75 — fine for chunking; not for
actual token counting. Use tiktoken for precise counts.
"""
paragraphs: list[str] = [p.strip() for p in text.split("\n\n") if p.strip()]
target_words: int = int(target_tokens / 0.75)
overlap_words: int = int(overlap_tokens / 0.75)

chunks: list[Chunk] = []
current: list[str] = []
current_word_count: int = 0
source_offset: int = 0

for para in paragraphs:
para_words: int = len(para.split())
if current_word_count + para_words > target_words and current:
chunks.append(Chunk(
text="\n\n".join(current),
index=len(chunks),
source_offset=source_offset,
))
# carry overlap forward
overlap_chunk: list[str] = []
overlap_count: int = 0
for prev in reversed(current):
if overlap_count + len(prev.split()) > overlap_words:
break
overlap_chunk.insert(0, prev)
overlap_count += len(prev.split())
current = overlap_chunk
current_word_count = overlap_count
source_offset += sum(len(p) for p in current) + 2 * len(current)
current.append(para)
current_word_count += para_words

if current:
chunks.append(Chunk(
text="\n\n".join(current),
index=len(chunks),
source_offset=source_offset,
))
return chunks

Embedding. Call model with batched chunks (the OpenAI API accepts arrays of inputs in one request, much more efficient than one call per chunk):

# src/chat_agent/embedding/embedder.py
from openai import AsyncOpenAI

EMBEDDING_MODEL: str = "text-embedding-3-small"
EMBEDDING_DIM: int = 1536 # the table column must match

async def embed_chunks(
client: AsyncOpenAI, chunks: list[str],
) -> list[list[float]]:
"""Embed a batch of chunks. Returns one vector per chunk, in order."""
response = await client.embeddings.create(
model=EMBEDDING_MODEL, input=chunks,
)
return [item.embedding for item in response.data]
pgvector ko asyncpg ke saath register karein, warna INSERTs silently fail ho sakti hain

embed_chunks jo vectors return karta hai woh Python list[float] hote hain. Out of the box, asyncpg ko nahi pata hota ke pgvector ka VECTOR(1536) column inhein vector literal ke taur par serialise chahta hai, is liye woh inhein Postgres real[] array ke taur par bhejta hai aur INSERT ya to expected vector, got list[float] error deta hai ya, worse, aisi value silently store kar deta hai jo query time par fail hoti hai. Embeddings likhne se pehle har connection par type aik dafa register karein:

from pgvector.asyncpg import register_vector

# Option A: register on every connect
conn = await asyncpg.connect(NEON_DATABASE_URL)
await register_vector(conn)

# Option B (recommended for production): pool init callback so every pooled
# connection registers automatically, no chance of forgetting.
pool = await asyncpg.create_pool(
NEON_DATABASE_URL, init=register_vector,
)

Yeh sirf un connections par chahiye jo embedding column read ya write karte hain. Decision 5 neeche pool form use karta hai; Decision 6 ka MCP server bhi same pattern use karta hai.

What if your inference provider isn't OpenAI? OpenAI hi woh major provider hai jo first-class embeddings API bhi ship karta hai. Agar aap inference DeepSeek, Anthropic, Gemini, ya local model ke through route karte hain, to aap ke paas chaar options hain. VECTOR(n) column create karne se pehle aik choose karein: column dimension embedding model se match karni chahiye.

ModelDimensionsCost (input, per 1M tokens)Where it livesKab use
text-embedding-3-small1536$0.02OpenAI (and OpenAI-compatible aggregators like OpenRouter)The default if you have an OpenAI key. Cheap, tez, good for most retrieval.
text-embedding-3-large3072$0.13 (model card) / $0.065 (pricing page)OpenAIWhen you've measured -small underperforming.
embed-english-v3 / embed-multilingual-v31024$0.10CohereWhen you're already on Cohere for inference, or when multilingual recall matters.
voyage-3 / voyage-3-lite1024$0.06 / $0.02VoyageEmbeddings-as-a-service; integrates cleanly with Anthropic-shaped stacks.
all-MiniLM-L6-v2 / bge-small-en-v1.5 (local)384 / 384"free" (aap ka compute)sentence-transformers package; runs CPU-only with no API callWhen aap ka inference provider has no embeddings API (DeepSeek, most local LLMs), or when data residency forbids sending text to a third party.

Headline cost number: 50,000 chunks ko ~300 tokens each par embed karna = 15M tokens x $0.02/M = OpenAI par $0.30. -large ke saath wohi $1.95 hai. Local 384-dim model ke saath $0 plus 30 seconds CPU. Embeddings bill ki sab se cheap line hoti hain; choice rarely dollar dial move karti hai, lekin architecture zaroor move karti hai.

Dimension hi contract hai. VECTOR(1536) sirf 1536-dim vectors accept karta hai. Agar aap text-embedding-3-small (1536) se all-MiniLM-L6-v2 (384) par switch karte hain, to column ko VECTOR(384) ke taur par dobara create karte hain aur har row re-embed karte hain. pgvector mein "close enough ho to fit ho jaye" nahin hota; dimensions absolute hain.

Re-embedding. Aap re-embed kab karte hain? Teen triggers:

  1. Source document badal gaya. Jis document_id ki embeddings match karti hain, un sab ko delete aur re-insert karein.
  2. Embedding model badal gaya. Yeh lifetime wali migration hai; agar aap -small se -large par switch karte hain, har existing embedding new embeddings se incompatible hai. Sab kuch re-embed karein, ya transition ke dauran do embedding columns chalayein.
  3. The chunking strategy changed. If you decide 400 tokens was wrong and 250 is right, re-chunk and re-embed. Versioning aap ka chunks (storing chunk_strategy_version in the metadata JSONB) lets you do this safely.

PRIMM, Predict. You've embedded 100,000 chunks with text-embedding-3-small. You then decide to also embed all messages (not just documents) so agent can do "have we discussed this before?" lookups. You write the message embeddings into the same embeddings table with the same column. A semantic search query (find the 5 nearest neighbors to user question, no filter) comes back with mixed document and message results. Is this what you wanted? What's the right query shape? Confidence 1 - 5.

Answer: almost certainly not what you wanted. Mixing documents and messages in retrieval results without distinguishing them produces incoherent answers: model dekhta hai a top result that's a customer's prior message and treats it as authoritative documentation. The right pattern is to filter by source type in the query: either by joining and filtering on WHERE document_id IS NOT NULL for document search, or by running two separate searches and ranking them differently. The schema's CHECK constraint that exactly one of document_id/conversation_id is set makes this filter cheap and the intent explicit.

Debugging poor retrieval. Kabp-k results don't match expectations, four checks in order, each ruling out one common cause:

SymptomLikely causeFix
All top-5 distances are > 0.7 (cosine distance, so anything > 0.5 is "far")Query embedding model differs from corpus embedding modelConfirm both went through the same EMBEDDING_MODEL. Mixing text-embedding-3-small with -large, or local with OpenAI, produces nonsense ranks across both pools.
Top-5 results are all from one source type when you expected varietySource-type filter missingAdd WHERE document_id IS NOT NULL (or the right side of the CHECK constraint) to the query, or split into two ranked queries and merge them with explicit weights.
Top-k changes wildly between near-identical queriesChunk size too small (each chunk lacks enough context)Re-chunk with larger chunks (200, then 400, then 600 tokens) and re-embed. Re-running the same query should now produce stable top-k.
Query returns nothingHNSW index missing, or vector dimension mismatch\d+ embeddings in psql to confirm the column is VECTOR(1536) and the HNSW index exists with vector_cosine_ops. If the column is the wrong dimensionality, re-create it.

Teen diagnostic queries jo scratch.sql mein rakhne layak hain:

-- 1. Distance band on the top-5 for a known-good query
SELECT chunk_text, embedding <=> :query_vec AS distance
FROM embeddings ORDER BY distance LIMIT 5;

-- 2. Source-type breakdown of top-20 (catch the "wrong source dominates" bug)
SELECT
CASE WHEN document_id IS NOT NULL THEN 'document' ELSE 'message' END AS src,
COUNT(*)
FROM (SELECT * FROM embeddings ORDER BY embedding <=> :query_vec LIMIT 20) t
GROUP BY src;

-- 3. Confirm the index is actually being used
EXPLAIN (FORMAT TEXT) SELECT id FROM embeddings ORDER BY embedding <=> :query_vec LIMIT 10;

Retrieval quality Worker accuracy ka silent killer hai. Final answer bilkul reasonable sunai de sakta hai jab ke wrong evidence cite kar raha ho; isay pakarne ka sirf aik tareeqa hai: final answer se pehle retrieval evaluate karein.

Try with AI

I'm chunking a corpus of legal contracts (each averaging 8,000 words)
for semantic search. The user will query things like "what's the
termination clause in this contract" — phrases that map cleanly to
specific sections. Walk me through three chunking strategies:

A) Fixed 400-token chunks with 60-token overlap (the default)
B) Chunk at section headings only, with no overlap
C) A two-level approach: store both 400-token chunks AND
whole-section chunks, search both, combine results

For each, name (1) when it wins and (2) when it loses.

Concept 10: Audit trail as discipline, what "reads and writes" means for a Worker

Thesis is unusually direct about this: "every action a Worker takes leaves a trace in a store that survives agent ki session and can be inspected, replayed, and trusted. system of record hai what separates execution from plausible-sounding fiction. It is also what makes the workforce legible after the fact."

Translated to operational terms: every meaningful action agent takes writes a row to audit_log, plus a more structured row to capability_invocations if it's a skill or tool call. The data being acted on lives in its appropriate table (a message goes in messages, a document update goes in documents); the fact that the action happened lives in audit_log. Do are joined by foreign key.

What to log.

  • Har model call: input tokens, output tokens, model name, cost estimate
  • Har skill invocation: skill name, arguments, result summary, latency, success/error/timeout
  • Har database write: kaunsi table, kya change hua (JSONB payload), kis conversation context ke under
  • Har external tool call: tool name, input, output summary, latency
  • Har guardrail event: kaunsi guardrail trip hui, input kya tha, action kya tha (blocked/allowed/modified)

What not to log.

  • Har turn par full conversation history; woh already messages mein hai, aap usay do dafa store kar rahe honge
  • Sensitive payload data verbatim if the row is queryable to insan; store a hash or summary, keep the full data in a restricted table
  • Internal model reasoning that user shouldn't see; that's a context-management decision, not an audit decision

The replay test. A good audit trail passes one specific test: given a conversation_id and a timestamp, aap reconstruct kar sakte hain what agent did and why, without re-running model. If aap ka audit log doesn't pass this test, it's logging, not auditing.

Concrete write-on-action helper, har jagah use hota hai jahan capability run hoti hai:

# src/chat_agent/audit.py
import time
import json
from typing import Any
from uuid import UUID

import asyncpg

async def log_capability(
pool: asyncpg.Pool,
conversation_id: UUID,
capability: str,
arguments: dict[str, Any],
result: Any,
status: str,
started_at: float,
cost_cents: int | None = None,
) -> None:
"""Write a capability_invocations row plus a denormalized audit_log row.

Both writes happen in one transaction so they're either both present
or both absent — the audit trail never gets partial.
"""
latency_ms: int = int((time.monotonic() - started_at) * 1000)
async with pool.acquire() as conn:
async with conn.transaction():
await conn.execute(
"""INSERT INTO capability_invocations
(conversation_id, capability, arguments, result, status,
latency_ms, cost_cents)
VALUES ($1, $2, $3::jsonb, $4::jsonb, $5, $6, $7)""",
conversation_id, capability,
json.dumps(arguments), json.dumps(result),
status, latency_ms, cost_cents,
)
await conn.execute(
"""INSERT INTO audit_log
(conversation_id, actor, action, target, payload, result)
VALUES ($1, $2, $3, $4, $5::jsonb, $6::jsonb)""",
conversation_id, "worker:customer-support",
"capability_invoked", capability,
json.dumps({"arguments": arguments, "latency_ms": latency_ms}),
json.dumps({"status": status, "result": result}),
)

Do writes in one transaction is the load-bearing detail. Half-written audit trails are worse than missing audit trails: they suggest completeness without delivering it. Either both rows go in or neither does.

Canonical action vocabulary. Every site that writes to audit_log picks an action value from a small, agreed list. Drift here (the same event gets called three different names across codebase) is what makes replay queries fragile six months later. The six values the worked example uses, and which code path writes each:

actiontargetWhere written
message_received(null)cli.py, when user sends a message and a row lands in messages
message_sent(null)cli.py, when agent ki reply lands in messages
skill_activatedskill namecli.py, on the RunItemStreamEvent for a skill load
capability_invokedcapability idlog_capability helper above, on every skill or MCP-tool call
mcp_calledtool namecli.py, after each MCP tool returns (lower-level than capability_invoked; use one or the other consistently)
refund_issued / refund_blockedorder idThe customer-data MCP server's issue_refund tool, on success and policy denial

Pick one of capability_invoked and mcp_called and stick with it; writing both per call double-counts. The worked example uses capability_invoked for agent-side helper (Decision 7) and named domain actions like refund_issued for state-changing writes inside MCP tools (Decision 6). Domain-state writes always get their own action name so the audit row is the receipt for the business event, not just for the tool call that triggered it.

Yeh sirf logging kyun nahin. Teen properties audit data ko log data se separate karti hain:

  1. Replayable. The schema lets you reconstruct agent ki reasoning trace from audit_log joined with messages joined with capability_invocations. A log line in a JSONL file doesn't.
  2. Queryable. "What did agent tell customer X last month, and which policy did it cite?" is a SQL query. "Which skill triggered the most timeouts in the last 7 days?" is a SQL query. Logs require grep and luck.
  3. Trustworthy. Audit rows are in the same database as the business data. They're backed up together, point-in-time-recoverable together, access-controlled together. They survive agent process, the deployment region, and model version.

Yeh what the thesis means when it says "Workers only become governable as a workforce when a ledger makes them legible, as units of capability, cost, latency, and outcome." aap ka audit_log and capability_invocations tables are that ledger. Treat them like one.

Try with AI

Here's a customer support scenario: a customer claims the Worker told
them they would receive a $50 refund, but the actual refund issued was
$30. The Worker handled the conversation 19 days ago.

Walk me through the audit-trail query path to resolve this:

1. Find the conversation. (Which columns of which tables?)
2. Find the message where the refund amount was promised. (How do you
distinguish "discussed" from "promised"?)
3. Find the capability invocation that issued the refund.
4. Find the database write that recorded the $30 amount.

For each step, name the table you'd query and the WHERE clauses.
Then say what's MISSING from the six-table schema that would make
this query easier.

Part 3: MCP, wiring agent to system of record

Part 1 gave agent a library of Skills. Part 2 gave it a Postgres system of record. Part 3 wires the two together with Model Context Protocol: the open standard for how agents reach external state and external capability. Thesis is direct about MCP's place: "MCP is how the workforce reaches [its systems of record]: every authoritative store becomes addressable to any Worker through an MCP server, under policy." This Part makes that operational.

Concept 11: What MCP is and isn't

Model Context Protocol (modelcontextprotocol.io) aik open client/server protocol hai (originally Anthropic se, ab open standard ke taur par governed) jo define karta hai ke AI agent external tools, data, aur prompts se kaise connect hota hai. Jo framing repeat hoti hai woh "AI tools ke liye USB-C" hai: aik protocol, many implementations, kisi bhi side ko swap karein without breaking the other. Framing accurate hai; har metaphor ki tarah is ki limits bhi name karni zaruri hain.

MCP kya hai. Aik protocol. Aik specification. Teen primitives jo server client ko expose kar sakta hai.

  1. Tools: functions jinhein model invoke kar sakta hai. Client unki list deta hai, model aik choose karta hai, aur server usay chalata hai. Conceptually yeh previous course ke @function_tool decorator jaisa hai, lekin implementation MCP server process mein rehti hai, agent ke process mein nahi. Yeh by far sab se zyada use hone wali primitive hai.
  2. Resources: read-only data agent can fetch. files, database query results, API responses. Think of them as the GET-only side of MCP. Less common than tools in practice, but useful for "let agent read this document on demand."
  3. Prompts: reusable prompt templates jo server provide karta hai. Team standardised prompts ("summarize-incident-report") publish kar sakti hai jinhein server se connect hone wala koi bhi agent invoke kar sake. Tools aur resources ke muqablay mein rarely used.

Teen transports, 2026 tak ki current recommendations ke saath:

TransportKab useStatus
stdioLocal subprocess; agent and server on the same machineMature. Default for local tools.
streamable HTTPRemote server; production deploymentsRecommended for new remote work. Bidirectional, resumable, single endpoint.
SSERemote server; older deploymentsLegacy. Many servers still expose it; new ones increasingly default to streamable HTTP.

What MCP is not.

  • Not a framework. It's a protocol. aap ka agent doesn't "use MCP" the way it uses Agents SDK; aap ka agent ki MCP client speaks MCP to an MCP server. Agents SDK includes an MCP client; that's the integration point.
  • Not a service. There is no "MCP cloud." MCP servers are programs you run (or that vendors run for you). The Neon MCP server is hosted at mcp.neon.tech; filesystem MCP server runs as a local subprocess; a custom MCP server you write runs wherever you deploy it.
  • Not a security boundary. MCP defines transport and protocol; what tools an MCP server exposes and what they can do is the server's responsibility. A malicious MCP server can do anything its server-side code does. The trust boundary is still agent loop deciding which tools to call, and the sandbox the tools execute in.
  • @function_tool ka replacement nahi. Dono ki jagah hai. Decision tree Concept 14 mein hai.

Quick check. True or false: (a) An MCP client talks to exactly one MCP server at a time. (b) The same @function_tool-style function, if you wanted, could be exposed as an MCP tool or kept as a function tool, and model wouldn't know the difference. (c) MCP servers and OpenAI Agents SDK are tightly coupled, so to use MCP you must use SDK. Answers: (a) False: agent can connect to multiple MCP servers and see the union of their tools. (b) True: to model, both look like callable tools with schemas. The difference is where the implementation lives. (c) False: MCP is model-agnostic. Claude, Gemini, and others have their own MCP clients. OpenAI Agents SDK is one client among many.

Try with AI

You're building three capabilities for an internal "weekly report"
Worker. For each, decide: @function_tool, custom MCP server, or
vendor MCP server (use the existing one). Justify with ONE specific
property from Concept 11:

A) generate_summary(text: str) -> str — wraps OpenAI's completion
API to summarize the input. Used only by this Worker.

B) read_jira_issues(project: str, since: date) -> list[Issue] — your
company already runs Atlassian's official Jira MCP server in
production.

C) post_to_slack(channel: str, message: str) -> None — used by this
Worker today. The Engineering team's incident-response Worker
needs the same capability next quarter.

Then answer: which of the three is the EASIEST design decision, and
why? Which is the HARDEST, and why?

Concept 12: The Neon MCP server, development plane, not runtime

Is concept ki specifics age ho jayen gi. Pattern nahi. Neon ka MCP server tooling, auth flow, aur exact tool surface har kuch mahino mein change hota hai. Jo true rehta hai: managed-database vendor natural-language operations ke liye apni management API MCP ke through expose karta hai, jab ke runtime production traffic direct connections ya scoped custom servers use karta hai. Specifics pin karne se pehle Neon docs ke against verify karein.

Neon MCP server vendor-shipped MCP server ki sab se polished misaal hai. Yeh Neon's management API (projects, branches, databases, migrations, ad-hoc SQL) ko MCP tools ke taur par expose karta hai jinhein aap ka agent natural language mein call kar sakta hai. Development ke dauran yeh profoundly useful hai. Production runtime ke liye explicitly nahi. Neon ki apni warning unambiguous hai: "We recommend MCP for development and testing only, not production environments." Aur Neon docs ka stronger version: "Never connect MCP agents to production databases."

Yeh distinction kyun matter karti hai. Neon MCP server ka run_sql tool model ki produced any SQL execute kar sakta hai. Development workflow jahan aap kehte hain "show me users who signed up last week and haven't logged in" -> model SELECT * FROM users WHERE ... generate karta hai -> server run karta hai -> results wapas aate hain, delightful hai. Production mein yehi path security nightmare ho ga: attacker jo aap ke customer-support Worker ko prompt-inject kar sake, sahi question pooch kar aap ka entire database exfiltrate kar sakta hai.

Neon MCP server kis kaam ke liye hai. Teen roles jahan yeh right tool hai:

  • Schema design and migrations during development. "Add a priority column to the tickets table." Branch-based migrations let model test the migration on a branch before committing; Neon's own MCP design pattern. Yeh right way to use it.
  • Exploratory data work in aap ka agent-building loop. "How many embeddings are in database, broken down by source?" Working through model is often tez than typing SQL by hand, especially for one-off questions.
  • Connection-string retrieval and project introspection. The MCP server can hand you a connection string for the right branch without aap ka having to log into the Neon console.

Quick setup for development use. The Neon MCP server is available three ways. Use the OAuth-based remote server for the sadast path:

# In your project, run:
npx -y mcp-remote@latest https://mcp.neon.tech/mcp

Or add to .claude/settings.json:

{
"mcpServers": {
"neon": {
"command": "npx",
"args": ["-y", "mcp-remote@latest", "https://mcp.neon.tech/mcp"]
}
}
}

Claude Code first use par OAuth ke liye prompt karta hai.

A development session with the Neon MCP server, no SQL typed. (Approximate transcript; tool names are accurate as of Neon's current MCP documentation; exact phrasing and output format will vary. The shape of the interaction is what matters.)

You: Set up the chat-agent SoR. Create a project called "chat-agent",
enable pgvector, run the schema from migrations/001_initial.sql,
and tell me the connection string for the main branch.

Agent: [calls create_project] Created project chat-agent (ID: abc123).
[calls run_sql] Ran CREATE EXTENSION vector; — succeeded.
[calls prepare_database_migration with migrations/001_initial.sql]
Created temporary branch migration-test/abc123.
Migration ran on the branch; 6 tables created, 7 indexes built.
Want me to commit to main?

You: Yes, commit.

Agent: [calls complete_database_migration] Committed. Main branch
updated. Connection string:
postgresql://...@ep-cool-name.neon.tech/chat-agent?sslmode=require

Yeh poora interaction Claude Code ya OpenCode session hai: koi SQL hand se type nahi, migration main par jane se pehle branch par tested. Development ke dauran MCP isi ke liye hai. Runtime par single audit row write karne ke liye agent ko yeh use nahi karna chahiye.

PRIMM, Predict. You're writing a customer-support Worker. The Worker needs to: (a) look up the customer's order history; (b) check the refund policy for the customer's tier; (c) issue a refund; (d) write an audit row recording what was issued and why. Should it talk to Neon through the official Neon MCP server, or some other way? Confidence 1 - 5.

Answer: some other way. For all four operations. The Neon MCP server's run_sql is too broad an interface for a production agent: it grants agent permission to run arbitrary SQL, which is far more authority than the four narrow operations above require. The production patterns are direct Postgres connections (a connection_pool from asyncpg) wrapping scoped business operations, or a custom MCP server that exposes only those four operations. Concept 14 covers the latter; Part 4's worked example uses both: a custom customer-data MCP server for business operations (lookups, vector search, refunds), and direct asyncpg only for the audit subsystem (Decision 7 explains the reason: audit is the meta-layer and must not share the MCP boundary it's auditing).

Invariant 5 ke saath thesis jo distinction bana rahi hai (ke workforce governed stores se read/write karti hai) woh isi distinction par depend karti hai. Broad run_sql MCP tool governance nahi; yeh friendly interface ke saath governance ki absence hai.

The Neon MCP tools aap ka Worker will touch. The Neon MCP server exposes roughly twenty tools across project management, branching, schema, SQL execution, and query tuning. Decision 3 of the worked example uses a small subset; the rest are useful for follow-on work (extending the schema later, sandboxing a what-if migration on a branch). For reference:

ToolWhat it doesWhere used in Part 4
list_projectsList Neon projects on aap ka accountDecision 3, to confirm the project does not already exist
create_projectProvision a new Neon project (Postgres database + default main branch)Decision 3, to create the chat-agent project
describe_projectGet project metadata (default branch, region, compute settings)Decision 3, sanity-check after provisioning
get_connection_stringReturn the Postgres connection string for a named branchDecision 3, to populate NEON_DATABASE_URL for asyncpg
prepare_database_migrationStart a migration: opens a temporary branch and lets agent run DDL thereDecision 3, to test the six-table schema before merging to main
run_sqlExecute SQL on a named branch (within an MCP-mediated session)Decision 3, to run the schema creation on the temporary branch
complete_database_migrationCommit a migration: merges the temporary branch back to mainDecision 3, after the schema verifies clean on the temporary branch
describe_table_schemaDump full DDL of a named table (useful for agent to ground future plan-mode work)Optional, when you ask Plan Mode to extend the schema later
create_branch / delete_branchManage isolated copy-on-write branchesOptional, when you want a sandbox branch for a what-if migration

Production runtime Workers is server ko touch nahi karte. Woh Decision 6 mein aap ke written custom MCP server ke through Postgres tak reach karte hain, ya audit subsystem ke liye direct asyncpg use karte hain (Decision 7).

Try with AI

Read Neon's MCP server documentation page and answer three questions:

1. List THREE management operations the Neon MCP server exposes that
would be useful during development of a customer-support Worker.
2. List THREE operations a runtime Worker would NEED that the Neon MCP
server should NOT be used for, and why.
3. For each of the three in (2), propose what the Worker should use
instead (direct Postgres connection? custom MCP server? function_tool?).

Concept 13: Connecting MCP to the OpenAI Agents SDK

Agents SDK ships with a first-class MCP client. Three classes do the work, one per transport:

from agents.mcp import (
MCPServerStdio, # local subprocess
MCPServerStreamableHttp, # remote, modern transport
MCPServerSse, # remote, legacy transport (avoid for new work)
)

All three are async context managers (async with server as ...:). Once connected, you pass them to agent via the mcp_servers argument. agent automatically discovers each server's tools at startup, presents them to model alongside any @function_tools you've defined, and routes calls back to the right server based on which tool model picks.

Bump the init timeout if your MCP server imports anything heavy

MCPServerStdio's default client_session_timeout_seconds=5 is enough for a pure-Python server that only imports mcp and asyncpg. It is not enough when the server imports torch, sentence-transformers, or anything that loads model into memory at import time, which can take 10 - 60 seconds on first launch. The symptom is a confusing init failure ("MCP server did not respond in time"); the fix is one parameter:

MCPServerStdio(
name="customer-data",
params={...},
client_session_timeout_seconds=60,
)

Jis server ka process startup par real work karta ho, us ke liye isay aik dafa set karein aur bhool jayein.

MCP architecture: model decide karta hai which tool to call; the MCP client routes the call across the trust boundary via streamable HTTP (or stdio, or legacy SSE); the MCP server exposes a narrow, scoped set of tools and is the only thing that touches Postgres. Three properties the boundary buys you: scope, isolation, reusability. Development ke dauran Neon MCP server use karte hue minimal example:

# tools/scratch_with_neon.py — development utility, not production
import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHttp


async def main() -> None:
async with MCPServerStreamableHttp(
name="Neon (development)",
params={"url": "https://mcp.neon.tech/mcp"},
cache_tools_list=True,
) as neon:
agent: Agent = Agent(
name="DBAssistant",
instructions=(
"You help with Neon database operations during development. "
"Always use prepare_database_migration / complete_database_migration "
"for schema changes — never run DDL directly on main."
),
mcp_servers=[neon],
model="gpt-5.5",
)
result = await Runner.run(
agent,
"List all projects, then show me the schema of the largest one.",
)
print(result.final_output)


if __name__ == "__main__":
asyncio.run(main())

Three details worth naming:

  • async with optional nahi. MCP connection open transport hold karta hai (stdio ke liye subprocess, streamable HTTP ke liye HTTPS session). Context manager ke baghair connection leak hota hai aur server-side state stale ho jati hai. Hamesha async with use karein.

  • cache_tools_list=True is a substantial speedup for production. By default SDK calls list_tools() on every Runner.run, which is a network round-trip. Caching makes it once-per-process. Cache invalidation is manual: call server.invalidate_tools_cache() when you've added or removed tools. For development with a server whose tools change, leave it False.

  • Multiple MCP servers stack naturally. Pass mcp_servers=[neon, custom_server, hosted_server]; model dekhta hai the union of all tools. Use MCPServerManager if aap ko chahiye lifecycle management across many; it's a thin helper around the same pattern.

The require_approval parameter is the production switch worth knowing about. By default, MCP tool calls run without confirmation. For sensitive servers, aap kar sakte hain require human approval per-tool:

async with MCPServerStreamableHttp(
name="Neon (development, guarded)",
params={"url": "https://mcp.neon.tech/mcp"},
require_approval={
"always": {"tool_names": ["delete_project", "delete_branch", "run_sql"]},
"never": {"tool_names": ["list_projects", "describe_project"]},
},
) as neon:
...

Destructive operations ko human-in-the-loop check milta hai; read-only operations silently run hoti hain. Yeh Concept 12 ke development-vs-production gap ka practical knob hai: jab aap hands-on work ke liye Neon MCP server use kar rahe hon tab bhi us ke destructive tools ko approval ke through gate karna real safety improvement hai.

A look at how model dekhta hai MCP tools. When agent runs, model gets tool list that looks structurally similar to a @function_tool list:

[
{"type": "function", "function": {"name": "search_docs", "description": "...", "parameters": {...}}},
{"type": "function", "function": {"name": "get_billing_invoice", "description": "...", "parameters": {...}}},

// These three came from the Neon MCP server:
{"type": "function", "function": {"name": "list_projects", "description": "Lists the first 10 Neon projects ...", "parameters": {...}}},
{"type": "function", "function": {"name": "run_sql", "description": "Executes a single SQL query ...", "parameters": {...}}},
{"type": "function", "function": {"name": "prepare_database_migration", "description": "Initiates a database migration by creating a temporary branch ...", "parameters": {...}}}
]

model can't tell the difference between an MCP tool and a @function_tool, and shouldn't need to. The SDK's routing layer dispatches each call to the right backend.

Try with AI

I want to connect my OpenAI Agents SDK agent to TWO MCP servers
simultaneously: (a) the Neon MCP server for database operations, and
(b) a local filesystem MCP server (npx @modelcontextprotocol/server-filesystem)
for reading project files.

Write the async setup code that connects to both servers and uses
them in a single Agent. Include:

1. The right imports.
2. Both connections as async context managers (one streamable HTTP,
one stdio).
3. An Agent that uses both, with require_approval set to require
approval for any tool that writes to the filesystem.
4. A sample Runner.run that exercises tools from both servers in
one turn.

Concept 14: Custom MCP servers, when to write apna vs. when not to

Neon MCP server generic hai: woh woh sab kuch kar sakta hai jo Neon's API kar sakti hai. Development ke liye yeh us ki strength hai aur runtime ke liye weakness. Custom MCP server trade-off ko invert karta hai: narrow surface, general-purpose run_sql nahi, sirf woh specific operations jo aap ke Worker ko actually chahiye.

Decision tree, priority order mein.

Decision tree for capability placement: starting at the root question, answer five filters in order (single-use? vendor has one? multi-agent reuse? sensitive scope? process-isolation?). Three leaves are green (use what you have: @function_tool or vendor MCP server); three are amber (build something new: custom MCP server). Stop at the first YES.

Wahi logic quick-scan table mein:

You want to expose...Use thisWhy
One function with one input, used by one agent@function_toolNo need for protocol overhead. Local function call is fine.
Several functions tightly coupled to aap ka agent ki code@function_toolIf they share state with agent and live in the same repo, they're part of agent.
A capability that multiple agents (or multiple deployments) will useCustom MCP serverThe protocol is what makes it reusable.
A capability that needs to outlive agent ki processCustom MCP serverLong-running connections, background jobs, queue consumers.
Vendor-provided functionality (Neon, GitHub, Linear)Vendor's MCP serverDon't rebuild what they ship.
Sensitive operations that need narrow scopeCustom MCP serverDefine exactly the tools aap ko chahiye; nothing else.

Minimum-viable custom MCP server in Python, using the official MCP SDK:

# servers/customer_data_mcp/server.py
# Run with: python -m customer_data_mcp.server
import os
from typing import Annotated
import asyncpg
from mcp.server.fastmcp import FastMCP
from pydantic import Field

mcp: FastMCP = FastMCP("customer-data")

# Connection pool: created once, reused across tool calls.
_pool: asyncpg.Pool | None = None


async def get_pool() -> asyncpg.Pool:
global _pool
if _pool is None:
_pool = await asyncpg.create_pool(
os.environ["DATABASE_URL"],
min_size=1, max_size=10,
)
return _pool


@mcp.tool()
async def lookup_customer(
customer_id: Annotated[
str, Field(description="The customer's UUID, as provided by the user."),
],
) -> dict[str, str | int]:
"""Look up a customer by ID. Returns id, email, tier, and active_tickets count.

Use when the user provides a customer ID and you need their basic profile.
Does NOT return billing details — use lookup_billing for that.
"""
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"""SELECT id::text, email, tier,
(SELECT COUNT(*) FROM tickets t
WHERE t.customer_id = c.id AND t.status='open') AS active_tickets
FROM customers c WHERE id = $1::uuid""",
customer_id,
)
if row is None:
return {"error": f"customer {customer_id} not found"}
return dict(row)


@mcp.tool()
async def find_similar_resolved_tickets(
description: Annotated[
str, Field(description="The user's question or description of the issue."),
],
limit: Annotated[int, Field(description="Max results.", ge=1, le=10)] = 5,
) -> list[dict[str, str | float]]:
"""Find resolved tickets similar to a description, via pgvector.

Use when the user describes an issue and you want to check if a similar
issue has been resolved before. Returns ticket_id, summary, resolution,
and similarity score (lower is more similar).
"""
# Embed the description with the SAME model used at insert time
# (text-embedding-3-small, 1536 dim per Concept 9). Mixed models
# silently return nonsense rankings. embed_chunks is batch-oriented;
# pass a one-element list and unpack the result.
from openai import AsyncOpenAI

from chat_agent.embedding.embedder import embed_chunks

[query_embedding] = await embed_chunks(AsyncOpenAI(), [description])

pool = await get_pool()
# Note the JOIN path: embeddings.document_id references documents(id), not
# tickets(id). See Concept 7's schema. A resolved ticket is embedded by
# storing it as a `documents` row (source='past_case') whose metadata
# holds the ticket id, so the join goes embeddings -> documents -> tickets.
async with pool.acquire() as conn:
rows = await conn.fetch(
"""SELECT t.id::text AS ticket_id, t.summary, t.resolution,
(e.embedding <=> $1::vector) AS distance
FROM embeddings e
JOIN documents d ON d.id = e.document_id
JOIN tickets t ON t.id = (d.metadata->>'ticket_id')::uuid
WHERE d.source = 'past_case' AND t.status = 'resolved'
ORDER BY e.embedding <=> $1::vector
LIMIT $2""",
query_embedding, limit,
)
return [dict(r) for r in rows]


if __name__ == "__main__":
mcp.run(transport="stdio")

Then in aap ka agent:

async with MCPServerStdio(
name="customer-data",
params={
"command": "python",
"args": ["-m", "customer_data_mcp.server"],
"env": {"DATABASE_URL": os.environ["DATABASE_URL"]},
},
cache_tools_list=True,
) as customer_data:
agent: Agent = Agent(
name="CustomerSupport",
mcp_servers=[customer_data],
...,
)

Teen cheezen jo yeh server deta hai aur @function_tool nahi deta.

  1. Process isolation. The MCP server runs in its own process (subprocess for stdio, separate service for streamable HTTP). A crash in the server doesn't crash agent; a memory leak in the server doesn't leak in agent.

  2. Scope. The server has exactly two tools. No run_sql. No "execute arbitrary code." model can't escape this scope because the protocol doesn't expose anything else. Yeh ek real defense in depth: even if model decided to do something stupid, the surface area to do it through is two functions.

  3. Agents ke across reusability. Doosra agent (Sales Worker, Reporting Worker) same customer-data MCP server se baat kar sakta hai. Same scope, same protocol, same trust boundary. Capability agents ke darmiyan copy-paste ke bajaye shared infrastructure ka piece ban jati hai.

Trade-off real hai. Custom MCP servers operational complexity add karte hain: deploy karne ke liye aik aur process, logs ka aik aur set, aik aur network hop (agar remote ho), manage karne ke liye aik aur version. Single agent ke single function ke liye server na likhein. Server tab likhein jab capability reuse hone wali ho, scoping matter karti ho, ya isolation safety buy karta ho.

PRIMM, Predict. You're designing the customer-support Worker. aap ko chahiye: (1) semantic search over past resolved tickets; (2) writing a refund audit row; (3) reading the current weather (used in one greeting skill that says "good morning from sunny Karachi"); (4) calling the payment gateway to issue a refund. For each, predict: @function_tool, custom MCP server, or vendor MCP server (e.g., Stripe's, if such exists)?

Answers tease out the framework:

  1. Custom MCP server (customer-data). Agents ke across reused; sensitive data; scoped tools broad run_sql se better.
  2. Custom MCP server (customer-data) ya @function_tool. Dono work karte hain; agar Worker only writer hai to function tool fine hai. Agar multiple Workers audit rows likhein ge to MCP server.
  3. @function_tool. Aik agent, aik tiny function, defend karne ke liye security surface nahi. Is ke liye server na build karein.
  4. Vendor MCP server (Stripe MCP) if it exists, else @function_tool calling Stripe's API. Don't wrap third-party APIs in apna MCP server unless aap ko chahiye to add policy on top.

Framework trace karne ke baad clear hai: MCP ki value us boundary ki value ke saath barhti hai jo woh create karta hai. Jis boundary ki zarurat nahi, woh overhead hai.

Try with AI

You're designing the architecture for a research-Worker that does these
five things:

A. Reads PDFs from a Google Drive folder (uses Google Drive's API)
B. Queries a private Snowflake warehouse the company owns
C. Calls a third-party data-enrichment API (Clearbit-like)
D. Logs every action to the company's central audit log database
E. Generates a Markdown report by string-formatting the results

For each, decide: @function_tool, custom MCP server, or vendor MCP
server (if a credible vendor MCP server exists). For each choice,
justify with ONE of the three properties of MCP from Concept 14
(isolation, scope, reusability) OR justify why the boundary isn't
worth building.

Concept 15: MCP under load: transports, pooling, and what happens at scale

Laptop par aik MCP server aur aik agent wali demo fine chalti hai. Questions tab hard hote hain jab Worker production mein 10 conversations per minute handle kar raha ho. Yeh Concept operational checklist hai.

Transport choice, at scale.

  • stdio is for local development and single-process deployments. The MCP server runs as a subprocess of agent. Restart agent -> server restarts -> fresh state. Cheap. Works. Doesn't scale across machines.

  • streamable HTTP is for production remote servers. Multiple agents can connect to one MCP server. The server can run on different hardware thagent. Auto-reconnect, resumable streams, single endpoint: designed for production from the ground up.

  • SSE exists, mostly for legacy reasons. New work should default to streamable HTTP. Existing SSE-only servers are still common (SDK supports them); don't rewrite a server just to switch transport.

Connection management. MCP connections free nahi. Har connection aik session hai: authentication handshake, list_tools round-trip, persistent connection. Teen patterns matter karte hain:

  1. Cache the tools list. cache_tools_list=True saves a round-trip per Runner.run. Yeh single biggest latency win for the typical agent loop. Invalidate the cache when you deploy a new version of the server.

  2. Reuse server instances across runs. Don't create a new MCPServerStreamableHttp(...) per request. Open once at agent startup, hold the async with open for the lifetime of agent process, close at shutdown. The connection cost is amortized across thousands of agent runs.

  3. Connection pooling lives inside the MCP server, not the client. If aap ka MCP server talks to Postgres (as the customer-data example does), the server should have an asyncpg.create_pool(min_size=1, max_size=10) and share the pool across tool calls. The client side talks to one MCP server connection; the server side fans out to a pool of database connections.

Concurrency limits. Inhein set karne ki teen jagah, cheapest se most expensive to get wrong order mein:

  • max_turns on agent. Previous course se familiar hai. Tools kuch bhi hon, yeh loop length cap karta hai.
  • max_retry_attempts on the MCP server connection. Caps retry storms when an MCP server is flaky. Default 0 (no retries); set to 2 - 3 for production resilience.
  • Server-side rate limits. Inside aap ka custom MCP server, count concurrent calls per agent and reject above some threshold. The protocol allows tool calls to return errors; clients see them as tool failures and model can decide to retry, give up, or pick a different tool.

Tracing across the MCP boundary. Previous course ka tracing setup yahan bhi chalta hai, lekin aik aham detail ke saath: MCP tool calls agent ki trace mein ordinary tool calls ki tarah nazar aati hain: same span shape, same span name (MCP tool ka naam), same timing. Agent side se jo cheez visible nahi hoti woh yeh hai ke MCP server ke process ke andar kya hua. Agar aap ka custom MCP server apni database calls ya downstream API calls karta hai, to unke liye server ke andar apni tracing chahiye.

Aik practical pattern: propagate trace context across the MCP boundary using _meta. The SDK supports a tool_meta_resolver that adds metadata to each tool call (tenant IDs, trace IDs, request IDs); aap ka server can extract and use them. Yeh how a single agent run's tracing chain stays unbroken from agent -> MCP client -> MCP server -> Postgres.

Quick check. True ya false: (a) MCPServerSse se MCPServerStreamableHttp par switch karne ke liye MCP server khud change karna padta hai. (b) cache_tools_list=True production mein safe hai jab tak aap new server version deploy karne ke baad invalidate_tools_cache() call karte hain. (c) Paanch tools wala MCP server hamesha paanch functions wale @function_tool se zyada agent context budget use karta hai. Answers: (a) Depends: server ko streamable HTTP transport support karna hota hai; most modern servers karte hain, older servers shayad sirf SSE bolte hon. (b) True: intended pattern yahi hai. (c) False: schema level par yeh equivalent hain. Paanch tool definitions dono forms mein roughly same cost karte hain.

Try with AI

My customer-support Worker is in production. It runs 80 conversations/minute
at peak. Each conversation makes 2-4 MCP tool calls on average. I'm seeing
intermittent latency spikes — most calls return in 200ms, but a small
percentage take 5-15 seconds.

Walk me through five places I'd investigate, in order of priority:

1. The agent-side MCP client connection management.
2. The transport choice between agent and MCP server.
3. The MCP server's internal connection pool to Postgres.
4. Postgres-side query performance (slow queries blocking the pool).
5. Network or DNS issues between agent and MCP server.

For each, name the specific signal I'd look for and the rough fix.

Part 4: The worked example, customer-support Worker

Aik realistic evolution, upar ke har concept ke saath, dono tools. Hum Build AI Agents se chat-agent project lete hain aur teen Skills, Neon system of record, aur MCP wiring layer add kar ke usay customer-support Worker banate hain. Eight build decisions, us course ke Part 5 jaisi same shape.

Part 4 in one picture: previous course ke chat agent (left) se shuru hota hai; eight Decisions teen phases mein grouped hain. Phase 1 (blue) Foundation hai: D1 CLAUDE.md update karta hai, D2 Plan Mode mein plan karta hai, D3 Neon provision karta hai. Phase 2 (amber) Capability hai: D4 pehli Skill likhta hai (woh aik deliberate hand-write), D5 embedding pipeline banata hai. Phase 3 (green) Runtime + Verify hai: D6 custom MCP server banata hai, D7 audit logging wire karta hai, D8 end-to-end verify karta hai. Jab bhi samajhna ho ke koi Decision arc mein kahan fit hota hai, yahan wapas dekhein.

Before you start: prereqs ke ilawa setup. Is Part mein teen cheezen already done assume ki gayi hain. (1) Aap ke paas starter chat-agent project hai (yeh is course ke zip mein aata hai aur previous course ki end state reproduce karta hai): cli.py, agents.py, tools.py, models.py, guardrails.py, plus sandboxed.py. Hum in files ko modify karte hain; replace nahi karte. Agar aap ne previous course complete kiya hai to us project ko use karein; shape same hai. (2) Aap ke paas free Neon account hai aur aap authentication ke liye aik dafa npx neonctl@latest init chala chuke hain. (3) Aap ke paas Claude Code ya OpenCode installed aur authenticated hai. In mein se koi cheez missing ho to Decision 1 se pehle fix karein.

The brief

Previous course ke chat-agent ko evolve kar ke aisa customer-support Worker banayein jo:

  • Teen Skills on demand load karta hai: summarize-ticket, find-similar-cases, aur escalate-with-context.
  • Concept 7 ki six tables ke saath Neon Postgres system of record se read/write karta hai.
  • Past resolved cases ki chhoti library par semantic search ke liye pgvector use karta hai.
  • Runtime par scoped, custom MCP server (customer-data) ke through Postgres se baat karta hai, Neon MCP server ke through nahi, aur agent code mein direct asyncpg calls ke through nahi.
  • Har meaningful action ke liye audit row likhta hai: har skill invocation, har database write, har refund considered.

"End par verification" test: customer message karta hai "I haven't received my refund from order #4429, it's been two weeks." Worker MCP ke zariye order look up karta hai, vector search se teen similar past cases find karta hai, aisa response draft karta hai jo most similar case ke resolution ko cite karta hai, jo kiya usay record karte hue audit row likhta hai, aur (real deploy mein) agar customer Pro tier user ho to escalate karta hai.

A note on the prompts that follow. Each Decision shows a structured ask as a block-quoted prompt. The pattern that works best in practice is to precede each ask with one orient move ("Read CLAUDE.md and the relevant files, tell me what you see, and ask 1-2 questions before we start") and then send the structured ask once agent has loaded context and clarified ambiguities. The structured asks below are the destination, not the first move. Pasting them cold works; pasting them after orientation works better, especially as the project grows.


Decision 1: Update the rules file with the new architecture

What you do (Claude Code). Apne existing chat-agent/ project mein Claude Code open karein. Agent ko is course ki architectural changes brief karein, aur usay CLAUDE.md update karne ko kahein:

We're extending this project with three additions on top of the Course
#3 stack: Skills loaded from .claude/skills/, a Neon Postgres system of
record (six-table schema in migrations/001_initial.sql), and MCP wiring
(Neon MCP server for development; a custom customer-data MCP server at
servers/customer_data_mcp/ for runtime).

Update CLAUDE.md to add:

1. New stack lines (Neon Postgres + pgvector, MCP).
2. A new "Architecture (NEW)" section describing where each piece lives.
3. Critical rules preventing three failure modes: business data access
must go through the customer-data MCP server (agent decision logic
never bypasses MCP for business reads or writes); the only permitted
direct asyncpg usage is in the audit subsystem via a separate audit
pool; never use the Neon MCP server's run_sql in production paths
(it's a dev tool); every skill invocation, refund, or database write
produces an audit_log row; skills live in .claude/skills/ (OpenCode
reads as fallback) — do not duplicate in .opencode/skills/.
4. New uv-run commands for the MCP server and migrations.

Keep the file under 2,500 tokens. Show me the diff before writing.

Claude Code update draft karta hai. Diff carefully parhein. "Critical rules" section load-bearing part hai; wahan koi weak cheez un failure modes ko prevent nahi kare gi jinhein woh prevent karne ke liye hai. Result roughly is tarah dikhna chahiye:

# chat-agent

## Stack

Python 3.12+, uv, openai-agents >=0.14.0 (latest at time of writing: 0.17.2),
Cloudflare Sandbox, Neon Postgres + pgvector, MCP (custom server for runtime).

## Architecture (NEW)

- Capabilities: three Skills in .claude/skills/
(summarize-ticket, find-similar-cases, escalate-with-context).
See [agentskills.io](https://agentskills.io/specification) for the format.
- System of record: Neon Postgres, schema in migrations/001_initial.sql.
Six tables: conversations, messages, documents, embeddings, audit_log,
capability_invocations.
- MCP: Neon MCP server for development ONLY (schema, migrations).
Runtime traffic uses a custom MCP server at servers/customer_data_mcp/,
exposing scoped tools (lookup_customer, find_similar_resolved_tickets,
issue_refund). No run_sql in production paths.
- Embeddings: text-embedding-3-small (1536 dims), cosine distance,
HNSW index.

## Critical rules (NEW)

- Business data access must go through the `customer-data` MCP server.
Agent decision logic must never bypass MCP to query or mutate business
data. The only permitted direct asyncpg usage is inside the audit
subsystem, through a separate audit connection pool (Decision 7
explains why audit can't share the MCP boundary it's auditing).
- Never use the Neon MCP server's run_sql tool in production paths.
It's a development tool.
- Every skill invocation, every refund consideration, every database
write produces an audit_log row. No silent actions.
- Skills are in .claude/skills/ (OpenCode reads this as a fallback);
do not duplicate in .opencode/skills/.

## Commands

- `uv run python -m chat_agent.cli`: local dev (uses customer-data MCP via stdio)
- `uv run python -m chat_agent.sandboxed`: sandbox deployment
- `uv run python -m customer_data_mcp.server`: run MCP server standalone
- `uv run alembic upgrade head`: apply pending migrations

File ~30 lines se ~50 lines tak grow karti hai, phir bhi previous course ke 2,500-token budget ke kaafi neeche rehti hai. Agar Claude Code ka dikhaya hua diff critical rules mein se kuch miss karta hai, ya brief mein na hone wali koi constraint hallucinate karta hai, to push back karein aur re-prompt karein. Rules file aik aisi jagah hai jahan chhoti inaccuracies ka downstream effect bohat bara hota hai.

Why. Previous course ki rules file ne engine aur sandbox pin kiye thay. Yeh update us architecture ko pin karta hai jo upar layer hoti hai: Skills, system of record, MCP boundaries. "Business data through MCP, audit through scoped direct asyncpg" rule key rule hai: yeh agent ko baad mein hurry ke waqt MCP boundary short-circuit karne se rokta hai, aur audit subsystem ko us boundary se independent rakhta hai jise woh audit karta hai. Decision 7 batata hai ke yeh independence kyun matter karti hai.

Har rule apni jagah kyun deserve karta hai. Previous course ki audit-prompts table wali same shape. Har rule aik specific failure se map hota hai jo model warna kar deta, plus aik paste-able audit prompt jo aap ke coding agent ko current codebase mein verify karne deta hai ke rule ab bhi follow ho raha hai:

RuleYeh kis failure ko rokti haiAudit prompt (apne coding agent mein paste karein)
Business data customer-data MCP server se guzarta haiAgent decision logic MCP ko short-circuit kar ke scoped tools aur audit logging bypass karte hue Postgres directly query kar deti hai"List every asyncpg, psycopg, or raw SQL call site in src/chat_agent/. For each, name the file and confirm whether it's in the audit subsystem (allowed) or anywhere else (violation). Flag every violation."
Audit separate asyncpg pool use karta hai, MCP kabhi nahiAudit subsystem us boundary par depend karne lagta hai jise woh audit karta hai, is liye MCP outage audit ko bhi silently disable kar deti hai"Find the audit subsystem's connection setup. Confirm it builds its own asyncpg.create_pool(...) and does NOT import anything from the customer-data MCP layer."
Neon MCP run_sql dev-only hai, production kabhi nahiThrowaway dev tool production path mein call ho jata hai, agent ke liye unscoped SQL execution open kar deta hai"Search the repo for run_sql calls. List every match. Confirm each is inside tools/, scripts/, or a similar dev-only directory; flag any in src/chat_agent/ runtime paths."
Har skill, refund, write ek audit_log row produce karta haiSkill invocation ya destructive write silently ho jati hai, aur next compliance review ke paas trail nahi hota"List every @function_tool, skill invocation, and database write in the agent code. For each, confirm there is a corresponding audit_log insert in the same transaction or immediately after."
Skills sirf .claude/skills/ mein rehti hainSkills .opencode/skills/ mein duplicate ho jati hain, dono copies drift karti hain, aur subtle behavior change ship ho jata hai"Run ls .claude/skills/ and ls .opencode/skills/ (if it exists). They should not both contain skill folders. If .opencode/skills/ exists, list any folder that is also in .claude/skills/ as a violation."

Worker mein kisi bhi significant change ke baad audit prompts run karein; agar coding-agent reply violations list kare to team ke saath next conversation wahi honi chahiye. Yeh table Worker ki governance own karne wale person ke liye quarterly review checklist bhi ban jati hai.

What changes in OpenCode. Same flow: agent ko brief karein, diff review karein. File ko AGENTS.md rename karein (agar previous course se pehle hi nahi kiya) ya CLAUDE.md rehne dein (OpenCode isay fallback ke taur par parhta hai). Same content.


Decision 2: Plan the schema and the Skill set

What you do (Claude Code). Press Shift+Tab twice to enter Plan Mode. model can read aap ka existing project but cannot edit anything. Brief it:

Plan the customer-support Worker evolution of this project. The
foundation (OpenAI Agents SDK, Cloudflare Sandbox, sessions, streaming,
guardrails) stays. We're adding:

1. Three Skills: summarize-ticket, find-similar-cases, escalate-with-context.
For each, propose: the description, the operational shape (script-driven
or instruction-driven), and what reference files it needs.

2. The six-table schema from Part 2 Concept 7, plus any tables specific
to a customer-support domain (probably: customers, orders, tickets, refunds).

3. The custom MCP server (customer-data), with exactly the runtime tools
our agent will need. Propose the tool list and signatures. No run_sql.

4. The audit-logging plan — what writes an audit row, what doesn't.

Output the plan as a markdown file at plans/customer-support-worker-plan.md.
Do not write code yet.

model produces a plan. Read it carefully. Two places where the first draft is usually wrong:

  • Skills' descriptions will be vague ("Summarizes tickets"). Push back: descriptions must be specific enough to fire correctly (Concept 3 of this course was about exactly this).
  • MCP tools ke input schemas zarurat se broad hon ge ("query: string"). Push back karein: har tool ke paas sirf minimum input hona chahiye jo use chahiye. lookup_customer tool ko customer_id chahiye, query nahi jisse aap SQL build karein.

Kyun. Plan Mode do failure modes catch karta hai jo baad mein hours cost karte hain: vague description wali Skill kabhi fire nahi hoti, aur broad input schema wala MCP tool bas extra steps ke saath run_sql hai. Dono built hone ke baad ke bajaye markdown plan mein fix karna aasaan hai.

What changes in OpenCode. Press Tab to switch to the Plagent. Same prompt, same plan output. Save the plan file with the same name.


Decision 3: Provision Neon and run the schema migration

Cost impact (Decision 3)

Neon ka free tier Part 5 ke assumed volume (~200 conversations/day) par single Worker cover karta hai. Yahan $0/month plan karein. Free plan limits 0.5 GB storage aur 100 compute-hours per project hain (Neon pricing); us se upar Launch tier pay-as-you-go hai (roughly $0.11/CU-hour + $0.35/GB-month), aur worked-example Worker usually $25/month ke under rehta hai. Full breakdown ke liye Part 5 ki cost shape table dekhein.

Aap kya karte hain (Claude Code). Plan Mode se exit karne ke liye Shift+Tab press karein. Ensure karein ke Neon MCP server connected hai (Concept 12 se). Phir:

Using the Neon MCP server, provision a project called "chat-agent",
enable the vector extension, and run the schema from
migrations/001_initial.sql (which you'll generate from the plan).
Use prepare_database_migration to test on a branch before committing
to main. Then add NEON_DATABASE_URL to my .env (use the main-branch
connection string).

agent uses the Neon MCP server's tools: create_project, run_sql (for CREATE EXTENSION vector), prepare_database_migration (to test on a branch), and complete_database_migration (to apply to main). The Worker-side schema lands. At the end of this Decision, you have a working Neon database with the six tables, pgvector enabled, indexes built, and the connection string in .env.

Yeh exactly the development-use case from Concept 12. Schema management via natural language on a branch, with explicit human approval (you said "go ahead") before main is touched. The Neon MCP server has earned its place here.

OpenCode mein kya change hota hai. Same flow. OpenCode ka MCP integration Neon server ko identically use karta hai; visible UI different hai (approval prompt format) lekin operations same hain.

What migrations/001_initial.sql looks like

Full migration Concept 7 ka six-table schema hai, plus customer-support example ke liye four domain-specific tables. Yahan do pieces dikhane laayiq hain:

-- migrations/001_initial.sql — abbreviated; full file in the repo

CREATE EXTENSION IF NOT EXISTS vector;

-- [Six tables from Concept 7 omitted: conversations, messages, documents,
-- embeddings, audit_log, capability_invocations. See Concept 7.]

-- Domain-specific tables for customer-support
CREATE TABLE customers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT NOT NULL UNIQUE,
tier TEXT NOT NULL CHECK (tier IN ('free', 'pro', 'enterprise')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
public_id TEXT NOT NULL UNIQUE, -- the #4429-style display ID
placed_at TIMESTAMPTZ NOT NULL,
amount_cents INT NOT NULL,
status TEXT NOT NULL CHECK (status IN ('placed','shipped','delivered','refunded','cancelled'))
);

CREATE TABLE tickets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
order_id UUID REFERENCES orders(id) ON DELETE SET NULL,
summary TEXT NOT NULL,
resolution TEXT,
status TEXT NOT NULL CHECK (status IN ('open','in_progress','resolved','escalated')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
resolved_at TIMESTAMPTZ
);

CREATE TABLE refunds (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
order_id UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
ticket_id UUID REFERENCES tickets(id) ON DELETE SET NULL,
amount_cents INT NOT NULL,
reason TEXT NOT NULL,
issued_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

verify Decision 3 worked. Open psql with aap ka NEON_DATABASE_URL and run:

SELECT extname FROM pg_extension WHERE extname = 'vector';   -- one row: 'vector'
SELECT COUNT(*) AS table_count FROM information_schema.tables
WHERE table_schema='public'; -- expect 10
SELECT indexname FROM pg_indexes
WHERE indexname = 'idx_embeddings_hnsw'; -- one row

Teen rows wapas, no errors: Decision 3 done hai. Agar vector missing hai, temporary branch par CREATE EXTENSION step main mein merge nahi hua; complete_database_migration dobara run karein. Agar table count off hai, migration cleanly apply nahi hui.


Decision 4: Write the first Skill, summarize-ticket

What you do: skill ke saath scaffold karein, phir woh field own karein jo matter karti hai.

Folder ko blank file se hand-create na karein. Skill se generate karein, phir steer karein. Ecosystem se skill-creator install karein aur usay summarize-ticket scaffold karne ka prompt dein. Woh folder, SKILL.md, body, examples sab produce karta hai. Phir aap woh aik kaam karte hain jo skill aap ke liye judge nahi kar sakti: description field own karna. Yeh wohi install-a-skill-then-prompt pattern hai jo Chapter 57 MCP servers ke liye use karta hai, Skills par apply hua: aap describe karte hain, agent scaffold karta hai, aap steer karte hain.

Skill install karein (Chapter 57 wali same syntax):

npx skills add https://github.com/anthropics/skills --skill skill-creator

Phir Claude Code ko skill scaffold karne ko kahein:

Use skill-creator to scaffold a skill named summarize-ticket in
.claude/skills/. It summarizes a customer support ticket into five
sections: Customer Context, Issue Description, Resolution Steps Taken,
Current Status, Recommended Next Action. The body should be imperative
step-by-step instructions, two examples (one short ticket, one
complex), and three edge cases (escalated ticket, ticket with no
resolution yet, irate customer). Keep the body under 200 lines.

skill-creator poora folder produce karta hai. Ab woh hissa karein jo delegate nahi hota: description khud rewrite karein. Reason Concept 3 hai. description hi decide karta hai ke skill kabhi fire ho gi ya nahi; yeh ghalat ho to skill ya silently kabhi activate nahi hoti, ya har prompt par fire hoti hai (worse). Description quality yeh judgment call hai ke user ki kaun si phrasings is skill ko trigger karni chahiye, aur yeh judgment skill ki poori life mein compound hoti hai. Scaffolded description spec satisfy kar deti hai lekin generic hoti hai; hand-owned description woh specific triggers capture karti hai jin ki aap ko parwah hai. Body, examples, edge cases: sab skill-generated under human review.

Aap kuch drafts ke baad jis description par settle karte hain:

---
name: summarize-ticket
description: Summarizes a customer support ticket into a structured format with sections for Customer Context, Issue Description, Resolution Steps Taken, Current Status, and Recommended Next Action. Use when the user provides a ticket ID, asks for a ticket summary, asks "what's the status of ticket X", or asks to be brought up to speed on a ticket. Produces a concise summary suitable for handoff to another agent.
---

Concept 3 ki discipline: what name karein ("summarizes a customer support ticket into a structured format"), when name karein ("when the user provides a ticket ID, asks for a ticket summary, asks 'what's the status of ticket X', or asks to be brought up to speed on a ticket"), specific keywords surface karein ("ticket ID," "ticket summary," "status of ticket," "brought up to speed").

Phir skill-creator ki likhi hui body parhein. Narrative cheez par push back karein. Skills imperative instructions ("Look up the ticket. Extract X. Format as Y.") ke saath narrative descriptions ("This skill is designed to help with...") se behtar chalti hain. Isay imperative ki taraf steer karein; Concept 3 ki discipline yahan poori apply hoti hai.

Why. Yeh first time agent gets capability that wasn't in tools.py. The next conversation that mentions a ticket ID won't go through the search_docs stub; it'll activate this skill and follow its instructions. model decide karta hai the skill applies based on the description; the body controls what happens next.

What changes in OpenCode. .claude/skills/summarize-ticket/SKILL.md wali skill OpenCode ke fallback path ke zariye auto-discover hoti hai (Concept 4). Duplication ki zarurat nahi.

What .claude/skills/summarize-ticket/SKILL.md looks like
---
name: summarize-ticket
description: Summarizes a customer support ticket into a structured format with sections for Customer Context, Issue Description, Resolution Steps Taken, Current Status, and Recommended Next Action. Use when the user provides a ticket ID, asks for a ticket summary, asks "what's the status of ticket X", or asks to be brought up to speed on a ticket. Produces a concise summary suitable for handoff to another agent.
---

# Summarize ticket

When this skill activates:

1. Extract the ticket ID from the user's message. Format: a UUID, or a
public ID like "TKT-1234". If both forms are present, prefer the UUID.
2. Call the `lookup_customer` tool with the ticket's customer_id (which
you'll get in step 4).
3. Call `find_similar_resolved_tickets` with the ticket's summary, to
surface 2-3 prior similar resolutions.
4. Fetch the ticket itself by querying for ticket_id = <id>. If the ticket
has order_id set, fetch the order too.
5. Compose a five-section summary:

**Customer Context** — name, tier, account age, any recent prior tickets
**Issue Description** — the ticket summary, the affected order if any
**Resolution Steps Taken** — actions logged in audit_log scoped to this ticket
**Current Status** — status field, who is assigned, time since last update
**Recommended Next Action** — based on similar resolved tickets, propose ONE
specific action: respond with template, escalate, request more info, close

6. Keep the summary under 200 words total.

## Example: short ticket

Ticket TKT-1042 from a free-tier customer who reported a missing order. Status:
open, 18 minutes old, no actions yet. Three similar resolved tickets show
this is usually a delivery-tracking issue resolved in 2 hours.

Output:
**Customer Context**: Sara K. (free tier, 7-month account, 0 prior tickets)
**Issue Description**: Order #4912 not received, 5 days post-shipping date
**Resolution Steps Taken**: None — ticket 18 minutes old
**Current Status**: Open, unassigned
**Recommended Next Action**: Reply with delivery-tracking template; auto-close
in 24h if no follow-up.

## Example: complex ticket

[A second worked example with an escalated ticket — omitted for brevity.]

## Edge cases

**Escalated ticket** — include the escalation reason and the assignee. Mark the
"Recommended Next Action" as "Continue with assignee; do not duplicate."

**No resolution yet** — explicit "no actions logged" in Resolution Steps Taken.
Don't infer.

**Irate customer (from message sentiment)** — flag in Customer Context.
Recommend tier-2 review even if the issue is small.

Doosri do Skills, brief mein. Same shape apply hoti hai. Do aur frontmatter blocks, har aik ke liye forty minutes ki careful description-writing, aur Worker ke paas teenon required Skills hoti hain.

# .claude/skills/find-similar-cases/SKILL.md (frontmatter only)
---
name: find-similar-cases
description: Searches the resolved-tickets library for tickets semantically similar to a customer's described issue, returning the top 3-5 with their resolutions and a similarity score. Use when the user describes a problem, complaint, or symptom and you need to check whether the team has handled something similar before. Calls the find_similar_resolved_tickets MCP tool. Always run this BEFORE drafting a response, so the response can reference proven prior resolutions rather than inventing a new approach.
---

Body teen steps walk through karti hai: context se issue description extract karein, limit=5 ke saath find_similar_resolved_tickets call karein, top three ko similarity scores ke saath markdown table mein present karein, aur low-confidence matches (similarity > 0.3) ko explicitly "no strong prior precedent found" flag karein. Instruction "always run this BEFORE drafting" real kaam kar rahi hai; is ke baghair model kabhi kabhi priors se reply draft karta hai aur library dekhta hi nahi.

# .claude/skills/escalate-with-context/SKILL.md (frontmatter only)
---
name: escalate-with-context
description: Packages a customer conversation for handoff to a tier-2 support agent. Produces a structured escalation note with customer profile, issue summary, what was already tried, why escalation is recommended, and the suggested specialist team. Use when (a) the customer is on the Pro or Enterprise tier AND the issue is unresolved after one round of investigation, (b) the customer's sentiment is clearly negative, (c) the issue involves billing >$500 or a refund decision, or (d) the user explicitly asks for a human.
---

Body structured context lene ke liye pehle summarize-ticket invoke karti hai, phir six-section escalation note likhti hai (customer context, issue, attempted resolutions, sentiment signals, recommended team, suggested SLA). Description mein chaar explicit trigger conditions hi is skill ko over-fire hone se rokti hain; vague escalation logic wala Worker har cheez escalate karta hai, jo purpose defeat karta hai.

Decision 7 tak teenon Skills .claude/skills/ mein live hon gi. Structure transfer hota hai; sirf descriptions aur operational specifics change hoti hain.


Decision 5: build the embedding pipeline and seed the document library

Cost impact (Decision 5)

Kuch dozen resolved tickets ka seed corpus, har aik ~300 tokens, text-embedding-3-small ke $0.02 per 1M input tokens par cent ke fraction mein embed hota hai. New tickets aur messages ki ongoing embedding worked-example volume par usually $3/month ke under rehti hai. Embeddings bill ki sab se cheap line hain; cost lever inference budget hai, embedding budget nahi.

What you do. Two pieces here: the embedding pipeline code (from Concept 9) and a small corpus to embed. The Worker's "library" for this example is a curated set of past resolved tickets: small enough to run tez, big enough that semantic search shows its value. A dozen rows is enough to feel the architecture work; a few dozen makes the retrieval results more interesting. The course doesn't ship the corpus, because writing it is part of the exercise: give aap ka agent the column shape and a few example rows, and have it generate the rest.

mkdir -p data/seed
# data/seed/resolved-tickets.csv (columns: id, customer_email, summary, resolution)
# A few starter rows; ask your agent to expand this to a dozen or more,
# varied enough that semantic search has something to discriminate between.
id,customer_email,summary,resolution
1,sara@example.com,Refund not showing up two weeks after return approved,Refund was stuck in pending due to a payment-gateway batch delay; manually re-triggered and funds posted within 24h.
2,raj@example.com,Cannot log in after email change,Account email was updated but the session cache held the old address; cleared the session and the customer logged in normally.
3,mei@example.com,Duplicate charge on a single order,Gateway retry created a second authorization; voided the duplicate and confirmed only one capture settled.

customer_email column seeder ko ticket insert karne se pehle customers row find-or-create karne deta hai (tickets.customer_id foreign key NOT NULL hai, is liye seed yeh step skip nahi kar sakta). Phir agent se poochein:

Implement src/chat_agent/embedding/{chunker,embedder,seeder}.py based on
the patterns from Concepts 9 and 10 of the course. Add a CLI command
`python -m chat_agent.embedding.seeder data/seed/resolved-tickets.csv`
that:

1. Reads the CSV.
2. For each row:
a. Find or create a customers row by email (default tier='free').
b. Insert a tickets row with status='resolved' and customer_id set.
c. Insert a documents row with source='past_case', title=the ticket
summary, body=summary+resolution combined, and metadata containing
the ticket id ({"ticket_id": "<uuid>"}). This is the row the
embedding will link to: embeddings.document_id references
documents(id), not tickets(id). See Concept 7's schema and the
Concept 14 query.
d. Chunk the documents row's body.
e. Embed each chunk with text-embedding-3-small.
f. Insert each chunk into embeddings with document_id set to the
documents row from step (c) (not conversation_id).
3. Write a single audit_log row recording "seed_run" with the row count.

Use asyncpg via a connection pool. Do NOT use the Neon MCP server for
this; direct connection only. We're populating data, not managing
the database.

model produces the seeder. Run it. Watch the embedding count grow. Confirm the index works:

-- A sanity-check query you should run after seeding
SELECT COUNT(*) FROM embeddings WHERE document_id IS NOT NULL;
SELECT model, COUNT(*) FROM embeddings GROUP BY model;
EXPLAIN ANALYZE
SELECT chunk_text, embedding <=> (SELECT embedding FROM embeddings LIMIT 1) AS d
FROM embeddings ORDER BY d LIMIT 5;

EXPLAIN ANALYZE ko "Index Scan using idx_embeddings_hnsw" dikhana chahiye; yeh HNSW index ka kaam karna hai. Agar "Seq Scan" dikhaye, index use nahi ho raha (usually table bohat chhoti hone ki wajah se; pgvector ka planner certain row count se neeche sequential scans par fall back karta hai, jo small seed corpus ke liye fine hai lekin jaana worth hai).

Why this is direct asyncpg, not MCP. Seed scripts are infrastructure. They run once, by hand, by an authorized operator. They are not a Worker's runtime path. The MCP boundary is for things agent does autonomously; the seed script is for things you do. Don't put unnecessary boundaries between khud and apna database.

OpenCode mein kya change hota hai. Same approach. Same files generate karne ke liye OpenCode ka /build slash command equivalent use karein (ya direct prompting).

verify Decision 5 worked. The sanity-check block above is aap ka verification surface. specifically:

  • SELECT COUNT(*) FROM documents WHERE source = 'past_case' should equal the row count in aap ka CSV. Each resolved ticket becomes one documents row; this is the row the embedding links to.
  • SELECT COUNT(*) FROM embeddings WHERE document_id IS NOT NULL bhi CSV row count ke equal hona chahiye (har short ticket ke liye aik embedding, kyun ke seed corpus chunking threshold se neeche hai).
  • SELECT model, COUNT(*) FROM embeddings GROUP BY model aik row dikhaye, jis mein woh embedding model named ho jo aap ne actually use kiya. Do rows ka matlab hai aap ne mid-seed models mix kar diye; clean state se dobara run karein.
  • Kuch hundred rows se zyada corpora ke liye EXPLAIN ANALYZE line Index Scan using idx_embeddings_hnsw dikhaye. Small seed corpus ke saath pgvector ka planner Seq Scan par fall back karta hai aur yeh fine hai.

Agar counts zero hain, to aap ke seeder ne exception silently swallow ki hai; seed run ke liye aap ke seeder ki likhi hui audit_log row check karein.


Decision 6: Write the custom MCP server for runtime access

Cost impact (Decision 6)

Custom MCP server aap ke sandbox process mein run karta hai, is liye separate hosting cost nahi. Bill inference mein show hota hai: har lookup_customer ya find_similar_resolved_tickets call next model turn mein round-trip ke tokens add karta hai. Concept 15 MCP-under-load ki latency aur pool-size side cover karta hai.

Aap kya karte hain: server mcp-builder se build karein, phir is course ki discipline se steer karein. FastMCP boilerplate haath se na likhein. Yeh wahi install-a-skill-then-prompt pattern hai jo Chapter 57 ne MCP servers ke liye use kiya tha: aap tools describe karte hain, agent server build karta hai, agent verify karta hai. Aap ka kaam build ko Digital-FTE rules se steer karna hai jo generic boilerplate generation apne aap infer nahin karegi.

mcp-builder install karein (Decision 4 ke skill-creator jaisi same syntax):

npx skills add https://github.com/anthropics/skills --skill mcp-builder

Phir server describe karein. Start ke liye teen tools: lookup_customer, find_similar_resolved_tickets, issue_refund. run_sql nahin. Har tool narrowly scoped. Neeche ka prompt discipline ko explicit steering ke taur par carry karta hai, aisi cheez nahin jo agent ke guess par chhori gayi ho:

Use mcp-builder to build a custom MCP server at
src/customer_data_mcp/server.py on the mcp.server.fastmcp.FastMCP
framework, with three @mcp.tool() functions and NO general-purpose
run_sql tool (scope is the point; see Concept 14):

1. lookup_customer(customer_id) → returns id, email, tier, active_tickets.
2. find_similar_resolved_tickets(description, limit=5) → embed the
description with text-embedding-3-small (the SAME model used at insert
time; mixed models return nonsense rankings), THEN run the vector
query. Join embeddings to documents (embeddings.document_id →
documents.id) and on to tickets via documents.metadata->>'ticket_id',
filtered to source='past_case' and status='resolved'. Returns
ticket_id, summary, resolution, similarity_score.
3. issue_refund(order_id, amount_cents, reason) → inside ONE transaction:
insert a refunds row, mark the order status='refunded', and write the
audit_log row. The audit write and the action it logs must commit in
the same transaction, never as an afterthought.

Use asyncpg with a connection pool. Register pgvector on connections that
touch the embedding column. Type every function. Run via stdio as __main__.

mcp-builder server produce aur verify karta hai. Har tool ki docstring parhein. Har docstring woh text hai jo model parhta hai decide karne ke liye ke tool kab call karna hai: SKILL.md description jaisa same role. Vague docstrings tools ko wrong times par fire karwati hain. Aap ne jahan steer kiya woh chaar disciplines (broad run_sql ke bajaye scoped tools, matched model ke saath embed-before-query, pgvector registration, same transaction mein audit write) exactly woh rules hain jo generic MCP scaffold miss karega; prompt mein inhein own karna isay database boilerplate ke bajaye Digital-FTE server banata hai. Why this is a custom MCP server, not just asyncpg calls in agent code. Concept 14's three reasons, in the order they matter for this Worker: scope (agent can do exactly three things to database, not anything SQL allows), isolation (the MCP server runs in its own process, with its own connection pool that agent can't accidentally exhaust), and reusability (when we build the second Worker that also needs lookup_customer, it talks to the same server).

What changes in OpenCode. The MCP server itself is tool-agnostic. agent ki MCP server registration moves to opencode.json:

{
"mcp": {
"customer-data": {
"type": "local",
"command": ["python", "-m", "customer_data_mcp.server"],
"env": { "DATABASE_URL": "${env:DATABASE_URL}" }
}
}
}
Skeleton of customer_data_mcp/server.py
# src/customer_data_mcp/server.py
import json
import os
from typing import Annotated

import asyncpg
from mcp.server.fastmcp import FastMCP
from openai import AsyncOpenAI
from pydantic import Field

mcp: FastMCP = FastMCP("customer-data")
_pool: asyncpg.Pool | None = None
_oai: AsyncOpenAI | None = None


async def get_pool() -> asyncpg.Pool:
global _pool
if _pool is None:
_pool = await asyncpg.create_pool(
os.environ["DATABASE_URL"], min_size=1, max_size=10,
)
return _pool


# The SQL in this server uses unqualified table names (embeddings, tickets, ...),
# which assumes the tables live in the `public` schema. That holds if you
# followed the course's migration as-is. If you put the schema somewhere else
# (a Neon branch with a non-default schema), schema-qualify the table names:
# Neon's pooled endpoint silently drops `server_settings` startup params and
# resets `SET search_path` on connection release, so search_path isolation does
# not survive a pooled connection.


def get_oai() -> AsyncOpenAI:
global _oai
if _oai is None:
_oai = AsyncOpenAI()
return _oai


@mcp.tool()
async def lookup_customer(
customer_id: Annotated[str, Field(description="Customer UUID.")],
) -> dict[str, str | int]:
"""Look up customer by UUID. Returns id, email, tier, active_tickets count.
Use when the user provides a customer ID and you need their profile.
"""
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"""SELECT id::text, email, tier,
(SELECT COUNT(*) FROM tickets t
WHERE t.customer_id = c.id AND t.status='open') AS active_tickets
FROM customers c WHERE id = $1::uuid""",
customer_id,
)
return dict(row) if row else {"error": "customer not found"}


@mcp.tool()
async def find_similar_resolved_tickets(
description: Annotated[str, Field(description="Customer's issue description.")],
limit: Annotated[int, Field(ge=1, le=10)] = 5,
) -> list[dict[str, str | float]]:
"""Find resolved tickets similar to a description, via pgvector.
Use when the user describes an issue and you want to check prior resolutions.
"""
oai = get_oai()
emb = await oai.embeddings.create(
model="text-embedding-3-small", input=[description],
)
query_vec = emb.data[0].embedding

pool = await get_pool()
async with pool.acquire() as conn:
# embeddings.document_id references documents(id), not tickets(id).
# Decision 5's seeder stores each resolved ticket as a documents row
# (source='past_case') with the ticket id in metadata, so the join
# path is embeddings -> documents -> tickets.
rows = await conn.fetch(
"""SELECT t.id::text AS ticket_id, t.summary, t.resolution,
(e.embedding <=> $1::vector) AS distance
FROM embeddings e
JOIN documents d ON d.id = e.document_id
JOIN tickets t ON t.id = (d.metadata->>'ticket_id')::uuid
WHERE d.source = 'past_case' AND t.status = 'resolved'
ORDER BY e.embedding <=> $1::vector
LIMIT $2""",
query_vec, limit,
)
return [dict(r) for r in rows]


@mcp.tool()
async def issue_refund(
order_id: Annotated[str, Field(description="Order UUID.")],
amount_cents: Annotated[int, Field(ge=1, description="Refund amount in cents.")],
reason: Annotated[str, Field(description="Reason recorded with the refund.")],
) -> dict[str, str]:
"""Issue a refund for an order. Writes refunds row, updates order.status,
and writes an audit_log row. Use only when refund is authorized.
"""
pool = await get_pool()
async with pool.acquire() as conn:
async with conn.transaction():
refund_id: str = await conn.fetchval(
"""INSERT INTO refunds (order_id, amount_cents, reason)
VALUES ($1::uuid, $2, $3) RETURNING id::text""",
order_id, amount_cents, reason,
)
await conn.execute(
"UPDATE orders SET status='refunded' WHERE id = $1::uuid",
order_id,
)
await conn.execute(
"""INSERT INTO audit_log
(actor, action, target, payload, result)
VALUES ($1, $2, $3, $4::jsonb, $5::jsonb)""",
"worker:customer-support", "refund_issued", order_id,
json.dumps({"amount_cents": amount_cents, "reason": reason}),
json.dumps({"refund_id": refund_id}),
)
return {"refund_id": refund_id, "order_id": order_id, "status": "ok"}


if __name__ == "__main__":
mcp.run(transport="stdio")

Verify karein Decision 6 worked. Server standalone start karein aur confirm karein ke yeh apne teen custom tools list karta hai:

# In one terminal: start the MCP server as if it were being launched by the agent
uv run python -m customer_data_mcp.server &
# In another terminal, or after the server prints "Server listening on stdio":
# Try the agent
uv run python -m chat_agent.cli "test: list your available tools"

agent ki first response should name the three custom tools: lookup_customer, find_similar_resolved_tickets, issue_refund. (Some models also surface their own meta-tools, like OpenAI's multi_tool_use.parallel, a parallel-call wrapper; that is normal and not from aap ka MCP server.) If you see a generic database tool (run_sql or similar), agent hai also connected to the Neon MCP server at runtime; remove that from aap ka runtime mcp_servers= list (it's development-only per Concept 12). If the three custom tools are missing or you get an init-timeout error, the MCP server crashed at startup; check the subprocess stderr for the actual exception. The Concept 13 timeout admonition covers the ML-import case.

An SSL/Event loop is closed traceback on shutdown is harmless

Yeh server AsyncOpenAI client hold karta hai (find_similar_resolved_tickets ke andar embedding ke liye). Jab MCP subprocess shut down hota hai, HTTP library ki connection cleanup stdio transport ke event-loop teardown se race kar sakti hai aur stderr par multi-line Fatal error on SSL transport / RuntimeError: Event loop is closed traceback print kar sakti hai. Yeh teardown noise hai: run already complete ho chuka hota hai. Success ko exit code aur audit trail se judge karein, clean-looking shutdown se nahi. (Agar noise bother kare, deeper fix yeh hai ke embedding call ko agent process mein move karein aur vector ko tool argument ke taur par pass karein; is se server ka scope bhi narrow hota hai, jo Concept 14 ka whole point hai.)


Decision 7: Wire audit logging everywhere

What you do. Two pieces of wiring: agent ki own audit writes (skill invocations, model calls), and the audit writes inside the MCP server (already shown in issue_refund above). Both happen alongside actions, never as an afterthought.

agent-side helper from Concept 10 (log_capability) gets used at three sites in cli.py and sandboxed.py: at the start and end of each Skill invocation, after each MCP tool call, and around guardrail trips. The key discipline: the audit write and the action it logs happen in the same code path, ideally in the same transaction.

Ask agent:

Modify src/chat_agent/cli.py to log_capability around each
Runner.run_streamed call. Capture: which skills activated (visible
from RunItemStreamEvent.tool_called for skill events), which MCP
tools were called, total tokens, latency, status. Write to audit_log
via a direct asyncpg connection (NOT through the customer-data MCP
server — audit is the meta-layer; it shouldn't share the same MCP
boundary as the data it audits).

Audit connection pool customer-data MCP server ke pool se separate hai. Do reasons: data pool saturated ho tab bhi audit ko succeed hona chahiye, aur audit writes ko same connections ke liye business writes se compete nahi karna chahiye. Jo audit subsystem usi system ke zariye starve ho sakta hai jise woh audit kar raha hai, woh audit subsystem nahi.

Audit infrastructure jitni complicated dikhti hai utni hai nahi. Concept 7 already audit_log aur capability_invocations ship karta hai; Concept 10 ka helper already dono ko ek transaction mein likhta hai. Is Decision mein naya kaam sirf yeh hai ke har meaningful boundary par us helper ko call kiya jaye. Mechanics simple hain; discipline yeh yaad rakhne mein hai ke yeh consistently karna hai.

OpenCode mein kya change hota hai. Identical. Audit code plain Python hai; tool-specific differences nahi.

verify Decision 7 worked. Run one throwaway conversation against agent, then ask database whether the audit fired:

-- Pick the most recent conversation
SELECT id FROM conversations ORDER BY started_at DESC LIMIT 1;

-- ... then for that conversation_id, every meaningful event should have a row
SELECT action, target, created_at FROM audit_log
WHERE conversation_id = '<id-from-above>'
ORDER BY created_at;

Running from the standalone Python cli.py, you should see at minimum: message_received, at least one capability_invoked, and message_sent. (No skill_activated row from the standalone path; Decision 8 explains why and shows how to see one inside Claude Code.) If you see only the MCP-side rows (capability_invoked + domain actions like refund_issued), agent-side log_capability helper is wired but not firing; check that you invoked it from the Runner.run_streamed event loop, not just at run start. If you see zero rows: the audit connection pool isn't connecting; check audit.py's asyncpg.create_pool(...) against aap ka NEON_DATABASE_URL.


Decision 8: verify end-to-end with the test scenario

Aap kya karte hain. Brief se customer scenario run karein:

You: I haven't received my refund from order #4429 — it's been two weeks.

Trace what happens. Running from the standalone Python cli.py, four things you should see in the audit log within a few seconds:

  1. action=message_received: user's message arrives, conversation row created.
  2. action=capability_invoked, target=mcp:lookup_customer: agent finds the customer behind order #4429.
  3. action=capability_invoked, target=mcp:find_similar_resolved_tickets: vector search finds similar past cases.
  4. action=message_sent: agent ki draft reply, recorded.

Agar refund authorized hota: aik action=capability_invoked, target=mcp:issue_refund row, plus MCP tool khud (Decision 6) ki likhi hui action=refund_issued row jo payload mein amount aur reason carry karti. Agar refund not authorized ho (policy check fail ho), to reason ke saath action=refund_blocked row: replay ke liye no-action case action case jitna hi aham hai. In sab action names Concept 10 ki canonical vocabulary table se match karte hain.

Standalone Python path se jo aap nahi dekhen ge woh skill_activated row hai. Skill discovery Claude Code / OpenCode client capability hai (Quick Win ke qareeb "One precondition" callout); bare Agent(...) .claude/skills/ scan nahi karta. Decision 4 ki teen Skills disk par valid aur ready hain, lekin standalone agent lookup_customer aur find_similar_resolved_tickets tak MCP boundary ke through directly pohanchta hai, summarize-ticket skill ke through nahi. Skills layer ko fire hota dekhne ke liye same agent ko Claude Code ke andar run karein (next callout).

Partial-implementation success state

Rows 1 and 4 (message_received, message_sent) come from Decision 7's agent-side audit wiring. Rows 2, 3, and the refund rows come from the MCP server itself (Decision 6). If you implemented Decisions 1-6 + 8 but skipped Decision 7 (a defensible slice for a learning pass; the MCP-side audit is already enough to prove the architecture), aap dekhenge only the MCP-side rows. That is also a success state: the business writes have a receipt; agent-side trace is what you add when aap ko chahiye to replay reasoning, not just outcomes.

See the Skills layer fire: run it inside Claude Code

Everything above runs through the standalone Python cli.py, which cannot load .claude/skills/. To exercise the Skills layer the brief promised, open this same project in Claude Code (or OpenCode) and give it the scenario prompt directly. Now skill discovery is active: the client scans .claude/skills/, model matches the summarize-ticket description, and aap ka agent-side log_capability wiring records a skill_activated row alongside the capability_invoked rows. Same audit schema, same canonical vocabulary, one more action value: the row the standalone path structurally cannot produce. Yeh moment the three layers, Skills + system of record + MCP, are all visible in one trace.

Then run the replay query, the one that proves aap ka audit log is replayable:

SELECT al.created_at, al.action, al.target, al.payload, al.result
FROM audit_log al
WHERE conversation_id = $1
ORDER BY al.created_at;

Read the result. You should be able to reconstruct, line by line, what agent did and why, without re-running model. If aap nahin kar sakte (if a step happened that isn't in the log, or if a row is in the log but the action it claims happened isn't reflected in the business tables) there's a wiring bug. Fix it now.

Why this scenario. Yeh un architectural pieces ko exercise karta hai jo yeh course add karta hai: MCP-backed tools (lookup_customer, find_similar_resolved_tickets) Neon system of record ke against run hotay hain, aur audit trail poora path record karti hai, sab SQL mein replayable. Previous course ke chat agent mein in mein se kuch bhi nahi tha. Skills layer teesra piece hai: .claude/skills/ mein built aur ready, aur jab aap same agent Claude Code ke andar chalate hain (upar wali callout), jahan skill discovery active hoti hai, to yeh trace mein join hoti hai aur baqi rows ke saath aik skill_activated row nazar aati hai.

What changes in OpenCode. The verification is client-agnostic: run inside Claude Code or OpenCode and you get the same trace, skill_activated row included, as long as agent code, Skills, and MCP server are the same. The standalone Python cli.py produces the same trace minus that one row, for the reason Decision 8 explains.


What just happened

Eight Decisions, aur previous course ka chat agent ab Worker hai. Dekhein kya change hua:

  • Capability code se bahar move hui. Teen Skills .claude/skills/ mein hain, version-controlled, agents ke across sharable.
  • State moved into system of record. Six tables in Neon, with semantic search via pgvector, and a real domain schema for customers/orders/tickets/refunds.
  • Runtime database access is mediated. agent doesn't have asyncpg in its imports; it talks to Postgres through a scoped MCP server that exposes exactly three tools.
  • Har action trace chhor kar jata hai. Audit log weeks ya months baad bhi kisi bhi conversation ka full reasoning trace SQL mein replay kar sakta hai.

OpenAI Agents SDK ab bhi wahi hai. Aap ka sandbox runtime (worked example mein Cloudflare) ab bhi wahi hai. Previous course ki streaming, guardrails, aur tracing ab bhi wahi hain. Jo cheez badli hai woh upar baithi hui architecture hai: Skills capabilities rakhti hain, Neon truth rakhta hai, aur MCP dono ko wire karta hai.

Thesis isi ko agent aur Worker ke darmiyan farq kehti hai.


Part 5: Where this course leaves off

Cost shape of a Worker: what does it actually cost to operate?

Previous course mein "every turn re-bills the world" wali thread thi; yeh course ab tak dollars par quiet raha hai. Gap close karte hue, specifically Part 4 worked example ke liye: aik customer-support Worker jo 200 conversations per day karta hai, ~10 messages per conversation, average context 8K tokens per turn, teen Skills, teen MCP tools.

Cost ki chaar lines, size ke order mein.

LineDriverRough monthly costWhen it bites
Model inferenceinput tokens x turns x $/M$60 - $200Volume x prompt size. Cache hits on stable prefixes (CLAUDE.md, Skills metadata, system prompt) typically recover 60 - 80% of input cost.
Neon Postgresstorage + active compute$0 - $25Free tier covers a single Worker doing this volume. Scale-to-zero means idle hours cost nothing. Paid tier kicks in once a project crosses the 0.5 GB / 100 CU-hours free-plan limits (see Neon pricing for current numbers).
Embeddingschunks x $0.02/M tokens$0.30 - $3One-time cost for seed data plus incremental embedding of new tickets/messages. Negligible unless you're embedding entire conversation histories continuously.
Cloudflare Sandboxcontainer minutes$0 - $15Depends on session length and concurrency. Idle reaping helps; long-running sessions don't.

Total monthly for the Part 4 Worker: roughly $60 - $240. model is the largest line, by an order of magnitude over everything else. system of record hai essentially free at this volume. Skills discipline (progressive disclosure, Concept 2) is also cost discipline: a Worker with bloated SKILL.md bodies pays for every turn that activates them; a Worker with descriptions-only-at-discovery and tight bodies-on-activation pays for what it uses.

Three knobs that move the dial most. (1) Cache hit rate: keep aap ka CLAUDE.md, system prompt, and Skills metadata stable; cache misses cost 5 - 10x more than hits. (2) Model tier: the same Worker on a cheaper model (DeepSeek V4 Flash, Claude Haiku) often does 80% of the job at 10% of the cost; route hard decisions to the frontier model only when needed. (3) Audit table growth: audit_log is the largest table by row count (Concept 7's PRIMM Predict); partition or archive past 90 days if you don't actually query historical audit data.

The honest scaling number. Is shape par 50 Workers ki workforce (jo aap ka next course cover karta hai) inference ke liye roughly $3,000 - $12,000/month, Neon ke liye $50 - $200, embeddings ke liye single-digit dollars, aur sandbox compute ke liye $100 - $500 hoti hai. Infrastructure layer cheap rehti hai; model bill scale hota hai. Isi liye previous course ki har cost-discipline habit (cache hits, model tiering, context hygiene) compound hoti hai jab aap aik Worker se bohat se Workers tak jate hain.


Swap guide: the architecture is invariant, the products are not

Yeh course names specific vendors at every layer (OpenAI Agents SDK, Cloudflare Sandbox, Neon, OpenAI embeddings, MCP Python SDK). That's because a teaching example needs concrete answers, not "use any LLM runtime you like." But the architecture works with any compliant alternative. Five swaps the course ki design explicitly anticipates:

  • Postgres host: Neon -> Supabase, AWS RDS, self-hosted. Anything with pgvector works. You lose branching and scale-to-zero (those are Neon-specific), but the six-table schema, the embedding pipeline, the audit-trail discipline, and the custom MCP server pattern are all transferable byte-for-byte. The only change is the connection string and possibly the SSL config.
  • Vector storage: pgvector -> Pinecone, Weaviate, Qdrant. If you reject the "one database for both relational and vector data" argument from Concept 6, swap the embeddings table for a vector-DB client. The cost: two stores to keep consistent (Concept 6 argues this is rarely worth it). The benefit: better recall at very large scales (10M+ vectors), and managed-service operational simplicity.
  • Embedding model: OpenAI -> Cohere, Voyage, BGE-small (local). Change one constant (EMBEDDING_MODEL) and one column dimension (VECTOR(n)). Run a one-shot re-embed of existing data. Concept 9's pipeline doesn't change.
  • Sandbox: Cloudflare Sandbox -> E2B, Modal, Daytona, apna Docker. Isolated process boundaries aur clean restart wali koi bhi cheez kaam karti hai. SandboxAgent runtime backend-agnostic hai; worked example bas Cloudflare use karta hai. Skills ke scripts/ same tareeqe se execute hotay hain. Previous course ka trust-boundary diagram ab bhi apply hota hai.
  • Agent runtime: OpenAI Agents SDK -> LangGraph, CrewAI, Pydantic AI, apna loop. The MCP boundary is what survives; every modern agent framework has an MCP client. Skills work in any agent that can load SKILL.md files (Claude Code, OpenCode, Goose, and increasingly Cursor/Windsurf). The audit-trail discipline is framework-agnostic Python.

Kyan't swap easily. The MCP protocol itself, the Skills format spec, and the audit-trail discipline. These are the invariants: the things you'd port between products, not the things you'd port between. Same architectural shape, replaceable implementations underneath.


What this course doesn't cover (yet)

Ab aap ke paas aik Worker hai jo thesis ke set kiye hue Seven Invariants mein se do satisfy karta hai. Specifically: yeh engine par run karta hai (Invariant 4, previous course se), aur system of record ke against run karta hai (Invariant 5, is course se). Baqi five Invariants woh hain jo production AI-Native Companies require karti hain, aur subsequent courses cover karte hain. Yahan har aik sirf aik bullet hai, section nahi.

  • Invariant 1: The human is the principal. Authored specs, approval gates, budget declarations. Architecture for setting intent and owning outcomes, covered in Part 6 of the book.
  • Invariant 2: Every human needs a delegate. A personal agent at the edge that holds aap ka context, represents aap ka judgment, and brokers work to the workforce. Thesis names OpenClaw as the current realization.
  • Invariant 3: The workforce needs a manager. An orchestrator that assigns work, enforces budgets, audits execution, exposes hiring as a callable capability. Thesis names Paperclip.
  • Invariant 6: Workforce policy ke under expandable hai. Aik meta-layer jahan authorized agent human ko wake kiye baghair prompt generate karta hai, runtime provision karta hai, aur new Worker register karta hai. Claude Managed agents is ki aik realization hai.
  • Invariant 7: The workforce runs on a nervous system. Triggers (schedules, webhooks, inbound API calls) wake agent under the authority envelope. Inngest (durable functions and background jobs) is one realization for general workforce events; Claude Code Routines is the coding-agent-specific path.

Neon system of record se read karne wala single Worker is course ki taught architecture ka smallest unit hai. Next course us Worker ko workforce mein extend karta hai: multiple Workers jinhein manager coordinate karta hai, demand par expandable, triggers se woken. Same OpenAI Agents SDK foundation, same Cloudflare Sandbox runtime, same Skills format, same Neon system of record. Architecture invariant hai.


How to actually get good at this

Reading this crash course does not make you good at building Workers. Using it does. The path looks the same as for previous course: you start manual, feel the friction, and let each piece of friction teach you which Concept it belongs to.

The mapping for this course:

  • "Why isn't my skill firing when it should?" -> description quality (Concept 3). Rewrite. Test by inventing five different ways user might phrase the trigger.
  • "Why is agent inventing data database doesn't have?" -> agent hain't actually calling the MCP server. Check the trace; check the mcp_servers=[...] registration.
  • "Why is my audit log incomplete?" -> the audit write isn't in the same code path as the action (Concept 10). Move it next to the action, in the same transaction.
  • "Kyun my pgvector results irrelevant?" -> either chunking is wrong (Concept 9), or the embedding model at insert-time doesn't match the embedding model at query-time. Re-embed.
  • "Why is my MCP server slow under load?" -> connection pool inside the server is too small, or the tools list isn't cached on the client. Concept 15.
  • "Why does the Neon MCP server feel scary in production?" -> because Neon's own docs say it's not for production. Write a custom MCP server (Concept 14). Pehla one takes 30 minutes; the second takes 10.

Build the architecture one piece at a time. Skills, system of record, aur MCP ko aik hi weekend mein add karne ki koshish na karein. Previous course ka chat agent lein. Pehle system of record add karein (Decisions 3 - 5) aur dekhein aap ka debugging experience kaise badalta hai. Phir aik Skill add karein (Decision 4) aur dekhein model usay use karne ka faisla kaise karta hai. MCP boundary sab se last add karein (Decision 6). Har step apni learning hai; teeno ko aik bari rewrite mein combine kar dein to woh wall ban jate hain.

The portability dividend extends. Skills you write here work in any Agent Skills-compatible client. Schemas you write here work in any Postgres. MCP servers you write here work with any MCP client: Claude, GPT, Gemini, local models. Architecture is invariant; the products are 2026. When the products change, aap ka code mostly doesn't.

Aap kis cheez par waqt lagate hain, us mein shift

Decision 4 ke baad aap ka kaam shape badalta hai. Code likhna agent ko brief karne wali cheez ban jata hai; description review karna (config file ka woh field jise aap normally skim karte) load-bearing craft ban jata hai. Jis description ko draft aur refine karne mein aap ne 30 minutes lagaye, woh agent ke generated 200 lines MCP server code se zyada architectural work kar rahi hoti hai, kyunke description hi routing surface hai jo model har turn read karta hai.

Do practical shifts. Pehla, aap "is ko implement kaise karun?" poochna chhor kar "real user trigger ko paanch different tareeqon se kaise phrase kar sakta hai?" poochna shuru karte hain. Code downstream hai; description ghalat ho to agent code tak kabhi pohanchta hi nahi, aur code quality irrelevant ho jati hai. Doosra, authorship ki jagah review load-bearing skill ban jati hai. Agent draft karta hai; aap decide karte hain ke draft un trigger cases mein kaam karta hai ya nahi jin ke liye aap ne description likhi thi. Sab se mushkil hissa yeh urge resist karna hai ke jab aap khud teen minutes mein solve kar sakte hain to draft rewrite na kar dein - wahi discipline jo aap ko MCP boundary bypass karne se rokta hai.


Quick reference

The 15 concepts in one line each

  1. agent Skill is a folder. SKILL.md plus optional scripts/references/assets.
  2. Progressive disclosure. Metadata at startup -> full body on activation -> references on demand.
  3. SKILL.md frontmatter + body hai. Name, description, optional metadata, phir operational instructions.
  4. Skills files ke taur par travel karti hain. Same SKILL.md Claude Code aur OpenCode mein bina modification kaam karta hai.
  5. Small Skills ko filesystem handoff ke zariye compose karein jab isolation orchestration simplicity se zyada matter kare.
  6. Postgres + pgvector almost har agent workload ke liye separate vector DB se better hai. Neon branching, scale-to-zero, aur MCP server add karta hai.
  7. Six tables minimum operational schema hain: conversations, messages, documents, embeddings, audit_log, capability_invocations.
  8. pgvector basics: VECTOR(1536) + <=> cosine distance + HNSW index. Dono ends par same embedding model use karein.
  9. Embedding pipeline: semantic boundaries par chunk karein (~400 tokens with overlap), batch-embed karein, model metadata ke saath store karein.
  10. Audit logging nahi. Har meaningful action usi transaction mein row likhta hai jis mein woh action record hota hai.
  11. MCP protocol hai, service nahi. Teen primitives (tools, resources, prompts), teen transports (stdio, streamable HTTP, legacy SSE).
  12. Neon MCP server development ke liye hai. Schema design, branch-based migrations. Production runtime ke liye nahi.
  13. OpenAI Agents SDK has a built-in MCP client. from agents.mcp import MCPServerStdio, MCPServerStreamableHttp. Use async with. Cache list_tools in production.
  14. Custom MCP servers apni value earn karte hain scope, isolation, aur reusability ke zariye. Aik agent ke single function ke liye server na likhein.
  15. Load ke under MCP: remote ke liye streamable HTTP, tools cache karein, connections reuse karein, server ke andar pool karein, _meta ke zariye trace context propagate karein.

Decision tree: @function_tool vs. custom MCP server vs. vendor MCP server

Capability used by one agent, one process, one function?
→ @function_tool

Capability that multiple agents (or multiple deployments) will reuse?
→ Custom MCP server

Vendor provides one and it does what you need?
→ Vendor MCP server (don't rebuild)

Sensitive operations, needing narrow scope?
→ Custom MCP server (NOT a broad `run_sql` interface)

Long-running, background, or process-isolated work?
→ Custom MCP server (process isolation buys safety)

File location quick-ref

WhatPath
Project Skills.claude/skills/<name>/SKILL.md (OpenCode reads as fallback)
Personal Skills~/.claude/skills/<name>/SKILL.md (OpenCode reads as fallback)
Schema migrationsmigrations/NNN_*.sql
Embedding codesrc/chat_agent/embedding/{chunker,embedder,seeder}.py
Custom MCP serversrc/customer_data_mcp/server.py
MCP server registration (Claude Code).claude/settings.json mcpServers
MCP server registration (OpenCode)opencode.json mcp block
Audit helpersrc/chat_agent/audit.py

When something feels wrong

Skill not firing when it should
→ Description too vague. Rewrite with "Use when..." and specific keywords (Concept 3).

Skill firing when it shouldn't
→ Description too broad. Add explicit constraints in the description.

pgvector returning irrelevant results
→ Embedding model mismatch (insert vs. query). Verify the model column in
the embeddings table. Re-embed if needed.

MCP tool not appearing in agent
→ Server not registered, or list_tools cache stale. Check mcp_servers=[...]
and try cache_tools_list=False temporarily.

Audit log has gaps
→ Action and audit write are in different code paths. Move them next to
each other, ideally same transaction.

Agent timing out on Postgres operations under load
→ MCP server's connection pool too small. Check asyncpg.create_pool(max_size=...).

MCP server hangs on startup with torch / sentence-transformers / large imports
→ Default client_session_timeout_seconds=5 is too short for servers that
load ML models at import. Bump to 60. See Concept 13's callout.

CREATE TABLE fails: relation "notes" already exists
→ You're pointing at a database that already has tables. Use a fresh
database or Neon branch for the Quick Win (Step 2).

An OLD asyncpg refuses the Neon DSN with "unsupported startup parameter"
→ Only older asyncpg versions choke on channel_binding=require. The
version this course pins accepts it; if you are pinned to an old
asyncpg, either upgrade or strip that one parameter from the DSN.
TLS posture is unchanged either way.

Neon -pooler endpoint: schema-qualified SQL needed, search_path ignored
→ The pooled endpoint silently drops server_settings startup params
and resets SET search_path on connection release. If your tables
are not in the public schema, schema-qualify them (see the
Decision 6 MCP-server skeleton's get_pool comment). A reader who
built in public is unaffected.

Non-OpenAI key getting 401 against api.openai.com
→ Set OPENAI_BASE_URL to your provider's OpenAI-compatible endpoint
(e.g., https://api.deepseek.com/v1) before running the agent.

Agent fails partway with a 401 / auth / BadRequestError
→ Wrong key, wrong provider, or expired key. Run the curl key smoke-test
from Step 5 before the full run; it fails in one second instead of
four files deep.

Neon MCP server returning errors in production agent code
→ You're using it wrong. Neon's docs are explicit: development only.
Write a custom MCP server instead (Concept 14, ~30 minutes).

Appendix: Audience, prerequisites, glossary

Audience, prerequisites, previous course ka recap, aik Postgres basic guide, first-pass reading guide, aur glossary. Isay reference ke taur par use karein: main flow mein koi term unfamiliar ho to wapas flip karein, ya start karne se pehle A.1 parh lein agar aap sure nahi ke prerequisites complete hain.

A.1: Audience & prerequisites

Audience. Yeh intermediate-to-advanced course hai, apne predecessors se denser. Build AI Agents complete karna recommended hai, required nahi: companion zip ka starter project us course ki end state scratch se reproduce karta hai, is liye aap sirf OPENAI_API_KEY ke saath yahan se shuru kar sakte hain. Jo cheez zaruri hai woh us course ki sikhayi hui cheezon ke saath comfort hai, kyun ke hum us ke chat agent ko extend karte hain, dobara explain nahi karte. OpenAI Agents SDK, agent loop, sessions, streaming, function tools, sandboxing: sab assumed hain, chahe aap ne previous course mein seekhe hon ya pehle se jaante hon.

Prerequisites. yeh page assumes four things.

  1. You have completed Build AI Agents with the OpenAI Agents SDK and Cloudflare Sandbox. Yeh non-negotiable. We pick up where its worked example left off: same project layout, same agents.py, same cli.py, same tools.py. If you have not, do that course first; this one will read as friction without it.
  2. You have the Agentic Coding Crash Course discipline. Plan mode, rules files (CLAUDE.md / AGENTS.md), slash commands, context discipline. Yeh course ki worked example uses Skills as slash commands at one point, so the rules-file discipline is load-bearing.
  3. Aap ne kam az kam aik Chapter 42 ka PRIMM-AI+ cycle kiya hai. Is course ke Predict prompts assume karte hain ke aap predict, run, investigate, modify, make jaante hain.
  4. Aap ke paas working Postgres mental model hai. Tables, indexes, transactions, foreign keys. Aap ko DBA hona zaruri nahi. Aap ko pata hona chahiye SELECT ... WHERE kya karta hai, index kis liye hota hai, aur roughly JOIN kya karta hai. Agar aap ne kisi bhi language mein aik CRUD app likhi hai, aap calibrated hain.

A.2: Previous course ne kya sikhaya jo yeh course assume karta hai

Full course: Build AI Agents with the OpenAI Agents SDK and Cloudflare Sandbox. Us course ki aath cheezen jin par yeh course directly build karta hai:

  1. The state-and-trust frame. Every agent primitive is SDK's answer to a state question or a trust question. Yeh course extends both axes: state into system of record, trust into a skills library.
  2. agent loop. Model decides -> is_final? -> run_tool (trust boundary) -> history grows -> next turn. Yeh course adds MCP tool calls and Skills invocations to that loop, but the loop shape doesn't change.
  3. @function_tool. A typed Python function exposed to model. Yeh course ki Concept 14 contrasts it with MCP-exposed tools; aap ko chahiye to know what @function_tool is to understand when not to reach for MCP.
  4. Sessions. Previous course ka SQLiteSession ab bhi kaam karta hai. Yeh course usay replace nahi karta; is ke saath Postgres-backed audit trail add karta hai.
  5. Streaming events. Runner.run_streamed aur RunItemStreamEvent. Hum in events se skill activations aur MCP tool calls log karte hain (Decision 7).
  6. guardrails. Input and output guardrails. Yeh course doesn't add new guardrail concepts; you carry over what you have.
  7. Cloudflare Sandbox. SandboxAgent with Shell() and Filesystem(). Yeh course ki Worker still deploys to a sandbox; system of record lives outside it (in Neon).
  8. Dual-tool pattern (Claude Code + OpenCode). .claude/skills/ mein aap ki written Skills dono mein kaam karti hain. MCP server registration tool config ke hisaab se differ karti hai; server khud identical hai.

Stop signal. Agar "agent loop is model -> tool -> history -> loop, with max_turns capping it" review jaisa lage, continue karein. Agar yeh naya material lage, stop karein aur pehle previous course karein. Is course ka worked example usi course ke chat agent ko evolve karta hai; us foundation ke baghair parhna friction ban jata hai.

A.3: Postgres essentials this course uses

Agar aap ne kisi bhi language mein aik CRUD application likhi hai, to aap calibrated hain. Jo cheezen aap dekhen ge:

  • Tables, primary keys, foreign keys. Hamare schema mein six tables, har aik UUID primary key aur apne parent ke explicit foreign keys ke saath.
  • Indexes. Lookup columns par regular B-tree indexes; vector columns par HNSW indexes. Indexes queries accelerate karte hain; inserts par cost karte hain.
  • Transactions. BEGIN ... COMMIT (ya asyncpg mein async with conn.transaction():) multiple writes ko group karta hai taake sab hon ya koi bhi na ho. Concept 10 ke mutabiq audit write aur jis action ko woh log karta hai, dono aik transaction mein jate hain.
  • JSONB columns. Postgres ka native JSON type. Arbitrary key-value data store karta hai, JSON operators se queryable. audit_log.payload aur audit_log.result mein use hota hai.
  • Extensions. CREATE EXTENSION vector pgvector enable karta hai. Doosri extensions bhi hoti hain (text search ke liye pg_trgm, spatial ke liye postgis); humein sirf pgvector chahiye.

Stop signal. Agar "JOIN tickets ON tickets.customer_id = customers.id" obvious lage, continue karein. Agar JOIN syntax unfamiliar hai, worked example friction jaisa lage ga. Pehle 90-minute Postgres tutorial kar lein.

A.4: How to read yeh page on first pass

Same rule as previous course in the series:

  • Expand on first read: anything labeled "What aap dekhenge," "Sample transcript," "Expected output," "verify." These contain the runnable behavior to check predictions against.
  • Skip on first read: anything labeled "What skill.md looks like in full," "The complete migration SQL," and other full-file listings in Part 4's worked example. The narrative above each block tells you what changed; you only need file contents when you actually build.
  • Optional throughout: the "Try with AI" blocks at the end of each concept. Extension prompts for Claude Code or OpenCode; skip without guilt if aap ka tool isn't set up.

First pass ka goal three-layer model internalize karna hai (Skills capability layer hain, Neon system of record state layer hai, MCP connector hai) aur yeh samajhna hai ke yeh us OpenAI Agents SDK + Cloudflare Sandbox stack ke upar kaise baithte hain jo aap already jaante hain. Second pass, keyboard par aap ke hands ke saath, woh jagah hai jahan aap build karte hain.

A.5: Glossary

Yeh woh terms hain jo first encounter par reader ko sab se zyada trip kar sakti hain. Har term context mein dobara explain hoti hai jab woh appear hoti hai.

  • Skill: A folder with a SKILL.md file and optional scripts, references, and assets. The folder is the skill; file inside it is the entry point. agent load karta hai Skills via progressive disclosure: name+description at startup, full instructions when triggered, referenced files on demand.

  • SKILL.md: The skill ki entry file. YAML frontmatter with name and description (and optional metadata), then markdown body with the instructions agent follows when the skill activates.

  • Progressive disclosure: Teen-stage skill-loading model. Discovery: agent read karta hai names and descriptions of all available Skills at startup. Activation: agent read karta hai the full SKILL.md of the matching skill when task triggers it. Execution: agent load karta hai referenced files (or runs bundled scripts) on demand during execution.

  • system of record (SoR): Authoritative store of state the Worker reads from and writes to. Thesis term for "database that holds the truth." For this course: a Neon Postgres database.

  • Neon: Managed Postgres service jisme serverless branching, scale-to-zero, aur free tier hai. Doosre managed Postgres ke muqablay mein is ka differentiator branching (seconds mein copy-on-write database copies) aur first-class MCP server hai.

  • pgvector: Postgres extension jo similarity search ke liye vector column type plus distance operators add karti hai. Aik database ko relational data aur embedding-based semantic search dono hold karne deti hai.

  • Embedding: Fixed-length numerical vector jo text ke piece (ya other data) ko is tarah represent karta hai ke semantic similarity vector distance par map hoti hai. Embedding model se generated (text-embedding-3-small OpenAI default hai).

  • MCP (Model Context Protocol): Open standard jo define karta hai AI agents external tools, resources, aur prompts se kaise connect karte hain. Client/server architecture, teen primitives (tools, resources, prompts), aur teen transports (stdio, SSE, streamable HTTP) define karta hai.

  • MCP server: A program that exposes capabilities (tools/resources/prompts) to MCP clients. The Neon MCP server is one example; aap likh sakte hain apna in Python or TypeScript.

  • MCP client: agent-side counterpart that connects to MCP servers, lists their capabilities, and surfaces them to model. OpenAI Agents SDK has an MCP client built in.

  • Tool (MCP): One of three MCP primitives. A function model can invoke. From model's perspective, an MCP tool looks similar to a @function_tool, but the implementation lives in the MCP server, not agent ki process.

  • Resource (MCP): One of three MCP primitives. A read-only data source agent can fetch. files, database query results, API responses. Read but not write.

  • Prompt (MCP): One of three MCP primitives. A reusable prompt template the server provides for model to invoke. Less common than tools and resources; useful for standardized templates across teams.

  • Audit log: Database table jo Worker ke har meaningful action ko record karti hai (har tool call, har database write, har capability invocation) aisi form mein jise company baad mein replay, query, aur reason kar sake.

A.6: What this appendix does NOT replace

Full curriculum mein manager layer (Paperclip), meta-layer (runtime par AI Workers hire karna), aur trigger gateway (Inngest) par courses shamil hain. Un mein se koi yahan summarize nahi kiya gaya, kyun ke woh is course se pehle nahi balki isay extend karte hain. Jab wahan pohanchein to unhein parhein; is course ki Worker ko abhi un ki zaroorat nahi.