ایجنٹک انجینئرنگ کے بنیادی اصول: 45 منٹ کا مختصر عملی کورس

8 تصورات، 80% کا حقیقی Use

Prerequisite: ایجنٹک کوڈنگ مختصر عملی کورس. کہ صفحہ سکھاتا ہے ٹولز (Claude Code، OpenCode، منصوبہ طریقہ، CLAUDE.md، skills، MCP، hooks). یہ صفحہ سکھاتا ہے discipline آپ استعمال کریں انہیں کے ساتھ. دو ہیں ایک دwasرے کو مکمل کرنے والے: ٹولز بغیر طریقہ کار پیدا کرتا ہے vibe کوڈ; طریقہ کار بغیر ٹولز ہے theory.

"کوڈ ہے نہیں سستا. Bad کوڈ ہے زیادہ تر مہنگا یہ has ever been." Matt Pocock

"Vibe کوڈنگ ہے کے بارے میں raising floor کے لیے everyone میں اصطلاحات کا کیا they سکتا ہے کریں میں سافٹ ویئر. Agentic engineering ہے کے بارے میں preserving quality bar کا کیا existed پہلے میں professional سافٹ ویئر." Andrej Karpathy

ایک narrative ہے loose میں industry: AI ہے ایک نیا paradigm، اس لیے old engineering قواعد نہیں longer apply; specifications ہیں نیا ماخذ کوڈ; ماڈل ہے compiler; diff doesn't matter بطور long بطور program behaves. یہ ہے comforting، اور یہ ہے wrong.

thesis کا یہ باب، اور throughline کا ہر Digital FTE میں یہ کتاب، ہے opposite. سافٹ ویئر بنیادی اصول matter زیادہ میں AI era than they did پہلے یہ. وجہ ہے mechanical، نہیں sentimental. interface آپ ڈیزائن ہے interface agent learns سے; names آپ چنیں ہیں names یہ reuses; boundaries آپ draw ہیں boundaries یہ respects. ایک agent میں ایک clean، well-tested codebase پیدا کرتا ہے کوڈ several quality tiers above وہی agent میں ایک tangled ایک. ڈھانچہ ہے نہیں longer just ایک property کا کوڈ; یہ ہے ایک input کو agent. Bad کوڈ yields bad agents. اچھا کوڈ yields agents کہ دیکھیں astonishingly competent.

یہ باب سکھاتا ہے ورک فلو کہ بناتا ہے کہ competence دہرائے جانے کے قابل: ایک seven-stage pipeline (idea → grilling → PRD → issues → implementation → review → QA) implemented کے ذریعے چھوٹا، composable skills کہ کام identically میں Claude Code اور OpenCode. skills، specs، اور ڈھانچے سے متعلق نمونے لکھا گیا کے لیے ایک drop میں دwasرا unchanged. طریقہ ہے constant. ٹول ہے variable.

کے ذریعے end کا باب آپ گا be able کو:

Locate yourself on vibe کوڈنگ ↔ ایجنٹک انجینئرنگ spectrum اور چنیں طریقہ کار کہ matches stakes کا آپ کا کام.
Diagnose six ناکامی طریقے کا AI کوڈنگ اور apply cure کے لیے ہر.
چلائیں ایک مکمل grill → PRD → vertical-slice issues → AFK implementation loop میں either Claude Code یا OpenCode.
لکھیں ایک SKILL.md کہ agent loads صرف جب needed، rather than burning tokens on ہر turn.
refactor ایک codebase سے "shallow modules" میں "deep modules" اس لیے AI فیڈبیک کے چکر اصل میں کام.
استعمال کریں working vocabulary fluently: smart zone، dumb zone، clearing، compaction، ہینڈ آف، AFK، tracer bullet، ڈیزائن concept، grilling، jagged intelligence.

pipeline پر ایک glance

پہلے any کا theory، یہاں ہے operating shape باب سکھاتا ہے. Seven stages، five skills، ایک direction کا flow. ہر later حصہ either explains ایک row یا دکھاتا ہے یہ میں کوڈ.

#	Stage	کیا ہوتا ہے	Input → نتیجہ	skill	حصہ
1	خیال → Aligned concept	Agent interviews آپ Socratically until ڈیزائن ہے shared	wish → ڈیزائن concept	`grill-me`	§6.1
2	تصور → Destination	Synthesise conversation میں ایک PRD	conversation → PRD	`to-prd`	§6.2
3	PRD → Backlog	Split PRD میں vertical-slice tickets	PRD → tracer-bullet issues	`to-issues`	§6.3
4	issue → Slice	Implement ایک slice، test-first	issue → reviewable diff	`tdd`	§6.4
5	Slices → Drained backlog	AFK loop drains queue اندر sandboxes	issues → PRs	(orchestrator)	§6.5
6	Diff → فیصلہ	Human reads diff، چلتا ہے QA	PR → merge یا نیا issue	(taste، نہیں automated)	§6.6
7	Codebase health، ongoing	Find shallow modules; propose deepenings	codebase → RFC	`improve-codebase-architecture`	§7.4

Stages 1–3 ہیں دن shift: human میں loop. Stages 4–5 ہیں night shift: agent چلتا ہے AFK میں ایک سینڈ باکس. Stage 6 ہے back کو دن shift. Stage 7 چلتا ہے on ایک weekly cron اور فراہم کرتا ہے نیا issues میں stage 3. whole pipeline چلتا ہے identically میں Claude Code اور OpenCode.

نیا کو programming? پڑھیں یہ پہلا.

یہ باب assumes آپ رکھتے ہیں لکھا گیا کوڈ، استعمال ہوا git، چلائیں ایک test suite، اور opened ایک pull request پہلے. If those ہیں familiar، skip یہ box اور continue.

If they're نہیں yet familiar، باب ہے اب بھی readable بطور ایک conceptual map. آپ گا get: shape کا ورک فلو، vocabulary آپ ضرورت کو understand AI-کوڈنگ conversations، ایک diagnostic catalogue کا عام ناکامیاں، اور ڈھانچے سے متعلق philosophy کہ بناتا ہے agents کام اچھی طرح میں حقیقی codebases. آپ گا not be able کو چلائیں مثال کوڈ yet; کہ لیتا ہے ایک few weeks کا programming foundations پہلا. honest path ہے: پڑھیں یہ باب once کے لیے map، سیکھیں prerequisites، پھر come back اور follow کوڈ.

bare-minimum vocabulary آپ ضرورت کو follow conceptual حصے:

repo (short کے لیے repository): ایک پروجیکٹ کا فولڈر کا کوڈ، tracked کے ذریعے git.

Branch: ایک parallel نسخہ کا repo کہاں آپ سکتا ہے experiment بغیر affecting مرکزی کوڈ. Worktree ہے ایک related concept: ایک copy کا repo on disk، attached کو ایک branch.

Commit: ایک saved snapshot کا تبدیلیاں، کے ساتھ ایک short message describing انہیں.

Pull request (PR): ایک proposed تبدیلی submitted کے لیے review پہلے being merged میں مرکزی branch. thing humans review میں باب's stage 6.

Test / test suite: کوڈ کہ checks دwasرا کوڈ ہے درست، چلائیں automatically. "Tests pass" means checks all came باہر green.

سینڈ باکس (یا container): ایک isolated environment، like ایک sealed mini-computer، کہاں agent سکتا ہے چلائیں، لکھیں فائلیں، اور break things بغیر touching rest کا آپ کا نظام.

token: unit کا text ایک لسانی ماڈل processes. Roughly 3/4 کا ایک word on average. ایک 100k-token سیاق و سباق کی ونڈو holds کے بارے میں 75،000 words.

terminal / shell / bash: text-based طریقہ کا running commands on ایک computer. Lines آغاز کے ساتھ $ میں یہ باب ہیں commands آپ type میں terminal.

1. سے Vibe کوڈنگ کو ایجنٹک انجینئرنگ

دو things رکھتے ہیں changed میں close succession. پہلا بنایا گیا second necessary.

1.1 سافٹ ویئر 3.0: ایک نیا Computing Paradigm

Andrej Karpathy describes سافٹ ویئر میں three eras. سافٹ ویئر 1.0 ہے کیا زیادہ تر engineers spent ان کا careers writing: explicit کوڈ، executed کے ذریعے ایک CPU، working پر structured ڈیٹا. سافٹ ویئر 2.0 ہے era کا learned weights: programming کے ذریعے curating datasets اور training neural networks rather than writing branching logic. سافٹ ویئر 3.0 ہے era we live میں اب: programming کے ذریعے prompting، کہاں LLM ہے ایک kind کا programmable computer، اور کیا آپ put میں سیاق و سباق کی ونڈو ہے lever آپ pull on یہ.

کیا تبدیلیاں درمیان eras ہے آرٹفیکٹ آپ produce. میں 1.0 آرٹفیکٹ was executable کوڈ. میں 3.0 آرٹفیکٹ ہے increasingly ایک piece کا text intended کے لیے ایک agent. جب OpenCode ships اس کا installer، یہ doesn't ship ایک bash script; یہ ships ایک paragraph کا natural language meant کو be pasted میں ایک کوڈنگ agent. agent reads environment، debugs میں loop، اور gets کو ایک working install. installer ہے نہیں longer ایک program; یہ ہے ایک skill.

یہ generalises. Documentation لکھا گیا کے لیے humans ("جائیں کو یہ URL، click Settings…") بن جاتا ہے documentation لکھا گیا کے لیے agents ("دیں یہ کو آپ کا کوڈنگ agent اور یہ گا configure آپ کا پروجیکٹ"). UIs ہیں نہیں longer صرف interface; agent بن جاتا ہے ایک second-class صارف کا ہر نظام آپ تعمیر کریں، اور ہر نظام آپ depend on. Agent-native بنیادی ڈھانچا (APIs، docs، tooling، اور ڈیپلائمنٹ pipelines ڈیزائن کیا گیا کے لیے agents پہلا) ہے اگلا پلیٹ فارم layer.

یہ باب ہے کے بارے میں operating میں سافٹ ویئر 3.0. skills (§5) ہیں 3.0 آرٹفیکٹس. PRDs اور tickets (§6) ہیں 3.0 آرٹفیکٹس. AGENTS.md اور CONTEXT.md فائلیں (§3، ناکامی 2) ہیں 3.0 آرٹفیکٹس. کوڈ itself ہے increasingly downstream کا all کا انہیں.

1.2 Vibe کوڈنگ Raises Floor; ایجنٹک انجینئرنگ Preserves Ceiling

Karpathy بھی coined vibe کوڈنگ: letting agent لکھیں کوڈ، accepting اس کا نتیجہ بغیر مطالعہ diff، judging یہ کے ذریعے whether program چلتا ہے. Vibe کوڈنگ ہے حقیقی، useful، اور یہاں کو stay. یہ ہے کیسے ایک non-programmer ships ایک useful ٹول پر ایک weekend; یہ ہے کیسے Karpathy describes تعمیر MenuGen، his side پروجیکٹ کہ converts restaurant menu photos میں menus کے ساتھ rendered dish images. Vibe کوڈنگ raises floor کا کیا ایک individual سکتا ہے پیدا کریں میں سافٹ ویئر. economic consequences کا کہ floor-raise ہیں بڑا، اور mostly اچھا.

ایک second طریقہ کار ہے اب emerging on top کا یہ: ایجنٹک انجینئرنگ. کہاں vibe کوڈنگ raises floor، ایجنٹک انجینئرنگ preserves ceiling: quality bar کا professional سافٹ ویئر. agent کرتا ہے زیادہ تر کا typing; آپ remain responsible کے لیے سیکیورٹی، ڈیٹا integrity، maintainability، contracts، اور صارف experience. Vibe کوڈنگ کرتا ہے نہیں introduce vulnerabilities; engineer استعمال کرتے ہوئے یہ carelessly کرتا ہے. bar کرتا ہے نہیں move just کیونکہ typist changed.

	Vibe کوڈنگ	Agentic engineering
مقصد	Raise floor کا کیا کا buildable	Preserve ceiling کا کیا کا professional
Reviewer	Often none; judge کے ذریعے whether یہ چلتا ہے	Human reads diff; automated review on top
ڈھانچہ	Whatever agent emits	ڈیزائن کیا گیا کے ذریعے engineer; implemented کے ذریعے agent
Tests	Optional	Non-negotiable; TDD on critical path
Codebase health	Drift accepted	refactor on ایک schedule; deepen modules
ناکامی handling	"یہ کام کرتا ہے کے لیے me"	Reproducible; tested; explained
درست setting	Side پروجیکٹس، prototypes، throwaway ٹولز	پروڈکشن نظام، regulated کام، کوئی بھی چیز multi-صارف

اصول اور ورک فلو میں یہ باب ہیں طریقہ کار کا ایجنٹک انجینئرنگ، نہیں freedom کا vibe کوڈنگ. جب آپ تعمیر کریں ایک Digital FTE کہ ایک organisation گا trust کے ساتھ payroll، customer escalations، یا financial reconciliation، vibe کوڈنگ ہے malpractice. آپ ضرورت floor and ceiling: raised throughput and preserved quality.

gap درمیان ایک mediocre agentic engineer اور ایک مضبوط ایک ہے much wider than old "10× engineer" gap. Karpathy: "10× ہے نہیں speed-up آپ gain. لوگ who ہیں بہت اچھا پر یہ peak ایک lot زیادہ than 10× سے my perspective درست اب." Closing کہ gap ہے کام کا یہ باب.

2. Three Constraints ہر کوڈنگ Agent Inherits

ایک کوڈنگ agent ہے نہیں ایک magical engineer; یہ ہے ایک ماڈل wrapped میں ایک harness. Three properties کا کہ pairing shape ہر ورک فلو we تعمیر کریں on top کا یہ: ایک finite attention بجٹ، نہیں persistent state، اور ایک jagged صلاحیت profile.

2.1 smart zone اور Dumb zone

جب ایک ماڈل predicts اگلا token (ایک chunk کا text، roughly three-quarters کا ایک English word)، یہ weighs every دwasرا token پہلے ہی میں سیاق و سباق کی ونڈو. ہر token has ایک finite attention بجٹ: ایک fixed share کا influence کو spend on rest. ایک window کا N tokens has on order کا N² attention relationships competing کے لیے کہ fixed بجٹ.

consequence ہے non-negotiable. Early میں ایک سیشن agent ہے میں اس کا smart zone: sharp، focused، recall ہے اچھا. بطور سیشن grows، ہر token کا signal ہے diluted کے ذریعے competitors. agent drifts میں dumb zone: یہ forgets schema آپ pasted پر top، invents fields کہ aren't میں type فائل، mis-binds دو variables کے ساتھ وہی name، contradicts اس کا اپنا earlier استدلال. وہی ماڈل، وہی parameters; just زیادہ mouths feeding سے وہی plate.

عملی ceiling، across موجودہ فرنٹیئر ماڈلز، regardless کا whether مارکیٹنگ claims ایک 200k یا 1M سیاق و سباق کی ونڈو، sits اچھی طرح below advertised window کے لیے کوڈنگ کام. Practitioner رپورٹس converge on something like 100k tokens بطور rough waterline پہلے drift starts کو دکھائیں، مگر exact number ہے کم اہم than shape: beyond some fraction کا advertised window آپ رکھتے ہیں نہیں been دیا گیا زیادہ صلاحیت; آپ رکھتے ہیں been دیا گیا زیادہ dumb zone کو spend money میں. بڑا windows مدد کے ساتھ retrieval پر long دستاویزات; they کریں نہیں extend استدلال horizon کے لیے کوڈ کے ذریعے وہی factor.

Token usage:    0k ────────── 50k ────── 100k ────── 200k ────── 1M
Quality:        ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░
                ↑                  ↑
                smart zone         dumb zone begins

Concretely، کیا کرتا ہے transition دیکھیں like میں ایک حقیقی سیشن? Roughly یہ:

turn  5  → you paste users.ts schema (8 fields: id, email, name, ...)
turn  9  → agent uses User.email correctly
turn 23  → agent builds a route, refers to User.id, all good
turn 47  → context is now ~80k tokens
turn 52  → agent writes  user.emailAddress  ← field doesn't exist
turn 55  → agent invents user.preferences   ← also not in the schema
           ⇒ smart zone exited.
           ⇒ /clear, re-paste schema in a fresh session, continue.

وہی ماڈل، وہی پرامپٹ پر turn 52 بطور پر turn 9. صرف thing کہ changed was attention بجٹ. cure ہے نہیں کو push کے ذریعے. Size ہر unit کا کام کو fit اندر smart zone، اور جب ایک unit ہے مکمل، throw سیشن away اور شروع کریں ایک نیا ایک.

2.2 Memento Problem

ماڈلز ہیں stateless. They carry nothing across ماڈل provider requests. Continuity اندر ایک سیشن ہے harness re-feeding سیاق و سباق on ہر turn; continuity across سیشنز ہے something ایک memory نظام wrote کو disk اور reloads پر اگلا سیشن شروع کریں.

یہ ہے ایک feature. زیادہ تر قابل اعتماد thing کے بارے میں ایک agent ہے کہ clearing سیاق و سباق returns یہ کو ایک known-good state. agent کہ just spent forty turns drifting میں dumb zone ہے وہی agent کہ، five seconds بعد ایک /clear، گا پڑھیں آپ کا fresh پرامپٹ کے ساتھ ایک fresh attention بجٹ اور پیدا کریں excellent کام.

وہاں ہیں دو طریقے کو recover جب ایک سیشن bloats:

Clearing: end سیشن، شروع کریں ایک fresh ایک. Total reset.
Compaction: summarise previous سیشن اور seed ایک نیا ایک کے ساتھ summary. Lossy.

زیادہ تر developers reach کے لیے compaction پہلا کیونکہ یہ feels کم destructive. Treat کہ instinct کے ساتھ suspicion: compaction preserves some کا dumb-zone استدلال کہ put آپ میں trouble. Clearing، paired کے ساتھ ایک چھوٹا لکھا گیا ہینڈ آف آرٹفیکٹ (ایک PRD، ایک ticket، ایک AGENTS.md)، دیتا ہے اگلا سیشن وہی آغاز state ہر وقت. Predictable starts پیدا کریں predictable finishes.

Working اصول. Treat agent like protagonist کا Memento. منصوبہ کے گرد اس کا forgetting. بنائیں ہر اہم fact survive میں environment (AGENTS.md، ایک CONTEXT.md، ایک skill، ایک ticket)، نہیں میں chat history.

2.3 jagged Intelligence

پہلا دو constraints ہیں کے بارے میں how much agent سکتا ہے attend کو. third ہے کے بارے میں what یہ ہے اچھا پر، اور یہ ہے ایک کہ catches engineers زیادہ تر off guard.

LLMs ہیں jagged. They ہیں نہیں uniformly smart; they peak sharply میں some domains اور stagnate میں دwasرے، کے ساتھ little correlation کو کیسے مشکل کام seems کو ایک human. ایک state-of-the-art ماڈل سکتا ہے refactor ایک hundred-thousand-line codebase یا find ایک zero-day vulnerability، and میں وہی session tell آپ کو walk کو ایک car wash fifty metres away rather than drive. دو abilities ہیں connected صرف کے ذریعے کون سا RL environments labs happened کو train on.

Frontier ماڈلز ہیں trained heavily کے ساتھ reinforcement سیکھنا on کام کہاں نتیجہ ہے verifiable: math problems کے ساتھ checkable جوابات، کوڈ کہ compiles اور passes tests، formal proofs. ماڈل learns brilliantly اندر those circuits کیونکہ reward signal ہے clean. Outside انہیں، یہ operates on pre-training intuition کے ساتھ نہیں comparable فیڈبیک کو sharpen یہ. صلاحیت profile looks like ایک mountain range کے ساتھ deep valleys: peaks پر competitive کوڈنگ اور کوڈ refactoring، ایک valley پر common-sense planning پر physical-world distances.

capability
   │
   │      ╱╲           ╱╲
   │     ╱  ╲    ╱╲   ╱  ╲
   │    ╱    ╲  ╱  ╲ ╱    ╲     ╱╲
   │   ╱      ╲╱    ╲╱      ╲   ╱  ╲
   │  ╱                      ╲ ╱    ╲___
   └────────────────────────────────────────► task
       code   refactor  math       car-wash    common-sense
                                   walking     physical reasoning

jagged-intelligence constraint has four operational implications.

پہلا، کوڈ ہے lucky domain. آپ ہیں working میں ایک کا deepest peaks on entire surface، نہیں کیونکہ کوڈنگ ہے intrinsically easier، مگر کیونکہ labs prioritised یہ economically اور trained یہ heavily. Treat یہ بطور اچھا fortune، نہیں بطور evidence کہ ماڈل "ہے intelligent." Outside یہ peak، وہی ماڈل سکتا ہے be confidently wrong کے بارے میں things ایک child would get درست.

Second، آپ کا فیڈبیک کے چکر ہیں کیسے آپ stay میں verifiable circuits. Static types، automated tests، lints، اور compile errors ہیں وہی reward signal ماڈل was trained کے خلاف. جب agent چلتا ہے آپ کا tests اور sees انہیں fail، یہ ہے operating میں فیڈبیک shape کہ produced اس کا strongest behaviours during training. بغیر those signals، یہ ہے back on pre-training intuition کے ساتھ نہیں correction. یہ ہے deeper why behind ناکامی 3 اور tdd skill: tests کریں نہیں merely catch bugs; they رکھیں agent on peak.

Third، آپ رکھتے ہیں کو جانیں کون سا circuit آپ ہیں میں. جب agent کرتا ہے something ایک junior engineer wouldn't رکھتے ہیں، یہ ہے often کیونکہ آپ رکھتے ہیں wandered off peak، میں ایک region labs did نہیں train کے لیے. "کیوں would آپ cross-reference صارفین کے ذریعے email بجائے کا کے ذریعے ایک explicit صارف_id?" Karpathy پوچھتا ہے، بعد watching his agent کریں exactly کہ on his MenuGen پروجیکٹ. agent was outside اس کا strongest circuits میں identity modelling across third-party services. fix was نہیں ایک بہتر پرامپٹ; یہ was Karpathy stepping میں کے ساتھ explicit ڈھانچے سے متعلق guidance.

Fourth، جب آغاز fresh، چنیں آپ کا stack کو land اندر ایک peak. jagged map ہے نہیں symmetric across languages یا فریم ورکس. Boris Cherny ہے matter-of-fact کے بارے میں کیوں Claude Code ہے built میں TypeScript اور React: "یہ کا بہت on distribution کے لیے ماڈل." جب دwasرا constraints permit، prefer mainstream choices: Python اور TypeScript پر niche languages، Postgres پر exotic stores، popular فریم ورکس پر hand-rolled ones. آپ ہیں نہیں picking technology آپ would لکھیں میں alone; آپ ہیں picking کیا آپ کا ایجنٹ افرادی قوت writes میں اچھی طرح. long tail گا catch up; until پھر، on-distribution choices buy years کا effective leverage.

Animals vs. ghosts. Karpathy describes LLMs بطور ghosts، نہیں animals: statistical simulations shaped کے ذریعے ڈیٹا اور reward، نہیں biological intelligences shaped کے ذریعے evolution. consequence: yelling پر ایک agent کرتا ہے نہیں improve یہ; sympathy کرتا ہے نہیں improve یہ; "think step کے ذریعے step" کرتا ہے نہیں wake dormant cognition. کیا کام کرتا ہے ہے putting agent on ایک peak (واضح سیاق و سباق، verifiable فیڈبیک، well-named کوڈ، ایک precise spec) اور letting trained behaviour fire. Treat agent psychology بطور physics، نہیں personality.

3. Six ناکامی طریقے کا AI کوڈنگ

three constraints پیدا کریں predictable ناکامیاں. Six میں particular دکھائیں up often enough کو treat بطور ایک closed catalogue. table below ہے diagnostic; paragraphs کہ follow expand ہر row میں symptom، root cause، اور cure کہ rest کا باب encodes بطور ایک skill.

#	Symptom	Root cause	Cure	skill	کہاں
1	" agent didn't کریں کیا I wanted."	نہیں shared ڈیزائن concept درمیان آپ اور agent	Force alignment پہلے any asset ہے لکھا گیا، via Socratic interview	`grill-me`	§5، §6.1
2	" agent ہے طریقہ too verbose."	نہیں ubiquitous language; آپ اور agent name وہی things differently	Maintain ایک `CONTEXT.md` کا domain اصطلاحات loaded ہر سیشن	`grill-with-docs`	§5، §6.1
3	" کوڈ doesn't کام."	Weak فیڈبیک کے چکر; agent ہے کوڈنگ blind	Loud environment (types، tests، lints) + TDD red-green-refactor	`tdd`	§5، §6.4
4	"We built ایک ball کا mud."	Shallow modules; agents پیدا کریں انہیں زیادہ تیز than humans clean انہیں up	Invest میں module ڈیزائن daily; periodic deepening pass	`improve-codebase-architecture`	§7
5	"My brain سکتا ہے't رکھیں up."	آپ ہیں مطالعہ ہر سطر پر 5× normal pace	Gray-box اصول: ڈیزائن interfaces، نمائندہ implementations	(ڈھانچے سے متعلق habit)	§7.3
6	"I'm reviewing زیادہ کوڈ than I'm تعمیر."	Throughput moved رکاوٹ کو review	Split review میں automated + human layers; vertical slices رکھیں diffs چھوٹا	`automated-review` (recipe میں §6.5; نہیں میں upstream pack)	§6.5، §7

ناکامی 1: " agent didn't کریں کیا I wanted."

زیادہ تر عام ناکامی ہے misalignment. آپ had ایک واضح picture کا feature; agent built something subtly مختلف; آپ disagree کے بارے میں کیا "مکمل" even means. یہ ہے ایک communication problem، نہیں ایک ماڈل problem. Frederick P. Brooks named missing thing میں The ڈیزائن کا ڈیزائن: ڈیزائن concept، shared، ephemeral خیال کا کیا ہے being built. PRDs، specs، اور conversations ہیں assets کہ try کو capture ڈیزائن concept; none کا انہیں are یہ.

Cure: force ڈیزائن concept کو stabilise before any کوڈ یا formal asset ہے لکھا گیا. technique ہے grilling: agent interviews آپ Socratically، ایک فیصلہ پر ایک وقت، walking down ہر branch کا ڈیزائن tree، proposing اس کا اپنا recommendation کے لیے ہر سوال، until دونوں sides ہیں aligned. حصہ 5 دکھاتا ہے skill.

ناکامی 2: " agent ہے طریقہ too verbose."

ایک fresh agent dropped میں آپ کا پروجیکٹ کرتا ہے نہیں جانیں آپ کا jargon. آپ کا codebase calls انہیں اسباق اور agent calls انہیں کورس units. آپ کا ٹیم says materialisation cascade اور agent writes ایک paragraph describing وہی خیال. دو کا آپ ہیں talking past ہر دwasرا اور burning tokens doing یہ.

یہ ہے وہی problem domain-driven ڈیزائن solved twenty-plus years ago: ubiquitous language. ایک پروجیکٹ ضرورت ہے ایک single shared vocabulary کہ کوڈ، tests، conversation، اور documentation all draw سے. کے ساتھ agents یہ has ایک second benefit: tighter vocabulary means fewer thinking tokens spent unfolding ambiguity اور زیادہ attention on کام.

Cure: maintain ایک CONTEXT.md پر repo root کے ساتھ پروجیکٹ کا domain اصطلاحات، loaded میں ہر سیشن. حصہ 5 دکھاتا ہے کیسے grilling اور CONTEXT.md pair میں وہی skill.

ناکامی 3: " کوڈ doesn't کام."

آپ aligned کے ساتھ agent. آپ wrote ایک clean spec. agent produced کوڈ، اور کوڈ ہے broken، sometimes obviously، sometimes silently. diagnosis ہے almost ہمیشہ weak فیڈبیک کے چکر. agent ہے کوڈنگ blind.

Pragmatic Programmer warns کے خلاف outrunning آپ کا headlights: taking on کام bigger than rate کا فیڈبیک سکتا ہے illuminate. Agents کریں یہ constantly، اور worse than humans، کیونکہ they گا happily لکھیں ایک thousand lines پہلے checking whether any کا انہیں compile. ایک کوڈنگ agent کا effective IQ ہے bounded کے ذریعے quality کا فیڈبیک اس کا environment فراہم کرتا ہے.

Cure: بنائیں environment loud، کے ساتھ static types، type-checked imports، automated tests، تیز lints، ایک pre-commit hook، browser access جب کام ہے visual. پھر enforce test-driven development اس لیے agent لیتا ہے چھوٹا، deliberate steps: failing test، بنائیں یہ pass، refactor، repeat. tdd skill میں §5 encodes یہ.

ناکامی 4: "We built ایک ball کا mud."

Agents accelerate everything، including rate پر کون سا ایک codebase بن جاتا ہے unmaintainable. بغیر intervention they پیدا کریں shallow modules (بہت سے tiny فائلیں exposing بہت سے چھوٹا functions، کے ساتھ implicit dependencies threading درمیان انہیں) کیونکہ shallow modules ہیں easier کو generate ایک پر ایک وقت. ایک agent کہ نہیں کر سکتا navigate اس کا اپنا codebase پیدا کرتا ہے worse کوڈ کے ساتھ ہر pass. codebase بن جاتا ہے ایک poison loop.

John Ousterhout، میں A Philosophy کا سافٹ ویئر ڈیزائن، دیتا ہے alternative: deep modules. Few بڑا modules کے ساتھ سادہ interfaces اور ایک lot کا functionality hidden behind انہیں. Deep modules ہیں easier کے لیے agents کو test ( test boundary ہے interface)، easier کو وجہ کے بارے میں (callers don't ضرورت کو جانیں implementation)، اور easier کو نمائندہ (آپ ڈیزائن interface; agent writes implementation).

Cure: invest میں module ڈیزائن every day (Kent Beck)، اور چلائیں improve-codebase-architecture periodically کو find shallow modules اور propose deepenings. حصہ 7 covers اصول میں depth.

ناکامی 5: "My brain سکتا ہے't رکھیں up."

ایک surprising ناکامی طریقہ، اور ایک serious ایک. Senior engineers working کے ساتھ agents کے لیے پہلا وقت often report being زیادہ tired، نہیں کم، despite shipping زیادہ کوڈ. کے ساتھ agent producing کوڈ پر three کو five times normal pace، engineer holds whole system میں ان کا head پر new pace. بغیر ڈھانچے سے متعلق طریقہ کار، cognitive load multiplies بجائے کا dividing.

Cure: gray box اصول. ڈیزائن module interfaces کے ساتھ مکمل attention; نمائندہ implementation کو agent; verify module سے outside via اس کا tests، نہیں کے ذریعے مطالعہ ہر سطر اندر. آپ hold ڈھانچے سے متعلق map; agent fills میں bricks. حصہ 7.3 expands یہ.

ناکامی 6: "I'm reviewing زیادہ کوڈ than I'm تعمیر."

flip side کا throughput. Once agent ships تیز، رکاوٹ moves کو کوڈ review، اور review کام expands کو fill یہ. cure ہے کو split review میں دو layers: ایک high-throughput automated layer کہ catches bulk کا routine issues، اور ایک low-throughput human layer کہ focuses on کیا automated layer نہیں کر سکتا.

Cure: ایک automated-review skill کہ چلتا ہے میں ایک fresh سیشن، کے ساتھ صرف diff، پروجیکٹ کا کوڈنگ standards، اور ایک سیکیورٹی checklist بطور input، اور پیدا کرتا ہے ایک structured comment on PR پہلے human opens یہ. چلائیں یہ pre-merge بطور ایک CI step; یہ catches contract regressions، missing tests، عام سیکیورٹی antipatterns، اور mismatches کے خلاف پروجیکٹ conventions. human reviewer arrives پر ایک pre-triaged PR، کے ساتھ attention freed کے لیے taste، پروڈکٹ fit، اور ambiguous calls automated layer flagged. Vertical slices (§6) رکھیں ہر diff چھوٹا; persistent review loops (§6.5.3) let automated reviewer چلائیں on ایک schedule rather than صرف پر merge وقت. None کا یہ eliminates human review; یہ relocates human کا attention کو کہاں judgement ہے non-substitutable.

یہ ہیں six ناکامیاں rest کا باب eliminates، میں order.

4. End-to-End ورک فلو

Everything کہ follows hangs off یہ skeleton: shape کا whole pipeline، fixed میں mind پہلے descending میں skills اور کوڈ.

4.1 دن Shift / Night Shift ماڈل

دو kinds کا کام. Human-in-the-loop کام درکار ہے ایک person پر keyboard answering سوالات اور making judgement calls: alignment، ڈیزائن، taste، QA. AFK ("away سے keyboard") کام چلتا ہے unattended میں ایک سینڈ باکس اور دکھاتا ہے آپ diff میں morning: implementation، refactors، test fills.

pipeline alternates:

flowchart TD
    subgraph DAY1["DAY SHIFT - human-in-the-loop"]
        A[Idea] --> B[Grill]
        B --> C[PRD]
        C --> D[Issues - vertical slices]
    end

    D --> BACKLOG[(backlog of issues)]

    subgraph NIGHT["NIGHT SHIFT - AFK, sandboxed"]
        E[Implementation Loop<br/>TDD per slice] --> F[Automated Review<br/>separate session]
    end

    BACKLOG --> E
    F --> PRS[(review-ready PRs)]

    subgraph DAY2["DAY SHIFT - back to human"]
        G[Human Review<br/>read the diff] --> H[QA] --> I[Merge]
    end

    PRS --> G
    H -. new issues from QA .-> BACKLOG

    classDef human fill:#e8f1ff,stroke:#3b6ea8,color:#0d2a4d
    classDef afk fill:#fff5e6,stroke:#a36a1a,color:#3d2700
    class DAY1,DAY2 human
    class NIGHT afk

ہر transition ہے ایک ہینڈ آف. ہر ہینڈ آف ہے mediated کے ذریعے ایک چھوٹا، پائیدار آرٹفیکٹ (ایک CONTEXT.md، ایک PRD، ایک ticket، ایک diff)، نہیں کے ذریعے ایک long-running سیشن. Long-running سیشنز die میں dumb zone; پائیدار آرٹفیکٹس survive forever. یہ ہے ڈھانچے سے متعلق insight کہ بناتا ہے rest کام.

4.2 Limits کا "Specs-to-Code"

specs ہیں useful. PRDs میں §6.2 ہیں specs. issues میں §6.3 ہیں mini-specs. CONTEXT.md ہے ایک spec. argument یہاں ہے narrower than ایک blanket rejection: یہ ہے کے خلاف treating specs بطور whole ورک فلو، کہاں آپ لکھیں ایک specification، compile یہ کے ذریعے ایک agent، ignore resulting کوڈ، اور if کوئی بھی چیز goes wrong، edit spec اور recompile. بطور ایک stage کا pipeline، specs ہیں essential. بطور ایک closed loop کہ replaces rest کا pipeline، they break down، کے لیے دو reasons.

** کوڈ ہے battleground.** Hidden اندر کوڈ ہیں constraints spec did نہیں anticipate: existing module feature لازمی integrate کے ساتھ، ڈیٹا shape ڈیٹا بیس اصل میں returns، bug کہ صرف emerges جب cache ہے cold. ایک spec کہ کرتا ہے نہیں respond کو یہ drifts further سے reality کے ساتھ ہر recompilation، اور ہر round پیدا کرتا ہے worse کوڈ than last کیونکہ agent inherits ایک longer history کا unrooted suggestions.

specs decay. ایک gamification-prd.md لکھا گیا میں March ہے، کے ذریعے July، ایک دستاویز کے بارے میں ایک نظام کہ نہیں longer موجود ہے: names رکھتے ہیں changed، boundaries رکھتے ہیں moved، requirements رکھتے ہیں evolved. ایک agent loading کہ spec کو "extend" نظام inherits ایک faithfulness problem پہلے یہ writes ایک سطر.

درست ماڈل ہے ایک میں §4.1: specs ہیں ہینڈ آف آرٹفیکٹس پر ایک stage کا pipeline، نہیں حقیقت کا مستند ماخذ کے لیے نظام. They رہنمائی ایک یا دو سیشنز کا implementation، پھر retire. کوڈ، tests، اور CONTEXT.md ہیں کیا persist.

Karpathy بناتا ہے وہی observation کے بارے میں منصوبہ طریقہ: یہ rushes کو پیدا کریں ایک asset پہلے استدلال ہے settled، جب درست move ہے کو "کام کے ساتھ آپ کا agent کو ڈیزائن ایک spec کہ ہے بہت detailed" پہلے any کوڈ ہے لکھا گیا. grilling-then-PRD-then-issues pipeline ہے کیا کہ looks like: منصوبہ طریقہ rushes کو ایک asset; pipeline reaches ایک ڈیزائن concept پہلا اور lets asset fall باہر کا یہ.

4.3 Vertical Slices اور Tracer Bullets

زیادہ تر اہم shape فیصلہ میں §4.1 ہے کیسے کو split ایک PRD میں issues. temptation ہے کو slice horizontally: ایک issue کے لیے ڈیٹا بیس، ایک کے لیے API، ایک کے لیے UI. یہ ہے wrong. کے ساتھ horizontal slicing agent gets نہیں شروع سے آخر تک فیڈبیک until third issue lands; bugs accumulate پر seams; اور any ایک issue سکتا ہے stall دwasرے.

درست shape ہے vertical slice، ایک tracer bullet، بعد Pragmatic Programmer کا analogy کا glowing rounds کہ let ایک anti-aircraft gunner دیکھیں کہاں fire ہے going. ہر issue cuts thinly کے ذریعے every layer feature touches. Shoot ایک tracer کو دیکھیں whether aim ہے درست، پھر fire fully knowing آپ'll hit.

flowchart LR
    subgraph H["Horizontal slicing - bad<br/>(no integrated feedback until phase 3)"]
        direction TB
        H1[Frontend - phase 3]
        H2[API - phase 2]
        H3[Database - phase 1]
        H1 -.- H2 -.- H3
    end

    subgraph V["Vertical slicing - good (tracer bullets)"]
        direction TB
        V1[Slice 1<br/>F→A→D] ~~~ V2[Slice 2<br/>F→A→D] ~~~ V3[Slice 3<br/>F→A→D] ~~~ V4[Slice 4<br/>F→A→D]
    end

    classDef bad fill:#fde8e8,stroke:#a83838,color:#5a0d0d
    classDef good fill:#e8f5e8,stroke:#3b8a3b,color:#0d3a0d
    class H bad
    class V good

حصہ 6.3 walks کے ذریعے کیا vertical slicing looks like on worked مثال، including کیسے dependency graph درمیان slices admits parallel عمل درآمد. کے لیے اب، concept ہے enough: ہر issue ships ایک شروع سے آخر تک path; sequencing falls باہر کا dependencies، نہیں phases.

5. skills بطور Encoded عمل

ہر cure ضرورت ہے encoding بطور ایک reusable، agent-loadable آرٹفیکٹ. کہ آرٹفیکٹ ہے ایک skill.

اصول vs. instance. Five اصول چلائیں یہ pipeline: grilling، PRD-synthesis، vertical-slicing، TDD، deepening. ہر has ایک موجودہ best-in-class implementation میں someone کا skill pack. implementations evolve; اصول کریں نہیں. live registry کا community skills ہے skills.sh; Matt Pocock کا pack lives پر skills.sh/mattpocock اور supplies worked مثالیں below. جب ایک بہتر grill-me ships اگلا quarter، swap instance; grilling اصول میں آپ کا pipeline کرتا ہے نہیں move. ڈھانچے سے متعلق invariant ہے وہی ایک §7.3 سکھاتا ہے پر کوڈ level: interface ہے stable; implementation ہے mutable.

5.1 کیا ایک skill ہے، اور کیا یہ Isn't

skill (n.): ایک teachable صلاحیت bundled بطور ایک unit (instructions اور resources کے لیے doing ایک کام اچھی طرح)، kept میں environment اور loaded میں سیاق و سباق کی ونڈو صرف جب relevant. unit کا progressive disclosure میں ایک harness.

ایک skill ہے کیا agent reads; ایک ٹول ہے کیا agent calls. ایک skill might کہتے ہیں "جب صارف پوچھتا ہے کے لیے ایک ڈیپلائے، چلائیں bash deploy.sh اور verify کے ساتھ gh ٹول": skill ہے prose; bash اور gh ہیں ٹولز.

ایک skill ہے بھی on-demand. AGENTS.md ہے loaded ہر turn اور pays ایک token لاگت on ہر ماڈل provider request; ایک skill ہے loaded صرف جب agent decides یہ چاہیے. کوئی بھی چیز کہ کرتا ہے نہیں ضرورت کو be میں سیاق و سباق ہر turn belongs میں ایک skill، نہیں میں AGENTS.md. یہ ہے progressive disclosure میں action.

اور ایک skill ہے portable. وہی SKILL.md چلتا ہے میں Claude Code اور OpenCode unchanged. طریقہ کار travels کے ساتھ فائل; harness ہے interchangeable.

5.2 کہاں skills Live

دونوں harnesses scan well-known directories پر سیشن شروع کریں، پڑھیں YAML frontmatter کا ہر SKILL.md، اور surface names اور descriptions کو agent. body ہے loaded صرف جب agent decides skill ہے relevant.

skills CLI installs ایک community pack میں .agents/skills/، cross-ٹول standard location. ایک directory کا installed skills looks like یہ:

project/
└── .agents/
    └── skills/
        └── grill-me/
            └── SKILL.md

وہی SKILL.md format کام کرتا ہے میں دونوں harnesses unchanged. کیا differs ہے کون سا directories ہر harness scans، اور کہ تبدیلیاں ایک step کا install.

Claude Code 2.1.141 scans .claude/skills/<name>/SKILL.md (and globally ~/.claude/skills/). It does not scan .agents/skills/. The skills CLI installs into .agents/skills/, and it links the install into .claude/skills/ صرف جب کہ directory پہلے ہی موجود ہے. اس لیے بنائیں یہ پہلا، پھر install:

mkdir -p .claude/skills
npx skills@latest add mattpocock/skills

کے ساتھ .claude/skills/ present پہلے install، ہر skill ہے linked میں یہ اور Claude Code discovers pack. (If آپ install پہلا اور Claude Code نہیں کر سکتا find /grill-me، cause ہے missing directory: بنائیں .claude/skills/، پھر re-run install.)

Invoke ایک skill کے ذریعے asking میں plain language ("grill me on یہ منصوبہ")، اور agent loads یہ on ایک frontmatter-description match. Claude Code additionally accepts ایک explicit slash invocation: type /grill-me کو load کہ skill کے ذریعے name.

ایک format، دونوں harnesses، نہیں translation step. install path ہے صرف thing کہ differs، اور یہ differs کے ذریعے ایک mkdir.

5.3 Anatomy کا ایک SKILL.md

ایک SKILL.md has دو parts: YAML frontmatter ( metadata harness scans) اور ایک markdown body ( instructions agent reads on load).

most-starred skill میں Matt Pocock کا pack، grill-me، ہے یہاں میں مکمل: seven lines کا body.

---
name: grill-me
description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
---

Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.

Ask the questions one at a time.

If a question can be answered by exploring the codebase, explore the codebase instead.

کہ ہے entire skill، اور grill-me ہے most-used skill میں ایک pack کہ has drawn tens کا thousands کا GitHub stars. Three observations generalise:

skills کریں نہیں رکھتے ہیں کو be long کو be impactful. یہ ایک ہے essentially three sentences اور یہ transforms planning conversation. شامل کریں length صرف جب length earns اس کا place.
** frontmatter ہے doing حقیقی کام.** harness دکھاتا ہے agent description، نہیں body، اس لیے description لازمی be specific enough کہ agent گا load یہ پر درست moments. "استعمال کریں جب صارف wants کو stress-test ایک منصوبہ، get grilled on ان کا ڈیزائن، یا mentions 'grill me'" ہے much بہتر than "کے لیے grilling."
** body addresses agent میں second person**، میں وہی tone آپ would استعمال کریں کے ساتھ ایک junior collaborator. "Interview me relentlessly." "پوچھیں سوالات ایک پر ایک وقت." Direct، declarative، نہیں hedging.

ایک زیادہ elaborate skill (to-prd، to-issues، tdd، improve-codebase-architecture) extends وہی shape کے ساتھ numbered steps، ایک template، اور pointers کو دwasرا skills. اصول holds: encode عمل; کریں نہیں encode جواب.

5.4 Five Daily اصول (اور Today کا بہترین skills کے لیے ہر)

Five اصول correspond one-to-one کے ساتھ stages کا pipeline میں §4.1. ہر اصول has ایک موجودہ best-in-class implementation: ایک SKILL.md installable today. table below references most-used pack (Matt Pocock کا، پر skills.sh/mattpocock). ہر skill name لنکس کو اس کا مستند SKILL.md; bodies ہیں short اور worth مطالعہ.

Stage	skill	کیا یہ کرتا ہے
خیال → Aligned ڈیزائن concept	`grill-me`	Socratic interview until alignment ہے reached.
Aligned concept → Destination doc	`to-prd`	Synthesises conversation میں ایک PRD کے ساتھ صارف stories، implementation فیصلے، اور ایک list کا modules کو be modified.
PRD → Backlog کا issues	`to-issues`	Breaks PRD میں vertical-slice tickets کے ساتھ explicit blocking relationships.
issue → Implemented slice	`tdd`	Red–green–refactor on ایک slice پر ایک وقت.
Codebase health، ongoing	`improve-codebase-architecture`	Finds shallow modules; proposes deepenings; opens ایک RFC issue.

پہلے any کا یہ چلائیں. Matt کا pack expects ایک one-time per-repo bootstrap step، setup-matt-pocock-skills، کون سا scaffolds repo کا issue-tracker config اور ایک ## Agent skills block میں آپ کا AGENTS.md / CLAUDE.md، اور sets up ایک docs/agents/ directory. engineering skills پڑھیں سے یہ scaffolding (اور to-prd / to-issues بھی draw on docs/adr/ if یہ موجود ہے)، اس لیے چلائیں setup once بعد installing pack اور پہلے پہلا to-issues یا tdd invocation.

ہر skill کا frontmatter description: ہے سطر harness scans پر سیشن شروع کریں کو decide کیا کو surface کو agent. کہ description determines whether agent loads skill پر درست moment، اس لیے یہ carries حقیقی weight. grill-me's مکمل SKILL.md appears verbatim میں §5.3; کے لیے دwasرے، یہاں ہے کیا ہر ایک کرتا ہے (paraphrased سے installed skills، نہیں quoted verbatim):

to-prd turns موجودہ conversation میں ایک PRD اور publishes یہ کو پروجیکٹ کا issue tracker. یہ کرتا ہے نہیں re-interview آپ; یہ synthesises کیا ہے پہلے ہی میں سیاق و سباق.
to-issues breaks ایک منصوبہ، spec، یا PRD میں independently grabbable issues on پروجیکٹ کا issue tracker، sliced vertically کے ساتھ explicit blocking relationships، اور labels ہر ایک ready کے لیے ایک agent کو چنیں up.
tdd چلتا ہے ایک strict red-green-refactor loop کے لیے تعمیر ایک feature یا fixing ایک bug: ایک failing test پہلا، just enough کوڈ کو pass، refactor، repeat، کے ساتھ tests پر module interfaces rather than internal helpers.
improve-codebase-architecture finds deepening opportunities میں ایک codebase، informed کے ذریعے domain language میں CONTEXT.md اور فیصلے میں docs/adr/، اور proposes انہیں بغیر modifying کوڈ.

ایک قاری who wants exact frontmatter چاہیے cat installed SKILL.md فائلیں یا کھولیں linked sources; wording above ہے ایک faithful summary، نہیں ایک quote. Note ایک behaviour summaries بنائیں explicit اور ایک قاری گا دیکھیں directly: to-prd اور to-issues دونوں لکھیں کو آپ کا issue tracker، نہیں just کو ایک local فائل.

Three properties generalise across all five:

** description ہے doing loading کام.** یہ لازمی be specific enough کہ agent recognises when کو load skill، نہیں just what skill ہے کے بارے میں. "استعمال کریں جب…" clauses اور explicit negative scope ہیں کہاں یہ specificity lives.
skills name ان کا boundaries. to-prd کرتا ہے نہیں interview again; improve-codebase-architecture کرتا ہے نہیں modify codebase. یہ negative clauses ہیں کیسے skills compose بغیر stepping on ہر دwasرا.
skills name ان کا pairings. tdd ہے implicitly paired کے ساتھ issue یہ implements; to-issues ہے paired کے ساتھ PRD یہ splits. pipeline ہے ایک chain کا skills، ہر handing off کو اگلا.

Skill loading depends on your model's instruction-following

یہ pipeline کا architecture (skills، vertical slices، deep modules، sandboxes) ہے ماڈل سے آزاد. اس کا operational reliability ہے نہیں. ایک frontier-class instruction-follower (Claude Sonnet/Opus، GPT-5-class، Gemini 2.5 Pro) loads درست skill سے ایک description match، executes ایک multi-step skill body میں order، اور self-terminates ایک grilling interview جب alignment ہے reached. On ایک economy یا local ماڈل (deepseek-chat، Haiku-class، Llama-70B، زیادہ تر local ماڈلز)، those behaviours degrade: skills miss ان کا trigger، multi-step sequencing slips، اور literal-نتیجہ contracts ( NO_MORE_TASKS signal میں §6.5) get broken. recall سے §2.3 ہے cure یہاں too: on ایک weaker ماڈل، scaffold harder. Invoke skills explicitly کے ذریعے name rather than relying on description-matching، رکھیں skill bodies short اور declarative، اور state کیا ماڈل لازمی not کریں، نہیں صرف کیا یہ چاہیے.

ایک sixth skill میں Matt Pocock کا pack closes loop on ناکامی 2 (verbose agent / نہیں shared vocabulary): grill-with-docs. یہ ہے وہی Socratic interview بطور grill-me، مگر یہ also updates CONTEXT.md اور docs/adr/ ڈھانچہ فیصلہ Records inline بطور فیصلے crystallise during conversation. میں Matt کا Software بنیادی اصول Matter زیادہ Than Ever talk یہ began life بطور ایک standalone "ubiquitous language skill" کہ scanned codebase اور wrote ایک domain glossary; یہ has since been folded میں grilling skill itself، on اصول کہ terminology ہے بہترین resolved میں moment ایک فیصلہ ہے being بنایا گیا، نہیں بطور ایک separate post-hoc pass. استعمال کریں grill-me کے لیے greenfield ڈیزائن conversations کہاں وہاں ہے نہیں پروجیکٹ سیاق و سباق yet; استعمال کریں grill-with-docs once repo has ایک CONTEXT.md اور ADRs آپ چاہتے ہیں kept موجودہ.

تعمیر آپ کا اپنا skills پہلا; reach کے لیے someone else کا pack second. بہترین skill ہے ایک کہ captures your ٹیم کا عمل. Forking mattpocock/skills ہے ایک بہترین place کو شروع کریں. Owning stack، being able کو پڑھیں ہر skill آپ load، ہے کیا دیتا ہے آپ observability جب something goes wrong.

6. pipeline میں Practice

یہ حصہ چلتا ہے ورک فلو شروع سے آخر تک on ایک worked مثال: adding ایک gamification service کو ایک کورس پلیٹ فارم. وہی مثال ہے built once میں Python اور once میں TypeScript; ہر step چلتا ہے میں دونوں Claude Code اور OpenCode. point ہے نہیں مثال; یہ ہے shape. Internalise shape once اور language اور harness بن جاتے ہیں details.

6.1 Stage 1: Grilling خیال

ایک Slack message سے ایک stakeholder:

"Our retention numbers aren't بہترین. Students sign up، کریں ایک few اسباق، پھر drop off. I'd love کو شامل کریں some gamification کو پلیٹ فارم: points، streaks، levels."

یہ ہے نہیں ایک spec; یہ ہے ایک wish. Hand یہ کو ایک agent اور کہتے ہیں "تعمیر کریں یہ" اور آپ گا get something کہ doesn't match کیا stakeholder had میں mind، کیونکہ neither کا آپ yet knows کیا they had میں mind. Invoke grill-me.

$ claude
> /grill-me
> Read the Slack message in client-brief.md and grill me on adding gamification.

agent reads brief، explores codebase کو understand existing ماڈلز (Lesson، Quiz، Enrollment)، اور starts asking سوالات، ایک پر ایک وقت، کے ساتھ ایک recommended جواب کے لیے ہر:

Q1. What earns points? My recommendation: lesson completion + quiz pass.
    Skip video-watch events (too noisy, gameable).

Q2. Are points retroactive for existing lesson_progress records?
    My recommendation: yes, backfill once on migration, then live going forward.

Q3. Streak granularity? My recommendation: daily, with a 1-day grace period.

Q4. Level thresholds: linear, quadratic, or hand-tuned?
    My recommendation: quadratic, capped at level 50.
...

سیشنز stop anywhere سے fifteen کو ایک hundred سوالات. سیشن ends جب agent رپورٹس کہ نہیں فیصلے remain unresolved اور آپ رکھتے ہیں ایک stable ڈیزائن concept. نہیں کوڈ، نہیں PRD. نتیجہ کا grilling ہے ایک chat history; chat history ہے ڈیزائن concept بنایا گیا explicit.

6.2 Stage 2: سے Conversation کو PRD

Once ڈیزائن concept stabilises، invoke to-prd. skill کرتا ہے نہیں interview آپ again; یہ synthesises کیا آپ رکھتے ہیں پہلے ہی said میں ایک پروڈکٹ Requirements Document.

> /to-prd

نتیجہ ہے ایک markdown دستاویز following ایک fixed template:

# PRD: Course Platform Gamification

## Problem Statement

Students drop off after a handful of lessons. Retention metrics
indicate completion rates ... [synthesised from the brief]

## Solution

Add a points/streaks/levels gamification layer ...

## User Stories

1. As a student, I earn 10 points when I complete a lesson.
2. As a student, I earn 25 points when I pass a quiz.
3. As a student, I see my current streak on the dashboard.
4. As a student, I see my level on my profile.
5. As an admin, I can see aggregate engagement metrics.
   ... [12-20 more, each independently verifiable]

## Modules Touched

- NEW: gamification_service (deep module, owns points + streaks + levels)
- MODIFIED: lesson_progress_service (emits events on completion)
- MODIFIED: dashboard route (reads from gamification_service)
- NEW DB: point_events table, streak_state table

## Implementation Decisions

- Level formula: floor(sqrt(total_points / 50))
- Streak grace: 1 missed day allowed
- Backfill: one-time job at deploy

## Out of Scope

- Leaderboards (separate PRD)
- Push notifications (separate PRD)

کیا کو پڑھیں میں PRD پہلے approving یہ. سرسری پڑھیں کے لیے drift، don't proofread. آپ اور agent پہلے ہی share ڈیزائن concept سے grilling سیشن، اور agent ہے excellent پر summarisation; line-by-line مطالعہ ہے dumb-zone کام. Focus آپ کا attention on four places summarisation سکتا ہے drift: صارف stories (did any get dropped یا invented?)، modules touched (کرتا ہے boundary اب بھی match کیا آپ discussed?)، implementation decisions (کریں they match calls آپ بنایا گیا during grilling?)، اور out کا scope (did boundary creep?). دو منٹ کا focused skimming catches almost all ناکامیاں; مطالعہ whole دستاویز catches وہی ناکامیاں اور لاگتیں ten times attention.

6.3 Stage 3: سے PRD کو Vertical-Slice issues

PRD describes destination. اگلا skill describes journey: کیسے کو break PRD میں independently grabbable issues، sliced vertically، کے ساتھ blocking relationships درمیان انہیں.

چلائیں to-issues. کے لیے gamification PRD یہ پیدا کرتا ہے ایک چھوٹا Kanban board:

┌────────────────────────────────────────────────────────────┐
│ Issue #1 - Award points for lesson completion (E2E)        │
│   blocked by: nothing.       Type: AFK.                    │
│   Touches: schema, service, lesson route, dashboard widget │
└────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────┐
│ Issue #2 - Award points for quiz pass (E2E)                │
│   blocked by: #1.            Type: AFK.                    │
└────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────┐
│ Issue #3 - Streak counter (E2E)                            │
│   blocked by: #1.            Type: AFK.                    │
└────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────┐
│ Issue #4 - Level threshold + UI badge                      │
│   blocked by: #2.            Type: AFK.                    │
└────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────┐
│ Issue #5 - Retroactive backfill of historical lessons      │
│   blocked by: #1.            Type: human-in-the-loop.      │
└────────────────────────────────────────────────────────────┘

Several properties ہیں non-accidental:

issue #1 ships ایک working slice. If ٹیم merged صرف #1 اور stopped، پلیٹ فارم would رکھتے ہیں ایک functioning (if minimal) gamification feature. کے تحت horizontal slicing، "phase 1" would رکھتے ہیں produced ایک ڈیٹا بیس table کہ did nothing.
** DAG admits parallelism.** #2 اور #3 سکتا ہے چلائیں میں parallel سیشنز on parallel branches once #1 ہے merged. دو AFK agents، دو PRs کے ذریعے morning.
#5 ہے flagged انسانی نگرانی کے ساتھ، نہیں AFK. Backfills touch historical ڈیٹا; ایک human watches ہر step. Type field tells AFK loop میں §6.5 کو skip یہ.

6.4 Stage 4: implementation: TDD on ایک Slice

چنیں unblocked top کا queue: issue #1. Invoke tdd. skill enforces strict red–green–refactor: لکھیں one failing test، watch یہ fail، لکھیں just enough کوڈ کو بنائیں یہ pass، watch یہ pass، refactor کے ساتھ all tests اب بھی green، repeat.

کیوں TDD specifically? دو reasons.

یہ forces چھوٹا steps. بغیر TDD، ایک agent پیدا کرتا ہے six فائلیں کا کوڈ اور writes ایک test layer کے گرد یہ afterwards. Those tests tend کو cheat; they مشق implementation، نہیں behaviour. کے ساتھ TDD test ہے لکھا گیا پہلا، پہلے implementation موجود ہے، اس لیے یہ نہیں کر سکتا be shaped کو fit کیا agent wrote.
یہ فراہم کرتا ہے فیڈبیک ہر منٹ. ہر test pass ہے ایک checkpoint. If agent drifts، اگلا failing test catches یہ پہلے یہ پیدا کرتا ہے ایک hundred lines کا garbage.

یہاں ہے slice کے لیے issue #1 میں دونوں languages: ایک deep GamificationService module کے ساتھ ایک چھوٹا interface، ایک wide implementation، اور ایک focused test فائل. tdd skill assumes ایک working test runner: install ایک پہلے آپ شروع کریں، pip install pytest کے لیے Python slice یا npm install -D vitest کے لیے TypeScript slice، یا پہلا red step fails on ایک missing runner rather than ایک missing implementation.

کیا matters یہاں. مثال below دکھاتا ہے دو things visible بغیر مطالعہ syntax:

** service has ایک tiny public interface**: just دو methods (award_lesson_completion اور total_points). Everything else ہے hidden اندر class. Callers نہیں کر سکتا reach internals.

** test calls صرف those دو methods.** test کرتا ہے نہیں poke پر internal helpers. یہ checks behaviour caller would دیکھیں ("بعد three completions، total ہے 30")، نہیں کیسے service computes یہ.

کہ shape (چھوٹا interface، wide implementation، tests پر boundary) ہے کیا §7 calls ایک deep module. Python اور TypeScript versions ہیں line-for-line equivalent.

Python
TypeScript

# gamification/service.py - the deep module's interface

from dataclasses import dataclass
from datetime import datetime
from typing import Protocol


@dataclass(frozen=True)
class PointAward:
    student_id: str
    points: int
    reason: str
    awarded_at: datetime


class PointEventStore(Protocol):
    def append(self, award: PointAward) -> None: ...
    def total_for_student(self, student_id: str) -> int: ...


class GamificationService:
    """Awards and totals points. Streaks and levels live here too,
    but in the same module, so the interface stays small."""

    LESSON_COMPLETION_POINTS = 10

    def __init__(self, store: PointEventStore, clock=datetime.utcnow) -> None:
        self._store = store
        self._clock = clock

    def award_lesson_completion(self, student_id: str) -> PointAward:
        award = PointAward(
            student_id=student_id,
            points=self.LESSON_COMPLETION_POINTS,
            reason="lesson_completion",
            awarded_at=self._clock(),
        )
        self._store.append(award)
        return award

    def total_points(self, student_id: str) -> int:
        return self._store.total_for_student(student_id)

# gamification/test_service.py - written FIRST

from datetime import datetime
from gamification.service import GamificationService, PointAward


class InMemoryStore:
    def __init__(self) -> None:
        self._events: list[PointAward] = []

    def append(self, award: PointAward) -> None:
        self._events.append(award)

    def total_for_student(self, student_id: str) -> int:
        return sum(a.points for a in self._events if a.student_id == student_id)


def test_lesson_completion_awards_ten_points():
    store = InMemoryStore()
    fixed_clock = lambda: datetime(2026, 5, 10, 12, 0, 0)
    svc = GamificationService(store, clock=fixed_clock)

    award = svc.award_lesson_completion("student-42")

    assert award.points == 10
    assert award.reason == "lesson_completion"
    assert svc.total_points("student-42") == 10


def test_multiple_completions_accumulate():
    svc = GamificationService(InMemoryStore())
    for _ in range(3):
        svc.award_lesson_completion("student-42")
    assert svc.total_points("student-42") == 30

// gamification/service.ts - the deep module's interface

export interface PointAward {
  readonly studentId: string;
  readonly points: number;
  readonly reason: string;
  readonly awardedAt: Date;
}

export interface PointEventStore {
  append(award: PointAward): void;
  totalForStudent(studentId: string): number;
}

export class GamificationService {
  static readonly LESSON_COMPLETION_POINTS = 10;

  constructor(
    private readonly store: PointEventStore,
    private readonly clock: () => Date = () => new Date(),
  ) {}

  awardLessonCompletion(studentId: string): PointAward {
    const award: PointAward = {
      studentId,
      points: GamificationService.LESSON_COMPLETION_POINTS,
      reason: "lesson_completion",
      awardedAt: this.clock(),
    };
    this.store.append(award);
    return award;
  }

  totalPoints(studentId: string): number {
    return this.store.totalForStudent(studentId);
  }
}

// gamification/service.test.ts - written FIRST

import { describe, it, expect } from "vitest";
import { GamificationService, PointAward, PointEventStore } from "./service";

class InMemoryStore implements PointEventStore {
  private events: PointAward[] = [];
  append(a: PointAward) {
    this.events.push(a);
  }
  totalForStudent(id: string) {
    return this.events
      .filter((e) => e.studentId === id)
      .reduce((sum, e) => sum + e.points, 0);
  }
}

describe("GamificationService", () => {
  it("awards ten points on lesson completion", () => {
    const fixedClock = () => new Date("2026-05-10T12:00:00Z");
    const svc = new GamificationService(new InMemoryStore(), fixedClock);

    const award = svc.awardLessonCompletion("student-42");

    expect(award.points).toBe(10);
    expect(award.reason).toBe("lesson_completion");
    expect(svc.totalPoints("student-42")).toBe(10);
  });

  it("accumulates across multiple completions", () => {
    const svc = new GamificationService(new InMemoryStore());
    for (let i = 0; i < 3; i++) svc.awardLessonCompletion("student-42");
    expect(svc.totalPoints("student-42")).toBe(30);
  });
});

کہ ہے ایک deep module پر کام: ایک two-method public interface (awardLessonCompletion، totalPoints) پر ایک implementation free کو grow کو thousands کا lines. کو prove claim rather than assert یہ، یہاں ہے کیا ہوتا ہے جب issue #3 (streak counter) lands.

کیا matters یہاں. Watch public interface، نہیں lines. پہلے یہ slice service had دو methods (awardLessonCompletion، totalPoints). بعد یہ slice یہ has three ( وہی دو plus currentStreak). implementation grew significantly، کے ساتھ ایک streak store، ایک activity log، اور ایک date helper، مگر none کا کہ leaks out. Callers دیکھیں ایک نیا طریقہ. Existing callers کریں nothing differently. Existing tests stay green. نیا test calls صرف نیا طریقہ. کہ ہے کیا "deep" means میں practice: behaviour grows; surface barely moves.

Python
TypeScript

# gamification/service.py - interface gains ONE method, nothing else changes

class GamificationService:
    LESSON_COMPLETION_POINTS = 10

    def __init__(self, store, streaks=None, clock=datetime.utcnow):
        self._store = store
        self._streaks = streaks or InMemoryStreakStore()  # internal detail
        self._clock = clock

    def award_lesson_completion(self, student_id: str) -> PointAward:
        # unchanged signature; internally also updates streak state
        award = PointAward(...)
        self._store.append(award)
        self._streaks.record_activity(student_id, self._clock().date())
        return award

    def total_points(self, student_id: str) -> int:        # unchanged
        return self._store.total_for_student(student_id)

    def current_streak(self, student_id: str) -> int:      # NEW - only addition
        return self._streaks.streak_length(student_id, today=self._clock().date())

# gamification/test_service.py - existing tests untouched; ONE new test added

def test_streak_grows_with_consecutive_daily_completions():
    days = [date(2026, 5, 8), date(2026, 5, 9), date(2026, 5, 10)]
    clock = iter(datetime.combine(d, time()) for d in days)
    svc = GamificationService(InMemoryStore(), clock=lambda: next(clock))

    for _ in days:
        svc.award_lesson_completion("student-42")

    assert svc.current_streak("student-42") == 3

// gamification/service.ts - interface gains ONE method, nothing else changes

export class GamificationService {
  static readonly LESSON_COMPLETION_POINTS = 10;

  constructor(
    private readonly store: PointEventStore,
    private readonly streaks: StreakStore = new InMemoryStreakStore(),
    private readonly clock: () => Date = () => new Date(),
  ) {}

  awardLessonCompletion(studentId: string): PointAward {
    // unchanged signature; internally also updates streak state
    const award: PointAward = {
      /* ... */
    };
    this.store.append(award);
    this.streaks.recordActivity(studentId, this.clock());
    return award;
  }

  totalPoints(studentId: string): number {
    // unchanged
    return this.store.totalForStudent(studentId);
  }

  currentStreak(studentId: string): number {
    // NEW - only addition
    return this.streaks.streakLength(studentId, this.clock());
  }
}

// gamification/service.test.ts - existing tests untouched; ONE new test added

it("grows the streak across consecutive daily completions", () => {
  const days = [
    new Date("2026-05-08T12:00:00Z"),
    new Date("2026-05-09T12:00:00Z"),
    new Date("2026-05-10T12:00:00Z"),
  ];
  let i = 0;
  const svc = new GamificationService(
    new InMemoryStore(),
    undefined,
    () => days[i],
  );

  for (i = 0; i < days.length; i++) svc.awardLessonCompletion("student-42");

  expect(svc.currentStreak("student-42")).toBe(3);
});

Three things happened، all diagnostic کا ایک healthy deep module:

** interface grew کے ذریعے ایک طریقہ، نہیں five.** ایک shallow alternative would رکھتے ہیں exposed recordActivity، streakLength، streakStore، setActivityCalendar: internal mechanics leaking میں boundary. deep نسخہ دیتا ہے callers exactly کیا they ضرورت (currentStreak) اور nothing else.
Existing tests did نہیں تبدیلی. behaviour they pin اب بھی holds; test فائل ہے purely additive. کہ ہے کیا testing پر interface buys آپ.
** نیا behaviour got one test پر وہی boundary.** streak store، activity log، اور date helper ہیں نہیں tested directly; they ہیں tested indirectly via currentStreak's contract، کون سا ہے درست level.

اگلا slice (issue #4، level threshold) follows وہی نمونہ: ایک طریقہ added، existing tests untouched، ایک نیا behaviour test پر boundary.

6.5 Stage 5: AFK Loop

آپ رکھتے ہیں five issues میں backlog اور ایک tdd skill installed. آپ کریں نہیں چاہتے ہیں کو sit پر keyboard جبکہ agent grinds کے ذریعے انہیں. آپ چاہتے ہیں کو push five tracer bullets کے ذریعے نظام میں parallel، eat dinner، اور review five PRs میں morning.

AFK loop ہے ایک shell script: gather unblocked AFK issues، hand انہیں کو agent کے ساتھ ایک واضح پرامپٹ، چلائیں اندر ایک سینڈ باکسڈ container، repeat until queue ہے empty. دو implementations follow: ایک minimal bash نسخہ (کام کرتا ہے کے ساتھ either harness) اور ایک structured TypeScript orchestrator کہ چلتا ہے slices میں parallel.

6.5.1 Minimal AFK loop (bash)

کیا matters یہاں. script کرتا ہے five things میں ایک loop until وہاں ہے nothing left کو کریں: (1) پڑھیں all کھولیں issues سے ایک فولڈر; (2) پڑھیں recent commit history; (3) hand دونوں کو agent کے ساتھ ایک واضح پرامپٹ; (4) agent picks ایک issue اور implements یہ; (5) چیک whether queue ہے empty، اور if اس لیے، stop. human ہے نہیں پر keyboard during any کا this. script starts اور walks itself.

#!/usr/bin/env bash
# ralph.sh - the simplest AFK loop. Works with either harness.
# Loops over /issues/*.md, picks the highest-priority AFK issue,
# implements it inside a sandbox, commits, repeats until done.
set -euo pipefail   # bash safety: exit on any error, undefined var, or failed pipe

PROMPT_FILE="${1:-prompts/implement.md}"
ISSUES_DIR="${2:-issues}"

# Two env vars carry the harness difference. AGENT_CMD is the binary;
# AGENT_PERM_FLAG is its skip-approvals flag, which is NOT the same
# string in both harnesses (see the tool-tabs below). Everything else
# in this script is byte-identical across Claude Code and OpenCode.
CMD="${AGENT_CMD:-claude}"
PERM_FLAG="${AGENT_PERM_FLAG:---permission-mode acceptEdits}"

while :; do
  ISSUES=$(cat "$ISSUES_DIR"/*.md 2>/dev/null || true)
  COMMITS=$(git log --oneline -5)

  PROMPT=$(cat "$PROMPT_FILE")

  RESULT=$($CMD $PERM_FLAG <<EOF
$PROMPT

## Open issues
$ISSUES

## Recent commits
$COMMITS
EOF
)

  # Exit only on a line that is *exactly* the sentinel, so the loop
  # does not stop if the agent merely quotes the token in prose.
  if echo "$RESULT" | grep -qx "NO_MORE_TASKS"; then
    echo "queue drained - exiting"
    break
  fi
done

<!-- prompts/implement.md - fed to the agent on every iteration -->

You are operating AFK on the gamification project.

1. From the open issues, pick the highest-priority issue whose
   `Type:` is `AFK` and whose blockers are all closed.
   If none, reply with a line containing only `NO_MORE_TASKS` and stop.
2. Read the PRD it references.
3. Use the `tdd` skill to implement one vertical slice.
4. Run the project feedback loops (typecheck, tests, lint).
   Do not commit if any fail.
5. Commit referencing the issue number and close the issue.

skills، پرامپٹ، اور issues ہیں byte-identical across دو harnesses. کیا differs ہے harness binary اور اس کا skip-منظوریاں flag: Claude Code استعمال کرتا ہے --permission-mode acceptEdits، OpenCode استعمال کرتا ہے --dangerously-skip-permissions کے لیے وہی effect. دو env vars below carry کہ difference; heredoc on stdin کام کرتا ہے کے لیے دونوں.

AGENT_CMD="claude" \
  AGENT_PERM_FLAG="--permission-mode acceptEdits" ./ralph.sh

6.5.2 Parallel AFK orchestrator (TypeScript)

bash نسخہ چلتا ہے slices sequentially. Once آپ trust loop، اگلا leverage point ہے parallel عمل درآمد: چنیں all unblocked issues، spin up ایک سینڈ باکسڈ worktree per issue، چلائیں انہیں concurrently، merge. orchestrator below sketches نمونہ; پروڈکشن معیار کا implementations exist بطور dedicated sandboxing libraries میں دونوں Claude Code اور OpenCode ecosystems.

کیا matters یہاں. Three خیالات; everything else ہے plumbing:

Parallel، نہیں sequential. بجائے کا doing slice 1، پھر slice 2، پھر slice 3، orchestrator کرتا ہے all three پر وہی وقت، ہر میں اس کا اپنا isolated workspace. کے ذریعے morning آپ رکھتے ہیں three pull requests بجائے کا ایک.

ہر parallel چلائیں ہے سینڈ باکسڈ. ایک "سینڈ باکسڈ worktree" ہے ایک separate copy کا codebase (ایک git worktree ہے git کا built-in طریقہ کا having multiple checked-out copies) running اندر ایک container کہ سکتا ہے't damage آپ کا laptop. If agent کرتا ہے something wrong، blast radius ہے ایک worktree.

** reviewer ہے ایک separate agent میں ایک fresh سیشن.** ایک مختلف agent، کے ساتھ ایک مختلف (cheaper) ماڈل، looks صرف پر diff اور compares یہ کو پروجیکٹ کا کوڈنگ standards. Reviewing میں وہی chat کہ wrote کوڈ would be reviewing میں dumb zone.

کوڈ itself ہے ایک mid-level Node.js script; Promise.all سطر ہے کہاں parallelism ہوتا ہے.

// orchestrator.ts - parallel AFK loop with sandboxed worktrees
import { spawn } from "node:child_process";
import { readdir, readFile } from "node:fs/promises";

interface Issue {
  id: string; // e.g. "issue-001"
  title: string;
  type: "AFK" | "human-in-the-loop";
  blockedBy: string[]; // ids of blocking issues
  closed: boolean;
}

const HARNESS = process.env.AGENT_CMD ?? "claude"; // "claude" or "opencode run"

async function loadIssues(dir: string): Promise<Issue[]> {
  const files = await readdir(dir);
  return Promise.all(
    files.map(async (f) => {
      const raw = await readFile(`${dir}/${f}`, "utf8");
      return parseIssue(f, raw); // omitted for brevity
    }),
  );
}

function unblocked(issues: Issue[]): Issue[] {
  const closed = new Set(issues.filter((i) => i.closed).map((i) => i.id));
  return issues.filter(
    (i) =>
      !i.closed && i.type === "AFK" && i.blockedBy.every((b) => closed.has(b)),
  );
}

function runInSandbox(issue: Issue): Promise<{ ok: boolean; branch: string }> {
  return new Promise((resolve) => {
    const branch = `afk/${issue.id}`;
    // 1. create a git worktree on a fresh branch
    // 2. start a docker container with that worktree mounted r/w
    // 3. run the harness inside, with the implement.md prompt
    const proc = spawn("scripts/run-sandbox.sh", [HARNESS, branch, issue.id], {
      stdio: "inherit",
    });
    proc.on("exit", (code) => resolve({ ok: code === 0, branch }));
  });
}

async function main() {
  let issues = await loadIssues("./issues");

  while (true) {
    const ready = unblocked(issues);
    if (ready.length === 0) {
      console.log("backlog drained or fully blocked - exiting");
      break;
    }

    // run all unblocked issues in parallel, one sandbox each
    const results = await Promise.all(ready.map(runInSandbox));

    // automated review on each successful branch BEFORE merge
    // (in a fresh session - smart-zone reviewer)
    for (const r of results.filter((r) => r.ok)) {
      await reviewBranch(r.branch);
    }

    // reload issues from disk; agents may have closed some and opened others
    issues = await loadIssues("./issues");
  }
}

async function reviewBranch(branch: string): Promise<void> {
  // spawn a *separate* agent session, smaller model, with the
  // diff and the coding-standards skill as input. Open a comment
  // on the PR. Do NOT auto-merge.
}

main();

Three اصول ہیں embedded میں orchestrator اور matter زیادہ than کوڈ:

Sandboxes ہیں mandatory. AFK کے ساتھ --permission-mode bypassPermissions اور نہیں سینڈ باکس ہے کیسے repositories get destroyed. ہر slice gets ایک fresh container، ایک fresh worktree، نہیں پروڈکشن credentials، اور نہیں network egress beyond کیا یہ ضرورت ہے.
** reviewer ہے ایک separate agent.** ایک reviewer میں وہی سیشن بطور implementer ہے reviewing میں dumb zone. ایک reviewer میں ایک fresh سیشن، دیا گیا صرف diff اور standards، sees کام clearly. ایک smaller ماڈل ہے fine کے لیے review (often زیادہ critical); استعمال کریں بڑا ایک کے لیے implementation.
** loop reloads issues سے disk ہر iteration.** جب QA generates نیا issues میں §6.6، they appear میں queue automatically.

6.5.3 persistent Loops اور Ambient Agents

loops above چلائیں once per backlog. They شروع کریں، drain queue، اور stop. اگلا evolution ہے کو رکھیں انہیں running.

ایک loop، میں Boris Cherny کا sense، ہے ایک agent invocation scheduled کے ساتھ cron کو چلائیں ہر منٹ، ہر five منٹ، یا ہر thirty منٹ کے خلاف ایک چھوٹا standing job. ہر invocation ہے ایک fresh سیشن، اس لیے یہ starts میں smart zone ہر وقت اور never accumulates dumb-zone drift. agent کرتا ہے نہیں stay alive; job stays alive، اور ایک نیا agent ہے born کو handle ہر tick.

ایک working سیٹ کا loops on ایک پروجیکٹ might include:

ایک PR janitor: reruns flaky CI، rebases کے خلاف main، fixes typo اور lint comments left کے ذریعے reviewers.
ایک CI healer: جب ایک flaky test starts failing intermittently، investigates اور fixes یہ.
ایک فیڈبیک clusterer: pulls incoming صارف فیڈبیک ہر thirty منٹ، گروہ یہ کے ذریعے theme، posts ایک summary کو Slack.

یہ ہیں نہیں ٹولز. They ہیں ambient agents: ایک persistent، low-intensity AI افرادی قوت running alongside پروجیکٹ، handling background tax کہ historically ate engineering گھنٹے، such بطور PR janitorial کام، CI hygiene، ticket triage، dependency upkeep، log digestion، اور monitoring summaries. نہیں single کام justifies ایک مکمل AFK چلائیں; together they consume حقیقی وقت. چلائیں انہیں بطور loops اور they vanish سے engineer کا دن.

ایک minimal persistent loop ہے ایک cron سطر پر ایک پرامپٹ فائل:

کیا matters یہاں. ایک cron job چلتا ہے ایک command on ایک schedule: every Tuesday پر 9am، کہتے ہیں، یا every 30 minutes. five characters */30 * * * * mean "ہر 30 منٹ، ہر hour، ہر دن" (crontab.guru decodes any schedule). سطر below tells operating نظام: "ہر half hour، جائیں کو my پروجیکٹ فولڈر اور چلائیں PR-janitor agent کے لیے ایک tick." ہر tick ہے ایک fresh agent سیشن کہ lasts بطور long بطور یہ لیتا ہے کو handle whatever PRs ضرورت attention، پھر exits. job lives forever; agents ہیں disposable.

# crontab -e
# every 30 minutes, run the PR-janitor agent in the project
*/30 * * * * cd /home/me/project && \
  AGENT_CMD="claude" ./scripts/run-once.sh prompts/pr-janitor.md

<!-- prompts/pr-janitor.md -->

You are the PR janitor for this project.

1. List my open PRs (`gh pr list --author @me`). # gh = GitHub's CLI
2. For each PR:
   - If CI failed on a known-flaky test, retrigger only that job.
   - If the PR has merge conflicts with main, attempt a clean rebase.
     If the rebase is non-trivial, leave a comment and stop.
   - If a reviewer left a typo / lint comment, fix it and push.
3. Commit only changes you can explain in one sentence.
4. Do nothing else. Output a one-line summary.

ایک heavier نمونہ ہے routine: وہی loop executed server-side rather than سے آپ کا laptop کا cron، اس لیے یہ survives sleep، reboots، اور travel. Server-side scheduled-agent features ہیں emerging across coding-agent پروڈکٹس; treat local-cron نسخہ بطور development form اور server-side نسخہ بطور پروڈکشن form. پرامپٹ ہے وہی; صرف scheduler تبدیلیاں.

دو ڈیزائن قواعد نظم و نگرانی کرنا persistent loops:

ہر tick ہے ایک fresh سیشن. نہیں state survives درمیان ticks except کیا ہے لکھا گیا کو environment ( PRs، CI logs، ایک چھوٹا status فائل). loop ہے stateless on purpose; پرامپٹ carries role.
ہر loop has ایک job. ایک loop کہ کرتا ہے PR-janitor کام and CI healing and فیڈبیک clustering گا degrade میں ایک سیشن کہ کرتا ہے none کا انہیں اچھی طرح. ایک loop per role، like ایک skill per role.

AFK نمونہ ہے اب شروع سے آخر تک: §6.5.1 چلتا ہے ایک slice sequentially; §6.5.2 چلتا ہے بہت سے slices میں parallel; §6.5.3 keeps افرادی قوت running indefinitely on rhythms پروجیکٹ itself generates. ہر step adds throughput بغیر adding anyone کو ٹیم: operational shape کا ایک Digital FTE افرادی قوت.

6.6 Stage 6: Human Review اور QA

morning بعد loop چلتا ہے، آپ رکھتے ہیں N pull requests. پڑھیں diffs، نہیں agent کا summary کا diffs. summary ہے agent کا word کے لیے کیا یہ did; diff ہے کیا یہ اصل میں did. دو often differ میں subtle طریقے کہ صرف matter پر پروڈکشن wasیع کرنا.

ایک concrete مثال، سے gamification slice میں §6.4. agent کا PR summary said: "Added points کے لیے سبق completion. Tests pass. dashboard widget دکھاتا ہے موجودہ total." diff said وہی، except QA pass found کہ opening dashboard پہلے any سبق had been completed crashed کے ساتھ TypeError: Cannot read property 'awarded_at' of null. agent had handled empty-state میں service (returning 0 سے total_points) مگر React widget assumed ایک last_award_at timestamp existed. ایک null چیک، آسان fix; مگر agent's tests did نہیں cover empty-state UI render، کیونکہ slice کا صارف story implicitly assumed وہاں was پر least ایک award. کہ observation goes back میں backlog بطور ایک نیا issue ("شامل کریں empty-state کو dashboard widget; cover کے ساتھ ایک test") blocked کے ذریعے nothing، type AFK. PR merges; night shift picks up نیا issue tomorrow. یہ loop، کہاں human finds gap، ticket goes back میں queue، اور agent fixes یہ AFK، ہے کیا بناتا ہے pipeline self-improving.

QA پیدا کرتا ہے زیادہ تر valuable آرٹفیکٹ میں pipeline: نیا issues. ہر bug found، ہر UX concern، ہر edge case original PRD missed بن جاتا ہے ایک نیا ticket on Kanban board کے ساتھ appropriate blocking relationships. board never empties; یہ keeps producing slices.

یہ ہے بھی stage کہاں taste lives. Automating QA ہے ایک temptation worth resisting: ایک agent reviewing ایک agent کا UI reaches ایک opinion کہ nobody میں particular holds، اور result ہے gently-ذیلی، no-rough-edges slop کہ characterises unsupervised AI نتیجہ. ایک human deciding "یہ padding ہے wrong" اور "یہ label ہے too long" ہے ایک irreducible step. agent ships پر five times normal pace; آپ کا job ہے کو بنائیں sure یہ ships your taste پر five times normal pace، نہیں anyone کا.

7. ڈھانچہ اصول کے لیے AI-Friendly Codebases

ورک فلو اور codebase ہیں inseparable: cleaner ڈھانچہ، بہتر agent performs inside it. ڈھانچہ ہے نہیں longer just ایک end میں itself; یہ ہے ایک input کو آپ کا AI افرادی قوت.

7.1 Deep Modules پر Shallow Modules

ایک module ہے deep جب یہ has ایک چھوٹا interface اور ایک lot کا behaviour behind یہ; shallow جب interface اور implementation ہیں roughly وہی size.

flowchart TB
    subgraph S["Shallow modules - bad"]
        direction LR
        s1[ ] ~~~ s2[ ] ~~~ s3[ ] ~~~ s4[ ] ~~~ s5[ ]
        s6[ ] ~~~ s7[ ] ~~~ s8[ ] ~~~ s9[ ] ~~~ s10[ ]
        SLABEL["many small pieces<br/>callers thread through<br/>implicit dependencies"]
    end

    subgraph D["Deep module - good"]
        direction TB
        DI["small interface<br/>━━━━━━━━━━━"]
        DBODY["large internal<br/>implementation<br/>(hidden from callers)"]
        DI --> DBODY
    end

    classDef bad fill:#fde8e8,stroke:#a83838,color:#5a0d0d
    classDef good fill:#e8f5e8,stroke:#3b8a3b,color:#0d3a0d
    classDef shallowCell fill:#f0d0d0,stroke:#a83838,color:#5a0d0d
    class S bad
    class D good
    class s1,s2,s3,s4,s5,s6,s7,s8,s9,s10 shallowCell

کے لیے ایک agent difference ہے decisive. میں ایک shallow codebase agent traces بہت سے pairwise dependencies درمیان بہت سے چھوٹا فائلیں; signal-to-noise per token degrades; tests sprawl across module boundaries کیونکہ نہیں ایک boundary contains enough behaviour کو be worth testing میں isolation. میں ایک deep codebase agent reads ایک interface اور trusts boundary. Tests sit پر interface. Behaviour سکتا ہے be added internally بغیر disturbing callers، اور بغیر re-testing انہیں.

کو بنائیں difference concrete، یہاں ہے کیا shallow نسخہ کا GamificationService would رکھتے ہیں looked like: طریقہ ایک agent بغیر ڈھانچے سے متعلق guidance tends کو لکھیں وہی feature.

کیا matters یہاں. Count number کا exported items میں ہر block. shallow نسخہ exposes nine top-level functions کہ callers لازمی remember کو call میں درست order اور combination. deep نسخہ exposes three methods on ایک single class; whatever ضرورت ہے کو happen behind scenes ہوتا ہے behind scenes. bug کو بچیں: میں shallow نسخہ، ایک caller سکتا ہے forget کو invoke validateAntiCheat اور silently corrupt نظام. میں deep نسخہ، caller نہیں کر سکتا reach validateAntiCheat پر all; یہ ہے hidden اندر awardLessonCompletion، کون سا calls یہ automatically. Hiding درست things ہے entire job کا ایک deep module.

// gamification/index.ts - SHALLOW: the interface IS the implementation
export function awardPoints(studentId: string, reason: string, n: number): void;
export function totalPoints(studentId: string): number;
export function recordStreakActivity(studentId: string, day: Date): void;
export function streakLength(studentId: string, today: Date): number;
export function computeLevel(totalPoints: number): number;
export function validateAntiCheat(
  studentId: string,
  event: PointEvent,
): boolean;
export function backfillHistorical(studentId: string, since: Date): void;
export function pointsForLessonCompletion(): number;
export function pointsForQuizPass(): number;
// ... + the data classes each function depends on

Nine top-level functions، ہر callable سے anywhere، ہر silently dependent on دwasرے (awardPoints لازمی call validateAntiCheat; dashboard لازمی call awardPoints and recordStreakActivity and computeLevel کے لیے ایک سبق completion; if any caller forgets ایک، نظام silently drifts باہر کا consistency).

Compare کے ساتھ deep نسخہ سے §6.4:

// gamification/service.ts - DEEP: small interface, large hidden body
export class GamificationService {
  awardLessonCompletion(studentId: string): PointAward; // does ALL of the above internally
  totalPoints(studentId: string): number;
  currentStreak(studentId: string): number;
  // streak recording, anti-cheat, level calc, point amounts → all hidden
}

Three methods. Internally، وہی nine concerns exist، مگر they ہیں not interface. Callers نہیں کر سکتا forget کو call validateAntiCheat، کیونکہ callers نہیں کر سکتا call یہ پر all. Tests sit on three methods، نہیں nine. نیا behaviour (recordStreak، level threshold، backfill) ہے added اندر بغیر changing contract: exactly property §6.4 demonstrates.

Heuristic. If آپ کا IDE کا Outline view کا ایک module ہے longer than اس کا public interface، module ہے shallow. Deepen یہ.

7.2 Test پر interface

ایک corollary کا §7.1. Tests sit on module interfaces، نہیں on internal functions. ایک test on ایک internal function pins implementation; refactoring internals breaks test even جب externally visible behaviour ہے درست. ایک test on interface pins behaviour; internals تبدیلی freely بطور long بطور contract holds.

یہ ہے کیا tdd skill enforces کے ذریعے default: tests target interface; agent refactors internals درمیان green steps; suite دیتا ہے مکمل coverage سے ایک چھوٹا surface area.

7.3 ڈیزائن interface، نمائندہ implementation

زیادہ تر اہم habit کے لیے ایک senior engineer working کے ساتھ agents.

آپ decide کیا module exposes: contract، names، invariants. یہ فیصلے affect ہر caller; they shape ڈھانچہ; they require taste اور whole نظام میں mind.

agent decides کیسے contract ہے satisfied: internal ڈیٹا structures، helper placement، order کا عملی کام. یہ affect صرف اندر کا ایک module; mistakes ہیں recoverable; ڈھانچے سے متعلق map ہے نہیں needed.

یہ ہے gray box اصول. سے outside module ہے fully specified: interface visible، internals invisible-by-ڈیزائن. سے اندر agent ہے free کو کریں excellent کام، constrained صرف کے ذریعے interface contract. ایک senior engineer سکتا ہے hold ڈھانچے سے متعلق map کا ایک million-line codebase میں ان کا head کیونکہ map contains صرف interfaces.

یہ ہے کیا بناتا ہے brain-saturation problem کا ناکامی 5 tractable. آپ نہیں کر سکتا پڑھیں ہر سطر agent writes; کہ road leads کو burnout. آپ can رکھیں module map میں آپ کا head اور پڑھیں ہر interface تبدیلی carefully. change-set on interfaces ہے چھوٹا; change-set اندر modules ہے بڑا. Concentrating attention on چھوٹا سیٹ ہے whبڑے پیمانے پرs.

7.4 `improve-codebase-architecture` skill

Codebases drift toward shallow پر وقت، especially کے ساتھ agents میں انہیں. fix ہے ایک periodic deepening pass.

Even Karpathy، working پر frontier کے ساتھ تازہ ترین ماڈلز، describes experience plainly: "Sometimes I get ایک little bit کا ایک heart attack کیونکہ کوڈ ہے بہت bloaty اور وہاں کا ایک lot کا copy paste، اور awkward abstractions کہ ہیں brittle. یہ کام کرتا ہے، مگر یہ کا just really gross." یہ ہے نہیں ایک deep ماڈل failing; یہ ہے ماڈل performing اندر verifiable circuit کا "کرتا ہے کوڈ چلائیں" بغیر ایک corresponding reward کے لیے "ہے کوڈ well-ڈیزائن کیا گیا." deepening pass supplies reward labs did نہیں.

---
name: improve-codebase-architecture
description: Find shallow-module candidates in the codebase and propose deepenings. Run weekly, or after a burst of feature work.
---

You are an architecture reviewer. Walk the codebase and find places
where understanding one concept requires bouncing between many small
files; where pure functions have been extracted only for testability,
not behaviour; where modules are tightly coupled at the seams.

Surface a numbered list of deepening candidates. For each, briefly:

- which existing files would collapse into the new deep module
- what the new interface would be (3-5 method signatures, no more)
- what behaviour would move inside, freeing callers from knowing it

Do NOT make changes. Open a markdown RFC describing the highest-value
candidate as an issue, blocked by nothing, type AFK.

ایک weekly چلائیں پیدا کرتا ہے ایک deepening RFC. یہ enters وہی Kanban board feature کام flows کے ذریعے. یہ ہے implemented کے ذریعے وہی TDD-on-vertical-slices loop. codebase gets healthier on ایک schedule، نہیں کے ذریعے accident.

8. Working Vocabulary

Precise vocabulary speeds up استدلال. مکمل reference ہے Dictionary کا AI کوڈنگ; subset below ہے minimum needed کو پڑھیں اور لکھیں rest کا یہ کتاب.

اصطلاح	Meaning
ماڈل	parameters. Stateless. کرتا ہے next-token prediction; nothing else.
harness	Everything کے گرد ماڈل کہ turns یہ میں ایک agent: ٹولز، نظام پرامپٹ، سیاق و سباق-window management، اجازتیں. Claude Code ہے ایک harness; OpenCode ہے ایک harness.
Agent	ایک ماڈل + harness operating میں ایک سیاق و سباق کی ونڈو کے ساتھ ٹولز. کیا آپ اصل میں talk کو.
سیاق و سباق window	fixed-size byte view ماڈل sees on ہر request. Finite. صرف surface کے ذریعے کون سا ماڈل perceives کوئی بھی چیز.
smart zone / dumb zone	early-session region کہاں attention ہے sharp / late-session region کہاں attention ہے diluted کے ذریعے competing tokens.
Hallucination	Confidently-wrong نتیجہ. Factuality hallucinations come سے gaps میں parametric علم; faithfulness hallucinations come سے drift میں dumb zone. fixes differ.
Clearing	Ending سیشن اور آغاز ایک fresh ایک. مشکل reset. Returns agent کو ایک known state.
Compaction	Summarising سیشن in-memory کو seed ایک نیا ایک. Lossy; preserves some dumb-zone استدلال.
ہینڈ آف	Transferring سیاق و سباق سے ایک سیشن کو دwasرا via ایک آرٹفیکٹ (PRD، ticket، CONTEXT.md).
AFK	"Away سے keyboard." صارف kicks off ایک سیشن اور lets یہ چلائیں unattended میں ایک سینڈ باکس.
skill	ایک teachable صلاحیت bundled بطور ایک `SKILL.md` فائل. Loaded on demand. unit کا progressive disclosure.
Tracer bullet / vertical slice	ایک issue کہ ships ایک thin path کے ذریعے ہر layer کا نظام، شروع سے آخر تک.
Deep module	ایک module کے ساتھ ایک چھوٹا interface اور ایک بڑا internal implementation. shape کہ بناتا ہے AI codebases wasیع پیمانے پر قابل تwasیع.
ڈیزائن concept	shared، ephemeral خیال کا کیا ہے being built، held میں عام درمیان صارف اور agent. نہیں ایک asset.
Grilling	ایک technique کے لیے forming ایک ڈیزائن concept: agent interviews صارف Socratically، ایک فیصلہ پر ایک وقت.
Vibe کوڈنگ	Accepting agent کوڈ بغیر human review. Distinct سے "low-quality کوڈنگ"; اصطلاح names review stance، نہیں نتیجہ.
Agentic engineering	طریقہ کار کا استعمال کرتے ہوئے agents میں پروڈکشن کام while preserving quality bar کا professional software. opposite stance کو vibe کوڈنگ: floor raised، ceiling held.
jagged intelligence	empirical fact کہ LLM صلاحیت peaks sharply on کام labs trained کے لیے via verifiable RL (math، کوڈ)، اور stagnates outside those circuits. agent کہ refactors 100k lines ہو سکتا ہے بھی tell آپ کو walk کو ایک car wash 50 m away.
On distribution	property کا being well-represented میں ماڈل کا training ڈیٹا، اور therefore handled competently کے ذریعے یہ. جب آغاز fresh، چنیں stacks ماڈل ہے پہلے ہی مضبوط میں.
Loop / Routine	ایک persistent ambient agent: ایک fresh سیشن invoked on ایک schedule (cron locally; "routine" server-side) کے خلاف ایک چھوٹا standing job. ہر tick ہے stateless; role persists میں پرامپٹ.

ایک working coder چاہیے استعمال کریں any کا یہ بغیر hesitation. "I'm going کو واضح، پھر چلائیں tdd on اگلا unblocked vertical slice" اور "کہ کا ایک faithfulness hallucination; docs ہیں اب بھی میں سیاق و سباق، یہ just stopped مطالعہ انہیں کے گرد turn forty" ہیں kinds کا sentences کہ separate ایک vague conversation سے ایک کہ اصل میں gets کام مکمل.

9. عملی Drills

Three مشقیں. کریں انہیں میں order. ہر لیتا ہے thirty منٹ کو دو گھنٹے.

Drill 1: install اور چلائیں grill-me on ایک حقیقی خیال. چنیں ایک feature آپ رکھتے ہیں been putting off scoping. install skill pack میں ایک clean repo following §5.2 (Claude Code قارئین: mkdir -p .claude/skills پہلا، پھر npx skills@latest add mattpocock/skills). کھولیں Claude Code (یا OpenCode)، invoke /grill-me، اور جواب سوالات until agent stops. کریں نہیں shortcut. Count سوالات. Note کون سا فیصلے آپ would نہیں رکھتے ہیں surfaced on آپ کا اپنا.

کیا "اچھا" looks like. ایک grilling سیشن on ایک non-trivial feature tends کو چلائیں on order کا 15–40 سوالات اور 30–90 منٹ پہلے agent رپورٹس alignment. کے تحت roughly 10 سوالات usually means خیال was too چھوٹا یا آپ answered too generously; پر 60 usually means agent ہے fishing، اس لیے interrupt اور پوچھیں یہ کو commit کو ایک recommendation per سوال. کے ذریعے end آپ چاہیے be able کو paraphrase پر least three فیصلے کہ emerged کہ آپ had نہیں considered going میں. If آپ نہیں کر سکتا، یہ was ایک survey، نہیں ایک grilling. ایک useful diagnostic ratio: roughly one میں five questions چاہیے surface ایک فیصلہ آپ had نہیں pre-resolved.

Drill 2: لکھیں ایک vertical slice بطور ایک tracer bullet. لیں any unfinished feature میں آپ کا codebase. لکھیں ایک single صارف story کہ traces smallest possible شروع سے آخر تک path. Implement یہ کے تحت tdd skill. Notice کیسے short slice ہے. Notice کیسے much earlier integration bugs surface than they would رکھتے ہیں کے تحت horizontal slicing.

کیا "اچھا" looks like. slice lands میں کے تحت ایک سیشن کے ساتھ test، implementation، اور ایک reviewable diff میں ایک PR. If یہ doesn't، slice was too thick; split یہ. integration friction آپ hit during slice ہے قدر کا drill; capture یہ بطور نیا issues، کریں نہیں expand موجودہ slice کو absorb یہ.

Drill 3: Deepen ایک module. چلائیں improve-codebase-architecture on ایک codebase آپ جانیں اچھی طرح. چنیں highest-قدر candidate. کریں not implement یہ yet; sketch on paper نیا interface (3–5 طریقہ signatures، نہیں زیادہ). Compare surface area کا نیا interface کو old ایک (sum کا public symbols across فائلیں کہ would collapse). ratio ہے آپ کا concrete measure کا کیسے shallow codebase had بن جاتے ہیں.

کیا "اچھا" looks like. ایک genuine deepening typically collapses several چھوٹا modules (on order کا 5 کو 15) میں ایک deep ایک، کے ساتھ ایک public-symbol ratio (old : نیا) on order کا 3:1 یا higher. If ratio ہے closer کو 1:1، candidate was نہیں اصل میں shallow; چنیں ایک مختلف ایک.

ایک short checklist کے لیے روزمرہ کام:

Did I /clear پہلے آغاز today کا سیشن?
Did I استعمال کریں grill-me کے لیے any non-trivial تبدیلی?
ہیں my issues vertical slices، نہیں horizontal phases?
ہے ہر implementation slice running کے ذریعے tdd?
ہیں AFK چلتا ہے میں ایک سینڈ باکس?
ہے reviewer ایک separate سیشن سے implementer?
Did I پڑھیں diff، نہیں summary?

10. Closing: حکمت عملی سے متعلق Programmer

یہاں ہے picture کو لیں away.

آپ کا agent ہے ایک excellent tactical programmer: ایک sergeant on ground who سکتا ہے لیں any well-specified hill، میں any language، میں any فریم ورک، میں middle کا night، اور لائیں back ایک working slice کے ذریعے morning. آپ کریں نہیں ضرورت کو سکھائیں یہ کیسے کو لکھیں ایک function یا ایک test. harness، ماڈل، اور ٹولز رکھتے ہیں پہلے ہی solved کہ.

کیا sergeant نہیں کر سکتا کریں ہے decide which hill. یہ نہیں کر سکتا tell آپ whether نظام being built ہے نظام کاروباری ضرورت ہے. یہ نہیں کر سکتا tell آپ whether third module آپ ہیں کے بارے میں کو پوچھیں کے لیے چاہیے exist بطور ایک separate module پر all، یا be folded میں ایک existing deep ایک. یہ نہیں کر سکتا tell آپ کہ کوڈ آپ رکھتے ہیں asked کے لیے violates ایک domain constraint کہ has نہیں been لکھا گیا down anywhere. یہ نہیں کر سکتا رکھیں ڈھانچے سے متعلق map کا نظام میں mind across months اور years; یہ has نہیں months اور years; یہ has موجودہ سیشن اور ایک few فائلیں on disk.

Everything above sergeant ہے حکمت عملی سے متعلق programmer کا role، کون سا ہے آپ کا role. Aligning کے ساتھ stakeholder. Forming ڈیزائن concept. Choosing slice. Designing interface. مطالعہ diff. Holding map. Investing میں ڈیزائن کا نظام ہر دن، بطور Kent Beck wrote thirty years ago کے لیے humans، اور کون سا اب applies کو مخلوط افرادی قوت کا human engineers اور Digital FTEs کہ گا تعمیر کریں اگلا decade کا سافٹ ویئر.

حکمت عملی سے متعلق programmer کا ٹولز ہیں described میں یہ باب. pipeline (§4). six ناکامیاں (§3) اور ان کا cures. skills (§5) کہ encode cures. ڈھانچہ (§7) کہ بناتا ہے agent اچھا. vocabulary (§8) کہ lets آپ وجہ کے بارے میں all کا یہ. Across Claude Code اور OpenCode، طریقہ کار ہے وہی. Across Python اور TypeScript، طریقہ کار ہے وہی. Across whatever ماڈل اور harness exist five years سے اب، طریقہ کار گا اب بھی be وہی.

narrative پر شروع کریں کا یہ باب، کہ AI replaces سافٹ ویئر بنیادی اصول، ہے wrong کیونکہ یہ confuses who ہے writing code کے ساتھ what اچھا کوڈ looks like. author has changed; standard has نہیں. Codebases کہ were اچھا کے لیے humans ہیں اچھا کے لیے agents. Codebases کہ were bad کے لیے humans ہیں bad کے لیے agents، اور worse، کیونکہ agents amplify badness.

پڑھیں old کتابیں. The Pragmatic Programmer. A Philosophy کا سافٹ ویئر ڈیزائن. Domain-Driven ڈیزائن. Extreme Programming Explained. The ڈیزائن کا ڈیزائن. ہر صفحہ predates یہ technology، اور ہر صفحہ applies زیادہ sharply اب than جب یہ was لکھا گیا. They ہیں کیسے حکمت عملی سے متعلق programmer learns کو think on timescales sergeant نہیں کر سکتا reach.

ایک سطر ہے worth carrying away، سے Karpathy: "آپ سکتا ہے outsource آپ کا thinking، مگر آپ سکتا ہے't outsource آپ کا understanding." agent گا کریں typing، searching، boilerplate، API-detail recall، tedious refactor. یہ گا increasingly کریں thinking too: generate options، weigh انہیں، draft solutions، چلائیں experiments. کیا remains uniquely yours ہے understanding، کا کیوں یہ نظام ہے being built، کیا یہ ہے کے لیے، who relies on یہ، کیا یہ لازمی never کریں. Understanding ہے کیا lets آپ direct agent پر all. بغیر یہ، agent has نہیں destination، اور ایک تیز agent بغیر ایک destination ہے just ایک مہنگا طریقہ کو get lost.

corollary، سے Boris Cherny: جب کوڈنگ ہے solved اور domain علم ہے رکاوٹ، ** بہترین person کو لکھیں سافٹ ویئر ہے ایک who understands domain بہترین، نہیں ایک who has historically لکھا گیا سافٹ ویئر**. بہترین author کا accounting سافٹ ویئر ہے ایک really اچھا accountant. historical analogy ہے printing press: پہلے Gutenberg، مطالعہ was ایک specialist trade practised کے ذریعے ایک چھوٹا literate minority; اندر decades کا his press، printed نتیجہ exploded; پر following centuries literacy became ایک broad majority skill جبکہ ceasing کو be ایک profession. وہی arc ہے اب beginning کے لیے سافٹ ویئر. میں ایک generation، تعمیر software گا be ایک thing professionals میں ہر domain کریں بطور ایک matter کا کورس (accountants who لکھیں ان کا اپنا ledgers، doctors who لکھیں ان کا اپنا clinical ورک فلو، lawyers who لکھیں ان کا اپنا contract analysers، teachers who لکھیں ان کا اپنا curriculum ٹولز) اور role we call "engineer" گا mean something narrower اور deeper: person who designs substrate rest کا افرادی قوت builds on.

یہ ہے افرادی قوت shape یہ کتاب ہے کے بارے میں. Digital FTE آپ گا manufacture میں ابواب کہ follow ہے ایک شعبہ جاتی ماہر's ٹول: built کے ذریعے ایک agentic engineer، مگر specified، governed، اور استعمال ہوا کے ذریعے accountant، underwriter، analyst، case مینیجر who owns کام. اصول اور ورک فلو کا یہ باب ہیں کیا بنائیں those Digital FTEs trustworthy enough کو deserve کہ ownership. pipeline، skills، deep modules، persistent loops، sandboxes، smart-zone طریقہ کار، jagged-intelligence awareness: all میں service کا سافٹ ویئر ایک شعبہ جاتی ماہر سکتا ہے rely on بغیر مطالعہ ایک سطر کا کوڈ. کہ ہے ایجنٹک انجینئرنگ's contract کے ساتھ لوگ یہ serves.

کہ ہے کام. کہ ہے باب.

Further مطالعہ

Matt Pocock، Software بنیادی اصول Matter زیادہ Than Ever: keynote کہ informs یہ باب's thesis.
Matt Pocock، Full Walkthrough: ورک فلو کے لیے AI Coding: ایک two-hour live walkthrough کا pipeline میں §4 اور §5.
Matt Pocock، 5 Claude Code skills I استعمال کریں ہر Single Day: daily-Skills reference.
Matt Pocock، Dictionary کا AI Coding: مستند glossary; ماخذ کا §8.
Matt Pocock، Skills کے لیے حقیقی Engineers: installable skill pack استعمال ہوا throughout.
Andrej Karpathy، From Vibe کوڈنگ کو ایجنٹک انجینئرنگ: talk کہ names طریقہ کار، articulates سافٹ ویئر 1.0/2.0/3.0 framing، اور introduces jagged intelligence اور animals vs. ghosts lens استعمال ہوا میں §1 اور §2.
Boris Cherny (Anthropic)، Why کوڈنگ ہے Solved، اور کیا آتا ہے Next: creator کا Claude Code on his personal ورک فلو، "on-distribution" argument کے لیے stack choice، persistent loops اور routines، اور printing-press analogy استعمال ہوا میں §1.2، §2.3، §6.5.3، اور §10.
John Ousterhout، A Philosophy کا سافٹ ویئر ڈیزائن: deep modules، shallow modules.
David Thomas & Andrew Hunt، The Pragmatic Programmer: tracer bullets، headlights.
Eric Evans، Domain-Driven ڈیزائن: ubiquitous language.
Kent Beck، Extreme Programming Explained: invest میں ڈیزائن ہر دن.
Frederick P. Brooks، The ڈیزائن کا ڈیزائن: ڈیزائن tree، ڈیزائن concept.

Companion skills (یہ باب)

باب's pipeline چلتا ہے کے ذریعے six skills سے Matt Pocock کا pack، all linked یہاں کے لیے direct مطالعہ:

grill-me: Socratic interview کہ پیدا کرتا ہے ڈیزائن concept.
grill-with-docs: grilling کہ بھی writes CONTEXT.md اور ADRs inline ( "ubiquitous language" lineage سے §3 ناکامی 2).
to-prd: synthesise conversation میں ایک PRD.
to-issues: split PRD میں tracer-bullet tickets.
tdd: red-green-refactor، ایک slice پر ایک وقت.
improve-codebase-architecture: find shallow modules، propose deepenings، کھولیں ایک RFC.

ایک one-time bootstrap، setup-matt-pocock-skills، چلتا ہے پہلا per repo اور scaffolds issue-tracker config اور docs/agents/ layout engineering skills depend on.

Matt کا pack ships fourteen skills میں total (مکمل repo). Beyond seven-stage pipeline اور setup-matt-pocock-skills، یہ بھی includes diagnose (disciplined bug debugging)، triage (state-machine ticket triage)، zoom-out (broader-سیاق و سباق reframing)، prototype (throwaway ڈیزائن prototypes)، write-a-skill ( meta-Skill کے لیے creating نیا ones)، handoff ( session-to-session ہینڈ آف آرٹفیکٹ طریقہ کار سے §4.1)، اور caveman (terse-پرامپٹ طریقہ). They sit outside seven-stage pipeline مگر compose کے ساتھ یہ، اور ہر چلتا ہے identically میں Claude Code اور OpenCode. دیکھیں حصہ 5: تعمیر OpenClaw Apps کے لیے ایجنٹ فیکٹری Skillpack reference اور additional book-specific skills.

pipeline پر ایک glance​

1. سے Vibe کوڈنگ کو ایجنٹک انجینئرنگ​

1.1 سافٹ ویئر 3.0: ایک نیا Computing Paradigm​

1.2 Vibe کوڈنگ Raises Floor; ایجنٹک انجینئرنگ Preserves Ceiling​

2. Three Constraints ہر کوڈنگ Agent Inherits​

2.1 smart zone اور Dumb zone​

2.2 Memento Problem​

2.3 jagged Intelligence​

3. Six ناکامی طریقے کا AI کوڈنگ​

ناکامی 1: " agent didn't کریں کیا I wanted."​

ناکامی 2: " agent ہے طریقہ too verbose."​

ناکامی 3: " کوڈ doesn't کام."​

ناکامی 4: "We built ایک ball کا mud."​

ناکامی 5: "My brain سکتا ہے't رکھیں up."​

ناکامی 6: "I'm reviewing زیادہ کوڈ than I'm تعمیر."​

4. End-to-End ورک فلو​

4.1 دن Shift / Night Shift ماڈل​

4.2 Limits کا "Specs-to-Code"​

4.3 Vertical Slices اور Tracer Bullets​

5. skills بطور Encoded عمل​

5.1 کیا ایک skill ہے، اور کیا یہ Isn't​

5.2 کہاں skills Live​

5.3 Anatomy کا ایک SKILL.md​

5.4 Five Daily اصول (اور Today کا بہترین skills کے لیے ہر)​

6. pipeline میں Practice​

6.1 Stage 1: Grilling خیال​

6.2 Stage 2: سے Conversation کو PRD​

6.3 Stage 3: سے PRD کو Vertical-Slice issues​

6.4 Stage 4: implementation: TDD on ایک Slice​

6.5 Stage 5: AFK Loop​

6.5.1 Minimal AFK loop (bash)​

6.5.2 Parallel AFK orchestrator (TypeScript)​

6.5.3 persistent Loops اور Ambient Agents​

6.6 Stage 6: Human Review اور QA​

7. ڈھانچہ اصول کے لیے AI-Friendly Codebases​

7.1 Deep Modules پر Shallow Modules​

7.2 Test پر interface​

7.3 ڈیزائن interface، نمائندہ implementation​

7.4 improve-codebase-architecture skill​

8. Working Vocabulary​

9. عملی Drills​

10. Closing: حکمت عملی سے متعلق Programmer​

Further مطالعہ​

Companion skills (یہ باب)​