Skip to main content

General Agents ke Saath Problem Solving: 90-Minute Crash Course

7 Principles - 4 Tools - Real Use ka 80%


Monday subah do log aik hi agent kholte hain. Task bhi same hai: vendor contracts ke folder ka review, non-standard clauses flag karna, aur comparison memo banana.

Person A 22 minutes mein clean, verified output ke saath kaam finish kar leta hai. Person B 90 minutes correction loops mein guzarta hai, context polluted ho jata hai, aur end mein dobara shuru karta hai.

Agent same. Capability same. Farq kya tha?

Person A woh saat cheezein janta tha jo Person B nahin janta tha. Yeh course wahi saat cheezein sikhata hai.

Yeh kis ke liye hai. Har woh shakhs jo real problem solve karne ke liye general agent use karne wala hai. Engineers jo Claude Code ya OpenCode use karte hain. Domain experts, jaise lawyers, accountants, marketers, HR leaders, consultants, founders, jo Claude Cowork ya OpenWork use karte hain. Domain badalta hai; discipline nahin badalta.

General agent aik AI co-worker hai jo aap ki taraf se actions leta hai: commands chalata hai, files parhta hai, files likhta hai, services call karta hai. Yeh questions ka jawab dene wala chatbot nahin, hands wala tool hai.

Chaar tools:

  • Engineering: Claude Code, Anthropic ka terminal-native tool, aur OpenCode, open-source, model-agnostic terminal tool.
  • Knowledge work: Claude Cowork, domain experts ke liye Anthropic ka desktop agent, aur OpenWork, Different AI ka open-source desktop agent.

Jahan tools waqai different hain, is crash course mein four-column table hai. Baqi sab kuch chaaron mein same kaam karta hai. Har principle ki examples legal, accounting, marketing, hiring, engineering, ops jaise domains mein aati hain; jo example aap ke kaam se match kare woh zyada strongly land karegi, lekin principle har jagah same hai.

Is poore crash course ke neeche aik thesis hai. Yeh general agent use ka Mode 1 hai, yani problem-solving engagement. Aap tool kholte hain, aik cheez solve karte hain, session khatam hota hai, outcome ship hota hai. Mode 2, yani manufacturing engagement, woh hai jab aap Claude Code ya OpenCode se AI-Native Company ke liye durable AI Worker banate hain; woh alag course hai, alag rule set ke saath. Near future mein zyada tar professionals Mode 1 mein rahen ge, aur neeche ke saat principles isay reliable banate hain.

"Real Use ka 80%" se murad. Yeh content coverage claim hai, users ya sessions ka metric nahin. Mode 1 problem-solving mein aap general agent se jo zyada tar real work karwaen ge, coding, contract review, financial modeling, hiring loops, research briefs, marketing operations, woh inhi saat principles ko use karega. Long-tail edge cases, jaise deep performance tuning, multi-agent orchestration, custom evals, depth chapters mein aate hain jin ki taraf yeh crash course ishara karta hai. 80% se murad woh high-value subset hai jo learning time ka sab se zyada return deta hai.

Prerequisites. Yeh crash course assume karta hai ke aap AI Prompting in 2026 complete kar chuke hain, aur kam az kam aik tool-pair crash course kar chuke hain: Claude Code & OpenCode ya Cowork & OpenWork. Yahan ki discipline tool surface ke oopar baithti hai, us ki jagah nahin leti.

Three reading paths: 30-minute taste, 90-minute essential, full read.

Apna path choose karein:

  • 30-minute taste (first-time readers, especially non-engineering): Sirf Principles 1, 3, aur 5 parhein. Har aik ke neeche Hands-on: Hello world karein. Teen bari shifts feel karne ke liye itna kaafi hai: chatbot se baat karne ke bajaye hands ko brief karna, "looks right" par trust na karna, aur decisions ko file mein rakhna.
  • 90-minute essential path (standard read): intro, Part 1, Parts 2-4, aur har principle mein table, examples, aur Hands-on: Hello world parhein.
  • Full read (~2 hours): Sab kuch, including Now apply to your own work subsections. Behtar hai 90-minute path ke kuch din real work mein settle hone ke baad karein. Aik hi sitting mein sab khatam karne ki koshish na karein.

Non-engineering readers: examples ke code blocks skim karein, Principle 5 mein system-of-record note skip kar sakte hain. Baqi page aap ke liye hai.

Safety. Agents aap ki taraf se act karte hain: files parhte hain, files likhte hain, commands chalate hain, services call karte hain. Jab tak tool ka permission model na samajh lein, broad access na dein. Yeh configuration decide karti hai agent kin files ko touch kar sakta hai aur kin services ko call kar sakta hai. Isay aik dafa set karein to yeh sessions ke paar persist karta hai; ghalat set kiya to aik bad prompt aap ke irade se zyada door ja sakta hai. read-only ya approve-each-step mode se shuru karein. Principle 6 isay concrete banata hai.

Deep version chahiye? Yeh crash course saat principles ka aik read hai. Full treatment ke liye Chapter 18: The Seven Principles of General Agent Problem Solving dekhein. Tool-specific depth ke liye Claude Code and OpenCode: A 90-Minute Crash Course aur Cowork and OpenWork: A 90-Minute Crash Course hain. Yeh page principles hai; woh pages surfaces hain.


Essentials: paanch bullets

Agar aap sirf yeh paanch baatein internalize kar lein to value ka bara hissa mil jata hai:

  1. Action, talk se zyada important. General agent ki value kaam karne se aati hai: commands chalana, files parhna, services call karna. Har prompt ko action ya artifact ki taraf le jayein, sirf explanation ki taraf nahin.
  2. Code aur structured artifacts, prose se behtar. Jab precision important ho to schema, table, code block, checklist maangein; paragraph nahin. Jab format constrained ho to output quality tez behtar hoti hai.
  3. Verify karein, blind trust na karein. Har meaningful output ko verification step chahiye: code ke liye tests, memo ke liye rubric, high-stakes deliverable ke liye cross-model review. "Looks right" failure mode hai.
  4. Chote steps, atomic checkpoints. Kaam ko reversible units mein torein. Har unit ke baad commit, snapshot, ya save-version karein. Agent ko aik ghanta baghair checkpoint ke na chalne dein.
  5. Files memory hain. Conversation volatile hai; filesystem durable hai. Jo cheez sessions ke paar yaad rehni chahiye, decisions, plans, conventions, glossaries, woh chat history ke bajaye file mein honi chahiye.

Baqi do principles, constraints aur observability, pehle paanch ko operationalize karte hain. Woh agent ko aap ki lane mein rakhte hain aur batate hain ke agent wahi raha ya nahin.

The chapter in five disciplines: action over talk; code over prose; verify don't trust; small atomic steps; files are memory. The remaining two principles wrap the first five. Figure 1: Paanch core disciplines, do operational principles ke andar. Isay print kar ke monitor par laga lein.


Yeh Principles Purane Kyun Lagte Hain: Lindy Effect

Jo technologies decades tak survive kar chuki hon, woh aam tor par flash trends se zyada der chalti hain. Terminal, files, Git, SQL: yeh sab purane daur ki cheezein hain, magar aaj bhi real work karti hain. Is pattern ka naam Lindy Effect hai. Technical version yeh hai ke kai categories mein lambi past survival future survival ka evidence hoti hai; jo tool 40 saal useful raha hai, woh shayad 4 saal useful rehne wale tool se zyada durable hai. Practical version: right category mein age resilience ka evidence hai.

Yeh is liye important hai ke general agents existing infrastructure se alag duniya mein kaam nahin karte. Woh unhi surfaces ke zariye act karte hain jinhein engineers decades se use kar rahe hain: terminal, files, Bash, Git, SQL, logs, schemas, tests, version control. Agent natural language mein reason karta hai; act proven interfaces ke zariye karta hai. Yeh interfaces is liye survive hue kyun ke yeh kaam karte hain.

Teen implications:

  1. Purani technologies zyada important ho jati hain. Bash agents ko execute karne deta hai. Git agents ko track aur reverse karne deta hai. SQL structured truth query karwata hai. Files agents ki persistent working memory banti hain. Lindy stack hi agent ka stack hai.
  2. Coding ghaib nahin hoti; human ka role badalta hai. Pehle insan zyada code likhte thay; ab do kaam zyada important hain: problem ko precisely define karna, spec, schema, typed signature ki shape mein, aur output itna parhna ke verify kar sakein. Agent likhta, modify karta, test karta, execute karta hai. Defining aur reading skills har automation shift mein survive karti hain.
  3. Agents ko action surfaces se paanch properties chahiye: stable, typed, reversible, inspectable, governable.

Five properties agents need: Stable, Typed, Reversible, Inspectable, Governable, with long-lived technologies that already have each property.

Agentic era old stack ko replace nahin karta; usay activate karta hai. Neeche ke saat principles unhi foundations ko agent ke zariye use karne ki operator discipline hain.


Part 1: The Seven Principles

#PrincipleYeh kis failure mode ko rokta hai
1Bash is the Key"Agent sirf baat karta hai, act nahin karta"
2Code as Universal Interface"Prose request baar baar ghalat samjhi ja rahi hai"
3Verification as Core Step"Output sahi lagta hai magar production mein toot ta hai"
4Small, Reversible Decomposition"Aik bari change ne afternoon kharab kar di"
5Persisting State in Files"Agent bhool gaya kal kya decide kiya tha"
6Constraints and Safety"Agent ne woh files touch kar dein jin ki ijazat nahin thi"
7Observability"Pata nahin agent ne asal mein kya kiya"

Yeh importance ke order mein nahin; yeh building dependency ke order mein hain. Har principle oopar wale se pehle aane walon par khara hai. Kam az kam aik dafa sequence mein parhein.

P1 aur P2 milte julte lagte hain magar problem alag solve karte hain. P1 is bare mein hai ke agent act karta hai ya nahin; failure mode narration hai, action nahin. P2 output ki shape ke bare mein hai; failure mode fluent prose hai jab aap ko structured artifact chahiye tha. Dono chahiye: P1 action deta hai, P2 us action se useful artifact dilwata hai.

Seven principles arranged as a pyramid: P1 is the widest foundation, P7 is the apex. Dependency pyramid: P1 foundation hai; oopar ka har principle neeche walon par khara hai.

Thesis aik line mein. Principles session ko govern karte hain; tools usi session ke interfaces hain. Principles ke saath sochna seekhein, skill har tool mein transfer ho jati hai.


Principle 1 - Bash is the Key

"Bash" se murad kya hai. Terminal woh black-screen text interface hai jo har laptop ke saath aata hai. Bash us ke andar use hone wali command language hai. Jab agent Bash chalata hai to woh wohi commands type kar raha hota hai jo aap Terminal app ya Windows par PowerShell mein type karte. Cowork aur OpenWork users ke liye yahi principle different surface par hai: typed commands ke bajaye step cards. Dono surat mein agent aap ke computer par act karta hai, aur aap usay act karte dekhte hain.

Failure mode: "Agent kaam karne ke bajaye sirf baat kyun kar raha hai?"

General agent ki defining capability yahi hai ke woh actions le sakta hai: command chalana, file parhna, file likhna, service call karna, aur in actions ko chain kar ke task complete karna. Yeh code janne wala chatbot nahin; hands wala co-worker hai. Pehla principle yeh hai ke usay isi tarah treat karein.

Novice trap. Naye users agent se sawal poochte hain: "last week's customer interviews kaise summarize karun?" Jawab mein wandering essay aata hai. Agent ke paas hands thay, aap ne advice maangi. Fix: action specify karein. "How should I summarize..." chatbot prompt hai. "Read every transcript in /interviews/week-12; har transcript se customer name, top three pain points, pricing objections nikalo, week-12-themes.md mein pain-point frequency ke hisab se save karo" agentic prompt hai.

Har Tool Mein "Bash" Ka Matlab

Claude CodeOpenCodeCoworkOpenWork
Action surfaceTerminal: aap ki machine par shell commandsSameLocal Linux VM; sirf granted folders parhta/likhta haiSame as Cowork
Visible asTerminal mein commands streamSameSide panel mein step cardsStep chevrons ki timeline
Approval defaultHar Bash action se pehle poochta hai; allow-listed commands silentSame; configurablefiles likhne, messages bhejne, scheduling se pehle poochta haiSame; per-tool granularity
Quiet failureapproval ka intezar jo aap ne notice na kiyaglobal "permission": "allow" soche baghair setuntrusted document mein hidden instructionsSame; many connectors se amplified

Mental model: agent ke hands hain. Brain ko nahin, hands ko brief karein.

Examples

Shape hamesha aik jaisi hai: specific inputs par action, specific result ke saath. Chatbot column woh jagah hai jahan aksar naye users pehle month rehte hain; agent column productive use ki jagah hai.

Litigation, 47 deposition PDFs:

  • Chatbot: "deposition transcripts mein indemnification ka kya matlab hai?" -> essay, files untouched.
  • Agent:
Search every PDF in /depositions for "indemnification" and close synonyms.
For each hit, return file name, page number, and surrounding paragraph.
Save to indemnification-hits.md.

Cluttered Downloads folder:

  • Chatbot: "messy Downloads folder kaise organize karun?" -> generic folder hygiene blog post.
  • Agent: "My ~/Downloads folder is a mess. What's actually in there?" -> agent ls -la, find, du -sh khud chalata hai, file types classify karta hai, space hogs dikhata hai. Principle "Bash use karo" nahin; principle hai action surface use karo, command agent ko pick karne do.

Accounting, bank reconciliation:

Open bank-statement-march.csv and gl-export-march.xlsx. Match each bank
transaction to a GL entry (same date +/-2 days, same amount, same vendor).
List unmatched items in march-reconciliation-gaps.md, split into
"in bank not GL" and "in GL not bank".

Marketing, Q3 campaign performance:

Read every campaign-2025-Q3-*.csv in /campaigns/Q3. Produce a table:
campaign name, send date, sends, opens, open rate, clicks, click rate,
conversions. Sort by open rate descending. Save to Q3-campaign-summary.md.

Prompt pattern: Jab bhi aap sawal type karne lagen, poochhein: kya isay action + artifact mein rephrase kiya ja sakta hai? Aam tor par jawab haan hota hai.

Hands-on: Hello world

Setup (30 seconds):

  1. Pack 1 - Cluttered folder download kar ke unzip karein.
  2. unzipped folder apne tool mein open karein aur downloads/ subfolder ki read access dein.

Yeh prompt verbatim paste karein:

What's in ./downloads/?

Yahi poora prompt hai. Paanch words. Na "kaise dekhna hai", na file likhni hai, na structure. Sirf sawal.

Aap ko kya nazar ana chahiye. Agent khud commands ki choti cascade chalata hai. Terminal ya step cards mein kuch aisa stream hoga:

$ ls -lh ./downloads/
total 0
-rw-r--r-- invoice-globex-march.pdf 0B
-rw-r--r-- invoice-globex-march (1).pdf 0B
-rw-r--r-- invoice-globex-march-final.pdf 0B
...
(41 more entries)
-rw-r--r-- SIZES.txt 1.1K

$ find ./downloads -type f | wc -l
53

$ cat ./downloads/SIZES.txt
88K invoice-globex-march.pdf
88K invoice-globex-march (1).pdf
91K invoice-globex-march-final.pdf
...

Phir agent chat mein grounded summary dega: 53 files, invoices, vendor contracts, roadmap drafts, design zips, screenshots, installers, duplicate clusters, aur largest items.

Principle moment. Aap ne ls, find, ya cat nahin likha. Agent ne khud pick kiya, order bhi khud decide kiya. Yahi action surface hai. Us ne files move nahin ki, disk par kuch write nahin kiya, sizes invent nahin kiye; empty stubs dekh kar SIZES.txt khola. Chatbot prompt generic advice deta; yeh prompt aap ki specific 53 files par grounded answer deta hai.

Agar aisa na ho: agent commands chalane ke bajaye narrate kare, ya folder touch karne se pehle clarifying question pooche, to reply karein: "Just look. Run the commands." Yahi P1 correction hai: verb dobara clear karein.

Ab Isay Apne Kaam Par Lagayein

Curated Downloads folder asan tha. Asal test woh folder hai jisay aap avoid kar rahe hain: two-year Dropbox, nau hazar emails wala Inbox, shared drive jahan har client ki naming convention alag hai.

Brief likhein, method nahin. Aik sentence. input ka naam, output ka naam, read-only boundary:

The folder at <path> has been collecting <thing> for <how long>.
Inspect it and write me a <named output file> that <decision the
output should support>. Read-only, don't change anything.

Claude Code / OpenCode mein notice karein ke commands aap ne type nahin ki. Cowork / OpenWork mein execution view step cards se bharega. Agar aap "use find" ya "spreadsheet open karein" likh rahe hain to aap outcome ke bajaye method dictate kar rahe hain. Kaise wale verbs cut karein; sirf end mein kya chahiye woh rakhein.


Principle 2 - Code as Universal Interface

Failure mode: "Prose request baar baar misread kyun hoti hai, aur agent apps ki existing features ke edge par kyun ruk jata hai?"

Sarah ke paas Southeast Asia trip ki 3,000 photos thin: phone, camera, backup drive mein scattered, filenames IMG_4521.jpg jaise. Usay country/city ke hisab se organize karna tha, filenames mein dates, duplicates actual image content se remove. Teen photo apps try ki; har aik ne kaam ka hissa kiya, combination kisi ne nahin kiya.

Us ne general agent ko aik paragraph diya: "I have 3,000 photos in three folders. I want them organized by country and city based on location data, renamed YYYY-MM-DD-original.jpg, duplicates detected by image content." Pandrah minutes baad kaam done tha. Agent ne short program likha: EXIF location parhi, reverse-geocode kiya, date se rename kiya, image bytes hash kar ke duplicates nikale. Sarah ne code nahin likha; agent ka interface us ke computer se code tha.

Yeh principle do halves mein hai: single command se richer actions ke liye code agent ka acting medium hai; aur aap jo shape maangte hain woh khud interface hai. Natural language ambiguous hai; schema, typed signature, structured template nahin.

Kya Bash Pehle Hi Code Nahin?

SurfaceRoleKya karta hai
Bash (P1)handsnavigate, search, move, observe, one command at a time
Code (P2)braincompute, transform, orchestrate, persist, integrate

Bash folder kholta hai; code har file parhta hai, bytes hash karta hai, compare karta hai, deduplication report likhta hai. Jahan kaam "look here, move that" se "compute, decide, build a thing" mein enter ho, aap P2 mein hain.

Code Paanch Powers Unlock Karta Hai

  1. Precise thinking. Code compute karta hai; approximate nahin. Expense analysis mein agent category totals to the cent nikalta hai, spikes flag karta hai, quarter-over-quarter percentages banata hai.
  2. Workflow orchestration. Real tasks branching trees hote hain: PDF + "Invoice" -> Finances, image -> Images, else -> Other. Code ke baghair agent har branch par poochta hai; code ke saath tree once likhta hai aur end-to-end chalata hai.
  3. Organized memory. Big jobs ko intermediate state, scratch files, cached lookups, final report ke liye jagah chahiye. Filesystem task ki working memory banta hai.
  4. Universal compatibility. Guest list spreadsheet mein, dietary notes emails mein, RSVPs web form mein, itineraries PDFs mein. Aik app sab nahin parhti; code parh leta hai.
  5. Instant tool creation. Jab koi app exact kaam nahin karta, agent chota tool bana deta hai: data model, scripts, weekly report, tracker.

Yeh memorize karne ki checklist nahin; vocabulary hai jis se aap notice karte hain ke kya possible tha jahan pehle aap off-the-shelf tool ki limit accept kar lete.

Do Kaam Ab Bhi Aap Ke Hain

Engineers ke liye: problem ko precise spec, interface, schema, typed signature, structured output, constraint ki shape mein define karein; phir code itna read karein ke verify kar sakein.

Domain experts ke liye: deliverable ki shape define karein: template, sections, max lengths, column structure, allowed values. Phir output factual grounding ke liye parhein: claim source document tak trace hota hai ya nahin? number row tak tie hota hai ya nahin?

Spec-writing skill aur reading skill automation ke har shift mein survive karte hain.

Examples

Pattern universal hai: shape dein, agent bhare.

  • Lawyer, deposition summary: one row per witness, columns: admissions, denials, follow-ups, har cell mein page:line citation.
  • Consultant, interview synthesis: fixed sections: stated problems, unstated problems with evidence, quotes, open questions; 1 page max.
  • HR, candidate screening: per-resume template; required quals Y/N with evidence; recommendation ADVANCE / HOLD / DECLINE.
  • Sales, deal review memo: summary, risks, mitigations, decision GO / NO-GO / HOLD, open questions.
  • Database schema: CREATE TABLE mein NOT NULL, CHECK, REFERENCES; database bad data ko write time par reject karti hai.
  • Function signature: signature pehle, tests second, implementation last.

Escape hatch. Brainstorms, creative drafts, explainers ke liye prose thik hai. Structure ki taraf jane ka signal: do iterations ke baad bhi output wrong ho.

Subtle wrinkle. Code-as-interface inputs par bhi apply hota hai. Paanch vendor proposals compare karni hain to unhein consistent columns wali table mein dein, five prose blocks mein nahin.

Hands-on: Hello world

Setup (30 seconds):

  1. Pack 2 - Receipts download kar ke unzip karein. receipts/ mein 15 fake receipts hain: photos, PDFs, screenshots; do planted outliers hain.
  2. Folder apne tool mein open karein aur receipts/ ki read access dein.

Yeh prompt verbatim paste karein:

I want to understand why general agents that write code are more powerful
than specialized tools.

Here is my situation: I have a folder ./receipts/ with 15 receipts in mixed
formats - 5 phone photos of paper receipts, 5 PDF email receipts, and 5 app
screenshots. I need to:
1. Extract the date and amount from each receipt
2. Categorize them (groceries, dining, transportation, etc.)
3. Create a monthly summary showing totals by category
4. Flag any unusually large purchases

Walk me through how you would approach this. Don't write actual code; I'm
still learning. Instead, explain:
- What different steps would you take, in order?
- How does this approach give you flexibility a pre-built receipt app
would not have?
- Which of the Five Powers (precise thinking, workflow orchestration,
organized memory, universal compatibility, instant tool creation) is
each step using?

Aap ko kya nazar ana chahiye. agent pehle receipts/ inspect karega: three subfolders, 15 mixed-format files. Phir 5-to-8 step plan dega: OCR/vision for JPG/PNG, PDF text extraction, structured row per receipt, category classification, monthly aggregation, outlier threshold. Har step ke saath Five Powers ka naam hona chahiye.

Principle moment. agent ne "Expensify open karo" nahin kaha; kyun ke koi specialized tool teen formats, custom category rules, custom outlier rule, aur arbitrary output path ko aik saath nahin sambhalta. Agent ne pipeline sketch ki: format-crossing, precise computation, orchestration, organized memory, tool creation. Yahi "code as universal interface" kharidta hai.

Optional follow-up:

Now execute step 1 only. Read every file in ./receipts/ across all three
subfolders, extract the date and amount from each, and save the results to
extracted.csv with columns: file_path, date, amount, source_format
(photo / pdf / screenshot). Show me the file when you're done.

Ab Isay Apne Kaam Par Lagayein

Apne kaam ka woh recurring task choose karein jo do ya zyada apps mein phaila hua hai: CRM se deal data scorecard mein, three accounts se expenses reconcile, inbound resumes/contracts/PDFs se structured report. Yahi two-or-more-tools diagnostic hai.

Situation likhein, inputs aur output shape dein, phir kahein:

Walk me through how you'd approach this. Name which of the Five Powers
each step uses. Then, when I say go, execute step 1 only and produce the
artifact for it.

Failure dekhein: "script likh do" ke bajaye "approach samjhao" kahein. Design choices pehle surface hon to execution par aap jante hain agent kya karega aur kyun.


Principle 3 - Verification as a Core Step

Failure mode: "Output sahi lagta hai magar production mein toot ta kyun hai?"

Finished-looking output verified output nahin hota. Models plausible output banate hain; plausible correct nahin hota. Woh list items miscount kar sakte hain, nonexistent paragraph cite kar sakte hain, code compile karwa sakte hain jo third edge case par silently fail ho. Verification workflow ka step hona chahiye, afterthought nahin.

Har Tool Mein Verification Ka Matlab

Claude CodeOpenCodeCoworkOpenWork
Primary mechanismunit tests, type-checks, lintersSameoutput rubric: required sections? claims sourced?Same
Automated gate.claude/settings.json hook commit block kar sakta haiplugin samesecond agent pass rubric ke against score karta haiSame; smaller model bhi use ho sakta hai
Cross-model reviewdoosre model family wala tool diff critique karta haiSamesecond chat: "Find what's wrong with this memo"second provider se cross-pass
Quiet skiptests pass magar right things test nahin huinSame"memo looks good" without source checkSame

Key rule: Jis agent ne output banaya hai wohi us output ka worst verifier hai. Verification ko independent path chahiye: aap ki reading, different model, test, type-checker, database constraint.

Examples

Shape hamesha aik jaisi hai: har factual claim separate row, har row ko source location, unsourced row flag.

  • Litigation citations: brief ne case ko ghalat proposition ke liye cite kiya; verification prompt underlying opinion khol kar exact supporting paragraph quote karata hai.
  • Insurance policy limits: summary says $250K, magar water damage sublimit $100K hai; prompt har policy figure ko exact policy section se quote karata hai.
  • Clinical research: "no Grade 3 events" claim CRF rows se contradict hota hai; verification rows quote kar ke discrepancy pakarta hai.

High-stakes deliverable prompt pattern:

Before saving the final version, verification pass:
- List every factual claim in the draft
- For each one, identify the source location and quote the supporting text
- Flag any claim you cannot ground
Refuse to save until every flag is resolved.

Finance number mismatch: agent ne SQL se West revenue $4.2M nikala, ledger $3.8M kehta hai. Usi agent se "query correct hai?" poochna independent verification nahin. SQL editor mein query kholein, WHERE, JOIN, GROUP BY read karein, predicted rows sochein, phir run karein. Destructive queries ko BEGIN; ... ROLLBACK; mein rehearse karein.

Hands-on: Hello world

Setup (30 seconds):

  1. Pack 5 - Verification download karein. Is mein deliverable/Q3-variance-memo-DRAFT.md aur sources/ CSVs hain.
  2. Tool mein folder open karein, deliverable/ aur sources/ ki read access dein.

Yeh prompt verbatim paste karein:

Read deliverable/Q3-variance-memo-DRAFT.md. For every factual claim
(numbers, named causes, "largest/biggest" rankings), find the supporting
evidence in sources/ and quote the exact rows or cells. Flag any claim
where the source disagrees or where no row supports it. Save the audit
to VERIFICATION.md with two sections: Confirmed and Flags.

Aap ko kya nazar ana chahiye. agent memo parhta hai, teen CSVs kholta hai, VERIFICATION.md likhta hai. Har claim ka state hona chahiye: GROUNDED, DISCREPANT, UNSUPPORTED. Planted errors mein rent transposition, salaries sign flip, totals miscount, fabricated cause, wrong-superlative shamil hain.

Principle moment. Paanch claims fluent aur professional lag rahe thay; verification pass unhein source ke samne khara karta hai. Verification pass necessarily smarter nahin; step different hai.

Agar audit sirf "all claims appear consistent" kahe: reply karein: "For each claim, quote the exact CSV row or cell that supports it. If you can't quote a row, the claim is unsupported. Re-run."

Ab Isay Apne Kaam Par Lagayein

Is week ka woh output choose karein jis ke wrong hone ki cost sab se zyada hai: numbers wala memo, citations wali brief, decision recommending analysis. Output aur sources name karein:

Verify every factual claim in <output-file>. For each claim, quote the
exact row, sentence, or section from <sources> that supports it. Flag
any claim you can't ground. Save the audit to <output>-verification.md.

Audit mein literal quotes hone chahiye, summary judgments nahin. Agar source quote nahin to claim unsupported hai.


Principle 4 - Small, Reversible Decomposition

Failure mode: "Aik bari change ne meri afternoon kyun kharab kar di?"

Kaam ko smallest reversible units mein torein. Har unit land karein, verify karein, checkpoint karein, phir agla start karein. Big atomic changes debug mein der lagate hain, review mushkil banate hain, aur failure mode "paanch minute throw away" ke bajaye "aik ghanta throw away" bana dete hain.

Rule of thumb: agar change reverse karne mein do minutes se zyada lagein, change bahut bari thi.

Har Tool Mein Decomposition Aur Reversibility

Claude CodeOpenCodeCoworkOpenWork
Atomic unithar working step ke baad Git commitSamenumbered versions (memo-v1.md) ya drafts/Same; /undo via git
Undogit revert ya git reset; Esc Esc conversation rewind/undo conversation + file changesnumbered version copy back/undo
Course correctionEsc interruptSameStop buttonSame
Breaks when15 files wali 200-line refactor aik prompt meinSameoriginal overwriteSame

Enforcement prompt:

Break this task into the smallest steps you can. After each step:
1. Show me what you did
2. Run the verification check for that step
3. Commit / save a numbered version
4. Wait for my OK before starting the next step

Examples

  • Settlement letter: facts -> pause -> legal theory -> pause -> demand -> pause -> deadline. Step 2 par drift catch karna cheap hai; whole letter ke baad catch karna rewrite hai.
  • Q3 board memo: one-shot six pages with wrong tone vs outline -> section 1 -> section 2; har boundary par issue catch.
  • Excel model: assumptions tab -> revenue build -> operating expenses; har tab previous ke against validate.
  • Brand guide rewrite: voice principles -> tone by audience -> do's/don'ts; page 11 ki rules page 12 par na kho jayein.

System-level lesson Pixar disaster se hai: reversibility system property honi chahiye, not daily discipline you might forget. git commit har meaningful step ke baad disaster ko nuisance bana deta hai.

Hands-on: Hello world

Setup (30 seconds):

  1. Pack 3 - Decomposition download karein. inputs/case-brief.md aur inputs/firm-style-guide.md maujood hain.
  2. Folder tool mein open karein, inputs/ ki read access dein.

Yeh prompt verbatim paste karein:

Draft a demand letter for the dispute in ./inputs/case-brief.md, following
./inputs/firm-style-guide.md. Do it twice: once as a single prompt
(save as letter-A-big-prompt.md), then again in four steps, facts,
legal theory, demand, deadline, pausing after each so I can read.
Save the final decomposed version as letter-B-final.md.

Aap ko kya nazar ana chahiye. Run A one shot fluent draft dega. Run B facts ke baad stop karega, phir legal theory, demand, deadline. Dono files side-by-side kholein. Run A aam tor par banned phrase, ungrounded damages figure, vague deadline, ya settlement-floor disclosure chhor deta hai; Run B cleaner aata hai kyun ke har section short enough hai.

Principle moment. Model bad nahin tha; task too large tha. Chaar sections ko chaar prompts mein, checkpoints ke saath, same intelligence behtar work deti hai.

Ab Isay Apne Kaam Par Lagayein

Woh multi-section deliverable choose karein jo one-shot mein drift hua tha. Chaar se saat dependency-ordered steps likhein, har step ke saath one-line verification check:

Produce <deliverable> in <N> steps:
Step 1: <section> only. Stop and wait for my OK.
Step 2: <next section>. Verify against <check>. Stop.
...
Save numbered versions as you go (-v1, -v2, ...).

Halfway through agent "finish the rest" offer kare to decline karein: "Step at a time. Show me step 3 only."


Principle 5 - Persisting State in Files

Failure mode: "Agent kal ka decision kyun bhool gaya?"

Conversation volatile hai. Filesystem durable hai. Jo cheez sessions ke paar carry honi chahiye, project conventions, decisions, glossaries, plans, woh file mein honi chahiye, chat history mein nahin. Jab agent session start par woh file parhta hai to aap re-explaining chhor dete hain aur agent forgetting chhor deta hai.

Is course mein is file ka naam rules file hai. Claude Code aur Cowork mein CLAUDE.md; OpenCode aur OpenWork mein AGENTS.md. Idea chaaron mein same hai: project ya folder root par short markdown file jo agent automatically read karta hai.

Rules File Ki Shape

Mechanics chaaron tools mein lagbhag same hain: folder root par short markdown file, session start par auto-loaded, folder contents se draft ki ja sakti hai, roughly 2,500 tokens se kam, deeper docs reference links se. Most common mistake isay documentation bana dena hai. Right model: table of contents, encyclopedia nahin.

# Project: [name]

## What this is
[Two lines: domain, audience]

## Where things live
- folder-a/: [what's in it]
- folder-b/: [what's in it]

## Critical rules
- [The one mistake people keep making]
- [A non-obvious convention]
- [A thing that's expensive to undo]

## On-demand references
- @docs/conventions.md

Examples

Lawyer matter folder:

# Matter: Smith v. Acme (S.D.N.Y. 1:24-cv-04567)

## Parties
- Plaintiff: "Ms. Smith" or "Plaintiff", never bare "Smith".
- Defendant: "Acme". Full entity list: see `parties.md`.

## Citation style
Bluebook 21st. Pin-cites required for every record reference (`Tr. 142:18-143:4`).

## Where things live
- /pleadings: filed papers (do not edit)
- /depositions: transcripts as `YYYY-MM-DD-LASTNAME.pdf`
- /correspondence/opposing: untrusted, never run high-autonomy on these
- /our-drafts: in-progress work

## Critical rules
- Never finalize a brief citing a record passage we haven't quoted in full.
- Flag anything that may waive privilege before saving the draft.

Monthly close:

# Monthly close, FY26

## Variance thresholds
- Flag any GL line variance > $5,000 OR > 10% vs. prior month (whichever is larger).
- Material variances (>$25K) require commentary.

## Commentary tone
"[Account] variance of $X driven by [cause]." Max 2 sentences per line. No speculation.

## Critical rules
- Never cite a dollar amount not confirmed against the GL detail file.
- Round to nearest $1K in commentary; full precision lives in the workbook.

Hiring loop:

# Hiring loop: Senior PM, Growth team

## Job spec
Lives at `job-spec.md`. Required qualifications are the must-haves;
preferred are signals.

## Panel calibration
- Required-qualification gaps: hard fail, no further review.
- Preferred-qualification matches: count and weight per `weighting.md`.
- Credential discrepancies (school, dates, title): flag for human
verification, never auto-accept.

## Where things live
- /inbound: incoming resumes as PDF
- /shortlist: candidates advanced to phone screen
- /scorecards: panel scorecards as `scorecard-CANDIDATE-INTERVIEWER.md`

## Critical rules
- Never include candidate names in scheduled-task outputs (privacy).
- Always flag credential claims for human verification before advancing
a candidate.

Engineering project:

# Project: my-app

## Stack
Next.js 14, TypeScript, Postgres 16 on Neon (free tier), Drizzle ORM.

## Commands
- `npm run dev`: local server (also runs db:migrate)
- `npm test`: vitest
- `npm run db:branch <name>`: spin a Neon branch for risky migrations

## Critical rules
- Never edit files in `src/generated/`. They're rebuilt by codegen.
- All API routes use auth middleware in `src/lib/auth.ts`.
- Destructive migrations rehearse on a Neon branch first, never on `main`.
- Run `npm test` before committing; do not commit a red build.

Second persistence pattern: multi-session tasks ke liye plan ko docs/plans/feature-name.md mein save karein. Resume: "Read plans/q4-launch.md and continue from step 4."

Hierarchy: Conversation = volatile. Project files = durable. Referenced files = on-demand.

Hands-on: Hello world

Setup (30 seconds):

  1. Pack 6 - Hiring loop persistence download karein. folder mein job spec, weighting, five resumes, aur reference CLAUDE.md hai.
  2. Folder tool mein open karein. Reference rules file abhi na dekhein.

Run A:

Read every resume in inbound/. For each candidate produce a short
recommendation: ADVANCE, HOLD, or DECLINE, with a one-sentence
rationale. Save to inbound-screen-runA.md.

Rules file draft karein:

Read this folder. Draft a CLAUDE.md (under 250 words) covering what
this folder is, where things live, the hiring conventions, and three
to five critical decision rules, especially around credential
verification and required-vs-preferred gaps.

Draft edit kar ke folder root par CLAUDE.md save karein.

Run B, same prompt:

Read every resume in inbound/. For each candidate produce a short
recommendation: ADVANCE, HOLD, or DECLINE, with a one-sentence
rationale. Save to inbound-screen-runB.md.

Principle moment. Run A aur Run B side-by-side kholein. Carlos ka MBA date mismatch Run A mein buried rehta hai; Run B mein credential-verification rule ki wajah se HOLD ban jata hai. Aap ne Run B prompt mein credentials dobara mention nahin kiye. Rule file mein tha, agent ne session start par parh liya.

Ab Isay Apne Kaam Par Lagayein

Woh folder choose karein jis mein har session aap same context retype karte hain. Paste karein:

Read this folder. Draft a CLAUDE.md (or AGENTS.md) under 250 words:
what this is, where things live, three to five conventions I would
normally state manually, and three rules that are expensive to get
wrong. Cite the files you read to justify each line.

Generic lines cut karein. Agar line kisi bhi folder ke liye true ho sakti hai to woh useful nahin. 500 words se zyada ho to documentation ban gayi hai.


Principle 6 - Constraints and Safety

Failure mode: "Agent ne woh files kyun touch kar dein jin ki ijazat nahin thi?"

Constraints friction nahin; autonomy ko enable karte hain. Jo agent kuch bhi kar sakta hai, usay har second dekhna parta hai. Jo agent specific folder, connector list, approval mode tak constrained ho, usay walk away rung tak le jaya ja sakta hai.

Teen Universal Trust Levers

  1. Scope: agent kaun si files/folders/data dekh sakta hai.
  2. Connections: agent kaun si external services tak pohanch sakta hai.
  3. Approvals: kab agent aap ke OK ke liye pause karta hai.
LeverClaude CodeOpenCodeCoworkOpenWork
Scopecwd per-directorySame"Choose folder" cardworkspace folder
ConnectionsMCP servers in .mcp.json ya ~/.claude.jsonMCP in opencode.jsonCustomize > ConnectorsExtensions tab
Approvalsallow/deny lists; Shift+Tab plan modepermissions; Tab Plan agentper-action cardspermission stack

Autonomy Ladder

A five-rung ladder: Watching closely to Scheduled. Figure 2: autonomy ladder. Track record ke saath deliberately climb karein; task type badalne par step back down karein.

  1. Watching closely: novel task ke liye default. plan parhein, step watch karein, har action approve karein.
  2. Ambient supervision: task teen chaar dafa surprise ke baghair hua; plan read/approve, phir kuch minutes baad execution view check.
  3. Walk away: pattern trusted hai; task start kar ke finished deliverable par wapas aayen.
  4. Act without asking: approval pauses nahin, magar active watching hai. Sirf 5+ clean runs aur trusted inputs ke liye.
  5. Scheduled / automated: recurring hands-off; sirf tasks jo walk-away par already trusted hon.

Rule: agar task ko walk-away par trust nahin karte, usay schedule na karein.

Prompt-Injection Trap

Agar agent outside content parhta hai, opposing-counsel email, inbound resume, vendor PDF, unknown webpage, woh content hidden instructions rakh sakta hai. Defense:

  • untrusted content par high-autonomy na chalayein.
  • plan mein unmentioned files/connectors aayen to approve na karein.
  • drift hote hi Stop karein.

Examples

  • Lawyer: one matter per project; /matters broad access cross-matter contamination kar sakta hai.
  • Dispatcher: read-only CRM OAuth route optimization ko write-back se rokta hai.
  • Healthcare: PHI restricted folder agent session mein na aaye; de-identified /operations hi grant karein.
  • Procurement: vendor PDF mein hidden instruction send-email ki koshish kare to connector scope catch kare.

Pattern: install-time constraints durable hain; prompt constraints aspirational hain.

rm -rf block hook:

{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"command": "if echo \"$TOOL_INPUT\" | grep -q 'rm -rf'; then echo 'Blocked: rm -rf denied by hook' >&2; exit 2; fi"
}
]
}
}

Constraint config mein rehta hai, prompt mein nahin. Yahi shape git push -f, npm publish *, DROP TABLE ke liye bhi hai.

Hands-on: Hello world

Setup (90 seconds):

  1. Pack 1 dobara use karein.
  2. Tool permission config tighten karein:
    • Claude Code: .claude/settings.json mein reads only downloads/, writes deny, Bash(rm:*) deny.
    • OpenCode: opencode.json mein similar map.
    • Cowork / OpenWork: folder grants UI mein only downloads/ access, approval mode "ask before every action".

Prompt:

Read ./downloads/ and write me an ORGANIZATION-PLAN.md with what's
in there, the duplicates, and a proposed structure. Don't move
anything.

Aap ko kya nazar ana chahiye. reads allowed honge; write root par ho to permission denial ya approval card aayega; rm, mv, edit action deny hoga. End state: allowed location mein plan file, original files unchanged, transcript mein at least one visible denied/approval moment.

Ab Isay Apne Kaam Par Lagayein

Apne regular tool ka audit karein: granted folders, connectors/OAuth scopes, approval modes. Jo scope is week actively required nahin, remove ya narrow karein. Prompt mein baar baar "read-only please" likhte hain to usay config mein transfer karein. Monthly 15-minute revoke habit rakhein; scope add karna easy hai, remove karna discipline hai.


Principle 7 - Observability

Failure mode: "Mujhe pata kyun nahin agent ne kya kiya?"

Aap sirf woh direct kar sakte hain jo dekh sakte hain. Agent ka har meaningful action near real-time visible hona chahiye. Jab kuch wrong ho to log dekh kar exact path samajh ana chahiye. Observability drifted session debug karne, autonomy ladder climb karne, aur output trust karne ka basis hai.

Har Tool Mein Agent Ka Kaam Kahan Dekhein

Claude CodeOpenCodeCoworkOpenWork
Real-time viewterminal streams every actionSamethree-panel UI: conversation, execution, file trackervertical timeline
Plan stageplan mode before action; disk par write kar sakte hainPlan agentnumbered plan messageSame
Per-step tracecommand/edit inline with outputSameeach step cardSame
Session export/share transcriptSamehistory/exportSame

Discipline: har novel task par execution view kam az kam aik dafa watch karein.

Examples

  • Fleet routing: route optimization achanak drivers notify karne lagta hai; execution view delivery 4/5 par shift catch kar leta.
  • Discovery responses: per-step approval card incorrectly non-privileged document catch karta hai.
  • Controller: close commentary run payroll-confidential file kholta hai; stale AGENTS.md reference widened scope reveal hota hai.

Observability prompt:

"After each step, before moving on, state in one line:
(a) what you just did
(b) what changed (file path, command output, connector call)
(c) what's next
Don't skip this even on small steps."

Silent agent: service "active" hai magar report nahin aya; logs mein "Waiting for database connection..." Friday se repeat ho raha hai. Running aur useful kaam karne mein farq observability se khulta hai.

Session Off-Rails Hone Ki Paanch Symptoms

Five warning symptoms: unrelated chat references, vague responses, contradicted constraints, apologies without progress, unauthorized scope.

  1. agent unrelated earlier chat refer karne lage.
  2. responses longer/vaguer ho jayein.
  3. earlier constraints contradict kare.
  4. repeatedly apologize kare magar progress na ho.
  5. unmentioned files/folders/connectors touch karne ka plan de.

Aisa ho to typing band karein. Aik aur prompt context ko aur tangled karega. /clear karein ya new session kholein, sirf necessary facts paste karein, file se continue karein.

Hands-on: Hello world

Pack 1 teesri dafa use karein. Is dafa artifact nahin, trace lesson hai.

Prompt:

Read ./downloads/ and write me an ORGANIZATION-PLAN.md with what's
in there, duplicates, and a proposed structure. As you go, narrate
each step in one line: what you opened, what you looked at, what
you concluded. Don't skip steps, even small ones.

Execution view ko live dekhein. Aik surprise note karein: unexpected file, longer step, duplicate read, tool call, inference. Yahi calibration hai jo walk-away safe banati hai.

Ab Isay Apne Kaam Par Lagayein

Woh recurring task choose karein jis se aap usually walk away karte hain. Aaj poora run baith kar dekhein. Teen columns banayein: step, expected?, surprise? Har surprise decide kare: task change chahiye, constraints tighten, ya expectations update?

Watch-once habit novel task ko walk-away promote karne se pehle zaroori hai. Nai folder, prompt, connector category ko novel bana deti hai.


Part 2: Four-Phase Workflow

Seven principles production mein chaar phases ke loop mein collapse ho jate hain. Loop haath mein aa jaye to principles phases ke andar automatically fire hote hain.

A loop: Explore, Plan, Implement, Commit. Figure 3: Saat principles, chaar phases, aik loop.

  1. Explore (Bash + Observability): relevant files parhein, unknowns surface karein. Read-only, writes nahin.
  2. Plan (Code-as-Interface + Persistence): structured plan banayein, save karein, review/edit karein. Leverage ka bara hissa yahan hai.
  3. Implement (Decomposition + Verification): plan ko small atomic steps mein execute karein, har step verify aur commit/save.
  4. Commit (Constraints + Observability): final verification, decisions rules file mein persist.

Artifact merged PR ho, redlined MSA, variance pack, ya hiring debrief; phases wohi rehte hain, inputs/outputs badalte hain.

Paanch Failure Patterns

Five failure patterns mapped to principles.

#PatternSymptomKaun sa principle rokta hai
1The Driftagent brief se gradually wander karta haiPersistence (P5)
2The Confident Wrongplausible output quietly incorrectVerification (P3)
3The Big Bangone huge change hours nuke kar deDecomposition (P4)
4The Scope Creepunauthorized things touchConstraints (P6)
5The Black Box20 minutes run, kya kiya pata nahinObservability (P7)

Table dono directions mein parhein: principle pattern prevent karta hai; pattern aaye to right column wala principle reach karein.


Part 3: A Worked Example

Ab end-to-end run karein: complex incoming artifact review karna, jo matter karta hai usay identify karna, verified claims ke saath structured response produce karna.

  • Engineer track: contractor PR review karein, risks flag karein, response likhein.
  • Domain-expert track: vendor MSA review karein, redline standard se deviations flag karein, comparison memo produce karein.

Domains different; workflow identical.

Hands-on: Hello world

Setup (60 seconds):

  1. Pack 4 - Worked example download karein. inbound/vendor-msa-v1.md, redline-standard.md, aur folder-level CLAUDE.md maujood hain.
  2. Folder apne tool mein open karein.

Har phase prompt order mein paste karein. Agle prompt se pehle promised artifact ka intezar karein.

Phase 1, Explore (P1 and P7):

Claude Code / OpenCode:

Don't make any edits yet. Read the PR diff in `git diff main...feature-x`.
Read the related files the diff touches. Summarize:
- What this PR is changing (one paragraph)
- Which files are touched (list)
- Any obvious risks (bullets, max 5)
Save the summary to `reviews/pr-explore.md`. No code edits.

Cowork / OpenWork:

Don't draft anything yet. Read inbound/vendor-msa-v1.md and
redline-standard.md. Summarize:
- What this MSA is for (one paragraph)
- The clause structure (numbered outline by section)
- Any obvious deviations from our standard (bullets, max 7)
Save to vendor-msa-explore.md. No drafting yet.

Phase 2, Plan (P2 and P5):

Engineer:

Read `reviews/pr-explore.md`. Produce a review plan:
## Review plan
- Files to inspect in depth (max 5)
- Tests to run
- Concerns to flag (numbered, severity: HIGH / MED / LOW)
- Questions for the contractor (numbered)
Save to `reviews/pr-plan.md`. Pause for my approval before continuing.

Domain expert:

Read vendor-msa-explore.md. Produce a redline plan:
## Redline plan
- Clauses to review in depth (max 6, by section number)
- Deviations to flag (numbered, severity: HIGH / MED / LOW)
- Counter-proposals (numbered, parallel to deviations)
- Open questions for the vendor (max 3)
Save to msa-plan.md. Pause for my approval before continuing.

Phase 3, Implement (P4 and P3):

Execute the plan one item at a time. After each item:
1. Produce the output
2. Verify it against the source, quote the specific lines
supporting each claim (section cite for the MSA; file:line
for the PR)
3. Save a numbered version (e.g., step3.md)
4. Wait for my OK before the next item.
If you can't ground a claim, flag it instead of fabricating.

Phase 4, Commit (P6 and P7):

Final verification pass:
- Every cited claim is grounded in a source location
- The structure matches the plan
- The tone matches the project's voice (refer to CLAUDE.md / AGENTS.md)
Then assemble the final deliverable with: executive summary,
the numbered findings, a review checklist, and a "Rules-file
proposals" section listing anything we learned that belongs in
CLAUDE.md / AGENTS.md for next time.

Aap ko kya nazar ana chahiye. Har phase apni file land kare: *-explore.md, *-plan.md, numbered step1.md/step2.md, final deliverable. Plan audit trail hai, numbered steps work hain, final file ships.

Claude CodeOpenCodeCoworkOpenWork
Run surfaceTerminalTerminalCowork desktop appOpenWork desktop app
File accesscwd; .claude/settings.jsoncwd; opencode.jsonChoose folder cardworkspace folder
Plan modeShift+TabTab Plan agentbuilt-in plan stageSame
Verification gatehook on commitplugin on commitsecond-pass rubricSame

Part 4: Capstone - Poora Loop Apne Kaam Par Lagayein

Part 3 curated example tha; capstone open-ended version hai: same loop, your work, your stakes.

Setup:

  1. Woh recurring task choose karein jo 60+ minutes leta hai: privilege log batch, variance commentary, campaign report, candidate brief, discovery-call synthesis, investor update, code review.
  2. Tool open karein, folder setup karein, CLAUDE.md ya AGENTS.md initialize karein. Shuru mein ten lines kaafi hain; baqi run se earned hongi.

Run:

PhaseAap kya karte hainPrinciple invoked
1. Explorerelevant inputs read, structured summary file; no writes yet1, 7
2. Planstructured plan, save, read, edit, approve2, 5
3. Implementone step at a time, verification after each4, 3
4. Commitfinal verification, summary, rules file update6, 7

Baad mein paanch questions journal karein:

  1. Manual baseline ke muqable total time?
  2. Kaun sa principle hardest tha?
  3. rules file mein kya add hua?
  4. Kaun sa constraint tighten kiya?
  5. Kaun sa failure pattern aya: Drift, Confident Wrong, Big Bang, Scope Creep, Black Box?

Same task next week rules file ke saath re-run karein. second run aam tor par 40-60% faster hota hai; third run par discipline invisible hone lagti hai.


Part 5: Is Mein Waqai Behtar Kaise Hon

Yeh crash course parhne se aap good nahin hote; isay real work mein use karne se hote hain. Friction curriculum hai:

  • "agent sirf chat kyun kar raha hai?" -> P1; prompt ko action + artifact banayein.
  • "output subtly wrong kyun hai?" -> P2; format constrain karein.
  • "confident answer wrong kyun nikla?" -> P3; check step add karein.
  • "aik prompt ne aadha kaam kyun kharab kiya?" -> P4; break it up.
  • "agent same context kyun maangta hai?" -> P5; rules file mein daalein.
  • "agent ne unmentioned folder kyun touch kiya?" -> P6; scope tighten karein.
  • "mujhe pata kyun nahin agent ne kya kiya?" -> P7; execution view parhein.

Response pehle se na banayein; friction hit ho to banayein. Rules file ten lines, phir twelve, phir twenty honi chahiye, har line kisi real mistake se earned.

Portability dividend. Aik tool mein awareness build ho jaye to chaaron mein transfer hoti hai. configs badalte hain; principles nahin.

Aap ne course complete kiya agar real work mein yeh paanch kaam kar sakte hain:

  1. chatbot prompt ko explicit artifact wale agent task mein reframe karein.
  2. content se pehle output shape, schema/table/template, likhein.
  3. kisi bhi output ke liye do independent verification paths name karein aur aik invoke karein.
  4. non-trivial work ko atomic units mein break karein, har unit ke baad checkpoint.
  5. earned rules file maintain karein aur session behavior ko execution trace se explain karein.

Aage Kahan Jayen

  • Engineering depth banayein -> Part 2: Agent Workflow Primitives. Chapters 19-20 P1/P2 deepen karte hain; 21 aur 21B P5 ko rules file se system of record tak le jate hain; 21A P3, 22 P1/P6, 23 P4 deepen karta hai.
  • Principles deepen karein -> Chapter 18: The Seven Principles of General Agent Problem Solving.
  • Mode 1 mein reh kar faster hon -> capstone teen mazeed recurring tasks par run karein.
  • Tool surface expand karein -> apne family ka doosra tool pick karein: Claude Code ↔ OpenCode, ya Cowork ↔ OpenWork. families cross karne ke liye doosra 90-minute tool-pair crash course lein.
  • Mode 2, manufacturing engagements -> jab one-at-a-time problems se aage AI Workers banane hon, Agent Factory Thesis plus Spec-Driven Development se shuru karein.
  • Team ko sikhayein -> Part 4 capstone team exercise ke tor par acha chalta hai, har person apne recurring task par solo run ke baad.

Quick Reference

Saat principles, har aik aik line

Doing-principles:

  1. Bash is the Key. Brain nahin, hands ko brief karein.
  2. Code as Universal Interface. Shape specify karein; prose ambiguity khatam karein.
  3. Verification as Core Step. "Looks right" failure mode hai; check force karein.
  4. Small, Reversible Decomposition. Atomic units; har aik verify, har aik commit.
  5. Persisting State in Files. Conversation volatile; files memory.

Operating principles:

  1. Constraints and Safety. Constraints autonomy enable karte hain.
  2. Observability. Jo dekh nahin sakte usay direct nahin kar sakte.

Four-phase workflow

EXPLORE   -> read & summarize (read-only)
PLAN -> produce a structured plan, save it, review it
IMPLEMENT -> small steps, verify each, commit each
COMMIT -> final verification, summary, update the rules file

Paanch failure patterns

PatternReach for
The DriftPersistence (P5)
The Confident WrongVerification (P3)
The Big BangDecomposition (P4)
The Scope CreepConstraints (P6)
The Black BoxObservability (P7)

Autonomy ladder

Watching closely -> Ambient supervision -> Walk away -> Act without asking -> Scheduled

Har tool mein principles kahan rehte hain

PrincipleClaude CodeOpenCodeCoworkOpenWork
1. BashTerminalTerminalLocal Linux VMLocal Linux VM
2. Code-as-Interfacecode blocks, schemascode blocks, schemastemplates, .xlsx schemastemplates, .xlsx schemas
3. Verificationtests, hookstests, pluginsrubric pass, cross-modelrubric pass, cross-model
4. DecompositionGit commits, Esc EscGit commits, /undonumbered versionsnumbered versions, /undo
5. PersistenceCLAUDE.mdAGENTS.md (+ CLAUDE.md)CLAUDE.mdAGENTS.md
6. Constraints.claude/settings.jsonopencode.jsonfolder/connector/approvalfolder/connector/approval
7. Observabilityterminal streamterminal streamexecution viewtimeline

Jab kuch ghalat mehsoos ho

Agent apologizing without progress, rewriting the same thing,
contradicting earlier constraints, proposing scope you didn't ask for?
-> Context is poisoned. Stop typing. Reset and continue from a file.
Don't try to fix it with another prompt.

Last substantially revised: May 2026. Tool names, free-tier mechanics, and version-specific details are accurate as of that date.