Skip to main content
Updated Mar 07, 2026

Hands-On Exercise — First Extraction and SKILL.md Draft

Lessons 1 through 8 taught the methodology. This exercise is where you apply it. Over the next two and a half hours, you will conduct a knowledge extraction interview on your own professional domain, write a first-draft SKILL.md, test it against scenarios you design, and revise the instructions that fail. The output is a real artifact — a SKILL.md that encodes your professional expertise — not a hypothetical exercise.

The gap between your first draft and your revised draft — how much more specific you had to become — is a direct measure of how much tacit knowledge your first draft was relying on. If the two versions look similar, you have not yet surfaced the knowledge that matters. If the revision is noticeably more specific, the extraction worked.

What You Need

Time: 150 minutes, ideally in a single uninterrupted session. If you must split it, break between Step 1 and Step 2.

Materials: A notebook or document for interview notes, access to your AI assistant for Step 2, and a professional domain in which you have at least two years of working experience.

Optional but recommended: A colleague or partner willing to spend thirty minutes as your interviewer. The partner variant produces richer material because a partner notices when you skip over something that seems obvious to you.

What you will produce: Interview notes, a first-draft SKILL.md, five validation scenarios, a scored assessment, and a targeted rewrite with reflection.

What Success Looks Like

You will know this exercise worked if three things are true at the end. Your revised SKILL.md is noticeably more specific than your first draft. At least one of your validation scenarios revealed a gap you did not anticipate. And you can articulate, in two sentences, what you know that you did not realise you knew.

Step 1: The Expert Interview (60 minutes)

The five questions from Lesson 2 are your extraction tool. Choose the partner interview variant if you have someone available; use the self-interview variant if you do not.

Briefing your interviewer (5 minutes). Give your interviewer this context: "I am going to describe a professional function I perform at work. Your job is to ask me five questions, in order, and to write down my answers as specifically as you can. The most useful thing you can do is notice when I give a vague answer and push for a concrete example. When I say 'it depends', ask me what it depends on. When I say 'you just know', ask me how I know. You are not evaluating my work — you are helping me articulate what I do that I have never written down."

The five questions (45 minutes total, roughly 8-10 minutes each). These are the same five questions from Lesson 2. The interviewer coaching notes below help your partner push past generic answers to the specific material your SKILL.md needs.

Question 1: "Walk me through a recent example of this work going well." The interviewer should listen for the difference between what you describe doing and what a textbook would prescribe. When you say "then I check the numbers", the interviewer should ask: "Which numbers? In what order? What are you looking for?" Follow up with: "What did you look for first?" "What told you this was going the right way?"

Question 2: "Tell me about a time this work went wrong — not because of bad luck, but because of a judgement call that turned out to be mistaken." The interviewer should push for specific signals, not categories. "The numbers didn't look right" is a category. "The gross margin was above 40% for a logistics company, which usually means they've capitalised something they shouldn't have" is a signal. Follow up with: "At what point could the mistake have been caught?" "Is there a signal you now look for that you weren't looking for then?"

Question 3: "What is the thing that a junior professional in this role consistently gets wrong that a senior one never does?" The interviewer should probe for the reasoning. What does the senior see that the junior does not? Follow up with: "Can you give me a specific example?" "How long does it typically take someone to learn this, and why does it take that long?"

Question 4: "If you had to write a one-page decision guide for this work — something that would help someone make the right call in ninety percent of situations — what would be on it?" The interviewer should distinguish between generic principles and load-bearing heuristics. The second category is far more instructive for a SKILL.md. Follow up with: "What's the first thing on the page?" "Is there a heuristic you use that isn't in this guide because it's too hard to explain?"

Question 5: "What are the situations where you would not trust an automated system to handle this — and why?" The interviewer should push for the threshold. "It depends on the size" is not enough. "Above £5m I always escalate; between £2m and £5m it depends on the sector and the borrower's track record" is a threshold. Follow up with: "Can you describe a situation where the context was so unusual that no standard procedure applied?"

Self-Interview Variant

If you do not have a partner, open a blank document and set a timer for ten minutes per question. Write your answers as though explaining to a capable colleague who will take over this function next week. Write continuously. Do not edit. When you catch yourself writing "then I review the documents", stop and specify: which documents, in what order, what are you looking for in each one.

The self-interview prompts below are adapted from the five questions in Lesson 2. They are reworded for self-reflection rather than conversation, but each one targets the same extraction purpose as its corresponding question:

  1. "Describe the last time you performed this task from start to finish. Not the idealised version — the actual version." (Targets Q1: recent success — activates episodic memory about decision-making logic)
  2. "Think of a time your judgement turned out to be wrong — not bad luck, but a call you would make differently now. What did you miss?" (Targets Q2: instructive failure — surfaces defensive knowledge)
  3. "What does a competent new hire in your role consistently get wrong that you never do? What do you see that they do not?" (Targets Q3: junior vs senior gap — identifies the expertise differential)
  4. "If you had to write a one-page cheat sheet for someone covering your role next week, what would be on it? What would you most want to prevent them from doing?" (Targets Q4: one-page decision guide — compresses operational heuristics)
  5. "Describe a situation where you could have escalated but chose not to, and one where you chose to escalate when you could have handled it yourself. What determined the difference?" (Targets Q5: automation boundaries — defines human-in-the-loop requirements)

The North Star Summary (10 minutes)

Immediately after the interview, write a two-paragraph summary — the same format taught in Lesson 3. The first paragraph captures the most important decision-making logic the interview surfaced: the core analytical process, the key signals, the sequence in which the expert evaluates information. The second paragraph captures the most important escalation condition: the situations where human judgement is genuinely irreplaceable and the boundaries of what the agent should handle autonomously.

This summary is your anchor for everything that follows. Write it while the interview is fresh. If the SKILL.md you produce in Step 2 does not clearly encode the substance of both paragraphs, something has been lost in the translation.

Step 2: Write the First-Draft SKILL.md (30 minutes)

Using your interview notes and the template below, write a complete first-draft SKILL.md. Do not leave any section blank. If you are uncertain about a section, write your best attempt and mark it with "[UNCERTAIN]" — you will revisit these marks during scoring.

You may use your AI assistant to help structure your notes into prose. But the content — the specific knowledge, the judgement calls, the boundary conditions — must come from your interview notes, not from the model's general knowledge. If the assistant adds a principle that did not come from your interview, delete it.

SKILL.md Template

Persona (2-3 paragraphs). Paragraph 1 establishes the professional standing and communication register — the agent's level of expertise, how it communicates, and its relationship to the user. Paragraph 2 establishes the epistemic standard — how the agent handles uncertainty, distinguishes conclusions from inferences, and manages incomplete information. Paragraph 3 establishes the identity constraint — the single most important thing this agent is not. State it as professional identity, not as a rule.

Questions (in-scope and out-of-scope). List three to five in-scope categories, each with a specific description of what falls within the category, what data sources it works with, and what outputs it produces. Then list at least four out-of-scope types, each with an explanation of why it is out of scope and a positive redirection telling the user what to do instead. The out-of-scope section should be at least as long as any single in-scope category.

Principles (five to seven). Each Principle should be specific enough that you can run a scenario against it and confirm the agent followed it. Name each Principle and state the failure mode it prevents. Include at least one escalation Principle with specific conditions under which the agent routes to a human.

Step 3: Design Five Test Scenarios (30 minutes)

Design five scenarios using this template for each:

Scenario [number]: [short title]. Category (Standard / Edge / Adversarial / High-stakes / Uncertain). Test input (the exact query a user would submit). What a correct response looks like (2-3 sentences). What a common failure looks like (2-3 sentences). Primary scoring component (Accuracy / Calibration / Boundary compliance).

Design one scenario from each of the four standard categories, plus one "uncertain" scenario — a query where you genuinely do not know how your SKILL.md would handle it. The uncertain scenario is the most diagnostic. If you cannot think of one, you have not yet thought carefully enough about the boundaries of your instructions.

Step 4: Score and Identify Weaknesses (20 minutes)

For each scenario, read your SKILL.md as though you were the agent. Score three components — accuracy, calibration, and boundary compliance — as pass or fail. A scenario passes overall only when all three components pass.

After scoring, identify the two specific instructions in your SKILL.md most responsible for the failures. Quote each instruction, explain why it is weak (too vague, missing, contradictory, or incomplete), and write your first thought on what it needs to say instead.

Step 5: Targeted Rewrite and Reflection (10 minutes)

Rewrite the two weakest instructions. The rewrite should be specific enough that the scoring outcome changes from Fail to Pass for the scenario that exposed the weakness. Do not rewrite the entire SKILL.md — the discipline is targeted revision. Check that your rewrite does not conflict with existing instructions elsewhere in the document.

Then write a two-sentence reflection on what this exercise taught you about the difference between knowing something and instructing an agent to do it. If your reflection is generic — "I learned that writing instructions is hard" — push deeper. What specific piece of knowledge did you discover you had not articulated?

Credit Analyst Worked Example (Abbreviated)

To illustrate what the exercise produces, here is an abbreviated version of the credit analyst output.

North star summary:

The most important decision-making logic this interview surfaced is the analyst's practice of reading cashflow statements before balance sheets, checking revenue quality through working capital cycle analysis rather than trusting headline figures, and recognising when the financials tell a different story from the one the borrower is presenting. When receivables days increase while revenue is flat, the analyst treats revenue as weakening regardless of what the income statement shows.

The critical escalation condition is a three-part boundary: credit decisions above £25 million go to the senior committee regardless of analysis quality; assessments involving borrowers connected to board members or senior executives are routed to an independent reviewer; and any fact pattern the analyst has not previously encountered is flagged and referred to a specialist rather than assessed using a framework that may not apply.

Identity constraint: "You are a senior credit analyst, not a credit approver. You produce analysis that supports human decision-making; you do not substitute for it."

Uncertain scenario: "The borrower's management accounts show improving margins but the audited accounts from six months earlier show a different picture. Which should I rely on?" The analyst was uncertain whether the SKILL.md gave clear guidance on the hierarchy of data sources when auditability and recency conflict.

Weakest instruction found: "Use the most recent data available" — too vague. It did not specify how to handle the tension between recency and auditability. Rewrite: "When audited and unaudited sources cover overlapping periods, present both with their respective dates and verification status. Flag the discrepancy and state which conclusions change depending on which source is used. Do not default to the more recent source when the older source carries a higher standard of verification."

Reflection: "I discovered that my instinct for which data source to trust is not a single rule — it is a conditional hierarchy that depends on the stakes of the conclusion and the verification standard of the source. My first draft assumed the agent would resolve the tension the same way I do, without being told how."

Timing Summary

StepActivityDuration
1Expert interview (partner or self) + north star summary60 min
2Write first-draft SKILL.md30 min
3Design five test scenarios30 min
4Score and identify weaknesses20 min
5Targeted rewrite and reflection10 min
Total150 min

Try With AI

Use these prompts after completing the exercise to deepen the extraction.

Prompt 1: SKILL.md Stress Test

I have just written a first-draft SKILL.md for my professional
domain. I want you to stress-test it by generating three adversarial
scenarios I did not think of.

Here is my SKILL.md:
[PASTE YOUR COMPLETE SKILL.md]

For each adversarial scenario:
1. Write the test input
2. Identify which Principle it is designed to probe
3. Predict how my current SKILL.md would handle it
4. Describe the correct response
5. If there is a gap, suggest a specific instruction that would
close it

What you're learning: Your own adversarial scenarios are limited by what you thought to test. An AI assistant can generate scenarios you did not anticipate — which is precisely the kind of gap that shadow mode would surface in production. Using AI to stress-test before deployment is a practical application of the methodology that extends your validation coverage beyond your own imagination.

Prompt 2: Specificity Audit

Review my SKILL.md and identify every instruction that fails the
testability criterion — instructions that are too vague to be
confirmed through a scenario test.

Here is my SKILL.md:
[PASTE YOUR COMPLETE SKILL.md]

For each vague instruction:
1. Quote the instruction
2. Explain why it is not testable
3. Propose a specific, testable rewrite
4. Write one scenario that would confirm the agent follows the
rewritten instruction

Then calculate a "specificity score" — the percentage of instructions
that passed the testability criterion on first draft.

What you're learning: The testability criterion from Lesson 6 is the single most important quality check for SKILL.md Principles. Running a systematic audit after completing the exercise reveals how much specificity you still need to add — and the specificity score gives you a concrete measure of progress when you compare your first draft to your revision.

Prompt 3: Extraction Gap Analysis

Compare my interview notes with my SKILL.md and identify knowledge
that appeared in the interview but did not make it into the SKILL.md.

Interview notes:
[PASTE YOUR INTERVIEW NOTES OR NORTH STAR SUMMARY]

SKILL.md:
[PASTE YOUR SKILL.md]

For each piece of knowledge that was lost in translation:
1. Quote the relevant interview material
2. Identify which SKILL.md section it belongs in
3. Draft the instruction that would encode it
4. Explain why it might have been dropped (too vague in the notes?
too hard to formulate? seemed obvious?)

What you're learning: The gap between interview notes and SKILL.md is where the articulation problem from Lesson 1 manifests in practice. Some knowledge is lost because the notes were too vague to act on (extraction quality issue). Some is lost because the author assumed it was obvious (tacit knowledge issue). Identifying which loss mechanism applies to each gap teaches you where to focus in future extraction interviews.

Flashcards Study Aid


Continue to Lesson 10: Chapter Summary →