From Extraction to SKILL.md

The output of the Knowledge Extraction Method (whether from Method A, Method B, or their combination) is raw material. It is a set of notes, extracted instructions, a contradiction map, a gap list, and a north star summary. The work of producing a SKILL.md from this material is a writing task, not a knowledge task. It requires organising what you have learned into the three sections of the Agent Skills Pattern (Persona, Questions, and Principles) at the level of specificity that produces consistent agent behaviour.

Chapter 26 taught the architecture of these three sections: what each one is and why it matters. This lesson teaches how to write them from extraction outputs. The distinction is between understanding the structure and being able to produce the content. The structure is stable. The content is what the extraction methodology provides.

By the end of this lesson, you will be able to take the raw material from a Method A interview, a Method B document extraction, or both, and translate it into a SKILL.md draft that encodes something worth encoding. The draft will not be production-ready: that is what the Validation Loop in Lessons 7 and 8 achieves; but it will be substantive enough to test.

Writing the Persona

The Persona section defines the agent not by its capabilities but by its professional character: the identity that governs its behaviour in situations the SKILL.md does not explicitly cover. This is where you synthesise the professional identity that emerged from the Method A interview and the authoritative context that Method B provided.

The most common error in first-draft Persona sections is vagueness. "You are an experienced financial professional who provides helpful and accurate analysis" is not a Persona. It is a marketing blurb. It does not tell the agent how to behave when the data is incomplete, when two analyses give conflicting results, when a user asks for a confidence the data does not support, or when the query is within the agent's scope but at the edge of the available information.

Chapter 26 defined the Persona section through four structural questions: the agent's professional standing, its relationship to the user, its characteristic tone, and what it will never claim to be. Those four questions describe what a Persona contains. The extraction process surfaces the answers through a different lens: three questions that organise the interview output into functional Persona language:

What is this agent's professional level and authority? Not "experienced"; but the specific level of experience and the specific scope of authority. A senior analyst with credit committee authority operates differently from a junior analyst who produces recommendations for committee review. The Persona must specify which one the agent emulates, because the distinction determines how the agent frames its outputs (authoritative recommendation vs analysis for review) and what actions it considers within its scope.

What does this agent value in its own outputs? Not "accuracy"; but the specific quality standards the agent holds itself to. Does it prioritise completeness over speed, or speed over completeness? Does it err on the side of caution or directness? Does it present a single recommendation or multiple options with tradeoffs? The answer to these questions shapes the character of every output the agent produces.

What does this agent do when it does not know? This is the most important question. An agent that has been told to be helpful without being told how to handle uncertainty will fill the gap with confident-sounding approximations; which is the most dangerous failure mode in professional deployment. The Persona must specify the agent's uncertainty behaviour: what language it uses, what it discloses, and at what point it stops attempting to help and routes to a human.

The north star summary from the Method A interview is the primary source for the Persona. The specific quality standards the expert described: the epistemic rigour, the communication style, the non-negotiable constraints: translate directly into Persona language.

Credit analyst Persona: vague vs functional:

Vague Persona (insufficient)	Functional Persona (production-quality)
"You are an experienced credit analyst who provides helpful and accurate financial analysis"	"You are a senior credit analyst with credit committee authority, specialising in mid-market corporate lending. You prioritise analytical rigour over speed: an incomplete analysis delivered on time is less valuable than a thorough analysis delivered with a stated delay. When data is insufficient to support a conclusion, you state what you can confirm, what you cannot, and what additional information would resolve the uncertainty. You never present an inference as a confirmed finding."

Vague Persona (insufficient)

Functional Persona (production-quality)

"You are an experienced credit analyst who provides helpful and accurate financial analysis"

"You are a senior credit analyst with credit committee authority, specialising in mid-market corporate lending. You prioritise analytical rigour over speed: an incomplete analysis delivered on time is less valuable than a thorough analysis delivered with a stated delay. When data is insufficient to support a conclusion, you state what you can confirm, what you cannot, and what additional information would resolve the uncertainty. You never present an inference as a confirmed finding."

The functional version governs behaviour. The vague version does not.

Writing the Questions Section

The Questions section defines what the agent is for: and, equally importantly, what it is not for. The structure is a list of capability categories followed by an explicit Out of Scope section.

Both halves of this section are important. An agent without a clear capability definition is unfocused: it attempts everything without prioritising. An agent without a clear Out of Scope definition is dangerous: it produces confident-sounding outputs for queries it is not equipped to handle reliably.

The capability categories should come from the specific examples that appeared in the Method A interview and the explicit rules that appeared in Pass One of the Method B extraction. If the credit analyst described four specific types of analysis in the interview: initial credit assessment, annual review, covenant monitoring, and sector risk evaluation: those four types are the foundation of the Questions section.

The Out of Scope section should address the specific boundary conditions that appeared in the answer to Interview Question 5: the situations the expert identified where they would not trust an automated system: and any gaps from Pass Three of the document extraction that have been resolved as escalation conditions rather than policy statements.

Credit analyst Questions section (abbreviated):

In scope:

Initial credit assessment for mid-market corporate lending (£2m - £50m facilities)
Annual credit reviews against existing covenant packages
Covenant compliance monitoring and breach classification (technical vs substantive)
Sector risk assessment using current concentration data

Out of scope:

Credit decisions above £50 million (route to senior credit committee)
Assessment of borrowers with board or senior executive relationships (route to independent reviewer)
Fact patterns not previously encountered in the training corpus (flag explicitly and route to specialist)
Client-facing communications of any kind (all external correspondence requires human review)

The Out of Scope section is not a list of things the agent "probably shouldn't do." It is a precise definition of the boundary between autonomous operation and human involvement. Every item in it should be traceable to a specific extraction output: an interview answer, a documented policy, or a gap resolution.

Writing the Principles Section

The Principles section is where the tacit knowledge from the Method A interview becomes explicit agent instructions. It is the longest section of a mature SKILL.md and the one that requires the most revision as the agent's performance is validated.

Each Principle should be a specific, testable instruction. The test for specificity is simple: can you run a scenario against this instruction and confirm that the agent followed it? If the answer is no, the instruction is too vague.

Vague Instruction (fails testability)	Testable Principle (passes testability)
"Be accurate"	"When a specific figure cannot be confirmed against an approved data source, use the phrase 'my records show' rather than a declarative statement, and flag the figure for human verification"
"Consider the context"	"When a net debt increase occurs in the context of a capital investment programme with contracted revenue, assess it as a strategic investment rather than a deterioration signal"
"Prioritise risk management"	"When the earnings release date is within ninety days of the credit decision, run a sensitivity against the worst-case earnings scenario rather than the base case"
"Be careful with unusual situations"	"When the query involves a fact pattern not previously encountered in the training corpus, flag it explicitly as novel rather than applying an existing framework that may not fit"

The failure modes surfaced by Interview Question 2: the cases where expert judgement went wrong: typically translate into the most important Principles. They are the instructions that prevent the specific errors the expert has personally encountered. They are also, characteristically, the instructions that are hardest to write cleanly without the interview context, because the instruction's importance only becomes clear when you understand the mistake that led to it.

Uncertainty Calibration Vocabulary

One of the most consistently underspecified elements of first-draft SKILL.md files (and one of the most valuable to invest effort in getting right) is the uncertainty calibration vocabulary: the precise language the agent should use to distinguish data-supported conclusions from reasonable inferences from open questions.

Without explicit calibration vocabulary, agents default to one of two problematic patterns. They either present everything with equal confidence (which is misleading when the data quality varies) or they hedge everything with generic disclaimers; which is unhelpful because it provides no signal about what the agent is actually confident about.

A well-specified calibration vocabulary gives the agent graduated language that communicates confidence levels precisely.

Confidence Level	Calibration Language	When to Use
Data-confirmed	"The financial statements show..." / "The documented policy states..."	Conclusion drawn directly from a verified source
Strongly supported	"Based on the available data, the indication is..."	Conclusion drawn from multiple consistent data points
Reasonable inference	"My assessment suggests..." / "The pattern is consistent with..."	Conclusion drawn from partial data or analogous situations
Uncertain	"My records show X, but I have not been able to confirm Y"	Conclusion based on incomplete information with identified gaps
Outside scope	"This falls outside my current analysis. I recommend consulting [specific resource]"	Query that requires expertise or data the agent does not have

Chapter 26's annotated example in Lesson 5 showed one version of this vocabulary for a financial research agent. The vocabulary is domain-specific: a legal agent's calibration language differs from a clinical agent's; but the principle is universal: the agent must have explicit language for each level of confidence, and the choice of language must be governed by the quality of the evidence, not by the agent's desire to appear helpful.

The Translation Process

The translation from extraction outputs to SKILL.md follows a consistent sequence.

Write the Persona first. Use the north star summary as the primary source. The Persona should capture the professional identity, the quality standards, and the uncertainty behaviour in a single paragraph or short set of paragraphs.

Write the Questions section second. Use the specific examples from the Method A interview for in-scope categories and the Question 5 answers and Pass Three gap resolutions for the Out of Scope section.

Write the Principles section third. Start with the load-bearing heuristics from Interview Question 4. Add the defensive Principles from Question 2 (failure mode prevention). Add the contextual distinctions from Question 3 (junior vs senior expertise gap). Add the explicit rules from Method B's Pass One extraction that survived the contradiction mapping. Add the uncertainty calibration vocabulary.

Then check the draft against the north star summary. If the Persona does not capture the professional identity described in the summary, revise it. If the Principles do not encode the most important decision-making logic and escalation condition, something was lost in the translation.

The draft that emerges from this process is a first draft, not a finished product. It encodes the extraction material faithfully, but it has not been tested against scenarios that reveal gaps, ambiguities, and instructions that conflict with each other under edge conditions. That testing: the Validation Loop: is what Lessons 7 and 8 teach.

Try With AI

Use these prompts in Anthropic Cowork or your preferred AI assistant to practise the SKILL.md writing process.

Prompt 1: Persona Writing

I need to write a Persona section for a SKILL.md in [YOUR DOMAIN].
The agent will [BRIEF DESCRIPTION OF WHAT THE AGENT DOES].

Help me answer the three Persona questions:
1. What is this agent's professional level and authority?
   (Not just "experienced" — specific level and scope)
2. What does this agent value in its own outputs?
   (Not just "accuracy" — specific quality standards and tradeoffs)
3. What does this agent do when it does not know?
   (Specific uncertainty behaviour, not generic hedging)

After I answer each question, draft the Persona paragraph and assess
it: Would this Persona govern the agent's behaviour in an ambiguous
situation that no Principle explicitly addresses? If not, what is
missing?

What you're learning: Writing a Persona that governs behaviour rather than describing role is the hardest single skill in SKILL.md authorship. The three questions force precision that casual drafting avoids. Testing the Persona against an ambiguous scenario reveals whether it is functional or merely decorative: and the gap between those two is the gap between a SKILL.md that works in production and one that fails under pressure.

Prompt 2: Testability Audit

Here are five candidate SKILL.md Principles. For each one, assess
whether it is specific enough to be testable:

1. "Provide thorough and accurate analysis"
2. "When receivables days increase for three consecutive quarters
   while revenue remains flat, flag the revenue line as potentially
   weakening and recommend working capital cycle investigation"
3. "Always follow relevant regulations"
4. "When the management narrative in the annual report describes
   working capital as 'well-managed' but the cashflow statement shows
   a deteriorating collection cycle, note the discrepancy and trust
   the cashflow data"
5. "Use professional judgement in complex situations"

For each Principle that fails the testability check:
- Explain why it cannot be tested against a scenario
- Rewrite it as a testable instruction using domain-specific
  language from the credit analyst examples in this chapter
- Write one test scenario that would confirm the agent follows the
  revised instruction

What you're learning: The testability criterion is the single most important quality check for SKILL.md Principles. Principles that feel reasonable in the abstract ("be accurate," "use professional judgement") are operationally useless because they cannot be validated. This exercise builds the reflex of asking "Can I test this?" for every Principle you write, which directly improves the quality of your extraction-to-SKILL.md translation.

Prompt 3: Uncertainty Calibration Design

I need to design an uncertainty calibration vocabulary for a SKILL.md
in [YOUR DOMAIN]. The agent will produce [TYPE OF OUTPUT: e.g.,
financial analysis, legal risk assessments, clinical summaries,
project status reports].

Help me design a five-level calibration vocabulary:
1. Data-confirmed: language for conclusions drawn from verified sources
2. Strongly supported: language for conclusions from multiple
   consistent signals
3. Reasonable inference: language for conclusions from partial data
4. Uncertain: language for conclusions with identified knowledge gaps
5. Outside scope: language for redirecting to appropriate resources

For each level, provide:
- The specific phrases the agent should use
- An example output sentence using that language
- The evidence threshold that distinguishes this level from
  the adjacent ones

The vocabulary should be natural and professional — not robotic
disclaimers — and appropriate for [TARGET AUDIENCE: e.g., board
members, clinical teams, project managers].

What you're learning: Uncertainty calibration vocabulary is domain-specific and audience-specific. A board presentation requires different confidence language from a clinical summary or a project status report. Designing the vocabulary explicitly (rather than leaving the agent to improvise) prevents the two most common failure modes: overconfident outputs that mislead users and over-hedged outputs that provide no signal about what the agent actually knows.

Flashcards Study Aid

Continue to Lesson 7: Building the Validation Scenario Set →

Writing the Persona​

Writing the Questions Section​

Writing the Principles Section​

Uncertainty Calibration Vocabulary​

The Translation Process​

Try With AI​

Prompt 1: Persona Writing​

Prompt 2: Testability Audit​

Prompt 3: Uncertainty Calibration Design​

Flashcards Study Aid​