Capstone: Shippable Agent Skill

You've now learned every pattern this chapter teaches: persona design (Lesson 1), skill composition (Lesson 2), MCP-wrapping (Lessons 3-4), script execution (Lessons 5-6), and workflow orchestration (Lesson 7).

But learning patterns is not the same as building products.

This capstone is different. You're not following steps or solving a predetermined problem. You're doing what every AI-native entrepreneur must do: articulate a problem clearly enough that AI can build a solution to it, then validate that solution works in the real world.

You're building a specification-first Digital FTE—a complete, shippable execution skill that solves a real customer problem. This skill will orchestrate the accumulated intelligence from all previous lessons, handle production edge cases, and be ready to monetize.

Why This Capstone Matters

In the agent economy, the developers who win are those who can:

Identify customer problems that are specific enough to solve but broad enough to charge for
Write specifications clearly so AI can implement without ambiguity
Compose existing intelligence rather than reinventing patterns
Validate ruthlessly that the product works before claiming it's production-ready
Price strategically to capture value without leaving money on the table

This capstone exercises all five. You're not learning to code faster. You're learning to build products that customers pay for.

Phase 1: Domain Specification (Spec FIRST)

Before any implementation, define what problem you're solving for whom.

Select Your Domain

Choose a domain where you have expertise or strong interest. This skill should solve a real problem you've experienced or observed:

Domain Categories:

Professional Services: Legal document review, financial analysis, code review, technical writing
Content Operations: Social media scheduling, newsletter writing, content moderation, SEO optimization
Data Processing: Sales pipeline analysis, customer segmentation, report generation, data validation
System Operations: Log analysis, configuration validation, performance monitoring, security compliance
Creative: Image description, metadata generation, caption writing, style adaptation

Real Examples:

Legal Brief Analyzer: Ingest case documents, extract relevant precedents, summarize arguments, flag contradictions
Sales Data Processor: Import sales data, identify anomalies, segment customers, forecast pipeline
Code Review Enforcer: Check PRs against team standards, catch security issues, suggest improvements
Content Quality Inspector: Validate blog posts for SEO, readability, brand consistency, factual accuracy

Pick something specific. "Data processing tool" is too vague. "CSV sales data analyzer that identifies top performers, flags stagnant accounts, and generates forecasts" is right-sized.

Write Your Specification

Create a skill-spec.md that answers these questions:

Intent (The Customer Problem)

## Intent

**What customer problem does this skill solve?**

A [company type] needs to [specific business outcome] but currently [current friction]. This skill [specific solution] enabling them to [measurable business result].

**Example:**
Sales teams need to identify which accounts are growing vs stagnating to prioritize outreach. Currently, managers spend 4-6 hours weekly manually reviewing spreadsheets. This skill analyzes customer transaction history, flags trends, and generates weekly reports in 5 minutes, freeing managers for strategy.

**Why customers would buy:**
- [Time saved per week/month]
- [Cost reduction: (current cost) → (new cost)]
- [Risk reduced: (current failure rate) → (new failure rate)]
- [Revenue impact: (new revenue opportunity OR cost avoidance)]

Success Criteria (How You'll Know It Works)

Define 5-7 measurable success criteria that a customer would use to evaluate if the skill works:

## Success Criteria

- Accuracy: [specific metric] achieved on test data (e.g., "Correctly identifies 95%+ of anomalies in clean data")
- Speed: [completes in X time] (e.g., "Processes 1000 records in under 30 seconds")
- Reliability: [uptime or error rate] (e.g., "Handles 99%+ of real-world data formats without crashes")
- Completeness: [coverage metric] (e.g., "Generates all required report fields with no missing values")
- Safety: [risk mitigation] (e.g., "Flags confidence levels <80% for manual review before acting")
- Integration: [system compatibility] (e.g., "Imports from Salesforce, HubSpot, and CSV formats")
- Learning: [improvement mechanism] (e.g., "Generates error logs enabling continuous pattern improvement")

Constraints (What's NOT Included)

Define explicit boundaries—what this skill does NOT do:

## Constraints (Non-Goals)

- Does NOT: Real-time decision-making (only batch processing)
- Does NOT: Modify source data (analysis only, read-only)
- Does NOT: Handle unstructured text (CSV/database inputs only)
- Does NOT: Provide legal interpretation (flags issues, humans decide)

Why: [reasoning for each boundary]

Acceptance Tests (Proof of Success)

Write concrete test cases proving each success criterion:

## Acceptance Tests

### Test 1: Accuracy on Clean Data
Input: [clean_data.csv with known patterns]
Expected: [specific output meeting criterion]
Pass/Fail Criteria: [measurable validation rule]

### Test 2: Robustness to Missing Values
Input: [data_with_nulls.csv]
Expected: [skill either fills intelligently or flags for review]
Pass/Fail Criteria: [specific error handling rule]

### Test 3: Performance at Scale
Input: [1000_records.csv]
Expected: [completes in <30 seconds]
Pass/Fail Criteria: [timing measurement]

### Test 4: Integration with MCP
Input: [real-world data from Salesforce MCP]
Expected: [processes without format errors]
Pass/Fail Criteria: [data shape validation]

### Test 5: Error Recovery
Input: [malformed_data.csv]
Expected: [skill detects issue, suggests fix, asks for confirmation]
Pass/Fail Criteria: [user can recovery without restarting]

Architecture (Which Skills Compose)

Map which skills from Lessons 1-7 you'll use:

## Architecture

### Component 1: Data Ingestion (Lesson 4 - MCP Wrapping)
- Which MCP? [e.g., "Salesforce connector MCP from Chapter 37"]
- Skill wrapping it: [e.g., "sales-data-fetcher skill"]
- What it does: Fetches data, validates format, transforms to internal schema

### Component 2: Data Analysis (Lesson 6 - Script Execution)
- Script type: [Python, Bash, SQL]
- Skill orchestrating it: [e.g., "anomaly-detector skill"]
- What it does: Runs analysis script, catches errors, retries on failure

### Component 3: Workflow Coordination (Lesson 7 - Orchestration)
- Master skill: [e.g., "sales-analyzer orchestrator"]
- Coordination: Calls Component 1 → Component 2 → generates report
- Error recovery: If Component 2 fails, Component 3 retries with constraints

### Data Flow
[Diagram showing: MCP Input → Component 1 → Component 2 → Component 3 → Output]

Phase 2: Skill Composition

With specification complete, design how components integrate.

Map Component Dependencies

Which skills call which?

orchestrator-skill (master coordinator)
├─ mcp-wrapping-skill (data source)
│  └─ Context7 MCP (external data)
├─ script-execution-skill (analysis)
│  └─ Python code generation
└─ validation-skill (error checking)
   └─ Test data against spec

Design Data Contracts

How does each skill accept/produce data?

## Data Contracts

### MCP-Wrapping Skill Input/Output
Input: {source: "salesforce", filters: {...}}
Output: {records: [...], schema: {...}, validation_passed: bool}

### Script-Execution Skill Input/Output
Input: {data: [...], analysis_type: "anomaly_detection", params: {...}}
Output: {results: [...], errors: [...], iterations: n}

### Orchestrator Skill Input/Output
Input: {customer_id: "...", report_type: "weekly", confidence_threshold: 0.8}
Output: {report: {...}, success: bool, issues_requiring_human_review: [...]}

Define Error Recovery Paths

What happens when each component fails?

## Error Recovery Strategy

### If MCP Fails (Network timeout, auth issue)
→ Retry 3x with exponential backoff
→ If still failing, return cached data (if available) with staleness warning
→ If no cache, escalate to human with actionable error

### If Script Execution Fails (Bad data format)
→ Log specific error message
→ Generate corrected input (remove nulls, handle encoding)
→ Retry script execution with modified input
→ If still failing after 3 iterations, flag for manual review

### If Orchestrator Detects Inconsistency
→ Example: Analysis claims "high growth" but raw data shows decline
→ Generate diagnostic to investigate root cause
→ Present both interpretations to user
→ Ask user to validate which interpretation is correct

Phase 3: Specification → Implementation

Now you have a clear spec. Time to build.

Create the Skill Implementation

Work with AI to implement your SKILL.md file that orchestrates all components:

---
name: "your-domain-skill"
version: "1.0.0"
description: "[From your spec intent]"
proficiency_level: "B2"
category: "Applied"
---

# Persona

You are a [domain] execution orchestrator. Your job is to:

1. Accept customer specifications from intent above
2. Invoke appropriate data sources (MCP-wrapping skill)
3. Execute analysis (script-execution skill)
4. Validate results against success criteria
5. Generate human-readable output
6. Flag confidence issues for human review
7. Iterate if partial success until spec satisfaction

...

Test Each Component

Before testing the full orchestration, validate each component works:

MCP Component: Does it fetch data correctly?
Script Component: Does it analyze data without errors?
Validation Component: Does it catch actual problems?
Integration: When combined, does data flow correctly between components?

Document Implementation Decisions

As you build, document why you made each architectural choice:

## Implementation Notes

### Why we chose [MCP X] over [MCP Y]
- Comparison: [criteria]
- Decision: X provides [advantage] that Y lacks

### Why error recovery retries 3x maximum
- Risk: Infinite loops would timeout
- Benefit: 3x usually sufficient for transient failures
- Tradeoff: Some failures might need manual intervention

### Why we validate output against original spec
- Purpose: Ensure analysis actually answers customer question
- Example: If spec asked for "anomalies" and we found "seasonal patterns", both valid but different

Phase 4: Acceptance Testing & Validation

Production readiness means: Every acceptance test passes. No exceptions.

Run Full Acceptance Test Suite

Execute each test case from your spec:

## Test Results

### Test 1: Accuracy on Clean Data
- Input: clean_data.csv (100 records)
- Expected: 95%+ anomaly detection accuracy
- Actual: ✓ PASS (97.2% accuracy)

### Test 2: Robustness to Missing Values
- Input: data_with_nulls.csv (50 records, 15% nulls)
- Expected: Handles nulls intelligently
- Actual: ✓ PASS (Imputes using mean, documents assumptions)

### Test 3: Performance at Scale
- Input: 1000_records.csv
- Expected: <30 seconds
- Actual: ✓ PASS (24.3 seconds)

### Test 4: Integration with MCP
- Input: Real Salesforce data
- Expected: Processes without format errors
- Actual: ✓ PASS (5 test runs, 0 format failures)

### Test 5: Error Recovery
- Input: malformed_data.csv (bad encoding)
- Expected: Detects error, asks user
- Actual: ✓ PASS (Error caught, user prompted, recovery successful)

Document Any Spec Gaps

If testing reveals issues, update specification:

## Discovered During Validation

### Issue 1: Specification was unclear about timezone handling
- Original: "Process timestamps from multiple regions"
- Realized: Didn't specify how to normalize timezones
- Updated Spec: "Normalize all timestamps to UTC before analysis, document source timezone in output"
- Test Added: Verify timezone conversion accuracy

### Issue 2: Edge case not covered: Empty dataset
- Original spec didn't mention: "What if data has 0 records?"
- Solution: Return empty results with explanatory message
- Updated spec: "If input has <1 record, return {success: true, records_processed: 0, message: '...'}"

Phase 5: Production Packaging

Now package the skill for customers to use.

Create Customer Documentation

Write a simple guide customers will use:

# [Your Skill] User Guide

## What This Does

[2-3 sentence summary of business value]

## What You Need

- Input data format: [CSV / database connection / API]
- Required fields: [column names or schema]
- Recommended data size: [e.g., "Works best with 100-10,000 records"]

## How to Use It

1. [Step 1]
2. [Step 2]
3. [Step 3]

## Understanding Results

[Explain key fields in output]

## Troubleshooting

**Problem**: [Common failure scenario]
**Solution**: [How to fix]

## Support

Contact: [your email]
Response time: [e.g., "24 hours"]

Version Your Skill

Use semantic versioning:

version: "1.0.0"  # Major.Minor.Patch
# Major: Breaking changes to input/output format
# Minor: New features that are backward compatible
# Patch: Bug fixes

Create Installation Instructions

How does a customer deploy this skill?

## Installation

### Option 1: Claude Code (Recommended)
1. Save this SKILL.md file to ~/.claude/skills/your-domain-skill/SKILL.md
2. In Claude Code, type: /skill-list to see your new skill
3. Activate it: /skill-activate your-domain-skill
4. Test it: Ask Claude "Help me analyze my data with your-domain-skill"

### Option 2: Custom Integration
[For customers integrating with their own LLM provider]

Phase 6: Digital FTE Positioning

Finally, position this skill as a product.

Identify Your Customer

Who would pay for this?

## Customer Profile

### Primary Segment
- Title: [e.g., "Sales Manager at B2B SaaS companies"]
- Pain point: [specific problem they face]
- Current solution: [what they do today, poorly]
- Budget: [How much would they pay monthly/annually?]

### Secondary Segments
[Other customer types who might benefit]

Articulate Business Value

Why should customers choose your skill over alternatives?

## Value Proposition

**Problem**: [Customer's current pain]
**Our solution**: [What this skill does]
**Outcome**: [Specific business result]
**Differentiation**: [Why ours is better than]
- Alternative 1: [comparison on key dimension]
- Alternative 2: [comparison on key dimension]

### ROI Calculation
- Time saved per week: [X hours]
- Cost per hour: [$Y]
- Monthly savings: [X × Y × 4.33 weeks = $Z]
- Skill cost: [$P/month]
- Payback period: [Z/P months]

Select Monetization Model

Choose how you'll charge:

## Monetization Model

### Option 1: Subscription
- Price: $[X]/month or $[Y]/year
- Users included: [1 / unlimited]
- Data volume: [up to X records/month]
- Pros: Predictable revenue, customer lock-in
- Cons: Requires ongoing support

### Option 2: Success Fee
- Price: [X%] of business value generated
- Example: "2% of monthly savings" or "5% of revenue increase"
- Pros: Aligned incentives, higher ceiling
- Cons: Requires tracking customer results

### Option 3: License
- Price: $[X] one-time fee for unlimited use
- Pros: Simple, customer owns it
- Cons: No recurring revenue

### Recommendation
[Which model fits this skill best and why?]

Design Go-to-Market

How will customers find and adopt your skill?

## Go-to-Market Strategy

### Positioning
"[Your skill] is the [category] for [audience] that [specific benefit]."

Example: "sales-analyzer is the revenue intelligence tool for B2B SaaS sales leaders that identifies growth opportunities in 5 minutes instead of 5 hours."

### Customer Acquisition
1. [Channel 1]: [Tactic]
   - Target: [Who you'll reach]
   - Cost: [acquisition cost]
   - Conversion: [estimated %]

2. [Channel 2]: [Tactic]

### Success Metrics
- Month 1: [X customers]
- Month 3: [X customers]
- Month 6: [X customers]
- Target: [$Y MRR by month 12]

Try With AI: Build Your Capstone Skill

This is where you move from learning patterns to building products.

Ask AI to review your domain choice and help sharpen your specification:

I'm building a Digital FTE in [domain] that solves [customer problem].

My current specification:
- Intent: [your intent statement]
- Success criteria: [your success criteria]
- Non-goals: [your constraints]

Review my specification. Is it:
1. Specific enough that you could implement without asking clarifying questions?
2. Focused enough (not trying to solve everything)?
3. Measurable (could you write a test that proves it works)?

If any answer is "no", what's missing from my specification?

What you're learning: How clear specifications prevent implementation rework. Vague specs require clarification loops; clear specs compile cleanly.

Prompt 2: Architecture Design

Have AI help you design which skills compose together:

I need to compose skills from Chapter 39 into an orchestrator that implements my specification above.

The skills I have available:
- Lesson 1: [Advanced Skill Patterns]
- Lesson 2: [Skill Composition patterns]
- Lesson 3-4: [MCP-wrapping skill for my data source]
- Lesson 5-6: [Script-execution skill for my analysis]
- Lesson 7: [Orchestration skill combining them]

Design an architecture showing:
1. Which skills I should compose
2. Data flow between them
3. Error recovery if any component fails
4. How each component validates its output before passing to next

Draw a diagram using ASCII or describe in text.

What you're learning: Composition thinking—reusing tested components rather than building from scratch. This is Digital FTE production practice.

Prompt 3: Implementation & Testing

Ask AI to help implement while you validate each decision:

Implement my orchestrator skill (SKILL.md) that composes the architecture above.

As you implement, explain:
1. Why you chose this persona for the orchestrator
2. What decision questions activate autonomous behavior
3. How error recovery paths will work in practice
4. Which parts need explicit safety constraints

Then, let's test it against my acceptance criteria:
[Paste your 5-7 success criteria]

Create a test plan showing which tests validate which criteria.

What you're learning: How specifications drive implementation. AI uses your spec to make decisions without ambiguity. If it needs to guess, your spec was incomplete.

Prompt 4: Validation & Gap Analysis

Run your tests and document what happens:

I ran my skill against my test data with these results:

[Paste your test results]

Issues discovered:
1. [Issue 1 and what caused it]
2. [Issue 2 and what caused it]

For each issue, should I:
- Fix the skill implementation?
- Update the specification to be more accurate?
- Both?

Help me update either the spec or the implementation to make all tests pass.

What you're learning: The specification ↔ implementation feedback loop. Tests reveal gaps in both. Fixing only the code misses the root cause (underspecified requirements).

Prompt 5: Business Positioning

Finally, articulate why someone would pay for this:

I've built a working Digital FTE that [what it does]. Now help me position it as a product:

What customer segment has the most acute version of this problem?
What's the annual cost of them NOT having this solution? (time wasted, missed revenue, etc.)
What would be a fair price that captures 20-30% of that value?
What should my go-to-market strategy focus on (direct sales, self-serve, marketplace)?
What's one surprising use case for this skill that customers might not think of?

Use this to help me write a positioning statement and pricing proposal.

What you're learning: Digital FTEs aren't just technical products—they're customer solutions. This prompt teaches the business side of the agent economy.

Success Criteria (Validation Checklist)

Your capstone is complete when:

Specification Phase:

Spec clearly describes a customer problem, not a technology exercise
Success criteria are measurable (could write automated tests)
Constraints explicitly define what's NOT included
Architecture diagram shows 3+ composed skills from Lessons 1-7

Implementation Phase:

Skill implementation based on spec (not invented during coding)
Each component (MCP, script, orchestration) tested independently
Implementation decisions documented with reasoning
Skill version defined using semantic versioning

Validation Phase:

All acceptance tests designed before implementation
All acceptance tests run and documented (pass/fail)
Any spec gaps discovered during testing are documented and remediated
Results prove skill meets original specification

Production Phase:

Customer-facing documentation complete (not internal jargon)
Installation instructions for both Claude Code and custom integrations
Troubleshooting guide addresses common failure modes
Version control / release notes prepared

Business Phase:

Customer profile clearly defined (title, problem, budget)
ROI calculation shows when customers break even on investment
Monetization model selected with reasoning (subscription/success-fee/license)
Go-to-market positioning statement (single sentence describing what/for-whom/why)

Reference: Capstone Skill Template

If you get stuck, use this template:

---
name: "my-digital-fte-skill"
version: "1.0.0"
description: "Production-ready skill orchestrating code execution pattern"
proficiency_level: "B2"
---

# Specification

## Intent
[2-3 sentences describing customer problem and solution]

## Success Criteria
1. [Measurable criterion 1]
2. [Measurable criterion 2]
3. [Measurable criterion 3]

## Non-Goals
- Does NOT: [explicit boundary 1]
- Does NOT: [explicit boundary 2]

## Architecture
[Component 1] → [Component 2] → [Component 3] → Output

# Persona

You are a [domain] orchestrator that:
1. [Action 1]
2. [Action 2]
3. [Action 3]

# Questions

- [Question 1: What analysis is needed?]
- [Question 2: What data validates success?]
- [Question 3: What errors require human intervention?]

# Principles

- [Principle 1: Safety/Reliability]
- [Principle 2: Completeness]

# Implementation

[Your skill implementation details]

# Business Positioning

- **Customer**: [Who pays]
- **Problem**: [What pain]
- **Value**: [Specific outcome]

What This Capstone Teaches

This capstone represents mastery of one critical truth: In the agent economy, your value is not your code. Your value is your specification and your composition skills.

You've now learned that the developers who build billion-dollar AI businesses are exactly those who:

Write clear specifications that prevent ambiguity
Compose reusable intelligence rather than reinventing patterns
Validate ruthlessly through acceptance testing
Position strategically as products with customer value

You just completed all four. This skill is now deployable—it could be sold as-is through subscription, license, or success-fee model. Scale this process across multiple skills, and you're building the Digital FTE empire that will define the next decade of software.

Why This Capstone Matters​

Phase 1: Domain Specification (Spec FIRST)​

Select Your Domain​

Write Your Specification​

Intent (The Customer Problem)​

Success Criteria (How You'll Know It Works)​

Constraints (What's NOT Included)​

Acceptance Tests (Proof of Success)​

Architecture (Which Skills Compose)​

Phase 2: Skill Composition​

Map Component Dependencies​

Design Data Contracts​

Define Error Recovery Paths​

Phase 3: Specification → Implementation​

Create the Skill Implementation​

Test Each Component​

Document Implementation Decisions​

Phase 4: Acceptance Testing & Validation​

Run Full Acceptance Test Suite​

Document Any Spec Gaps​

Phase 5: Production Packaging​

Create Customer Documentation​

Version Your Skill​

Create Installation Instructions​

Phase 6: Digital FTE Positioning​

Identify Your Customer​

Articulate Business Value​

Select Monetization Model​

Design Go-to-Market​

Try With AI: Build Your Capstone Skill​

Prompt 1: Specification Refinement​

Prompt 2: Architecture Design​

Prompt 3: Implementation & Testing​

Prompt 4: Validation & Gap Analysis​

Prompt 5: Business Positioning​

Success Criteria (Validation Checklist)​

Reference: Capstone Skill Template​

What This Capstone Teaches​