Capstone: Shippable Agent Skill
You've now learned every pattern this chapter teaches: persona design (Lesson 1), skill composition (Lesson 2), MCP-wrapping (Lessons 3-4), script execution (Lessons 5-6), and workflow orchestration (Lesson 7).
But learning patterns is not the same as building products.
This capstone is different. You're not following steps or solving a predetermined problem. You're doing what every AI-native entrepreneur must do: articulate a problem clearly enough that AI can build a solution to it, then validate that solution works in the real world.
You're building a specification-first Digital FTE—a complete, shippable execution skill that solves a real customer problem. This skill will orchestrate the accumulated intelligence from all previous lessons, handle production edge cases, and be ready to monetize.
Why This Capstone Matters
In the agent economy, the developers who win are those who can:
- Identify customer problems that are specific enough to solve but broad enough to charge for
- Write specifications clearly so AI can implement without ambiguity
- Compose existing intelligence rather than reinventing patterns
- Validate ruthlessly that the product works before claiming it's production-ready
- Price strategically to capture value without leaving money on the table
This capstone exercises all five. You're not learning to code faster. You're learning to build products that customers pay for.
Phase 1: Domain Specification (Spec FIRST)
Before any implementation, define what problem you're solving for whom.
Select Your Domain
Choose a domain where you have expertise or strong interest. This skill should solve a real problem you've experienced or observed:
Domain Categories:
- Professional Services: Legal document review, financial analysis, code review, technical writing
- Content Operations: Social media scheduling, newsletter writing, content moderation, SEO optimization
- Data Processing: Sales pipeline analysis, customer segmentation, report generation, data validation
- System Operations: Log analysis, configuration validation, performance monitoring, security compliance
- Creative: Image description, metadata generation, caption writing, style adaptation
Real Examples:
- Legal Brief Analyzer: Ingest case documents, extract relevant precedents, summarize arguments, flag contradictions
- Sales Data Processor: Import sales data, identify anomalies, segment customers, forecast pipeline
- Code Review Enforcer: Check PRs against team standards, catch security issues, suggest improvements
- Content Quality Inspector: Validate blog posts for SEO, readability, brand consistency, factual accuracy
Pick something specific. "Data processing tool" is too vague. "CSV sales data analyzer that identifies top performers, flags stagnant accounts, and generates forecasts" is right-sized.
Write Your Specification
Create a skill-spec.md that answers these questions:
Intent (The Customer Problem)
## Intent
**What customer problem does this skill solve?**
A [company type] needs to [specific business outcome] but currently [current friction]. This skill [specific solution] enabling them to [measurable business result].
**Example:**
Sales teams need to identify which accounts are growing vs stagnating to prioritize outreach. Currently, managers spend 4-6 hours weekly manually reviewing spreadsheets. This skill analyzes customer transaction history, flags trends, and generates weekly reports in 5 minutes, freeing managers for strategy.
**Why customers would buy:**
- [Time saved per week/month]
- [Cost reduction: (current cost) → (new cost)]
- [Risk reduced: (current failure rate) → (new failure rate)]
- [Revenue impact: (new revenue opportunity OR cost avoidance)]
Success Criteria (How You'll Know It Works)
Define 5-7 measurable success criteria that a customer would use to evaluate if the skill works:
## Success Criteria
- Accuracy: [specific metric] achieved on test data (e.g., "Correctly identifies 95%+ of anomalies in clean data")
- Speed: [completes in X time] (e.g., "Processes 1000 records in under 30 seconds")
- Reliability: [uptime or error rate] (e.g., "Handles 99%+ of real-world data formats without crashes")
- Completeness: [coverage metric] (e.g., "Generates all required report fields with no missing values")
- Safety: [risk mitigation] (e.g., "Flags confidence levels <80% for manual review before acting")
- Integration: [system compatibility] (e.g., "Imports from Salesforce, HubSpot, and CSV formats")
- Learning: [improvement mechanism] (e.g., "Generates error logs enabling continuous pattern improvement")
Constraints (What's NOT Included)
Define explicit boundaries—what this skill does NOT do:
## Constraints (Non-Goals)
- Does NOT: Real-time decision-making (only batch processing)
- Does NOT: Modify source data (analysis only, read-only)
- Does NOT: Handle unstructured text (CSV/database inputs only)
- Does NOT: Provide legal interpretation (flags issues, humans decide)
Why: [reasoning for each boundary]
Acceptance Tests (Proof of Success)
Write concrete test cases proving each success criterion:
## Acceptance Tests
### Test 1: Accuracy on Clean Data
Input: [clean_data.csv with known patterns]
Expected: [specific output meeting criterion]
Pass/Fail Criteria: [measurable validation rule]
### Test 2: Robustness to Missing Values
Input: [data_with_nulls.csv]
Expected: [skill either fills intelligently or flags for review]
Pass/Fail Criteria: [specific error handling rule]
### Test 3: Performance at Scale
Input: [1000_records.csv]
Expected: [completes in <30 seconds]
Pass/Fail Criteria: [timing measurement]
### Test 4: Integration with MCP
Input: [real-world data from Salesforce MCP]
Expected: [processes without format errors]
Pass/Fail Criteria: [data shape validation]
### Test 5: Error Recovery
Input: [malformed_data.csv]
Expected: [skill detects issue, suggests fix, asks for confirmation]
Pass/Fail Criteria: [user can recovery without restarting]
Architecture (Which Skills Compose)
Map which skills from Lessons 1-7 you'll use:
## Architecture
### Component 1: Data Ingestion (Lesson 4 - MCP Wrapping)
- Which MCP? [e.g., "Salesforce connector MCP from Chapter 37"]
- Skill wrapping it: [e.g., "sales-data-fetcher skill"]
- What it does: Fetches data, validates format, transforms to internal schema
### Component 2: Data Analysis (Lesson 6 - Script Execution)
- Script type: [Python, Bash, SQL]
- Skill orchestrating it: [e.g., "anomaly-detector skill"]
- What it does: Runs analysis script, catches errors, retries on failure
### Component 3: Workflow Coordination (Lesson 7 - Orchestration)
- Master skill: [e.g., "sales-analyzer orchestrator"]
- Coordination: Calls Component 1 → Component 2 → generates report
- Error recovery: If Component 2 fails, Component 3 retries with constraints
### Data Flow
[Diagram showing: MCP Input → Component 1 → Component 2 → Component 3 → Output]
Phase 2: Skill Composition
With specification complete, design how components integrate.
Map Component Dependencies
Which skills call which?
orchestrator-skill (master coordinator)
├─ mcp-wrapping-skill (data source)
│ └─ Context7 MCP (external data)
├─ script-execution-skill (analysis)
│ └─ Python code generation
└─ validation-skill (error checking)
└─ Test data against spec
Design Data Contracts
How does each skill accept/produce data?
## Data Contracts
### MCP-Wrapping Skill Input/Output
Input: {source: "salesforce", filters: {...}}
Output: {records: [...], schema: {...}, validation_passed: bool}
### Script-Execution Skill Input/Output
Input: {data: [...], analysis_type: "anomaly_detection", params: {...}}
Output: {results: [...], errors: [...], iterations: n}
### Orchestrator Skill Input/Output
Input: {customer_id: "...", report_type: "weekly", confidence_threshold: 0.8}
Output: {report: {...}, success: bool, issues_requiring_human_review: [...]}
Define Error Recovery Paths
What happens when each component fails?
## Error Recovery Strategy
### If MCP Fails (Network timeout, auth issue)
→ Retry 3x with exponential backoff
→ If still failing, return cached data (if available) with staleness warning
→ If no cache, escalate to human with actionable error
### If Script Execution Fails (Bad data format)
→ Log specific error message
→ Generate corrected input (remove nulls, handle encoding)
→ Retry script execution with modified input
→ If still failing after 3 iterations, flag for manual review
### If Orchestrator Detects Inconsistency
→ Example: Analysis claims "high growth" but raw data shows decline
→ Generate diagnostic to investigate root cause
→ Present both interpretations to user
→ Ask user to validate which interpretation is correct
Phase 3: Specification → Implementation
Now you have a clear spec. Time to build.
Create the Skill Implementation
Work with AI to implement your SKILL.md file that orchestrates all components:
---
name: "your-domain-skill"
version: "1.0.0"
description: "[From your spec intent]"
proficiency_level: "B2"
category: "Applied"
---
# Persona
You are a [domain] execution orchestrator. Your job is to:
1. Accept customer specifications from intent above
2. Invoke appropriate data sources (MCP-wrapping skill)
3. Execute analysis (script-execution skill)
4. Validate results against success criteria
5. Generate human-readable output
6. Flag confidence issues for human review
7. Iterate if partial success until spec satisfaction
...
Test Each Component
Before testing the full orchestration, validate each component works:
- MCP Component: Does it fetch data correctly?
- Script Component: Does it analyze data without errors?
- Validation Component: Does it catch actual problems?
- Integration: When combined, does data flow correctly between components?
Document Implementation Decisions
As you build, document why you made each architectural choice:
## Implementation Notes
### Why we chose [MCP X] over [MCP Y]
- Comparison: [criteria]
- Decision: X provides [advantage] that Y lacks
### Why error recovery retries 3x maximum
- Risk: Infinite loops would timeout
- Benefit: 3x usually sufficient for transient failures
- Tradeoff: Some failures might need manual intervention
### Why we validate output against original spec
- Purpose: Ensure analysis actually answers customer question
- Example: If spec asked for "anomalies" and we found "seasonal patterns", both valid but different
Phase 4: Acceptance Testing & Validation
Production readiness means: Every acceptance test passes. No exceptions.
Run Full Acceptance Test Suite
Execute each test case from your spec:
## Test Results
### Test 1: Accuracy on Clean Data
- Input: clean_data.csv (100 records)
- Expected: 95%+ anomaly detection accuracy
- Actual: ✓ PASS (97.2% accuracy)
### Test 2: Robustness to Missing Values
- Input: data_with_nulls.csv (50 records, 15% nulls)
- Expected: Handles nulls intelligently
- Actual: ✓ PASS (Imputes using mean, documents assumptions)
### Test 3: Performance at Scale
- Input: 1000_records.csv
- Expected: <30 seconds
- Actual: ✓ PASS (24.3 seconds)
### Test 4: Integration with MCP
- Input: Real Salesforce data
- Expected: Processes without format errors
- Actual: ✓ PASS (5 test runs, 0 format failures)
### Test 5: Error Recovery
- Input: malformed_data.csv (bad encoding)
- Expected: Detects error, asks user
- Actual: ✓ PASS (Error caught, user prompted, recovery successful)
Document Any Spec Gaps
If testing reveals issues, update specification:
## Discovered During Validation
### Issue 1: Specification was unclear about timezone handling
- Original: "Process timestamps from multiple regions"
- Realized: Didn't specify how to normalize timezones
- Updated Spec: "Normalize all timestamps to UTC before analysis, document source timezone in output"
- Test Added: Verify timezone conversion accuracy
### Issue 2: Edge case not covered: Empty dataset
- Original spec didn't mention: "What if data has 0 records?"
- Solution: Return empty results with explanatory message
- Updated spec: "If input has <1 record, return {success: true, records_processed: 0, message: '...'}"
Phase 5: Production Packaging
Now package the skill for customers to use.
Create Customer Documentation
Write a simple guide customers will use:
# [Your Skill] User Guide
## What This Does
[2-3 sentence summary of business value]
## What You Need
- Input data format: [CSV / database connection / API]
- Required fields: [column names or schema]
- Recommended data size: [e.g., "Works best with 100-10,000 records"]
## How to Use It
1. [Step 1]
2. [Step 2]
3. [Step 3]
## Understanding Results
[Explain key fields in output]
## Troubleshooting
**Problem**: [Common failure scenario]
**Solution**: [How to fix]
## Support
Contact: [your email]
Response time: [e.g., "24 hours"]
Version Your Skill
Use semantic versioning:
version: "1.0.0" # Major.Minor.Patch
# Major: Breaking changes to input/output format
# Minor: New features that are backward compatible
# Patch: Bug fixes
Create Installation Instructions
How does a customer deploy this skill?
## Installation
### Option 1: Claude Code (Recommended)
1. Save this SKILL.md file to ~/.claude/skills/your-domain-skill/SKILL.md
2. In Claude Code, type: /skill-list to see your new skill
3. Activate it: /skill-activate your-domain-skill
4. Test it: Ask Claude "Help me analyze my data with your-domain-skill"
### Option 2: Custom Integration
[For customers integrating with their own LLM provider]
Phase 6: Digital FTE Positioning
Finally, position this skill as a product.
Identify Your Customer
Who would pay for this?
## Customer Profile
### Primary Segment
- Title: [e.g., "Sales Manager at B2B SaaS companies"]
- Pain point: [specific problem they face]
- Current solution: [what they do today, poorly]
- Budget: [How much would they pay monthly/annually?]
### Secondary Segments
[Other customer types who might benefit]
Articulate Business Value
Why should customers choose your skill over alternatives?
## Value Proposition
**Problem**: [Customer's current pain]
**Our solution**: [What this skill does]
**Outcome**: [Specific business result]
**Differentiation**: [Why ours is better than]
- Alternative 1: [comparison on key dimension]
- Alternative 2: [comparison on key dimension]
### ROI Calculation
- Time saved per week: [X hours]
- Cost per hour: [$Y]
- Monthly savings: [X × Y × 4.33 weeks = $Z]
- Skill cost: [$P/month]
- Payback period: [Z/P months]
Select Monetization Model
Choose how you'll charge:
## Monetization Model
### Option 1: Subscription
- Price: $[X]/month or $[Y]/year
- Users included: [1 / unlimited]
- Data volume: [up to X records/month]
- Pros: Predictable revenue, customer lock-in
- Cons: Requires ongoing support
### Option 2: Success Fee
- Price: [X%] of business value generated
- Example: "2% of monthly savings" or "5% of revenue increase"
- Pros: Aligned incentives, higher ceiling
- Cons: Requires tracking customer results
### Option 3: License
- Price: $[X] one-time fee for unlimited use
- Pros: Simple, customer owns it
- Cons: No recurring revenue
### Recommendation
[Which model fits this skill best and why?]
Design Go-to-Market
How will customers find and adopt your skill?
## Go-to-Market Strategy
### Positioning
"[Your skill] is the [category] for [audience] that [specific benefit]."
Example: "sales-analyzer is the revenue intelligence tool for B2B SaaS sales leaders that identifies growth opportunities in 5 minutes instead of 5 hours."
### Customer Acquisition
1. [Channel 1]: [Tactic]
- Target: [Who you'll reach]
- Cost: [acquisition cost]
- Conversion: [estimated %]
2. [Channel 2]: [Tactic]
### Success Metrics
- Month 1: [X customers]
- Month 3: [X customers]
- Month 6: [X customers]
- Target: [$Y MRR by month 12]
Try With AI: Build Your Capstone Skill
This is where you move from learning patterns to building products.
Prompt 1: Specification Refinement
Ask AI to review your domain choice and help sharpen your specification:
I'm building a Digital FTE in [domain] that solves [customer problem].
My current specification:
- Intent: [your intent statement]
- Success criteria: [your success criteria]
- Non-goals: [your constraints]
Review my specification. Is it:
1. Specific enough that you could implement without asking clarifying questions?
2. Focused enough (not trying to solve everything)?
3. Measurable (could you write a test that proves it works)?
If any answer is "no", what's missing from my specification?
What you're learning: How clear specifications prevent implementation rework. Vague specs require clarification loops; clear specs compile cleanly.
Prompt 2: Architecture Design
Have AI help you design which skills compose together:
I need to compose skills from Chapter 39 into an orchestrator that implements my specification above.
The skills I have available:
- Lesson 1: [Advanced Skill Patterns]
- Lesson 2: [Skill Composition patterns]
- Lesson 3-4: [MCP-wrapping skill for my data source]
- Lesson 5-6: [Script-execution skill for my analysis]
- Lesson 7: [Orchestration skill combining them]
Design an architecture showing:
1. Which skills I should compose
2. Data flow between them
3. Error recovery if any component fails
4. How each component validates its output before passing to next
Draw a diagram using ASCII or describe in text.
What you're learning: Composition thinking—reusing tested components rather than building from scratch. This is Digital FTE production practice.
Prompt 3: Implementation & Testing
Ask AI to help implement while you validate each decision:
Implement my orchestrator skill (SKILL.md) that composes the architecture above.
As you implement, explain:
1. Why you chose this persona for the orchestrator
2. What decision questions activate autonomous behavior
3. How error recovery paths will work in practice
4. Which parts need explicit safety constraints
Then, let's test it against my acceptance criteria:
[Paste your 5-7 success criteria]
Create a test plan showing which tests validate which criteria.
What you're learning: How specifications drive implementation. AI uses your spec to make decisions without ambiguity. If it needs to guess, your spec was incomplete.
Prompt 4: Validation & Gap Analysis
Run your tests and document what happens:
I ran my skill against my test data with these results:
[Paste your test results]
Issues discovered:
1. [Issue 1 and what caused it]
2. [Issue 2 and what caused it]
For each issue, should I:
- Fix the skill implementation?
- Update the specification to be more accurate?
- Both?
Help me update either the spec or the implementation to make all tests pass.
What you're learning: The specification ↔ implementation feedback loop. Tests reveal gaps in both. Fixing only the code misses the root cause (underspecified requirements).
Prompt 5: Business Positioning
Finally, articulate why someone would pay for this:
I've built a working Digital FTE that [what it does]. Now help me position it as a product:
1. What customer segment has the most acute version of this problem?
2. What's the annual cost of them NOT having this solution? (time wasted, missed revenue, etc.)
3. What would be a fair price that captures 20-30% of that value?
4. What should my go-to-market strategy focus on (direct sales, self-serve, marketplace)?
5. What's one surprising use case for this skill that customers might not think of?
Use this to help me write a positioning statement and pricing proposal.
What you're learning: Digital FTEs aren't just technical products—they're customer solutions. This prompt teaches the business side of the agent economy.
Success Criteria (Validation Checklist)
Your capstone is complete when:
Specification Phase:
- Spec clearly describes a customer problem, not a technology exercise
- Success criteria are measurable (could write automated tests)
- Constraints explicitly define what's NOT included
- Architecture diagram shows 3+ composed skills from Lessons 1-7
Implementation Phase:
- Skill implementation based on spec (not invented during coding)
- Each component (MCP, script, orchestration) tested independently
- Implementation decisions documented with reasoning
- Skill version defined using semantic versioning
Validation Phase:
- All acceptance tests designed before implementation
- All acceptance tests run and documented (pass/fail)
- Any spec gaps discovered during testing are documented and remediated
- Results prove skill meets original specification
Production Phase:
- Customer-facing documentation complete (not internal jargon)
- Installation instructions for both Claude Code and custom integrations
- Troubleshooting guide addresses common failure modes
- Version control / release notes prepared
Business Phase:
- Customer profile clearly defined (title, problem, budget)
- ROI calculation shows when customers break even on investment
- Monetization model selected with reasoning (subscription/success-fee/license)
- Go-to-market positioning statement (single sentence describing what/for-whom/why)
Reference: Capstone Skill Template
If you get stuck, use this template:
---
name: "my-digital-fte-skill"
version: "1.0.0"
description: "Production-ready skill orchestrating code execution pattern"
proficiency_level: "B2"
---
# Specification
## Intent
[2-3 sentences describing customer problem and solution]
## Success Criteria
1. [Measurable criterion 1]
2. [Measurable criterion 2]
3. [Measurable criterion 3]
## Non-Goals
- Does NOT: [explicit boundary 1]
- Does NOT: [explicit boundary 2]
## Architecture
[Component 1] → [Component 2] → [Component 3] → Output
# Persona
You are a [domain] orchestrator that:
1. [Action 1]
2. [Action 2]
3. [Action 3]
# Questions
- [Question 1: What analysis is needed?]
- [Question 2: What data validates success?]
- [Question 3: What errors require human intervention?]
# Principles
- [Principle 1: Safety/Reliability]
- [Principle 2: Completeness]
# Implementation
[Your skill implementation details]
# Business Positioning
- **Customer**: [Who pays]
- **Problem**: [What pain]
- **Value**: [Specific outcome]
What This Capstone Teaches
This capstone represents mastery of one critical truth: In the agent economy, your value is not your code. Your value is your specification and your composition skills.
You've now learned that the developers who build billion-dollar AI businesses are exactly those who:
- Write clear specifications that prevent ambiguity
- Compose reusable intelligence rather than reinventing patterns
- Validate ruthlessly through acceptance testing
- Position strategically as products with customer value
You just completed all four. This skill is now deployable—it could be sold as-is through subscription, license, or success-fee model. Scale this process across multiple skills, and you're building the Digital FTE empire that will define the next decade of software.