Build Script-Execution Skill

You've learned the pattern (Lesson 5): write code from specification → execute it → analyze errors → iterate. Now you're going to build a skill that orchestrates this loop autonomously.

But here's what makes this different from following a tutorial: You'll specify what problem you're solving FIRST, then let AI help you build the skill while you validate each decision. You're not just learning a pattern—you're learning to think about error recovery, convergence criteria, and edge cases the way production systems demand.

Step 1: Write Your Specification

Before touching any skill code, write a specification for the problem you're solving. You'll use a CSV data processing task because it's concrete and has natural edge cases.

Choose one of these:

CSV Analysis: Analyze customer or sales data for patterns
CSV Transformation: Clean and restructure messy CSV data
CSV Aggregation: Group data by dimensions and calculate metrics

Or define your own data processing task.

Your Specification

Write this to a file (skill-spec.md) or document:

# CSV Analysis Skill Specification

## Intent
[What does this skill do? Be specific about the business problem it solves]

## Input
- data_file: [type and format, e.g., "CSV with columns: customer_id, purchase_date, amount"]
- parameters: [what configuration does skill accept?]

## Output
- format: [JSON, CSV, report?]
- required_fields: [exact fields that must be in output]
- validation_rules: [how to verify output is correct]

## Success Criteria
- All data processed without loss
- Output format exactly matches specification
- Edge cases handled gracefully (malformed rows, missing values, etc.)
- Execution completes within 30 seconds

## Edge Cases to Handle
- [Case 1: e.g., "Empty CSV file"]
- [Case 2: e.g., "Missing column header"]
- [Case 3: e.g., "Non-numeric values in amount field"]

Key principle: Your specification must be complete enough that AI can generate correct code without additional context. If your spec is vague, the generated code will be equally vague.

Step 2: Design Your Skill's Persona and Questions

Before building, define how your skill thinks about this problem.

# Skill Persona
persona: |
  You are a data orchestrator: your role is to write Python scripts that
  process data robustly. When you encounter errors, you read the error message
  carefully, understand why the code failed, and generate corrected code.
  You validate results against the specification. You handle edge cases explicitly
  rather than hoping they don't occur.

questions:
  - "What does the data structure look like? (columns, data types, edge cases)"
  - "What transformation or analysis does the specification require?"
  - "What output format must the code produce?"
  - "What validation proves the output is correct?"
  - "What edge cases are most likely to occur in real data?"

principles:
  - "Validate data before processing: Check columns exist, types are correct"
  - "Fail explicitly: Raise errors with clear messages rather than silently producing wrong results"
  - "Test assumptions: Don't assume column names; inspect actual data first"
  - "Document the transformation: Add comments explaining the logic"

Step 3: Build the Skill Core with AI Collaboration

Now you're going to build this skill with AI. You'll test code, discover error patterns you hadn't anticipated, and learn what your actual data requires. As you iterate, the skill improves—not because you're following a formula, but because specification-driven feedback drives real improvements.

Part A: Generate Initial Implementation

Your prompt to AI:

I'm building a skill that processes CSV data. Here's my specification:

[PASTE YOUR SPEC]

Generate a Python skill implementation that:
1. Reads the CSV file
2. Validates the data structure (check columns, types)
3. Performs the required transformation/analysis
4. Returns results in the specified format
5. Includes error handling for common CSV issues

The code should be production-quality (defensive, not assuming data format).

AI will generate code. Study it. Does it match your specification?

Critical evaluation:

Does the code check for expected columns before using them?
Does it handle missing/null values?
Does it validate the output format matches your spec?

Document what you notice:

Things I observe:
- [Good pattern in the approach]
- [Assumption that might not hold]
- [Edge case not addressed yet]

Use these observations to guide your feedback to AI when iterating.

Part B: Test with Real Data

Get or create sample CSV data that matches your specification's expected format.

Run the generated code:

# Save AI-generated code to analysis.py
# Create test_data.csv with sample data
# Run it

python analysis.py test_data.csv

What happens?

✓ Success: Output matches specification → Great! Move to Part C
✗ Syntax Error: Code won't even parse
✗ Runtime Error: Code runs but crashes (KeyError, TypeError, etc.)
✗ Logic Error: Code runs, output is wrong or incomplete

Part C: Recover from Errors

This is where error recovery becomes visible.

If you got a syntax error:

Show AI the error:
"Here's my code:
[show problematic section]

Error: [paste error message]

What's wrong and how do I fix it?"

AI explains and provides corrected code.

If you got a runtime error:

Show AI the error:
"The code crashed with:
[error message and traceback]

What does this error mean?
What assumption did the code make that's wrong?
How should I fix it to handle real data?"

If output is wrong/incomplete:

"My spec requires [required output].
My code produces [what it actually produces].

What's missing? How should the code be changed to match the spec?"

Part D: Iterate Until Convergence

Keep improving until:

✓ Code runs without errors ✓ Output matches your specification exactly ✓ Edge cases are handled (test with malformed data) ✓ Execution completes within time limit

Test with multiple scenarios:

# Test 1: Clean data (happy path)
python analysis.py clean_data.csv

# Test 2: Missing columns
python analysis.py missing_columns.csv

# Test 3: Non-numeric values where numeric expected
python analysis.py malformed_data.csv

# Test 4: Empty file
python analysis.py empty.csv

# Test 5: Large file (check performance)
python analysis.py large_data.csv

For each test, document:

Did it run without error? (Yes/No)
Does output match spec format? (Yes/No)
Are edge cases handled gracefully? (Yes/No)

Step 4: Build the Iteration Loop (The Skill Automating the Pattern)

Now that you've manually gone through the loop, you're going to build a skill that does this automatically.

Your skill needs these components:

def build_analysis_skill():
    """
    The full script-execution skill that orchestrates:
    1. Generate code from spec
    2. Execute the code
    3. Check for errors
    4. Generate fixes if needed
    5. Iterate until convergence
    """

    # Component 1: Code Generation
    def generate_code(specification: str) -> str:
        """Generate Python code from specification using AI"""
        # Prompt AI with: "Given this spec: [spec],
        # write complete Python code that implements it"
        # Return the generated code
        pass

    # Component 2: Code Execution
    def execute_code(code: str, input_file: str, timeout: int = 30) -> tuple[bool, str, str]:
        """Execute code, return (success, output, error_message)"""
        # Run code with subprocess
        # Capture stdout, stderr
        # Return results with timeout protection
        pass

    # Component 3: Error Analysis
    def analyze_error(error_message: str, code: str) -> str:
        """Understand what went wrong"""
        # Parse error type (SyntaxError, RuntimeError, etc.)
        # Extract the problematic line
        # Return clear analysis of the issue
        pass

    # Component 4: Fix Generation
    def generate_fix(error_analysis: str, code: str, spec: str) -> str:
        """Generate corrected code"""
        # Prompt AI: "This code failed with: [error]
        # Here's the problem: [analysis]
        # The spec is: [spec]
        # Generate corrected code that fixes this"
        pass

    # Component 5: Convergence Check
    def check_convergence(output: str, spec: dict) -> bool:
        """Does output satisfy the specification?"""
        # Validate: all required fields present
        # Validate: output format correct
        # Validate: no error messages in output
        # Return True if spec is satisfied
        pass

    # Component 6: Main Iteration Loop
    def execute_skill(specification: str, input_file: str) -> str:
        """Main skill that orchestrates everything"""
        max_iterations = 5
        iteration = 0
        code = None

        while iteration < max_iterations:
            iteration += 1

            if iteration == 1:
                # First iteration: generate from spec
                code = generate_code(specification)

            # Execute the code
            success, output, error = execute_code(code, input_file)

            if success and check_convergence(output, spec):
                # ✓ Converged! Specification is satisfied
                return output

            if not success:
                # ✗ Error occurred
                analysis = analyze_error(error, code)
                code = generate_fix(analysis, code, specification)
                # Loop continues, retry with fixed code

            elif not check_convergence(output, spec):
                # ✗ Output doesn't match spec
                fix_request = f"Output is incomplete: {output}.
                               Required by spec: {spec}. Generate code that adds missing parts."
                code = generate_fix(fix_request, code, specification)
                # Loop continues, retry with improved code

        # If we get here, max iterations reached without converging
        raise RuntimeError(f"Failed to converge after {max_iterations} iterations")

Step 5: Implementation Guidance with AI

You're going to build this skill using AI, but testing and validating each component.

Get AI Help Building the Iteration Loop

I'm building a Python skill that generates code, executes it, and iterates
until a specification is satisfied.

Here's my specification:
[PASTE YOUR SPEC]

Here's my first attempt at code generation and execution:
[PASTE YOUR MANUAL CODE FROM STEP 3-4]

Now I need to build an automated loop that:
1. Generates code once (given spec)
2. Executes code (capture output/errors, 30-second timeout)
3. If error: analyze error, prompt you to generate fixed code
4. If output doesn't match spec: prompt you to improve code
5. Check convergence (spec is satisfied) → Stop
6. Repeat until convergence or 5 iterations max

Show me how to structure this as a Python class/functions.
Include error handling, timeout protection, and convergence checking.

Build Convergence Validation

This is critical. Your skill must STOP when the specification is satisfied.

def convergence_check(output: str, specification: dict) -> dict:
    """
    Validate whether output satisfies specification.
    Returns: {
        'converged': bool,
        'missing': [list of unsatisfied requirements],
        'issues': [any problems found]
    }
    """
    results = {
        'converged': True,
        'missing': [],
        'issues': []
    }

    # Check all required fields are present
    for field in specification.get('output', {}).get('required_fields', []):
        if field not in output:
            results['missing'].append(f"Field missing: {field}")
            results['converged'] = False

    # Check output format (if JSON specified)
    if specification.get('output', {}).get('format') == 'JSON':
        try:
            json.loads(output)
        except:
            results['issues'].append("Output is not valid JSON")
            results['converged'] = False

    # Add domain-specific validation based on your spec
    # Example: if analyzing customers, verify segments exist
    if 'required_segments' in specification:
        for segment in specification['required_segments']:
            if segment not in output:
                results['missing'].append(f"Segment missing: {segment}")
                results['converged'] = False

    return results

Add Timeout and Resource Protection

import subprocess
import signal

def execute_code_safely(code: str, input_file: str, timeout: int = 30) -> tuple[bool, str, str]:
    """
    Execute Python code with timeout and error capture.
    Returns: (success: bool, output: str, error: str)
    """
    # Write code to temporary file
    with open('_temp_analysis.py', 'w') as f:
        f.write(code)

    try:
        # Run with timeout
        result = subprocess.run(
            ['python', '_temp_analysis.py', input_file],
            capture_output=True,
            text=True,
            timeout=timeout
        )

        if result.returncode == 0:
            # Success
            return (True, result.stdout, '')
        else:
            # Execution failed
            return (False, result.stdout, result.stderr)

    except subprocess.TimeoutExpired:
        return (False, '', 'TimeoutError: Execution exceeded 30 seconds')
    except Exception as e:
        return (False, '', f'ExecutionError: {str(e)}')

Step 6: Test Your Skill Against Edge Cases

Your skill should handle:

Test 1: Clean Data (Happy Path)

skill = ScriptExecutionSkill(
    specification=your_spec,
    input_file='clean_data.csv'
)

result = skill.execute()
assert result is not None
assert 'error' not in result.lower()

Expected: Succeeds on first iteration

Test 2: Malformed Data (Edge Case)

# CSV with missing columns, non-numeric values, etc.
result = skill.execute(input_file='malformed_data.csv')

# Skill should detect error, fix code, retry
assert 'error' not in result.lower()  # After recovery, still valid

Expected: Skill generates fix after detecting error

Test 3: Empty File (Non-Recoverable)

result = skill.execute(input_file='empty.csv')

# This SHOULD fail (non-recoverable)
assert result is None or 'error' in result.lower()

Expected: Skill recognizes this is non-recoverable, stops gracefully

Test 4: Timeout Scenario

# Spec with large data processing that might timeout
result = skill.execute(input_file='large_data.csv', timeout=5)

# Skill should timeout gracefully, not hang
assert 'timeout' in result.lower() or result is None

Expected: Skill times out, reports clearly

Step 7: Document Your Skill

Real skills are documented for others to use.

# CSV Analysis Skill

## Purpose
[What problem does this solve?]

## Usage
```python
from my_skill import ScriptExecutionSkill

skill = ScriptExecutionSkill(
    specification={
        'input': ['customers.csv'],
        'output': {'format': 'JSON', 'required_fields': [...]},
        'success_criteria': [...]
    },
    input_file='customers.csv'
)

result = skill.execute()
print(result)

How It Works

Specification defines what code must do
Skill generates Python code from spec
Code executes against input file
Errors trigger automatic fix generation
Iteration continues until spec is satisfied or max retries reached

Success Metrics

Execution time: < 30 seconds
Convergence rate: 95%+ (passes with clean data)
Edge case handling: Gracefully recovers or fails clearly

Known Limitations

[What doesn't it handle?]
[When should you use something else?]

---

## Try With AI

Now you'll refine your skill with AI collaboration, focused on error recovery and robustness.

### Prompt 1: Design Error Recovery Patterns

I've built a skill that generates Python code from specifications and executes it. It encounters three types of errors:

Syntax errors (code won't parse)
Runtime errors (code crashes during execution)
Logic errors (code runs but output is wrong)

For each error type, help me design the recovery strategy:

Syntax errors:

How should I prompt you to generate fixed code?
What context should I provide?

Runtime errors:

How should I parse the error message?
What information helps you generate a better fix?

Logic errors:

How do I detect these (they don't produce error messages)?
How should I describe the problem to you?

Show me the exact prompts I should use for each type.

**What you're learning**: How to design prompts that help AI generate fixes, not just resuggest the same broken code.

### Prompt 2: Implement Convergence Testing

My specification requires these success criteria: [PASTE YOUR CRITERIA FROM YOUR SPEC]

I need a function that validates whether code output satisfies these criteria.

For each criterion, what should the validation check?

How do I verify the output format is correct?
How do I verify all required fields are present?
How do I detect if the output is incomplete or wrong?

Show me a Python function that validates all criteria and returns which ones passed, which ones failed, and what's missing.

**What you're learning**: How to translate specification requirements into automated validation that tells you exactly when to stop iterating.

### Prompt 3: Test Your Skill with Intentional Failures

I want to test my skill's error recovery. Help me design test cases:

Test Case 1: Missing column

Create CSV data where a required column is missing
Show me what error the generated code will produce
What should my skill do to recover?

Test Case 2: Wrong data type

Create data where a numeric column contains text
Show the error this produces
How should the skill fix this?

Test Case 3: Timeout scenario

What operation would cause a timeout?
How should my skill handle timeouts gracefully?

For each test case, show me:

The test data
The error produced
How my skill should recover

**What you're learning**: Testing is not about success cases—it's about understanding how your skill behaves when things break.

### Prompt 4: Validate Convergence Against Diverse Inputs

My skill has processed the following test scenarios:

Test 1 - Clean data: PASSED Test 2 - Missing column: RECOVERED (3 iterations) Test 3 - Empty file: FAILED (non-recoverable) Test 4 - Malformed values: RECOVERED (2 iterations)

Based on these results:

Is my skill ready for production?
What patterns suggest robustness?
What edge cases might still break it?
What should I test next?

Help me evaluate the skill's readiness.

**What you're learning**: Testing isn't a binary pass/fail. It's about understanding your skill's behavior patterns and building confidence in its robustness.

---

## Success Criteria

Your skill is complete when:

✓ **Specification is clear and complete** — AI can generate code from it without asking questions
✓ **Code executes successfully on clean data** — Happy path works
✓ **Error recovery works** — Syntax and runtime errors trigger fixes
✓ **Convergence is detected** — Skill stops when spec is satisfied
✓ **Edge cases are handled** — Tested with malformed, empty, large data
✓ **Iteration limits work** — Skill stops after 5 attempts or timeout
✓ **Skill is documented** — Someone else could use it

Your skill will become a reusable component in Lesson 7 (orchestration) when you combine it with MCP-wrapping skills to create complete workflows.

---

**Takeaway**: You didn't just learn the write-execute-analyze loop—you built a skill that automates it. You discovered that error recovery isn't magic; it's specification clarity + intelligent prompting + convergence validation. In Lesson 7, you'll orchestrate this skill with MCP-wrapping skills to build complex workflows that combine code execution with external tools.

Step 1: Write Your Specification​

Your Specification​

Step 2: Design Your Skill's Persona and Questions​

Step 3: Build the Skill Core with AI Collaboration​

Part A: Generate Initial Implementation​

Part B: Test with Real Data​

Part C: Recover from Errors​

Part D: Iterate Until Convergence​

Step 4: Build the Iteration Loop (The Skill Automating the Pattern)​

Step 5: Implementation Guidance with AI​

Get AI Help Building the Iteration Loop​

Build Convergence Validation​

Add Timeout and Resource Protection​

Step 6: Test Your Skill Against Edge Cases​

Test 1: Clean Data (Happy Path)​

Test 2: Malformed Data (Edge Case)​

Test 3: Empty File (Non-Recoverable)​

Test 4: Timeout Scenario​

Step 7: Document Your Skill​

How It Works​

Success Metrics​

Known Limitations​