Chapter 20: Computation & Data Extraction Workflow
"You already command files with Bash. Now command computation with Python: same pipes, same principles, zero syntax memorization."
The file processing chapter gave you power over files: finding them, organizing them, renaming hundreds in one command. You directed an agent through Bash: ls, find, mv, cp, and it handled the tedious work while you made the decisions.
Now try adding up the dollar amounts in a bank statement. echo $((14.50 + 23.75)) throws a syntax error. Bash: the tool that moved a thousand files without breaking a sweat: can't add two prices. The foundation has a hard wall: decimal math.
This chapter breaks through that wall. You'll build Python scripts that slot into your Unix toolkit exactly where Bash falls short: reading from stdin, writing to stdout, chaining through pipes. The agent writes the code. You make the decisions. The language changes; the workflow doesn't.
📚 Teaching Aid
What You'll Learn
A personal library of Unix-styled Python commands in ~/tools:
| Tool | What It Does |
|---|---|
sum.py | Sums decimal numbers from stdin |
sum-expenses.py | Parses real bank CSVs with quoted fields |
extract-column.py | Pulls one column from any CSV |
filter.py | Keeps numbers matching a condition |
stats.py | Prints sum, count, average, min, max |
tax-prep | Categorizes and totals deductible expenses |
By the end: cat ~/finances/2025/*.csv | tax-prep produces a categorized report your accountant can use. One command. Every year.
Why This Matters
Every lesson in this chapter follows the same rhythm from the book's core thesis: you describe what the tool should accept and produce (the first 10%), the agent writes the Python code (the 80%), and you verify against a known correct answer before trusting it (the final 10%). The hand-calculated total of $1,751.29 is not a curiosity; it IS your 10%. You cannot verify agent output without an independent ground truth.
This is Spec-Driven Development applied to computation: your description of what the tool should do is the spec, the agent's code is the implementation, and your test data is the acceptance criteria. You do not write Python from memory. You write precise requirements, and the agent fulfills them.
The Unix-styled tools you build here are not just personal utilities. In Part 3, these computation patterns become skills you install in AI Employees: an Expense Reconciler that runs nightly, a report generator that categorizes transactions automatically, a data pipeline that routes outputs to the right place without you in the loop. Master the pattern here, deploy it as a worker later.
Prerequisites
From the Seven Principles chapter:
- You understand the Seven Principles conceptually, especially "Bash is the Key" and "Verification as Core Step"
From the file processing chapter:
- You can navigate directories, run Bash commands, and direct an agent through conversations
- You've used the pipe operator (
|) to chain commands together - You've experienced the safety-first pattern: backup → verify → proceed
Technical Requirements:
- Python 3.x installed (see setup below)
- Unix-like terminal (macOS, Linux, or WSL on Windows)
- Access to Claude Code or similar AI assistant
Python Setup: verify Python is installed before starting Lesson 1:
python3 --version
If you see a version number (3.x), you're ready. If not, install Python from python.org or use your system's package manager (brew install python on macOS, sudo apt install python3 on Ubuntu).
The conversations shown in this chapter are illustrative: they show the flow of interaction and the kind of output you should expect. Your actual Claude Code sessions will look different. Focus on the pattern (what you asked for and why), not the exact words the agent used.
Seven Principles (Compact)
| Principle | Chapter 20 Application |
|---|---|
| P1 Bash is the Key | Unix pipes connect Python tools to the shell: cat \*.csv | sum-expenses.py | filter.py |
| P2 Code as Universal Interface | Python scripts read stdin and write stdout, slotting into any pipe chain |
| P3 Verification as Core Step | Exit codes, expected totals, and test CSVs confirm every tool before batch use |
| P4 Small Reversible Decomposition | One tool, one job: sum.py before filter.py before stats.py before tax-prep |
| P5 Persisting State in Files | Scripts live in ~/tools, logs and categorized reports saved for annual reuse |
| P6 Constraints and Safety | Test on sample data first; never mutate source CSVs; validate totals against known sums |
| P7 Observability | Named tools, verbose output flags, and hand-calculated totals make agent work auditable |
Sample Data
Use this bank statement CSV throughout the chapter. Save it as ~/finances/sample-2025.csv:
Date,Description,Amount
2025-01-02,STARBUCKS #1234,-5.75
2025-01-03,TRADER JOES #567,-87.32
2025-01-04,CVS/PHARMACY #1234,-45.67
2025-01-05,SHELL OIL STATION,-52.10
2025-01-06,NETFLIX SUBSCRIPTION,-15.99
2025-01-07,"AMAZON, INC.",-89.50
2025-01-08,WALGREENS #5678,-23.45
2025-01-09,DR MARTINEZ MEDICAL,-150.00
2025-01-10,WHOLE FOODS MKT,-62.18
2025-01-11,DR PEPPER SNAPPLE,-4.99
2025-01-12,UNITED WAY DONATION,-100.00
2025-01-13,SPOTIFY PREMIUM,-10.99
2025-01-14,OFFICE DEPOT #901,-89.50
2025-01-15,CVSMITH CONSULTING,-200.00
2025-01-16,TARGET STORE #442,-34.56
2025-01-17,RED CROSS DONATION,-50.00
2025-01-18,UBER TRIP,-18.75
2025-01-19,STAPLES #2233,-42.30
2025-01-20,CHEVRON GAS,-48.90
2025-01-21,PHARMACY RX PLUS,-67.80
2025-01-22,APPLE.COM/BILL,-9.99
2025-01-23,COSTCO WHSE #1123,-156.42
2025-01-24,ZOOM VIDEO COMM,-14.99
2025-01-25,DEPOSIT - PAYROLL,3200.00
2025-01-26,ATM WITHDRAWAL,-200.00
2025-01-27,VENMO PAYMENT,-35.00
2025-01-28,GOODWILL DONATION,-75.00
2025-01-29,HULU SUBSCRIPTION,-17.99
2025-01-30,PET SMART #890,-42.15
2025-01-31,INTEREST PAYMENT,2.47
Your hand-calculated expense total (all 28 debits, excluding the two credits): $1,751.29.
How This Chapter Teaches
Every lesson in this chapter uses the PRIMM-AI+ cycle. Use it every time you direct an agent to build or run a computation tool.
| Stage | What You Do |
|---|---|
| Predict | Before running anything, write down what you expect the script to output: the numeric total, the exit code, and which lines it will process. Record your confidence score from 1 to 5. |
| Run | Direct the agent with a specific, stdin-focused prompt. The agent writes the code and executes it. You do not type Python. |
| Investigate | Compare the actual output to your prediction. If they differ, write your own one-sentence explanation of why before asking the agent to clarify. |
| Modify | Change one input value or one requirement (a different column name, a different filter threshold). Predict the new result first, then verify. |
| Make | Build a similar tool for a domain unrelated to bank statements, independently. Passing means the tool reads stdin, writes stdout, and produces the correct result on known-answer test data. |
Apply this cycle whether you are summing numbers, parsing CSV, decomposing tools, or categorizing transactions.
Chapter Structure
| Lesson | Title | Duration | Key Skill |
|---|---|---|---|
| 1 | From Broken Math to Your First Tool | 30 min | Build a Python utility from scratch |
| 2 | The Testing Loop | 25 min | Verify with exit codes and test data |
| 3 | Parsing Real Data | 30 min | Parse CSV, install permanently |
| 4 | One Tool, One Job | 25 min | Decompose into composable Unix tools |
| 5 | Data Wrangling & Domain Transfer | 40 min | Categorize with regex, prove it transfers to server logs |
| 6 | Capstone: Tax Season Prep | 40 min | Generate tax-ready report |
Total Duration: 190 minutes (~3.2 hours)