Skip to main content

Chapter 20: Computation & Data Extraction Workflow

"You already command files with Bash. Now command computation with Python: same pipes, same principles, zero syntax memorization."

The file processing chapter gave you power over files: finding them, organizing them, renaming hundreds in one command. You directed an agent through Bash: ls, find, mv, cp, and it handled the tedious work while you made the decisions.

Now try adding up the dollar amounts in a bank statement. echo $((14.50 + 23.75)) throws a syntax error. Bash: the tool that moved a thousand files without breaking a sweat: can't add two prices. The foundation has a hard wall: decimal math.

This chapter breaks through that wall. You'll build Python scripts that slot into your Unix toolkit exactly where Bash falls short: reading from stdin, writing to stdout, chaining through pipes. The agent writes the code. You make the decisions. The language changes; the workflow doesn't.

📚 Teaching Aid

What You'll Learn

A personal library of Unix-styled Python commands in ~/tools:

ToolWhat It Does
sum.pySums decimal numbers from stdin
sum-expenses.pyParses real bank CSVs with quoted fields
extract-column.pyPulls one column from any CSV
filter.pyKeeps numbers matching a condition
stats.pyPrints sum, count, average, min, max
tax-prepCategorizes and totals deductible expenses

By the end: cat ~/finances/2025/*.csv | tax-prep produces a categorized report your accountant can use. One command. Every year.

Why This Matters

Every lesson in this chapter follows the same rhythm from the book's core thesis: you describe what the tool should accept and produce (the first 10%), the agent writes the Python code (the 80%), and you verify against a known correct answer before trusting it (the final 10%). The hand-calculated total of $1,751.29 is not a curiosity; it IS your 10%. You cannot verify agent output without an independent ground truth.

This is Spec-Driven Development applied to computation: your description of what the tool should do is the spec, the agent's code is the implementation, and your test data is the acceptance criteria. You do not write Python from memory. You write precise requirements, and the agent fulfills them.

The Unix-styled tools you build here are not just personal utilities. In Part 3, these computation patterns become skills you install in AI Employees: an Expense Reconciler that runs nightly, a report generator that categorizes transactions automatically, a data pipeline that routes outputs to the right place without you in the loop. Master the pattern here, deploy it as a worker later.

Prerequisites

From the Seven Principles chapter:

  • You understand the Seven Principles conceptually, especially "Bash is the Key" and "Verification as Core Step"

From the file processing chapter:

  • You can navigate directories, run Bash commands, and direct an agent through conversations
  • You've used the pipe operator (|) to chain commands together
  • You've experienced the safety-first pattern: backup → verify → proceed

Technical Requirements:

  • Python 3.x installed (see setup below)
  • Unix-like terminal (macOS, Linux, or WSL on Windows)
  • Access to Claude Code or similar AI assistant

Python Setup: verify Python is installed before starting Lesson 1:

python3 --version

If you see a version number (3.x), you're ready. If not, install Python from python.org or use your system's package manager (brew install python on macOS, sudo apt install python3 on Ubuntu).

About the Claude Code Conversations

The conversations shown in this chapter are illustrative: they show the flow of interaction and the kind of output you should expect. Your actual Claude Code sessions will look different. Focus on the pattern (what you asked for and why), not the exact words the agent used.

Seven Principles (Compact)

PrincipleChapter 20 Application
P1 Bash is the KeyUnix pipes connect Python tools to the shell: cat \*.csv | sum-expenses.py | filter.py
P2 Code as Universal InterfacePython scripts read stdin and write stdout, slotting into any pipe chain
P3 Verification as Core StepExit codes, expected totals, and test CSVs confirm every tool before batch use
P4 Small Reversible DecompositionOne tool, one job: sum.py before filter.py before stats.py before tax-prep
P5 Persisting State in FilesScripts live in ~/tools, logs and categorized reports saved for annual reuse
P6 Constraints and SafetyTest on sample data first; never mutate source CSVs; validate totals against known sums
P7 ObservabilityNamed tools, verbose output flags, and hand-calculated totals make agent work auditable

Sample Data

Use this bank statement CSV throughout the chapter. Save it as ~/finances/sample-2025.csv:

Date,Description,Amount
2025-01-02,STARBUCKS #1234,-5.75
2025-01-03,TRADER JOES #567,-87.32
2025-01-04,CVS/PHARMACY #1234,-45.67
2025-01-05,SHELL OIL STATION,-52.10
2025-01-06,NETFLIX SUBSCRIPTION,-15.99
2025-01-07,"AMAZON, INC.",-89.50
2025-01-08,WALGREENS #5678,-23.45
2025-01-09,DR MARTINEZ MEDICAL,-150.00
2025-01-10,WHOLE FOODS MKT,-62.18
2025-01-11,DR PEPPER SNAPPLE,-4.99
2025-01-12,UNITED WAY DONATION,-100.00
2025-01-13,SPOTIFY PREMIUM,-10.99
2025-01-14,OFFICE DEPOT #901,-89.50
2025-01-15,CVSMITH CONSULTING,-200.00
2025-01-16,TARGET STORE #442,-34.56
2025-01-17,RED CROSS DONATION,-50.00
2025-01-18,UBER TRIP,-18.75
2025-01-19,STAPLES #2233,-42.30
2025-01-20,CHEVRON GAS,-48.90
2025-01-21,PHARMACY RX PLUS,-67.80
2025-01-22,APPLE.COM/BILL,-9.99
2025-01-23,COSTCO WHSE #1123,-156.42
2025-01-24,ZOOM VIDEO COMM,-14.99
2025-01-25,DEPOSIT - PAYROLL,3200.00
2025-01-26,ATM WITHDRAWAL,-200.00
2025-01-27,VENMO PAYMENT,-35.00
2025-01-28,GOODWILL DONATION,-75.00
2025-01-29,HULU SUBSCRIPTION,-17.99
2025-01-30,PET SMART #890,-42.15
2025-01-31,INTEREST PAYMENT,2.47

Your hand-calculated expense total (all 28 debits, excluding the two credits): $1,751.29.

How This Chapter Teaches

Every lesson in this chapter uses the PRIMM-AI+ cycle. Use it every time you direct an agent to build or run a computation tool.

StageWhat You Do
PredictBefore running anything, write down what you expect the script to output: the numeric total, the exit code, and which lines it will process. Record your confidence score from 1 to 5.
RunDirect the agent with a specific, stdin-focused prompt. The agent writes the code and executes it. You do not type Python.
InvestigateCompare the actual output to your prediction. If they differ, write your own one-sentence explanation of why before asking the agent to clarify.
ModifyChange one input value or one requirement (a different column name, a different filter threshold). Predict the new result first, then verify.
MakeBuild a similar tool for a domain unrelated to bank statements, independently. Passing means the tool reads stdin, writes stdout, and produces the correct result on known-answer test data.

Apply this cycle whether you are summing numbers, parsing CSV, decomposing tools, or categorizing transactions.

Chapter Structure

LessonTitleDurationKey Skill
1From Broken Math to Your First Tool30 minBuild a Python utility from scratch
2The Testing Loop25 minVerify with exit codes and test data
3Parsing Real Data30 minParse CSV, install permanently
4One Tool, One Job25 minDecompose into composable Unix tools
5Data Wrangling & Domain Transfer40 minCategorize with regex, prove it transfers to server logs
6Capstone: Tax Season Prep40 minGenerate tax-ready report

Total Duration: 190 minutes (~3.2 hours)