Skip to main content

Capstone: Tax Season Prep

Six tools, five lessons of verification habits, one question: can you orchestrate them into a workflow that runs every year?

You have a library of Unix-styled Python commands in ~/tools, a verification habit that won't let you submit numbers you haven't proved, and 12 months of bank CSVs sitting in ~/finances/2025/. Now put them together.

No bank CSV yet? Start here.

Ask Claude Code to generate realistic test data before doing anything else:

Generate a bank statement CSV with 20 transactions.
Include: CVS PHARMACY ($45.67), WALGREENS ($23.45), DR MARTINEZ MEDICAL ($150.00),
DR PEPPER SNAPPLE ($4.99), UNITED WAY DONATION ($100.00), OFFICE DEPOT ($89.50),
CVSMITH CONSULTING ($200.00), and 13 random transactions.
Use columns: Date, Description, Amount (negative for debits).
Save as ~/finances/test-2025.csv.

Calculate expected totals by hand BEFORE running anything:

  • Medical (CVS + WALGREENS + DR MARTINEZ): $219.12
  • Charitable (UNITED WAY): $100.00
  • Business (OFFICE DEPOT): $89.50
  • POTENTIAL DEDUCTIONS: $408.62

Those hand-calculated numbers are your verification baseline.

Step 1: Take Inventory

You have a library of tools in ~/tools:

ToolWhat It DoesBuilt In
sum.pySums decimal numbers from stdinLesson 1
sum-expenses.pyExtracts and sums the Amount column from bank CSVsLesson 3
extract-column.pyPulls one column from any CSVLesson 4
filter.pyKeeps numbers matching a conditionLesson 4
stats.pyPrints sum, count, average, min, maxLesson 4
tax-categorize.pyCategorizes transactions, prints subtotals by categoryLesson 5

What's missing: a tax-prep command that adds a POTENTIAL DEDUCTIONS total and runs from any folder without typing python3 ~/tools/....

Step 2: Build tax-prep

Open Claude Code:

You: I have tax-categorize.py in ~/tools. It categorizes bank transactions and
prints subtotals by Medical, Charitable, and Business categories.

I want a tax-prep command I can run like this:
cat ~/finances/2025/*.csv | tax-prep

It should do everything tax-categorize.py does, plus print a
POTENTIAL DEDUCTIONS total at the end (the sum of all category subtotals).
Claude Code: I'll update tax-categorize.py to include the final total,
then install it as a permanent tax-prep command.

[Updates: ~/tools/tax-categorize.py — adds POTENTIAL DEDUCTIONS sum]

#!/usr/bin/env python3
import sys, csv, re

CATEGORIES = {
'medical': [r'\bCVS\b', r'\bWALGREENS\b', r'\bPHARMACY\b', r'\bMEDICAL\b'],
'charitable': [r'\bDONATION\b', r'\bRED CROSS\b', r'\bUNITED WAY\b'],
'business': [r'\bOFFICE DEPOT\b', r'\bSTAPLES\b', r'\bSUBSCRIPTION\b'],
}
FALSE_POSITIVES = [r'\bDR\.?\s*PEPPER\b', r'\bCVSMITH\b']

def categorize(description):
desc_upper = description.upper()
for fp_pattern in FALSE_POSITIVES:
if re.search(fp_pattern, desc_upper, re.IGNORECASE):
return None
for category, patterns in CATEGORIES.items():
for pattern in patterns:
if re.search(pattern, desc_upper, re.IGNORECASE):
return category
return None

reader = csv.DictReader(sys.stdin)
totals = {}
items = {}
for row in reader:
amount = float(row['Amount'].replace('$', '').replace(',', ''))
if amount >= 0: # Skip credits/refunds — only process debits
continue
cat = categorize(row['Description'])
if cat:
expense = abs(amount)
totals.setdefault(cat, 0.0)
items.setdefault(cat, [])
totals[cat] += expense
items[cat].append((row['Description'], expense))

for cat, entries in items.items():
for desc, amount in entries:
print(f"{cat.upper()}: {desc}: ${amount:.2f}")

print("\n--- TOTALS ---")
potential_deductions = 0.0
for cat, total in totals.items():
print(f"{cat.title()}: ${total:.2f}")
potential_deductions += total

print(f"\nPOTENTIAL DEDUCTIONS: ${potential_deductions:.2f}")

[Runs: chmod +x ~/tools/tax-categorize.py]
[Checks: echo $SHELL → /bin/zsh]
[Appends to ~/.zshrc:]
alias tax-prep='python3 ~/tools/tax-categorize.py'
[Runs: source ~/.zshrc]

Done. Try it:
cat ~/finances/test-2025.csv | tax-prep

Step 3: Verify Before Touching Real Data

Run it on your test data (or the generated test file from the tip above):

cat ~/finances/test-2025.csv | tax-prep

Expected output:

MEDICAL: CVS/PHARMACY #1234: $45.67
MEDICAL: WALGREENS #5678: $23.45
MEDICAL: DR MARTINEZ MEDICAL: $150.00
CHARITABLE: UNITED WAY: $100.00
BUSINESS: OFFICE DEPOT #901: $89.50

--- TOTALS ---
Medical: $219.12
Charitable: $100.00
Business: $89.50

POTENTIAL DEDUCTIONS: $408.62

DR PEPPER SNAPPLE and CVSMITH CONSULTING are absent. The totals match your hand calculations. Now you can trust it on real data.

Checkpoint: Prove tax-prep Is Permanent
  1. Close your terminal completely
  2. Open a brand new terminal
  3. Navigate to any folder: cd ~/Desktop
  4. Run: cat ~/finances/test-2025.csv | tax-prep

If you see the report (your command is installed. If you see "command not found") check your ~/.zshrc alias.

Step 4: Process a Full Year

Your bank exports one CSV per month. By year's end, you'll have twelve files. If you cat *.csv to combine them, every file's header row: Date,Description,Amount: ends up mixed into the data. Your script sees the header eleven times where it expects numbers.

The fix uses two commands you already know from the File Processing chapter:

# Header from first file only
head -1 ~/finances/2025/january.csv > ~/finances/combined-2025.csv

# Data rows from ALL files (skip each file's header)
tail -n +2 -q ~/finances/2025/*.csv >> ~/finances/combined-2025.csv

# Now process the clean combined file
cat ~/finances/combined-2025.csv | tax-prep
CommandWhat It Does
head -1First line only (the header row)
tail -n +2Everything from line 2 onward (skips header)
-qQuiet mode: no filename prefixes in output
>>Append (don't overwrite)

Result: one file, one header row, all data rows.

For multiple monthly statements, you can also skip the intermediate file entirely:

# Combine 12 months into one file (single header, all data rows)
head -1 ~/finances/2025/january.csv > ~/finances/combined-2025.csv
tail -n +2 -q ~/finances/2025/*.csv >> ~/finances/combined-2025.csv

# Run tax-prep on the full year
cat ~/finances/combined-2025.csv | tax-prep

Or skip the intermediate file entirely:

# Direct pipeline — no temp file needed
cat ~/finances/2025/*.csv | grep -v "^Date" | \
{ echo "Date,Description,Amount"; cat; } | tax-prep

The command from the README works exactly as promised.

What Just Happened?

Remember the Seven Principles from the Seven Principles chapter? You just used all of them in one workflow, without a checklist, without thinking about it. That is the point. Principles are not rules you consult. They are habits you act on.

PrincipleWhere It Appeared
Bash is the Keycat, head, tail, pipes orchestrated all data flow
Code as Universal InterfacePython scripts executed computation: no hallucinated math
Verification as Core StepTest data with hand-calculated totals BEFORE real files
Small, Reversible DecompositionComposable single-purpose tools (L4), each testable independently
Persisting State in FilesScripts in ~/tools, report saved to a file
Constraints and SafetyFalse positive guards prevented miscategorized deductions
ObservabilityEvery transaction printed before the totals section
When Things Break: Quick Diagnostic Chain

Six months from now, something will stop working. Maybe you updated your shell, maybe Python changed versions, maybe you moved to a new machine. Here's what to check:

# 1. Does the alias exist?
alias tax-prep
# If "not found" → re-add to ~/.zshrc (or ~/.bashrc), then source

# 2. Does the script exist where the alias points?
ls -la ~/tools/tax-categorize.py
# If "not found" → script was moved or deleted

# 3. Can the script run?
python3 ~/tools/tax-categorize.py <<< "Date,Description,Amount"
# If error → Python version mismatch or missing shebang
SymptomCheckFix
"command not found"alias tax-prepRe-add alias to shell config, then source
"No such file"ls ~/tools/tax-categorize.pyScript was moved: update the alias path
"Permission denied"ls -la ~/tools/tax-categorize.pyRe-run chmod +x ~/tools/tax-categorize.py
Script errors on runpython3 --versionPython version changed: check shebang line

Setup is the agent's job. Diagnosis is yours: because when it breaks at 11pm before a deadline, you need to know the three places to look.


The Victory

Before this chapter, Bash couldn't add decimals and you had no way to catch silent bugs in agent-generated code. Now you have a library of verified Unix-styled Python commands, a verification habit that applies to any domain, and the instinct to catch the agent's mistakes before they become yours. Tax prep was the exercise. The skill is the workflow.

Challenge: Prove It Transfers (30 Minutes)

You've run the tax prep workflow on financial data. Lesson 5 proved it works on server logs. Now prove you can do it from scratch on a domain neither lesson covered: no walkthrough, just the goal.

Save this as ~/grades/midterm-2025.csv:

Student,Assignment,Score,Max_Points,Category
Alice,Homework 1,85,100,homework
Bob,Homework 1,92,100,homework
Alice,Quiz 1,18,20,quiz
Bob,Quiz 1,15,20,quiz
Charlie,Homework 1,0,100,homework
Alice,Midterm,78,100,exam
Bob,Midterm,88,100,exam
Charlie,Quiz 1,19,20,quiz
DR CHARLES,Homework 1,95,100,homework
Alice,EXTRA CREDIT,5,0,bonus
Charlie,Midterm,72,100,exam
DR CHARLES,Quiz 1,20,20,quiz

Your task:

  1. Calculate each student's weighted average (homework 30%, quizzes 20%, exams 50%)
  2. Handle the edge cases: DR CHARLES is a student named Charles, not a "DR" prefix to filter. EXTRA CREDIT has Max_Points=0: division by zero trap. Charlie has a 0/100 homework
  3. Flag students with any single score below 60%
  4. Produce a grade report with per-student averages and an AT-RISK section

Hand-calculate first:

StudentHomeworkQuizExamWeighted Avg
Alice85%90%78%82.5%
Bob92%75%88%86.6%
Charlie0%95%72%55.0%
DR CHARLES95%100%:(no exam)

Edge cases to handle: Charlie has a 0/100 homework (that's your at-risk flag. EXTRA CREDIT has Max_Points=0) your script crashes or silently produces infinity unless you handle it. DR CHARLES is a student named Charles, not a "DR" prefix to filter.

If your report handles all three edge cases, the pattern transferred. You didn't need bank statements or server logs. You needed the workflow.

Reflection: What You Actually Learned

The agent wrote all the code. You made all the decisions that mattered.

What It Looked LikeWhat You Actually Learned
Building sum.py and decomposing into toolsDesigning Unix-style architectures where each piece is independently testable
Testing with known data, spotting Dr. PepperTrusting nothing until you've verified it, and finding bugs in output that looks correct
CSV parsing, redirecting the agent from awkRedirecting an agent when its first approach fails: your domain knowledge steers the fix
Writing the promptsSpecifying outcomes and interfaces: the one contribution the agent cannot make for itself

The specific tools (Python, regex, find/xargs) will change. The patterns: verify first, compose through pipes, guard against false positives: will not.

Flashcards Study Aid


Try With AI

Prompt 1: Add a NEEDS REVIEW Section

My tax-prep command categorizes transactions correctly. But some
transactions don't match any category — they're just silently ignored.

Modify it to print a NEEDS REVIEW section at the end listing all
uncategorized transactions with amounts, so I can review them manually.

What you're learning: A director decision disguised as a feature request. You're telling the agent the tool must make its own uncertainty visible rather than silently ignore it. "Print what you couldn't categorize" is not an implementation detail; it's a design principle you imposed. The agent wired the NEEDS REVIEW output; you decided that discarding uncategorized data silently was unacceptable. That call was yours.

Prompt 2: Add Date Filtering

My tax-prep processes all transactions in the CSV. For quarterly
estimates, I need to filter by date range:

cat finances.csv | tax-prep --start 2025-01-01 --end 2025-03-31

Add date filtering. Keep the stdin reading pattern so it still works
with pipes and cat.

What you're learning: Interface-first directing. Notice the prompt specifies exactly what the command should look like from the outside (tax-prep --start 2025-01-01 --end 2025-03-31) before mentioning implementation. You designed the interface; the agent wired argparse to match it. This is the same move as "reads from stdin and prints the total" in Lesson 1: you specify the contract, the agent writes the code that fulfills it.

Prompt 3: Transfer to Your Domain

I work with [your domain] data in CSV format. The data has
[describe columns]. I need to categorize it by [your categories]
and flag items that don't cleanly fit.

Apply the verification-first pattern: create test data with known
answers first, verify totals match before processing real files,
then build a permanent command I can reuse.

What you're learning: Full pattern transfer. You're applying the verification-first orchestration to a domain you actually work in. Notice which parts of the pattern carry over unchanged (verify first, flag ambiguous items, make it permanent) and which require domain-specific knowledge (your categories, your false positives).