Skip to main content
Updated Feb 16, 2026

Parsing Real Data

You download your bank statement, point sum-expenses at it, and get a number. It looks reasonable. You almost move on. Then something nags at you — that Amazon order was $89.50, but you can't find it in the output. You scroll through the CSV and spot the problem: "AMAZON, INC." — a comma hiding inside quotes. Your script just silently split that merchant name in half and summed the wrong column.

This is the wall that every real-world data task hits. Clean test data behaves. Real data cheats.

Checkpoint: Use YOUR Bank Data

Download your actual bank statement as CSV. Most banks offer this under "Download Transactions" or "Export" in online banking.

If your bank doesn't offer CSV export, use the test data below instead.

Privacy note: Your bank CSV may contain account numbers. Before processing, you can remove sensitive columns. The scripts only need the Description and Amount columns.

The Problem

Here's what a bank statement CSV looks like:

Date,Description,Amount
2024-01-02,Coffee Shop,-5.50
2024-01-03,Grocery Store,-127.43
2024-01-07,"AMAZON, INC.",-89.50

You need to sum the Amount column. The third column. Simple, right?

Look at line 4 carefully. The description is "AMAZON, INC." -- it contains a comma INSIDE the quoted field. If you split on commas, you'll get the wrong column.

Try the Obvious Approach First

Before asking Claude Code for help, try the obvious solution yourself. The Amount is the third column. awk splits on commas. This should work:

echo '2024-01-07,"AMAZON, INC.",-89.50' | awk -F',' '{print $3}'

Output:

 INC."

That's garbage. Not -89.50. Not even close. awk split on EVERY comma, including the one inside the quoted field. It saw four fields:

  1. 2024-01-07
  2. "AMAZON
  3. INC."
  4. -89.50

Field 3 is INC." — not the amount you wanted. And the worst part? No error. No warning. Just a wrong value that looks plausible enough to miss.

Now try the rows WITHOUT quotes:

echo '2024-01-02,Coffee Shop,-5.50' | awk -F',' '{print $3}'

Output:

-5.50

That works. The trap only triggers on quoted fields — the rows where merchant names contain commas. Some rows work, some silently fail. This is the worst kind of bug.

The Conversation

Now bring this problem to Claude Code. Show it what you found:

You: I have a bank statement CSV. I tried using awk to get the Amount
column but it breaks on rows like: 2024-01-07,"AMAZON, INC.",-89.50
The comma inside the quotes throws off the column count. How do I
sum the Amount column correctly?

Watch what happens:

Claude Code: That's the classic CSV quoting trap. awk treats every
comma as a delimiter, even inside quotes. I'll use Python's csv
module which understands quoting rules.

[Creates: sum-expenses.py]

#!/usr/bin/env python3
# sum-expenses.py - Sum the Amount column from bank statement CSV
import sys
import csv

total = 0
reader = csv.reader(sys.stdin)
next(reader) # Skip header row

for row in reader:
amount_str = row[2] # Amount is 3rd column (index 2)
amount = float(amount_str.replace('$', '').replace(',', ''))
total += abs(amount)

print(f"Total: ${total:.2f}")

[Creates: test_bank.csv]
Date,Description,Amount
2024-01-02,Coffee Shop,-5.50
2024-01-03,Grocery Store,-127.43
2024-01-07,"AMAZON, INC.",-89.50

[Runs: cat test_bank.csv | python sum-expenses.py]

Total: $222.43

Notice the collaboration. YOU discovered the awk failure by running it yourself. You brought that specific failure to the agent, which told it exactly what problem to solve. The agent knew about Python's csv module — something you might not have found on your own. You taught the agent about your data. The agent taught you about the right tool.

Python's csv module understands quoting rules. It knows that commas inside quotes don't count as separators. The module handles:

  • Quoted fields with commas inside
  • Escaped quotes within quoted fields
  • Different line ending styles
  • Edge cases you haven't thought of yet

When to Use Which

Data TypeUse awkUse csv module
Log files with consistent delimitersYes
Tab-separated dataYes
Data you control (no embedded delimiters)Yes
Bank exportsYes
Downloaded datasetsYes
Any external CSVYes

Rule of thumb: If the CSV came from outside your control, use a proper CSV parser.

The Pattern

Here's the pattern for handling structured data:

"I have [structured data type]. Process [specific column/field].
Be careful - [edge case that could break naive parsing]."

Mentioning the edge case helps the agent choose robust solutions. Compare:

PromptAgent's Likely Response
"Sum the third column of this CSV"Might use awk (faster, simpler)
"Sum the Amount column. Some merchant names have commas"Uses csv module (handles edge cases)

The second prompt gives context that guides the agent to the right tool. You're teaching the agent about your data -- and the agent is teaching you which tools handle that data correctly.

You have a script that handles real bank data — commas in quotes, dollar signs, the works. But right now it's stuck in whatever folder you built it in. Next month when you download a fresh bank statement into ~/finances/, you'll have to remember the exact path to this script. If you have to remember where a tool lives, it's not a tool yet. (And once it IS a tool, you'll still need to teach it the difference between a pharmacy and a Dr. Pepper. That's coming.)


Try With AI

Prompt 1: Understand the CSV Trap

Show me what happens when I try to parse "AMAZON, INC.",-89.50 with awk.
Why does it give the wrong result? What does Python's csv module do differently?

What you're learning: Deep understanding of the failure mode. The agent demonstrates the problem step by step, so you understand why csv modules exist -- not just that you should use them.

Prompt 2: Extend the Parser

My sum-expenses.py works well. Can you modify it to:
1. Accept a column number as an argument (so I can sum any column)
2. Handle empty lines gracefully
3. Show how many transactions were processed

Keep the stdin pattern so it still works with pipes.

What you're learning: Iterative improvement. You have working code and clear requirements. The agent extends functionality while preserving the composable pattern.

Prompt 3: Adapt to Your Bank Format

My bank's CSV has different column names than the example. The amount
column is called 'Debit' instead of 'Amount' and there's a separate
'Credit' column. How do I modify sum-expenses.py to handle my specific
bank format?

What you're learning: Adapting scripts to YOUR data. Every bank exports differently. The agent helps you customize the tool for your specific situation.