Tabular Data: CSV
A CSV file (Comma-Separated Values) is a plain text file where each line is a row and values are separated by commas. Every spreadsheet application (Excel, Google Sheets) can open CSV files. This lesson teaches you to export SmartNotes as a spreadsheet and import notes from CSV data.
Python's csv module handles reading/writing with DictReader/DictWriter for named-column access. The main challenge is type conversion: CSV stores everything as strings, so word_count arrives as "6" not 6, and tags arrives as "beginner, python" not ["beginner", "python"].
James shows Emma his notebook. "Markdown for reading, JSON for saving. But my manager at the warehouse used spreadsheets for everything. Inventory reports, shift schedules, budget tracking. If I could export notes as a spreadsheet, anyone could sort and filter them."
Emma nods. "CSV. The universal spreadsheet format. Every tool reads it: Excel, Google Sheets, LibreOffice, even command-line utilities. The tradeoff is that CSV stores everything as plain text. Your word_count of 6 becomes the string "6". Your tags list becomes a single string. You handle the conversions."
What CSV Looks Like
A CSV file is a text file with one row per line and commas between values:
title,body,word_count,author,is_draft,tags
Python Tips,Learn the basics of Python programming.,6,James,True,"beginner, python"
Debugging,Fix errors quickly.,3,James,True,debug
Cooking Pasta,Boil water and add salt.,6,Emma,False,cooking
The first line is the header row. It names each column. Every subsequent line is a data row. Values that contain commas are wrapped in quotes.
Open any spreadsheet application and paste this text. You will see a clean table with columns and rows.
Reading CSV with csv.DictReader
csv.DictReader reads each row as a dictionary, using the header row as keys:
import csv
from pathlib import Path
path = Path("notes.csv")
# First, create a sample CSV file
path.write_text(
"title,body,word_count,author,is_draft,tags\n"
"Python Tips,Learn basics,2,James,True,\"beginner, python\"\n"
"Debugging,Fix errors,2,James,True,debug\n"
)
# Now read it
with open(path, "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(row)
Output:
{'title': 'Python Tips', 'body': 'Learn basics', 'word_count': '2', 'author': 'James', 'is_draft': 'True', 'tags': 'beginner, python'}
{'title': 'Debugging', 'body': 'Fix errors', 'word_count': '2', 'author': 'James', 'is_draft': 'True', 'tags': 'debug'}
Notice that word_count is '2' (a string), not 2 (an integer). And is_draft is 'True' (a string), not True (a boolean). CSV does not distinguish between types. Everything is text.
The encoding="utf-8" parameter tells Python how to decode the file's bytes into characters. Encoding is the rule for converting characters (like A, é, or 中) into the bytes stored on disk. UTF-8 is the standard encoding for modern text files and supports every language. If you open a file without specifying the encoding, Python uses your system's default, which may not be UTF-8 on Windows.
The Type Conversion Challenge
To create Note objects from CSV rows, you must convert each value to the correct type. The conversion uses a few techniques worth explaining first:
| Technique | What it does | Example |
|---|---|---|
int(text) | Converts a string to an integer | int("6") → 6 |
text.strip().lower() | Removes whitespace, then lowercases | " True ".strip().lower() → "true" |
text.split(",") | Splits a string into a list at each comma | "a, b".split(",") → ["a", " b"] |
from dataclasses import dataclass, field
@dataclass
class Note:
title: str
body: str
word_count: int
author: str = "Anonymous"
is_draft: bool = True
tags: list[str] = field(default_factory=list)
def note_from_csv_row(row: dict[str, str]) -> Note:
"""Convert a CSV row dictionary to a Note.
Handles type conversions:
- word_count: str → int
- is_draft: str → bool
- tags: comma-separated str → list[str]
"""
# Convert the comma-separated tags string into a list
tags: list[str] = []
for tag in row["tags"].split(","):
cleaned = tag.strip()
if cleaned:
tags.append(cleaned)
# Convert "True"/"False" string to a real boolean
is_draft = row["is_draft"].strip().lower() == "true"
return Note(
title=row["title"],
body=row["body"],
word_count=int(row["word_count"]),
author=row.get("author", "Anonymous"),
is_draft=is_draft,
tags=tags,
)
Test it:
row = {
"title": "Python Tips",
"body": "Learn basics",
"word_count": "6",
"author": "James",
"is_draft": "True",
"tags": "beginner, python",
}
note = note_from_csv_row(row)
print(note)
print(f"word_count type: {type(note.word_count)}")
print(f"is_draft type: {type(note.is_draft)}")
print(f"tags type: {type(note.tags)}")
Output:
Note(title='Python Tips', body='Learn basics', word_count=6, author='James', is_draft=True, tags=['beginner', 'python'])
word_count type: <class 'int'>
is_draft type: <class 'bool'>
tags type: <class 'list'>
The string "6" became the integer 6. The string "True" became the boolean True. The string "beginner, python" became the list ["beginner", "python"]. Every field has its correct type.
Writing CSV with csv.DictWriter
csv.DictWriter writes dictionaries to CSV, using a list of field names as the header:
import csv
from pathlib import Path
from dataclasses import asdict
def save_notes_as_csv(notes: list[Note], file_path: Path) -> None:
"""Save notes to a CSV file.
- Tags are stored as comma-separated strings
- Creates parent directories if needed
"""
file_path.parent.mkdir(parents=True, exist_ok=True)
fieldnames = ["title", "body", "word_count", "author", "is_draft", "tags"]
with open(file_path, "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for note in notes:
row = asdict(note)
row["tags"] = ", ".join(row["tags"])
writer.writerow(row)
The newline="" parameter prevents double-spaced output on Windows. Without it, each row gets an extra blank line.
Test it:
notes = [
Note("Python Tips", "Learn basics", 2, "James", tags=["beginner", "python"]),
Note("Debugging", "Fix errors", 2, "James", tags=["debug"]),
Note("Cooking Pasta", "Boil water", 2, "Emma", False, tags=["cooking"]),
]
save_path = Path("data") / "notes.csv"
save_notes_as_csv(notes, save_path)
print(save_path.read_text())
Output:
title,body,word_count,author,is_draft,tags
Python Tips,Learn basics,2,James,True,"beginner, python"
Debugging,Fix errors,2,James,True,debug
Cooking Pasta,Boil water,2,Emma,False,cooking
Open this file in a spreadsheet application. You will see a clean table with headers. You can sort by word count, filter by author, or search for tags.
Loading Notes from CSV
Combine DictReader with the type conversion function:
def load_notes_from_csv(file_path: Path) -> list[Note]:
"""Load notes from a CSV file.
- Returns an empty list if the file does not exist
- Converts types from CSV strings to Python types
"""
if not file_path.exists():
return []
notes: list[Note] = []
with open(file_path, "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
notes.append(note_from_csv_row(row))
return notes
Verify the round trip:
loaded = load_notes_from_csv(Path("data") / "notes.csv")
print(f"Loaded {len(loaded)} notes")
for note in loaded:
print(f" {note.title}: {note.tags} (draft={note.is_draft})")
Output:
Loaded 3 notes
Python Tips: ['beginner', 'python'] (draft=True)
Debugging: ['debug'] (draft=True)
Cooking Pasta: ['cooking'] (draft=False)
Comparing the Three Formats
You now know three file formats. Each has strengths:
| Feature | Markdown (L1) | JSON (L2) | CSV (L3) |
|---|---|---|---|
| Human readable | Excellent | Good | Good (in spreadsheets) |
| Machine parseable | Fragile | Excellent | Good |
| Type preservation | None | Partial (no datetime) | None (all strings) |
| Spreadsheet compatible | No | No | Yes |
| Nested data (lists) | Manual formatting | Native support | Must flatten |
| Best for | Reading, documentation | Saving/loading program data | Sharing with non-programmers |
PRIMM-AI+ Practice: Predict the Parse
Predict [AI-FREE]
Press Shift+Tab to enter Plan Mode.
What does this code print?
StringIO lets you treat a string as if it were a file (so csv.DictReader can read from it without creating an actual file on disk):
import csv
from io import StringIO
csv_text = "name,age,active\nAlice,30,True\nBob,25,False"
reader = csv.DictReader(StringIO(csv_text))
rows = list(reader)
print(type(rows[0]["age"]))
print(rows[1]["active"])
print(rows[1]["active"] == False)
Write your predictions. Rate your confidence from 1 to 5.
Check your predictions
<class 'str'>
False
False
The tricky part: rows[1]["active"] is the string "False", not the boolean False. And "False" == False is False because a string is never equal to a boolean. This is why type conversion matters.
Run
Press Shift+Tab to exit Plan Mode.
Create csv_practice.py with the code above. Run it and compare to your predictions.
Investigate
If you want to go deeper, run /investigate @csv_practice.py in Claude Code and ask: "What happens if a CSV value contains a comma? How does the csv module handle quoting?"
The answer involves quoting. The csv module handles it automatically, but understanding the mechanism prevents surprises.
Modify
Change save_notes_as_csv to include only non-draft notes. Add a parameter include_drafts: bool = True that controls whether drafts are included. Update the function and test it.
Make [Mastery Gate]
Write a function csv_to_json(csv_path: Path, json_path: Path) -> int that reads a CSV file of notes, converts them to Note objects, and saves them as a JSON file. Return the number of notes converted. In Claude Code, type /tdg to guide you through the cycle:
- Write the stub with types and docstring
- Write 3+ tests (empty file, normal data, verify JSON output is valid)
- Prompt AI to implement
- Run
uv run ruff check,uv run pyright,uv run pytest
Try With AI
If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.
Prompt 1: Handle Encoding Issues
What happens if I try to read a CSV file that was saved
with Windows-1252 encoding but I open it with UTF-8?
Show me the error and how to detect and handle encoding
mismatches.
What you're learning: Encoding mismatches are one of the most common production file I/O bugs. The AI shows you the error pattern and defensive coding techniques.
Prompt 2: Compare csv.reader vs csv.DictReader
Show me the same CSV parsing task using csv.reader (list-based)
and csv.DictReader (dict-based). When is each one better?
What are the tradeoffs?
What you're learning: csv.reader gives positional access (like Markdown parsing). csv.DictReader gives named access (like JSON). You evaluate which fits your use case.
Prompt 3: Test Edge Cases
In Claude Code, type:
/tdg
Use the TDG workflow to write and test validate_csv_notes(file_path: Path) -> list[str] that reads a CSV file and returns a list of validation errors. Check for: missing required fields (title, body), negative word counts, and empty tags. Write tests first, then generate.
What you're learning: Validation functions are essential for data imported from external sources. CSV files can contain any text, including invalid data. Your tests define what "valid" means.
James exports his notebook as CSV and opens it in a spreadsheet application. Three rows, six columns, sortable by any field. He sorts by word count, then filters by the "python" tag.
"In the warehouse, we exported inventory data as CSV every night," he says. "The overnight team loaded it into a spreadsheet to plan the next day's picking routes. Different teams, different tools, same data."
"That is interoperability," Emma says. "Your SmartNotes program stores data in JSON because JSON preserves types. But when someone outside your program needs the data, CSV is the common language."
She pauses. "You have three export formats. Three import formats. Six functions. Right now, each one is standalone. But real data work chains them together: read CSV, filter by tag, export as Markdown. That is a pipeline. And pipelines are what make data processing powerful."