Tabular Data: CSV

If you're new to programming

A CSV file (Comma-Separated Values) is a plain text file where each line is a row and values are separated by commas. Every spreadsheet application (Excel, Google Sheets) can open CSV files. This lesson teaches you to export SmartNotes as a spreadsheet and import notes from CSV data.

If you've coded before

Python's csv module handles reading/writing with DictReader/DictWriter for named-column access. The main challenge is type conversion: CSV stores everything as strings, so word_count arrives as "6" not 6, and tags arrives as "beginner, python" not ["beginner", "python"].

James shows Emma his notebook. "Markdown for reading, JSON for saving. But my manager at the warehouse used spreadsheets for everything. Inventory reports, shift schedules, budget tracking. If I could export notes as a spreadsheet, anyone could sort and filter them."

Emma nods. "CSV. The universal spreadsheet format. Every tool reads it: Excel, Google Sheets, LibreOffice, even command-line utilities. The tradeoff is that CSV stores everything as plain text. Your word_count of 6 becomes the string "6". Your tags list becomes a single string. You handle the conversions."

What CSV Looks Like

A CSV file is a text file with one row per line and commas between values:

title,body,word_count,author,is_draft,tags
Python Tips,Learn the basics of Python programming.,6,James,True,"beginner, python"
Debugging,Fix errors quickly.,3,James,True,debug
Cooking Pasta,Boil water and add salt.,6,Emma,False,cooking

The first line is the header row. It names each column. Every subsequent line is a data row. Values that contain commas are wrapped in quotes.

Open any spreadsheet application and paste this text. You will see a clean table with columns and rows.

Reading CSV with `csv.DictReader`

csv.DictReader reads each row as a dictionary, using the header row as keys:

import csv
from pathlib import Path

path = Path("notes.csv")

# First, create a sample CSV file
path.write_text(
    "title,body,word_count,author,is_draft,tags\n"
    "Python Tips,Learn basics,2,James,True,\"beginner, python\"\n"
    "Debugging,Fix errors,2,James,True,debug\n"
)

# Now read it
with open(path, "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row)

Output:

{'title': 'Python Tips', 'body': 'Learn basics', 'word_count': '2', 'author': 'James', 'is_draft': 'True', 'tags': 'beginner, python'}
{'title': 'Debugging', 'body': 'Fix errors', 'word_count': '2', 'author': 'James', 'is_draft': 'True', 'tags': 'debug'}

Notice that word_count is '2' (a string), not 2 (an integer). And is_draft is 'True' (a string), not True (a boolean). CSV does not distinguish between types. Everything is text.

The encoding="utf-8" parameter tells Python how to decode the file's bytes into characters. Encoding is the rule for converting characters (like A, é, or 中) into the bytes stored on disk. UTF-8 is the standard encoding for modern text files and supports every language. If you open a file without specifying the encoding, Python uses your system's default, which may not be UTF-8 on Windows.

The Type Conversion Challenge

To create Note objects from CSV rows, you must convert each value to the correct type. The conversion uses a few techniques worth explaining first:

Technique	What it does	Example
`int(text)`	Converts a string to an integer	`int("6")` → `6`
`text.strip().lower()`	Removes whitespace, then lowercases	`" True ".strip().lower()` → `"true"`
`text.split(",")`	Splits a string into a list at each comma	`"a, b".split(",")` → `["a", " b"]`

from dataclasses import dataclass, field

@dataclass
class Note:
    title: str
    body: str
    word_count: int
    author: str = "Anonymous"
    is_draft: bool = True
    tags: list[str] = field(default_factory=list)


def note_from_csv_row(row: dict[str, str]) -> Note:
    """Convert a CSV row dictionary to a Note.

    Handles type conversions:
    - word_count: str → int
    - is_draft: str → bool
    - tags: comma-separated str → list[str]
    """
    # Convert the comma-separated tags string into a list
    tags: list[str] = []
    for tag in row["tags"].split(","):
        cleaned = tag.strip()
        if cleaned:
            tags.append(cleaned)

    # Convert "True"/"False" string to a real boolean
    is_draft = row["is_draft"].strip().lower() == "true"

    return Note(
        title=row["title"],
        body=row["body"],
        word_count=int(row["word_count"]),
        author=row.get("author", "Anonymous"),
        is_draft=is_draft,
        tags=tags,
    )

Test it:

row = {
    "title": "Python Tips",
    "body": "Learn basics",
    "word_count": "6",
    "author": "James",
    "is_draft": "True",
    "tags": "beginner, python",
}

note = note_from_csv_row(row)
print(note)
print(f"word_count type: {type(note.word_count)}")
print(f"is_draft type: {type(note.is_draft)}")
print(f"tags type: {type(note.tags)}")

Output:

Note(title='Python Tips', body='Learn basics', word_count=6, author='James', is_draft=True, tags=['beginner', 'python'])
word_count type: <class 'int'>
is_draft type: <class 'bool'>
tags type: <class 'list'>

The string "6" became the integer 6. The string "True" became the boolean True. The string "beginner, python" became the list ["beginner", "python"]. Every field has its correct type.

Writing CSV with `csv.DictWriter`

csv.DictWriter writes dictionaries to CSV, using a list of field names as the header:

import csv
from pathlib import Path
from dataclasses import asdict


def save_notes_as_csv(notes: list[Note], file_path: Path) -> None:
    """Save notes to a CSV file.

    - Tags are stored as comma-separated strings
    - Creates parent directories if needed
    """
    file_path.parent.mkdir(parents=True, exist_ok=True)
    fieldnames = ["title", "body", "word_count", "author", "is_draft", "tags"]

    with open(file_path, "w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for note in notes:
            row = asdict(note)
            row["tags"] = ", ".join(row["tags"])
            writer.writerow(row)

The newline="" parameter prevents double-spaced output on Windows. Without it, each row gets an extra blank line.

Test it:

notes = [
    Note("Python Tips", "Learn basics", 2, "James", tags=["beginner", "python"]),
    Note("Debugging", "Fix errors", 2, "James", tags=["debug"]),
    Note("Cooking Pasta", "Boil water", 2, "Emma", False, tags=["cooking"]),
]

save_path = Path("data") / "notes.csv"
save_notes_as_csv(notes, save_path)
print(save_path.read_text())

Output:

title,body,word_count,author,is_draft,tags
Python Tips,Learn basics,2,James,True,"beginner, python"
Debugging,Fix errors,2,James,True,debug
Cooking Pasta,Boil water,2,Emma,False,cooking

Open this file in a spreadsheet application. You will see a clean table with headers. You can sort by word count, filter by author, or search for tags.

Loading Notes from CSV

Combine DictReader with the type conversion function:

def load_notes_from_csv(file_path: Path) -> list[Note]:
    """Load notes from a CSV file.

    - Returns an empty list if the file does not exist
    - Converts types from CSV strings to Python types
    """
    if not file_path.exists():
        return []

    notes: list[Note] = []
    with open(file_path, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            notes.append(note_from_csv_row(row))
    return notes

Verify the round trip:

loaded = load_notes_from_csv(Path("data") / "notes.csv")
print(f"Loaded {len(loaded)} notes")
for note in loaded:
    print(f"  {note.title}: {note.tags} (draft={note.is_draft})")

Output:

Loaded 3 notes
  Python Tips: ['beginner', 'python'] (draft=True)
  Debugging: ['debug'] (draft=True)
  Cooking Pasta: ['cooking'] (draft=False)

Comparing the Three Formats

You now know three file formats. Each has strengths:

Feature	Markdown (L1)	JSON (L2)	CSV (L3)
Human readable	Excellent	Good	Good (in spreadsheets)
Machine parseable	Fragile	Excellent	Good
Type preservation	None	Partial (no datetime)	None (all strings)
Spreadsheet compatible	No	No	Yes
Nested data (lists)	Manual formatting	Native support	Must flatten
Best for	Reading, documentation	Saving/loading program data	Sharing with non-programmers

PRIMM-AI+ Practice: Predict the Parse

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode.

What does this code print?

StringIO lets you treat a string as if it were a file (so csv.DictReader can read from it without creating an actual file on disk):

import csv
from io import StringIO

csv_text = "name,age,active\nAlice,30,True\nBob,25,False"
reader = csv.DictReader(StringIO(csv_text))
rows = list(reader)
print(type(rows[0]["age"]))
print(rows[1]["active"])
print(rows[1]["active"] == False)

Write your predictions. Rate your confidence from 1 to 5.

Check your predictions

<class 'str'>
False
False

The tricky part: rows[1]["active"] is the string "False", not the boolean False. And "False" == False is False because a string is never equal to a boolean. This is why type conversion matters.

Run

Press Shift+Tab to exit Plan Mode.

Create csv_practice.py with the code above. Run it and compare to your predictions.

Investigate

If you want to go deeper, run /investigate @csv_practice.py in Claude Code and ask: "What happens if a CSV value contains a comma? How does the csv module handle quoting?"

The answer involves quoting. The csv module handles it automatically, but understanding the mechanism prevents surprises.

Modify

Change save_notes_as_csv to include only non-draft notes. Add a parameter include_drafts: bool = True that controls whether drafts are included. Update the function and test it.

Make [Mastery Gate]

Write a function csv_to_json(csv_path: Path, json_path: Path) -> int that reads a CSV file of notes, converts them to Note objects, and saves them as a JSON file. Return the number of notes converted. In Claude Code, type /tdg to guide you through the cycle:

Write the stub with types and docstring
Write 3+ tests (empty file, normal data, verify JSON output is valid)
Prompt AI to implement
Run uv run ruff check, uv run pyright, uv run pytest

Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Handle Encoding Issues

What happens if I try to read a CSV file that was saved
with Windows-1252 encoding but I open it with UTF-8?
Show me the error and how to detect and handle encoding
mismatches.

What you're learning: Encoding mismatches are one of the most common production file I/O bugs. The AI shows you the error pattern and defensive coding techniques.

Prompt 2: Compare csv.reader vs csv.DictReader

Show me the same CSV parsing task using csv.reader (list-based)
and csv.DictReader (dict-based). When is each one better?
What are the tradeoffs?

What you're learning: csv.reader gives positional access (like Markdown parsing). csv.DictReader gives named access (like JSON). You evaluate which fits your use case.

Prompt 3: Test Edge Cases

In Claude Code, type:

/tdg

Use the TDG workflow to write and test validate_csv_notes(file_path: Path) -> list[str] that reads a CSV file and returns a list of validation errors. Check for: missing required fields (title, body), negative word counts, and empty tags. Write tests first, then generate.

What you're learning: Validation functions are essential for data imported from external sources. CSV files can contain any text, including invalid data. Your tests define what "valid" means.

James exports his notebook as CSV and opens it in a spreadsheet application. Three rows, six columns, sortable by any field. He sorts by word count, then filters by the "python" tag.

"In the warehouse, we exported inventory data as CSV every night," he says. "The overnight team loaded it into a spreadsheet to plan the next day's picking routes. Different teams, different tools, same data."

"That is interoperability," Emma says. "Your SmartNotes program stores data in JSON because JSON preserves types. But when someone outside your program needs the data, CSV is the common language."

She pauses. "You have three export formats. Three import formats. Six functions. Right now, each one is standalone. But real data work chains them together: read CSV, filter by tag, export as Markdown. That is a pipeline. And pipelines are what make data processing powerful."

What CSV Looks Like​

Reading CSV with csv.DictReader​

The Type Conversion Challenge​

Writing CSV with csv.DictWriter​

Loading Notes from CSV​

Comparing the Three Formats​

PRIMM-AI+ Practice: Predict the Parse​

Predict [AI-FREE]​

Run​

Investigate​

Modify​

Make [Mastery Gate]​

Try With AI​

Prompt 1: Handle Encoding Issues​

Prompt 2: Compare csv.reader vs csv.DictReader​

Prompt 3: Test Edge Cases​

What CSV Looks Like

Reading CSV with `csv.DictReader`

The Type Conversion Challenge

Writing CSV with `csv.DictWriter`

Loading Notes from CSV

Comparing the Three Formats

PRIMM-AI+ Practice: Predict the Parse

Predict [AI-FREE]

Run

Investigate

Modify

Make [Mastery Gate]

Try With AI

Prompt 1: Handle Encoding Issues

Prompt 2: Compare csv.reader vs csv.DictReader

Prompt 3: Test Edge Cases