SmartNotes Analytics Capstone

If you're new to programming

This is a timed challenge where you build a complete analytics module for SmartNotes. You choose which Python tools to use for each task: comprehensions for building collections, generators for streaming large data, and functional tools (sorted, lru_cache) for transforming and caching results. No guidance beyond the problem statement.

If you've coded before

25 minutes. Design a NoteAnalytics class with methods for tag cloud, word statistics, author summaries, and reading time reports. Choose the right tool (comprehension, generator, functional) for each method. Write tests first, generate the implementation, verify.

Emma writes a list of questions on the whiteboard:

What tags are used most frequently?

What is the average word count across all notes?

Which author has the most published notes?

How long would it take to read all notes about "python"?

What are the top 5 longest notes?

"Your SmartNotes app can answer all of these," she says. "Build an analytics module. Twenty-five minutes."

The Problem

Build a NoteAnalytics class that answers questions about a collection of notes:

analytics = NoteAnalytics(notes)

# Tag cloud: tag → frequency
print(analytics.tag_cloud())
# {'python': 3, 'debug': 2, 'cooking': 1, ...}

# Word statistics
print(analytics.word_stats())
# {'total': 850, 'average': 85.0, 'longest': 200, 'shortest': 20}

# Author summary
print(analytics.author_summary())
# {'James': {'count': 5, 'total_words': 500}, 'Emma': {'count': 3, ...}}

# Reading time for a subset
print(analytics.reading_time(tag="python"))
# 3.4 (minutes)

# Top N notes by word count
print(analytics.top_notes(n=3))
# [Note(...), Note(...), Note(...)]

Your deliverables:

File	Purpose
`analytics.py`	`NoteAnalytics` class stub with types and docstrings
`test_analytics.py`	10+ tests covering all methods and edge cases
`tdg_journal.md`	Debugging journal documenting your TDG cycle

Start the timer.

Step 1: Specify (5 minutes)

Design the NoteAnalytics class. For each method, decide which tool from this chapter to use. You practiced all of these:

List comprehensions (Lesson 1): for filtering and transforming lists
Dict/set comprehensions (Lesson 2): for building lookup tables and extracting unique values
Generators (Lesson 3): for processing data without building intermediate lists
Key functions and lambda (Lesson 4): sorted(key=) for ranking, lambda for inline key functions
Caching and partial (Lesson 5): lru_cache for caching, partial for specialized functions

Method	Returns	Best tool	Why
`tag_cloud()`	`dict[str, int]`	Dict comprehension or loop	Grouping (one-to-many) needs a loop
`word_stats()`	`dict[str, float]`	Generator expressions with `sum()`, `max()`, `min()`	Aggregation without intermediate lists
`author_summary()`	`dict[str, dict[str, int]]`	Loop with dict building	Nested grouping
`reading_time(tag)`	`float`	Generator + `sum()`	Filter + aggregate in one pass
`top_notes(n)`	`list[Note]`	`sorted()` with key + slice	Sorting needs `sorted(key=)`

Hint: class stub

from dataclasses import dataclass, field


@dataclass
class Note:
    title: str
    body: str
    word_count: int
    author: str = "Anonymous"
    is_draft: bool = True
    tags: list[str] = field(default_factory=list)


class NoteAnalytics:
    """Analytics engine for a collection of notes.

    Provides tag frequency analysis, word statistics,
    author summaries, reading time estimates, and
    note ranking.
    """

    def __init__(self, notes: list[Note]) -> None:
        ...

    def tag_cloud(self) -> dict[str, int]:
        """Return a dict mapping each tag to its frequency across all notes."""
        ...

    def word_stats(self) -> dict[str, float]:
        """Return total, average, longest, and shortest word counts.

        - Returns {'total': 0, 'average': 0.0, 'longest': 0, 'shortest': 0}
          for an empty list
        """
        ...

    def author_summary(self) -> dict[str, dict[str, int]]:
        """Return per-author stats: count and total_words.

        Example: {'James': {'count': 3, 'total_words': 250}}
        """
        ...

    def reading_time(
        self, tag: str | None = None, words_per_minute: int = 250
    ) -> float:
        """Estimate reading time in minutes.

        - If tag is provided, only include notes with that tag
        - Returns 0.0 if no notes match
        """
        ...

    def top_notes(self, n: int = 5) -> list[Note]:
        """Return the top N notes by word count, longest first."""
        ...

Run uv run pyright analytics.py. Fix any type errors before writing tests.

Step 2: Test (7 minutes)

Write at least 10 tests. Think about what each method should do with normal data and edge cases:

tag_cloud: notes with overlapping tags, notes with no tags, empty list
word_stats: normal notes, single note, empty list (all zeros)
author_summary: multiple authors, single author, empty list
reading_time: all notes, filtered by tag, tag with no matches
top_notes: n smaller than list, n larger than list, n=0

Hint: test fixtures

import pytest
from analytics import NoteAnalytics, Note


@pytest.fixture
def sample_notes() -> list[Note]:
    return [
        Note("Python Tips", "Learn basics of coding", 50, "James", False, ["python", "beginner"]),
        Note("Debug Guide", "How to fix Python errors", 120, "James", False, ["python", "debug"]),
        Note("Cooking", "Boil water and add salt", 30, "Emma", False, ["cooking"]),
        Note("Draft Note", "Work in progress", 10, "James", True, ["python"]),
    ]


@pytest.fixture
def analytics(sample_notes: list[Note]) -> NoteAnalytics:
    return NoteAnalytics(sample_notes)


def test_tag_cloud(analytics: NoteAnalytics) -> None:
    cloud = analytics.tag_cloud()
    assert cloud["python"] == 3
    assert cloud["cooking"] == 1


def test_word_stats_empty() -> None:
    analytics = NoteAnalytics([])
    stats = analytics.word_stats()
    assert stats["total"] == 0
    assert stats["average"] == 0.0

Run uv run pytest test_analytics.py -v. Every test should FAIL (RED).

Step 3: Generate (3 minutes)

Open Claude Code and prompt:

Implement all methods in NoteAnalytics in analytics.py
so that every test in test_analytics.py passes.
Do not modify the test file.
Use comprehensions, generators, and sorted(key=) where
appropriate. Do not use for-loop-and-append where a
comprehension would be clearer.

The prompt explicitly requests the tools from this chapter. The AI should use comprehensions, generators, and functional tools in its implementation.

Step 4: Verify (3 minutes)

uv run ruff check analytics.py
uv run pyright analytics.py
uv run pytest test_analytics.py -v

Step 5: Debug (5 minutes)

If tests failed, apply the debugging loop. You can type /debug in Claude Code to walk through each step. Common issues:

tag_cloud returns wrong counts (check if it counts tags per note or per occurrence)
word_stats crashes on empty list (division by zero in average)
reading_time does not filter by tag (check the if tag logic)
top_notes returns all notes instead of top N (missing slice)

Document failures in tdg_journal.md.

Step 6: Read (2 minutes)

After all tests pass, review the generated code:

Which tools did the AI use? Does it use comprehensions, generators, or loops? Does the choice match the table in Step 1?
Would a different tool be better? If the AI used a loop where a comprehension would be clearer, note it.
Test an untested input. Try analytics.reading_time(tag="nonexistent"). Does it return 0.0 or crash?
Check for hardcoded values. Does reading_time use the words_per_minute parameter or hardcode 250?

Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Review Tool Choices

Here is my NoteAnalytics implementation:

[paste analytics.py]

For each method, tell me which data transformation tool
was used (comprehension, generator, loop, sorted, etc.)
and whether it was the best choice. Suggest improvements
where a different tool would be more readable or efficient.

What you're learning: Tool selection is a design skill. The AI evaluates your choices (or the AI's choices) and suggests alternatives. You build judgment for when to use each tool.

Prompt 2: Add a Report Generator

Add a method to NoteAnalytics called generate_report()
that returns a formatted string with all analytics.
Use a generator function that yields lines one at a time.
Show the implementation and explain why a generator is
appropriate for report generation.

What you're learning: Generators are natural for building reports: each section is yielded independently, and the caller decides how to consume them (print, write to file, or stream to a network).

Prompt 3: Rate My Capstone

I completed the NoteAnalytics capstone using TDG. Here
is my approach:

1. Designed class with 5 methods
2. Chose tools: [describe your tool choices]
3. Wrote [N] tests
4. AI implementation: [passed first try / needed fixes]
5. PRIMM review: tested [describe untested input]
6. Tool review: [found appropriate / found improvements]

Rate my Phase 6 mastery. Am I ready for Phase 7?

What you're learning: Phase completion assessment. The AI evaluates your mastery across all Phase 6 skills: file I/O (Ch 62), modules (Ch 63), and data transformation (Ch 64).

James looks at the analytics output. Tag cloud, word stats, author summaries, reading times, top notes. All computed from the same list of notes using different tools.

"Three chapters," he says. "Files, modules, transformations. SmartNotes went from a single function to a package with persistence, organization, and analytics."

Emma pulls up the package structure:

smartnotes/
├── __init__.py
├── __main__.py
├── models.py
├── search.py
├── storage.py
└── analytics.py

"Look at what you built. A package that reads three file formats, organizes code into modules, and transforms data with comprehensions and generators. Every method was tested before it was implemented. Every type was checked by pyright."

James thinks about the journey. "Phase 1 taught me to read Python. Phase 2 taught me to specify. Phase 3 taught me to test. Phase 4 taught me to debug. Phase 5 taught me to design objects. Phase 6 taught me to build real software: files, structure, and efficient data processing."

"And the TDG cycle worked at every scale," Emma says. "From a single function in Phase 2 to a multi-module package in Phase 6. Stub, test, generate, verify. The method does not change. The complexity of what you build does."

She writes on the whiteboard:

SmartNotes has persistence, structure, and analytics. But it is still a library. Users interact with it through Python code. Phase 7 gives it a command-line interface: smartnotes add, smartnotes search, smartnotes export. Then it becomes a tool anyone can use.

James nods. "From library to application."

"From library to product."

The Problem​

Step 1: Specify (5 minutes)​

Step 2: Test (7 minutes)​

Step 3: Generate (3 minutes)​

Step 4: Verify (3 minutes)​

Step 5: Debug (5 minutes)​

Step 6: Read (2 minutes)​

Try With AI​

Prompt 1: Review Tool Choices​

Prompt 2: Add a Report Generator​

Prompt 3: Rate My Capstone​