Skip to main content

Why Concurrency?

If you're new to programming

Blocking means your program waits and does nothing while the disk or network works. This lesson shows you the problem and explains why Python needs a special tool to stop wasting time.

If you've coded before

GIL, I/O-bound vs CPU-bound, cooperative vs preemptive scheduling. This lesson establishes the motivation for async/await by profiling sequential I/O and showing why threads alone are insufficient for Python.

James runs smartnotes export --format all on his collection of 100 notes. The command writes Markdown files, then JSON, then CSV. He watches the terminal cursor blink. One second per file, three formats, six seconds total. He runs it again with time:

time smartnotes export --format all

Six seconds. For writing text files. His laptop writes gigabytes per second to the SSD. Something is wrong.

"In the warehouse," James says, "we had six loading docks and one forklift. The forklift picked up a pallet, drove to dock 1, waited for the truck driver to strap it down, then drove back, picked up the next pallet, drove to dock 2, waited again. Six docks, but only one was active at any time. The other five trucks sat idle."

Emma nods. "Your export command is that forklift. It writes one file, waits for the disk to confirm, then starts the next file. The disk is fast. The waiting is slow."


What Is Blocking?

Create a file called sequential.py:

import time


def export_note(note_id: int) -> None:
"""Simulate exporting a single note to disk."""
print(f" Exporting note {note_id}...", end="")
time.sleep(1) # Simulate disk I/O
print(" done.")


def main() -> None:
start: float = time.perf_counter()

for i in range(5):
export_note(i)

elapsed: float = time.perf_counter() - start
print(f"\nTotal time: {elapsed:.2f} seconds")


if __name__ == "__main__":
main()

Run it:

uv run python sequential.py

Output:

  Exporting note 0... done.
Exporting note 1... done.
Exporting note 2... done.
Exporting note 3... done.
Exporting note 4... done.

Total time: 5.01 seconds

Five notes, one second each, five seconds total. Each call to time.sleep(1) blocks the entire program. While note 0 is "writing," notes 1 through 4 sit in the queue doing nothing.

time.sleep() is a stand-in for real disk I/O. When Python writes a file, the program pauses until the operating system confirms the write completed. That pause is blocking I/O.

TermMeaning
BlockingThe program stops and waits for an operation to finish
I/OInput/Output: reading files, writing files, network requests
Blocking I/OThe program stops and waits for a file or network operation

The total time equals the sum of every individual wait. Five one-second operations take five seconds. There is no overlap.


Concurrency vs Parallelism

Two strategies exist for doing multiple things:

ConcurrencyParallelism
WorkersOneMultiple
HowSwitch tasks when one is waitingRun tasks at the same time on different CPUs
AnalogyOne forklift, six docks; drive to the next dock while waiting for strappingSix forklifts, six docks; all loading at once
Python supportasyncio (cooperative switching)multiprocessing (separate processes)
Best forI/O-bound work (file writes, network calls)CPU-bound work (math, image processing)

Concurrency does not require multiple workers. One worker can handle multiple tasks by switching to a different task whenever the current one is waiting. The total wall-clock time drops because waiting periods overlap.

For SmartNotes export, the bottleneck is I/O (waiting for disk writes), not CPU work. One forklift visiting six docks while waiting for strapping is faster than one forklift driving to dock 1 and standing there until the truck leaves.


Python's GIL

Python has a mechanism called the Global Interpreter Lock (GIL). The GIL allows only one thread to execute Python code at a time. This means:

Type of workThreads help?Why
I/O-bound (file writes, HTTP requests)YesThreads release the GIL while waiting for I/O
CPU-bound (math, compression, parsing)NoGIL blocks other threads from running Python code

For SmartNotes, the work is I/O-bound. The GIL does not prevent I/O concurrency. But the GIL does mean that threads are not the best tool for the job in Python. Python provides asyncio: a concurrency model designed specifically for I/O, without the complexity of threads.

This chapter teaches both approaches (threads in Lesson 2, async in Lessons 3 onward) so you understand the tradeoffs. The destination is async/await.


The SmartNotes Export Problem

Here is a simplified version of what smartnotes export --format all does internally:

import time
from pathlib import Path


def export_notes_sequential(
notes: list[str], directory: Path, formats: list[str]
) -> None:
"""Export notes in multiple formats, one at a time."""
for fmt in formats:
for i, note in enumerate(notes):
filename: str = f"note_{i}.{fmt}"
filepath: Path = directory / filename
# Simulate writing the file
time.sleep(0.1)
print(f" Wrote {filepath.name}")


def main() -> None:
notes: list[str] = [f"Note {i}" for i in range(10)]
formats: list[str] = ["md", "json", "csv"]
directory: Path = Path("export_output")

start: float = time.perf_counter()
export_notes_sequential(notes, directory, formats)
elapsed: float = time.perf_counter() - start

print(f"\nExported {len(notes)} notes x {len(formats)} formats")
print(f"Total time: {elapsed:.2f} seconds")
print(f"Files written: {len(notes) * len(formats)}")


if __name__ == "__main__":
main()

Run it:

uv run python export_benchmark.py

Output:

  Wrote note_0.md
Wrote note_1.md
...
Wrote note_9.csv

Exported 10 notes x 3 formats
Total time: 3.02 seconds
Files written: 30

Thirty files at 0.1 seconds each: 3 seconds. Every write blocks the next one. The disk could handle all 30 writes in a fraction of a second if they overlapped.

The question: how do you start the next file write while the previous one is still in progress?


PRIMM-AI+ Practice: Predict the Timing

Predict [AI-FREE]

Press Shift+Tab to enter Plan Mode.

A sequential loop calls time.sleep(0.2) inside a loop of 8 iterations. On paper, predict:

  1. The total execution time
  2. What happens to total time if you change time.sleep(0.2) to time.sleep(0.5)
  3. What happens to total time if you add a 9th iteration

Rate your confidence from 1 to 5.

Check your predictions
  1. 8 iterations x 0.2 seconds = 1.6 seconds (plus tiny overhead)
  2. 8 x 0.5 = 4.0 seconds (total scales linearly with sleep duration)
  3. 9 x 0.2 = 1.8 seconds (total scales linearly with iteration count)

The pattern: total time = iterations x sleep duration. This is the signature of sequential blocking code. Every additional item adds its full wait time.

Run

Press Shift+Tab to exit Plan Mode.

Create a file called timing_test.py:

import time


def process_items(count: int, delay: float) -> float:
"""Process items sequentially and return elapsed time."""
start: float = time.perf_counter()
for i in range(count):
time.sleep(delay)
return time.perf_counter() - start


def main() -> None:
elapsed: float = process_items(8, 0.2)
print(f"8 items x 0.2s = {elapsed:.2f}s")

elapsed = process_items(8, 0.5)
print(f"8 items x 0.5s = {elapsed:.2f}s")

elapsed = process_items(9, 0.2)
print(f"9 items x 0.2s = {elapsed:.2f}s")


if __name__ == "__main__":
main()

Run it and compare to your predictions.

Investigate

Run /investigate @timing_test.py in Claude Code and ask: "Why does total time scale linearly? What would a non-blocking version look like? What is the theoretical minimum time for 8 items that each need 0.2 seconds of I/O?"

The theoretical minimum is 0.2 seconds: if all 8 I/O operations overlap, total time equals the duration of one operation.

Modify

Change process_items to accept a list of different delay values (delays: list[float]) instead of a uniform delay. Predict the total time for [0.1, 0.3, 0.2, 0.5] before running.

Make [Mastery Gate]

Write a function time_sequential(tasks: list[float]) -> float that takes a list of simulated I/O durations, processes them sequentially using time.sleep, and returns the total elapsed time. Use /tdg in Claude Code:

  1. Write the stub with types and docstring
  2. Write 3+ tests: empty list returns near-zero, single item returns approximately that item's duration, multiple items return approximately the sum
  3. Generate the implementation
  4. Verify with uv run ruff check, uv run pyright, uv run pytest

Try With AI

Opening Claude Code

If Claude Code is not already running, open your terminal, navigate to your SmartNotes project folder, and type claude. If you need a refresher, Chapter 44 covers the setup.

Prompt 1: Profile Real File I/O

I have a script that writes 30 files using time.sleep(0.1)
to simulate I/O. Replace time.sleep with actual file writes
using pathlib.Path.write_text. Time the real version and
compare to the simulated version.

How different are the timings? What does this tell me about
when I/O blocking actually matters?

What you're learning: Simulated delays are useful for learning, but real I/O timings vary. Small local file writes may complete in microseconds, while network I/O takes milliseconds. Understanding when blocking matters helps you decide when concurrency is worth the effort.

Prompt 2: Identify Blocking Operations

Here is my SmartNotes export function:

[paste export_notes_sequential from this lesson]

Identify every blocking operation in this code. For each one,
explain what the program is waiting for and estimate how much
time it wastes. Then suggest which of these operations could
overlap with each other.

What you're learning: The first step in optimization is identifying where time is wasted. You are building a diagnostic skill: read sequential code, spot the blocking calls, estimate the cost. This analysis is what you will use to decide which operations to make concurrent.

Prompt 3: Real-World Concurrency Scenarios

I understand that sequential I/O wastes time because each
operation waits for the previous one. Give me 3 real-world
Python scenarios where concurrency helps and 2 where it does
not help. For each, explain whether concurrency or parallelism
is the right tool.

What you're learning: Concurrency is not always the answer. CPU-bound work (image processing, encryption) needs parallelism, not concurrency. Short I/O operations may not justify the complexity. The AI helps you build a mental classifier for "is concurrency worth it here?"


James looks at the timing output. "The export takes 3 seconds. Each file write is 0.1 seconds. Thirty writes. 30 times 0.1 is 3. Every file waits for the one before it."

"That is the problem," Emma says. "Now translate it back to the warehouse."

James thinks for a moment. "One forklift. Six docks. The forklift drives to dock 1, loads the pallet, waits for the driver to secure it, drives back, picks up the next pallet. If the forklift could drop off a pallet at dock 1, immediately drive to dock 2 with the next pallet, and come back to dock 1 only when the driver signals 'ready for more,' the same forklift handles all six docks in the time it takes to secure one pallet."

"That is concurrency," Emma says. "One forklift, but it never sits idle."

"So Python has a way to do that?"

"Two ways, actually. Threads are the obvious one. The standard library includes a threading module. And there is a newer approach called async/await that was designed specifically for I/O."

"Which one should I use?"

Emma pauses. "Honestly? I am not sure I can give you a blanket answer. Threads are older and more widely documented. Async is cleaner for I/O but the syntax takes getting used to. Lesson 2 shows you threads. Lesson 3 shows you async. By the end, you will have your own answer."