Skip to main content
Updated Feb 23, 2026

How Kafka Fits: The Mental Model

You understand why events beat direct API calls. You know the difference between events and commands. Now you need a mental model for how Kafka actually works - one you can carry into production debugging sessions and architecture discussions.

This lesson builds that model through a familiar analogy: the newspaper industry. By the end, you'll be able to sketch Kafka's architecture on a whiteboard, explain consumer groups to a teammate, and trace exactly what happens when your agent publishes a "task.created" event.

No code yet. First, the concepts. Then, in Lessons 4-8, you'll deploy and code against a real Kafka cluster.

The Newspaper Analogy

Imagine a major newspaper operation. Every day, the newspaper:

  1. Receives stories from journalists (producers)
  2. Organizes content into sections - Sports, Business, Technology (topics)
  3. Prints copies at multiple facilities (partitions)
  4. Tracks delivery progress so subscribers can resume where they left off (offsets)
  5. Serves different subscriber types - home delivery, office delivery, digital (consumer groups)

Kafka works the same way. Let's map each concept.

Topics: Named Streams of Events

A topic is a named stream of events - like a newspaper section.

NewspaperKafka
Sports sectiontask-events topic
Business sectionuser-events topic
Technology sectionnotification-events topic

When your Task API creates a task, it publishes to the task-events topic. When a user signs up, the auth service publishes to user-events. Each topic is independent - consumers subscribe to the topics they care about.

Key insight: Topics are categories, not destinations. Unlike a traditional message queue where messages go to one consumer, Kafka topics are logs that multiple consumers can read independently.

Topic: task-events

┌─────────────────────────────────────────────────────────────────┐
│ [task.created] [task.updated] [task.completed] [task.created] │
│ ↑ ↑ ↑ ↑ │
│ offset 0 offset 1 offset 2 offset 3 │
└─────────────────────────────────────────────────────────────────┘
Time →

Events are appended to the end. They're never modified or deleted (until retention expires). This append-only log model is what makes Kafka different from traditional queues.

Partitions: Parallelism Units

A topic with millions of events per second can't run on a single machine. Kafka solves this with partitions - independent segments of a topic that can live on different machines.

Think of partitions as newspaper printing facilities in different cities:

Printing FacilityKafka Partition
New York plantPartition 0
Chicago plantPartition 1
Los Angeles plantPartition 2

Each partition:

  • Stores a subset of events from the topic
  • Lives on a specific broker (server)
  • Maintains strict ordering within itself
  • Can be consumed independently by different consumers
Topic: task-events (3 partitions)

Partition 0: [task.created] [task.updated] [task.created]
Partition 1: [task.completed] [task.created] [task.updated]
Partition 2: [task.created] [task.deleted] [task.completed]

Critical concept: Ordering is guaranteed within a partition, but not across partitions. If you need events for a specific task to be processed in order, they must go to the same partition. Kafka uses the message key to determine partition assignment - events with the same key always go to the same partition.

Key: "task-123" → hash → Partition 1
Key: "task-456" → hash → Partition 0
Key: "task-123" → hash → Partition 1 (same key = same partition)

Offsets: Your Bookmark

When you read a book, you use a bookmark. Kafka uses offsets.

An offset is a sequential number assigned to each event within a partition. It tells consumers "where they are" in the stream:

Partition 0:
┌────┬────┬────┬────┬────┬────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ ← Offsets
├────┼────┼────┼────┼────┼────┤
│ E1 │ E2 │ E3 │ E4 │ E5 │ E6 │ ← Events
└────┴────┴────┴────┴────┴────┘

Consumer position: "I've read up to offset 3"

Why offsets matter:

  1. Resume after crash: Consumer restarts at last committed offset
  2. Replay events: Reset offset to 0 to reprocess all events
  3. Skip ahead: Jump to latest offset to ignore old events
  4. Track lag: Difference between latest offset and consumer position shows how far behind you are

Unlike traditional queues that delete messages after delivery, Kafka retains events based on time or size limits (default: 7 days). Multiple consumers can read the same events independently, each tracking their own offset.

Producers and Consumers: Writers and Readers

Producers write events to topics. They're like journalists filing stories:

Producer (Task API) → Topic: task-events
└── Partition 0 (if key hashes to 0)
└── Partition 1 (if key hashes to 1)
└── Partition 2 (if key hashes to 2)

Consumers read events from topics. They're like subscribers reading the newspaper:

Topic: task-events → Consumer (Notification Service)
→ Consumer (Audit Service)
→ Consumer (Analytics Service)

Each consumer maintains its own offset. The notification service might be at offset 1000 while the analytics service is at offset 500 - they're independent.

Consumer Groups: Team Coordination

Here's where Kafka gets powerful. A consumer group is a team of consumers that share the work of reading a topic.

Imagine home delivery for a large city. One delivery person can't cover all routes. So you assign:

  • Route A: Delivery person 1
  • Route B: Delivery person 2
  • Route C: Delivery person 3

Each route is covered by exactly one person. If person 2 calls in sick, their route gets reassigned to someone else.

Kafka consumer groups work identically:

Consumer Group: "notification-service"

Partition 0 → Consumer 1
Partition 1 → Consumer 2
Partition 2 → Consumer 3

The rules:

  1. Each partition goes to exactly one consumer in a group
  2. A consumer can handle multiple partitions (if consumers < partitions)
  3. Extra consumers sit idle (if consumers > partitions)
  4. Different groups read independently (audit-service group has its own offsets)
Topic: task-events (3 partitions)

Consumer Group: "notification-service" Consumer Group: "audit-service"
├── Consumer 1 → Partition 0 ├── Consumer A → Partition 0
├── Consumer 2 → Partition 1 ├── Consumer A → Partition 1
└── Consumer 3 → Partition 2 └── Consumer A → Partition 2
(3 consumers, parallel processing) (1 consumer, all partitions)

Scaling insight: Want more parallelism? Add partitions. Want to process faster? Add consumers (up to partition count). Have 10 partitions but 15 consumers? 5 consumers will be idle.

Brokers: The Printing Presses

A broker is a Kafka server that stores partitions and serves producers/consumers. A Kafka cluster is a group of brokers working together.

Kafka Cluster (KRaft Mode)
┌─────────────────────────────────────────────────────────────┐
│ Controller Nodes (metadata via Raft consensus) │
│ └─ __cluster_metadata topic │
├─────────────────────────────────────────────────────────────┤
│ Broker 1 Broker 2 Broker 3 │
│ ├── task-events P0 ├── task-events P1 ├── task-events P2
│ └── user-events P0 └── user-events P1 └── user-events P2
└─────────────────────────────────────────────────────────────┘

KRaft mode (Kafka Raft): In Kafka 4.0+, cluster metadata is managed by a built-in Raft consensus protocol. No external ZooKeeper needed. This simplifies deployment and reduces operational complexity.

Each partition has:

  • One leader: Handles all reads and writes
  • Zero or more replicas: Copy data for fault tolerance

If a broker dies, another broker's replica becomes the new leader. Producers and consumers automatically reconnect to the new leader.

The Message Journey: End to End

Let's trace what happens when your Task API publishes a "task.created" event:

┌─────────────────────────────────────────────────────────────────────────┐
│ MESSAGE JOURNEY │
└─────────────────────────────────────────────────────────────────────────┘

Step 1: PRODUCE
┌──────────────┐
│ Task API │ ──produce("task-events", key="task-123", value={...})──┐
│ (Producer) │ │
└──────────────┘ │

Step 2: ROUTE TO PARTITION
┌─────────────────────────────────────────────────────────────────────────┐
│ hash("task-123") % 3 = 1 → Partition 1 │
└─────────────────────────────────────────────────────────────────────────┘


Step 3: WRITE TO BROKER
┌─────────────────────────────────────────────────────────────────────────┐
│ Broker 2 (Partition 1 Leader) │
│ ├── Append event to log at offset 47 │
│ ├── Replicate to Broker 1 (replica) │
│ └── Acknowledge to producer: "Offset 47 committed" │
└─────────────────────────────────────────────────────────────────────────┘


Step 4: CONSUMERS POLL
┌──────────────────────────────────────────────────────────────────────────┐
│ Consumer Group: "notification-service" │
│ └── Consumer 2 (assigned to Partition 1) │
│ ├── poll() → receives event at offset 47 │
│ ├── Process: Send email notification │
│ └── commit(offset=47) → "I've processed this" │
│ │
│ Consumer Group: "audit-service" │
│ └── Consumer A (assigned to all partitions) │
│ ├── poll() → receives event at offset 47 │
│ ├── Process: Write to audit log │
│ └── commit(offset=47) → "I've processed this" │
└──────────────────────────────────────────────────────────────────────────┘

What happens at each step:

StepComponentAction
1ProducerSerializes event, determines partition from key
2Kafka ClientHashes key to select partition
3BrokerAppends to partition log, replicates, acknowledges
4ConsumerPolls for new events, processes, commits offset

Key observations:

  • The producer doesn't know which consumers will read the event
  • Multiple consumer groups read the same event independently
  • Each consumer group tracks its own offset per partition
  • If a consumer crashes after processing but before committing, it will reprocess the event on restart (at-least-once delivery)

Why This Model Matters

Understanding Kafka's mental model helps you:

  1. Debug production issues: "Consumer lag is high on partition 2" - you know exactly what that means
  2. Design event schemas: Keys determine partition assignment and ordering
  3. Scale correctly: More partitions = more parallelism, but rebalancing overhead
  4. Choose delivery semantics: Offset commit timing affects exactly-once vs at-least-once
  5. Explain to teammates: The newspaper analogy makes Kafka accessible

In the next lesson, you'll deploy a real Kafka cluster with Strimzi and see these concepts in action.


Reflect on Your Skill

You built a kafka-events skill in Lesson 0. Test and improve it based on what you learned.

Test Your Skill

Using my kafka-events skill, explain how Kafka would handle 10 producers writing task events and 3 consumer groups reading them.
Does my skill correctly explain topics, partitions, consumer groups, and offset management?

Identify Gaps

Ask yourself:

  • Did my skill explain the relationship between topics, partitions, and consumer groups?
  • Did it cover how offsets enable replay and parallel processing?

Improve Your Skill

If you found gaps:

My kafka-events skill is missing Kafka's core mental model (topics, partitions, offsets, consumer groups).
Update it to include when to partition topics and how consumer groups enable parallel processing.

Try With AI

Use your AI companion to reinforce and extend this mental model.

Prompt 1: Test Your Understanding

I just learned about Kafka's architecture. Quiz me on it - ask me to explain:
1. What's the relationship between topics, partitions, and offsets?
2. How do consumer groups enable parallel processing?
3. What happens when a consumer crashes mid-processing?

Challenge my answers and correct any misconceptions.

What you're learning: Active recall strengthens mental models. Your AI partner acts as an expert interviewer, testing edge cases you might not have considered.

Prompt 2: Visualize Your Use Case

I'm building an agent system where:
- A Task API creates tasks
- A Notification Service sends alerts
- An Audit Service logs all changes
- An Analytics Service computes metrics

Help me design the Kafka topology. Ask me:
- How many topics do I need?
- How should I choose partition counts?
- What should my consumer groups look like?
- What keys should I use for ordering guarantees?

What you're learning: Applying abstract concepts to your specific domain. The AI helps you think through trade-offs rather than prescribing a solution.

Prompt 3: Explore Edge Cases

Walk me through what happens in these Kafka failure scenarios:
1. A broker crashes while a producer is sending a message
2. A consumer processes a message but crashes before committing
3. A consumer group rebalances while processing is in progress

For each scenario, explain what Kafka guarantees and what my application
needs to handle.

What you're learning: Failure modes reveal how well you understand the system. Understanding what Kafka guarantees versus what your code must handle is crucial for production reliability.

Safety Note

When exploring Kafka with AI, verify configuration recommendations against official documentation. Default settings differ between development and production, and incorrect settings (like acks=0 for critical data) can cause data loss. Always test failure scenarios in a non-production environment first.