Updated Feb 23, 2026

Chapter 81: Pipecat - Lesson Plan

Generated by: chapter-planner v2.0.0 (Reasoning-Activated) Source: Part 11 README, Deep Search Report, Chapter 80 context Created: 2026-01-01 Constitution: v6.0.0 (Reasoning Mode)

I. Chapter Analysis

Chapter Type

TECHNICAL (SKILL-FIRST L00 Pattern) - This chapter uses the Skill-First Learning pattern where students build the pipecat skill FIRST from official documentation, then learn the framework by improving that skill across subsequent lessons.

Recognition signals:

Learning objectives use "implement/create/build"
Code examples required for every lesson
Skill artifact created in Lesson 0
Subsequent lessons TEST and IMPROVE the skill
Follows L00 pattern established in Parts 5-7 and Chapter 80

Concept Density Analysis

Core Concepts (from Deep Search + Part 11 README): 9 concepts

LEARNING-SPEC.md (skill specification before building)
Frame-based pipeline architecture (core abstraction)
Frame types: AudioRawFrame, TextFrame, EndFrame, control signals
Processors: Transformations on frame streams
Pipelines: Composition of processors
Transport abstraction (Daily WebRTC, FastAPI WebSocket, local audio)
Provider plugins (40+ integrations: STT, LLM, TTS providers)
Speech-to-Speech integration (OpenAI Realtime, Gemini Live, Nova Sonic)
Custom processor implementation

Complexity Assessment: Standard (framework with clear abstractions, modular design)

Proficiency Tier: B1-B2 (Part 11 requires Parts 6, 7, 9, 10 completed; students have production experience)

Justified Lesson Count: 3 lessons

Lesson 0: Build Your Pipecat Skill (L00 pattern)
Lesson 1: Frame-Based Pipeline Architecture (Layer 2: AI Collaboration)
Lesson 2: Multi-Provider Integration & Custom Processors (Layer 2 + Layer 3)

Reasoning:

9 concepts across 3 lessons = 3 concepts per lesson
B1-B2 limit is 10 concepts per lesson - well within limit
Pipecat is conceptually simpler than LiveKit (modular vs distributed)
Chapter 80 covered similar voice AI fundamentals, reducing cognitive load
3 lessons sufficient for skill-first framework mastery

II. Success Evals (from Part 11 README + Chapter 81 Description)

Success Criteria (what students must achieve):

Skill Creation: Students build a working pipecat skill from official documentation using /fetching-library-docs
Frame Understanding: Students can explain frames as the data unit flowing through pipelines
Pipeline Architecture: Students implement voice agents with processor composition
Transport Flexibility: Students configure different transports (Daily, WebSocket, local)
Provider Integration: Students connect multiple STT/LLM/TTS providers via plugins
S2S Integration: Students use OpenAI Realtime, Gemini Live, or Nova Sonic through Pipecat
Custom Processors: Students implement custom processors for domain-specific transformations

All lessons below map to these evals.

III. Lesson Sequence

Lesson 0: Build Your Pipecat Skill

Title: Build Your Pipecat Skill

Learning Objectives:

Write a LEARNING-SPEC.md that defines what you want to learn about Pipecat
Fetch official Pipecat documentation using /fetching-library-docs
Create a pipecat skill grounded in official documentation (not AI memory)
Verify the skill works by building a minimal voice agent pipeline

Stage: Layer 1 (Manual Foundation) + Layer 2 (AI Collaboration via /skill-creator)

CEFR Proficiency: B1

New Concepts (count: 2):

LEARNING-SPEC.md (specification before skill creation)
Documentation-grounded skill creation

Cognitive Load Validation: 2 concepts <= 10 limit (B1) -> WITHIN LIMIT

Maps to Evals: #1 (Skill Creation)

Key Sections:

Clone the Skills Lab Fresh (~3 min)
- Why fresh clone: No state assumptions from previous work
- Command: git clone [skills-lab-repo] && cd skills-lab
- Verify clean environment

Write Your LEARNING-SPEC.md (~7 min)

What is LEARNING-SPEC.md: Your specification for what you want to learn

Template structure:

# Learning Specification: Pipecat

## What I Want to Learn
- How Pipecat's frame-based pipeline works
- How to compose processors into voice agents
- How to integrate multiple AI providers
- How to use S2S models through Pipecat

## Why This Matters
- Pipecat has 40+ provider integrations
- Frame-based design enables custom transformations
- Transport-agnostic means flexible deployment

## Success Criteria
- [ ] Skill can scaffold a basic voice pipeline
- [ ] Skill explains frame types and flow
- [ ] Skill guides multi-provider configuration
- [ ] Skill includes custom processor patterns

Write YOUR specification (not a copy)

Fetch Official Documentation (~5 min)
- Use /fetching-library-docs to get Pipecat docs
- Why official docs: AI memory is unreliable for API details
- What to look for: Frame architecture, processors, transports, plugins
- Save relevant excerpts for skill creation
Create Your Skill with /skill-creator (~10 min)
- Invoke /skill-creator with your LEARNING-SPEC.md and fetched docs
- Skill structure: Persona + Questions + Principles
- Review generated skill for accuracy
- Commit to .claude/skills/pipecat/SKILL.md
Verify Your Skill Works (~5 min)
- Test: "Create a minimal Pipecat voice pipeline"
- Verify generated code matches official patterns
- If issues found: Improve skill and re-test
- Skill is now your knowledge artifact

Duration Estimate: 30 minutes

File Output: .claude/skills/pipecat/SKILL.md

Prerequisites:

Part 10 completed (chat interfaces)
Chapter 79 completed (voice AI fundamentals)
Chapter 80 recommended (LiveKit comparison context)

Try With AI Prompts:

Draft Your LEARNING-SPEC.md

I'm about to learn Pipecat, a frame-based voice AI framework with
40+ provider integrations. Help me write a LEARNING-SPEC.md:

My context:
- I just learned LiveKit Agents in Chapter 80
- I want to understand Pipecat's different approach (frames vs jobs)
- My goal is flexibility in provider selection for my Digital FTEs

Help me define:
1. What specific aspects of Pipecat should I focus on?
2. How does it differ from LiveKit (what's unique)?
3. What success criteria would prove I've learned it?

Make this MY specification, not a generic template.

What you're learning: Comparative specification - defining learning goals relative to what you already know.

Analyze the Official Docs

I fetched Pipecat documentation. Here are the key sections:
[paste relevant excerpts from /fetching-library-docs output]

Help me understand:
1. What are frames? How do they flow through pipelines?
2. What patterns appear repeatedly in the examples?
3. What makes Pipecat different from LiveKit's approach?
4. What should my skill definitely include?

I want to build a GROUNDED skill, not one based on assumptions.

What you're learning: Documentation analysis - extracting unique patterns from primary sources.

Review Your Generated Skill

Here's the skill /skill-creator generated:
[paste SKILL.md content]

Compare this to the official documentation:
1. Does it accurately represent Pipecat's frame architecture?
2. Are there any claims not supported by the docs?
3. How does it compare to my livekit-agents skill?
4. What should be removed as incorrect or speculative?

Help me make this skill ACCURATE and DISTINCT from LiveKit.

What you're learning: Validation - ensuring AI-generated content matches authoritative sources.

Lesson 1: Frame-Based Pipeline Architecture

Title: Frame-Based Pipeline Architecture

Learning Objectives:

Explain frames as the fundamental data unit in Pipecat pipelines
Distinguish frame types: AudioRawFrame, TextFrame, EndFrame, control signals
Implement processors that transform frame streams
Compose processors into complete voice pipelines
Configure different transports (Daily WebRTC, WebSocket, local)

Stage: Layer 2 (AI Collaboration) - Use skill to build, improve skill based on learnings

CEFR Proficiency: B1

New Concepts (count: 4):

Frame-based architecture (core abstraction)
Frame types and their purposes
Processors and pipelines
Transport abstraction

Cognitive Load Validation: 4 concepts <= 10 limit (B1) -> WITHIN LIMIT

Maps to Evals: #2 (Frame Understanding), #3 (Pipeline Architecture), #4 (Transport Flexibility)

Key Sections:

The Frame Abstraction (~7 min)
- What is a frame: The data unit flowing through pipelines
- Why frames: Uniform interface for diverse data types
- Frame lifecycle: Creation, transformation, consumption
- Comparison to LiveKit: Jobs vs Frames mental model
- Diagram: Frame flow through pipeline
Frame Types (~8 min)
- AudioRawFrame: Raw audio data (samples, sample rate, channels)
- TextFrame: Transcribed text or LLM responses
- EndFrame: Signals end of conversation/stream
- Control Frames: StartInterruptionFrame, StopInterruptionFrame
- System Frames: Lifecycle management, pipeline control
- Code: Working with different frame types
Processors: The Building Blocks (~10 min)
- What processors do: Transform input frames to output frames
- Processor interface: async process_frame method
- Examples: STT processor (Audio -> Text), LLM processor (Text -> Text)
- Chaining: Output of one becomes input of next
- Code: Implementing a basic processor
Pipelines: Composing Processors (~8 min)
- Pipeline construction: List of processors in order
- Frame routing: How frames flow through
- Parallel pipelines: Multiple processing paths
- Error handling: What happens when processor fails
- Code: Building a complete voice pipeline
Transport Abstraction (~7 min)
- Transport role: Audio I/O to/from the outside world
- Daily WebRTC: Browser-based realtime communication
- FastAPI WebSocket: Custom backend integration
- Local Audio: Microphone/speaker for testing
- Code: Configuring different transports
Improve Your Skill (~5 min)
- Reflect: What frame patterns did you learn?
- Update .claude/skills/pipecat/SKILL.md
- Add: Frame type guidance, processor patterns
- Test: Does improved skill generate better code?

Duration Estimate: 45 minutes

Three Roles Integration (Layer 2):

AI as Teacher:

Skill explains frame lifecycle patterns you didn't know
"Control frames propagate immediately, bypassing queued frames"

AI as Student:

You refine skill's transport explanation based on your deployment needs
"Add WebSocket transport pattern for my Next.js frontend"

AI as Co-Worker:

Iterate on pipeline composition together
First attempt misses error handling -> AI suggests try/except pattern -> you validate

Try With AI Prompts:

Understand the Frame Abstraction

I'm learning Pipecat's frame-based architecture. Coming from LiveKit's
job-based model, help me understand:

1. What's a frame? How is it different from a LiveKit job?
2. What frame types exist and when do I use each?
3. How do frames flow through a pipeline?
4. What happens when a frame reaches the end?

Use diagrams or pseudocode to clarify the flow.

What you're learning: Mental model translation - mapping new concepts to familiar ones.

Build a Processor Chain

Help me build a complete voice pipeline using my pipecat skill:

Requirements:
- Transport: Daily WebRTC (for browser testing)
- STT: Deepgram Nova-3
- LLM: GPT-4o-mini
- TTS: Cartesia Sonic

Walk me through each processor and how frames flow between them.
After we build it, I'll test and report what works.

What you're learning: Processor composition - building systems from modular components.

Compare Transport Options

I need to choose the right transport for my use case:

Scenario A: Browser-based voice agent for customer support
Scenario B: CLI tool for voice interaction during development
Scenario C: WebSocket integration with existing FastAPI backend

Use my pipecat skill to recommend transports for each and explain
the tradeoffs. I'll implement one and report back.

What you're learning: Transport selection - matching infrastructure to requirements.

Lesson 2: Multi-Provider Integration & Custom Processors

Title: Multi-Provider Integration & Custom Processors

Learning Objectives:

Configure multiple STT/LLM/TTS providers via Pipecat's plugin system
Integrate speech-to-speech models (OpenAI Realtime, Gemini Live, Nova Sonic)
Implement custom processors for domain-specific transformations
Finalize pipecat skill for production use

Stage: Layer 2 (AI Collaboration) + Layer 3 (Intelligence Design)

CEFR Proficiency: B1-B2

New Concepts (count: 3):

Provider plugins (40+ integrations)
S2S model integration
Custom processor implementation

Cognitive Load Validation: 3 concepts <= 10 limit (B1-B2) -> WITHIN LIMIT

Maps to Evals: #5 (Provider Integration), #6 (S2S Integration), #7 (Custom Processors)

Key Sections:

The Plugin Ecosystem (~8 min)
- Pipecat's 40+ provider integrations
- Plugin categories: STT, LLM, TTS, Transport, Vision
- Installation: pip install pipecat-ai[provider]
- Provider comparison: Latency, cost, quality tradeoffs
- Table: Key providers and their strengths
Swapping Providers (~10 min)
- The modular advantage: Change one processor, keep the pipeline
- STT providers: Deepgram, Whisper, AssemblyAI, Gladia
- LLM providers: OpenAI, Anthropic, Google, Together, local
- TTS providers: Cartesia, ElevenLabs, Azure, Deepgram Aura
- Code: Switching from Deepgram to Whisper with one line
Speech-to-Speech Integration (~12 min)
- Why S2S: Native voice understanding + generation
- OpenAI Realtime: via RTVIProcessor
- Gemini Live: via GeminiMultimodalLive
- AWS Nova Sonic: via Nova plugin
- When to use S2S vs cascaded pipeline
- Code: Configuring OpenAI Realtime through Pipecat
Custom Processors (~10 min)
- When to customize: Domain-specific transformations
- Processor base class: FrameProcessor
- Example: Sentiment analysis processor (Text -> Emotion + Text)
- Example: Translation processor (Text -> Translated Text)
- Example: Content filter (blocks inappropriate content)
- Code: Implementing a custom processor
Finalize Your Skill (~5 min)
- Complete skill review: Does it cover all learnings?
- Add: Provider selection guidance
- Add: Custom processor patterns
- Final test: Use skill to scaffold a multi-provider voice system
- Commit: Production-ready skill artifact

Duration Estimate: 45 minutes

Three Roles Integration (Layer 2 + Layer 3):

AI as Teacher:

Skill guides provider selection for your use case
"For realtime transcription, Deepgram Nova-3 has 90ms latency vs Whisper's 300ms"

AI as Student:

You teach skill your domain constraints
"I need HIPAA compliance - add provider filtering"

AI as Co-Worker:

Design custom processor together
You specify transformation logic -> AI generates code -> you validate behavior

Skill Finalization: At lesson end, students have a production-ready pipecat skill that:

Scaffolds voice pipelines with frame-based architecture
Guides provider selection across 40+ integrations
Supports S2S model configuration
Includes custom processor patterns

Try With AI Prompts:

Choose the Right Providers

I need to build a voice agent with these constraints:

- Latency: Under 500ms total response time
- Cost: Under $0.05 per minute
- Quality: Natural-sounding voice, accurate transcription
- Region: Must support EU data residency

Use my pipecat skill to recommend:
1. Which STT provider?
2. Which LLM provider?
3. Which TTS provider?

Explain the tradeoffs and alternatives for each.

What you're learning: Provider selection - balancing latency, cost, quality, compliance.

Integrate Speech-to-Speech

I want to try OpenAI's Realtime API through Pipecat instead of
building my own pipeline. Help me:

1. Configure RTVIProcessor for OpenAI Realtime
2. Understand what I lose vs the cascaded approach
3. Understand what I gain (latency, naturalness)
4. Set up function calling through the S2S model

Use my pipecat skill. I'll test and report latency numbers.

What you're learning: S2S integration - using native voice models through framework abstraction.

Build a Custom Processor

I need a custom processor that:

1. Receives TextFrame from STT
2. Detects if user is asking about sensitive topics (medical, legal)
3. If sensitive: Adds disclaimer frame before LLM response
4. If not sensitive: Passes through unchanged

Help me implement this using my pipecat skill. Walk through:
- Processor class structure
- Frame handling logic
- Testing approach

What you're learning: Custom processor implementation - extending Pipecat for domain needs.

IV. Skill Dependency Graph

Skill Dependencies:

Lesson 0: Build Skill (foundation)
    |
Lesson 1: Frame Architecture (requires skill)
    |
Lesson 2: Multi-Provider + Custom Processors (requires frame understanding)

Cross-Chapter Dependencies:

Requires: Chapter 79 (Voice AI Fundamentals) - architecture mental models
Requires: Chapter 80 (LiveKit Agents) - comparison context, voice pipeline understanding
Prepares for: Chapter 82 (OpenAI Realtime API) - direct API access after framework abstraction
Prepares for: Chapter 85 (Capstone) - production voice agent

V. Assessment Plan

Formative Assessments (During Lessons)

Lesson 0: Skill generation verification (skill works, matches docs)
Lesson 1: Pipeline code review (correct frame flow)
Lesson 2: Provider swap demonstration (change provider without breaking pipeline)

Summative Assessment (End of Chapter)

Chapter 81 Quiz:

Frame Architecture: Explain how frames flow through processors
Frame Types: When to use AudioRawFrame vs TextFrame vs EndFrame
Transports: Compare Daily vs WebSocket vs Local transports
Providers: How to swap STT provider without changing pipeline
Custom Processors: When and how to implement custom transformations

Practical Assessment:

Build a voice pipeline that uses two different provider combinations
Implement a custom processor for your domain
Demonstrate transport flexibility (run same pipeline on different transports)

VI. Validation Checklist

Chapter-Level Validation:

Chapter type identified: TECHNICAL (SKILL-FIRST L00 Pattern)
Concept density analysis documented: 9 concepts across 3 lessons
Lesson count justified: 3 lessons (~3 concepts each, within B1-B2 limit)
All evals covered by lessons
All lessons map to at least one eval

Stage Progression Validation:

Lesson 0: Layer 1 + Layer 2 (skill creation with AI collaboration)
Lesson 1: Layer 2 (AI collaboration, skill improvement)
Lesson 2: Layer 2 + Layer 3 (provider integration, custom processors)
No premature spec-driven content (that's Chapter 85 Capstone)

Cognitive Load Validation:

Lesson 0: 2 concepts <= 10 (B1 limit) PASS
Lesson 1: 4 concepts <= 10 (B1 limit) PASS
Lesson 2: 3 concepts <= 10 (B1-B2 limit) PASS

L00 Pattern Requirements:

Lesson 0 creates skill from official documentation
Fresh clone of skills-lab (no state assumptions)
LEARNING-SPEC.md written before skill creation
/fetching-library-docs used for documentation
Skill tested and verified before proceeding
Each subsequent lesson TESTS and IMPROVES the skill
"Improve Your Skill" section in each lesson

Three Roles Validation (Layer 2 lessons):

Each Layer 2 lesson demonstrates AI as Teacher
Each Layer 2 lesson demonstrates AI as Student
Each Layer 2 lesson demonstrates AI as Co-Worker (convergence)

Canonical Source Validation:

Skills format follows .claude/skills/<name>/SKILL.md pattern
Lesson 0 references /fetching-library-docs for official docs
Provider patterns align with Pipecat plugin system

VII. File Structure

63-pipecat/
├── _category_.json           # Existing
├── README.md                  # Chapter overview (create)
├── 00-build-pipecat-skill.md  # Lesson 0: L00 pattern (create)
├── 01-frame-pipeline-architecture.md # Lesson 1 (create)
├── 02-multi-provider-integration.md  # Lesson 2 (create)
└── 03-chapter-quiz.md         # Assessment (create)

VIII. Summary

Chapter 81: Pipecat is a 3-lesson SKILL-FIRST technical chapter:

Lesson	Title	Concepts	Duration	Evals
0	Build Your Pipecat Skill	2	30 min	#1
1	Frame-Based Pipeline Architecture	4	45 min	#2, #3, #4
2	Multi-Provider Integration & Custom Processors	3	45 min	#5, #6, #7

Total: 9 concepts, ~120 minutes, creates production-ready pipecat skill

Skill Output: .claude/skills/pipecat/SKILL.md - a reusable Digital FTE component grounded in official documentation.

Comparison to Chapter 80 (LiveKit):

Chapter 80: 4 lessons, 10 concepts, distributed architecture focus
Chapter 81: 3 lessons, 9 concepts, modular composition focus
Together: Complete voice framework toolkit for Digital FTEs

I. Chapter Analysis​

Chapter Type​

Concept Density Analysis​

II. Success Evals (from Part 11 README + Chapter 81 Description)​

III. Lesson Sequence​

Lesson 0: Build Your Pipecat Skill​

Lesson 1: Frame-Based Pipeline Architecture​

Lesson 2: Multi-Provider Integration & Custom Processors​

IV. Skill Dependency Graph​

V. Assessment Plan​

Formative Assessments (During Lessons)​

Summative Assessment (End of Chapter)​

VI. Validation Checklist​

VII. File Structure​

VIII. Summary​

I. Chapter Analysis

Chapter Type

Concept Density Analysis

II. Success Evals (from Part 11 README + Chapter 81 Description)

III. Lesson Sequence

Lesson 0: Build Your Pipecat Skill

Lesson 1: Frame-Based Pipeline Architecture

Lesson 2: Multi-Provider Integration & Custom Processors

IV. Skill Dependency Graph

V. Assessment Plan

Formative Assessments (During Lessons)

Summative Assessment (End of Chapter)

VI. Validation Checklist

VII. File Structure

VIII. Summary