Chapter 81: Pipecat - Lesson Plan
Generated by: chapter-planner v2.0.0 (Reasoning-Activated) Source: Part 11 README, Deep Search Report, Chapter 80 context Created: 2026-01-01 Constitution: v6.0.0 (Reasoning Mode)
I. Chapter Analysis
Chapter Type
TECHNICAL (SKILL-FIRST L00 Pattern) - This chapter uses the Skill-First Learning pattern where students build the pipecat skill FIRST from official documentation, then learn the framework by improving that skill across subsequent lessons.
Recognition signals:
- Learning objectives use "implement/create/build"
- Code examples required for every lesson
- Skill artifact created in Lesson 0
- Subsequent lessons TEST and IMPROVE the skill
- Follows L00 pattern established in Parts 5-7 and Chapter 80
Concept Density Analysis
Core Concepts (from Deep Search + Part 11 README): 9 concepts
- LEARNING-SPEC.md (skill specification before building)
- Frame-based pipeline architecture (core abstraction)
- Frame types: AudioRawFrame, TextFrame, EndFrame, control signals
- Processors: Transformations on frame streams
- Pipelines: Composition of processors
- Transport abstraction (Daily WebRTC, FastAPI WebSocket, local audio)
- Provider plugins (40+ integrations: STT, LLM, TTS providers)
- Speech-to-Speech integration (OpenAI Realtime, Gemini Live, Nova Sonic)
- Custom processor implementation
Complexity Assessment: Standard (framework with clear abstractions, modular design)
Proficiency Tier: B1-B2 (Part 11 requires Parts 6, 7, 9, 10 completed; students have production experience)
Justified Lesson Count: 3 lessons
- Lesson 0: Build Your Pipecat Skill (L00 pattern)
- Lesson 1: Frame-Based Pipeline Architecture (Layer 2: AI Collaboration)
- Lesson 2: Multi-Provider Integration & Custom Processors (Layer 2 + Layer 3)
Reasoning:
- 9 concepts across 3 lessons = 3 concepts per lesson
- B1-B2 limit is 10 concepts per lesson - well within limit
- Pipecat is conceptually simpler than LiveKit (modular vs distributed)
- Chapter 80 covered similar voice AI fundamentals, reducing cognitive load
- 3 lessons sufficient for skill-first framework mastery
II. Success Evals (from Part 11 README + Chapter 81 Description)
Success Criteria (what students must achieve):
- Skill Creation: Students build a working
pipecatskill from official documentation using /fetching-library-docs - Frame Understanding: Students can explain frames as the data unit flowing through pipelines
- Pipeline Architecture: Students implement voice agents with processor composition
- Transport Flexibility: Students configure different transports (Daily, WebSocket, local)
- Provider Integration: Students connect multiple STT/LLM/TTS providers via plugins
- S2S Integration: Students use OpenAI Realtime, Gemini Live, or Nova Sonic through Pipecat
- Custom Processors: Students implement custom processors for domain-specific transformations
All lessons below map to these evals.
III. Lesson Sequence
Lesson 0: Build Your Pipecat Skill
Title: Build Your Pipecat Skill
Learning Objectives:
- Write a LEARNING-SPEC.md that defines what you want to learn about Pipecat
- Fetch official Pipecat documentation using /fetching-library-docs
- Create a
pipecatskill grounded in official documentation (not AI memory) - Verify the skill works by building a minimal voice agent pipeline
Stage: Layer 1 (Manual Foundation) + Layer 2 (AI Collaboration via /skill-creator)
CEFR Proficiency: B1
New Concepts (count: 2):
- LEARNING-SPEC.md (specification before skill creation)
- Documentation-grounded skill creation
Cognitive Load Validation: 2 concepts <= 10 limit (B1) -> WITHIN LIMIT
Maps to Evals: #1 (Skill Creation)
Key Sections:
-
Clone the Skills Lab Fresh (~3 min)
- Why fresh clone: No state assumptions from previous work
- Command:
git clone [skills-lab-repo] && cd skills-lab - Verify clean environment
-
Write Your LEARNING-SPEC.md (~7 min)
- What is LEARNING-SPEC.md: Your specification for what you want to learn
- Template structure:
# Learning Specification: Pipecat
## What I Want to Learn
- How Pipecat's frame-based pipeline works
- How to compose processors into voice agents
- How to integrate multiple AI providers
- How to use S2S models through Pipecat
## Why This Matters
- Pipecat has 40+ provider integrations
- Frame-based design enables custom transformations
- Transport-agnostic means flexible deployment
## Success Criteria
- [ ] Skill can scaffold a basic voice pipeline
- [ ] Skill explains frame types and flow
- [ ] Skill guides multi-provider configuration
- [ ] Skill includes custom processor patterns - Write YOUR specification (not a copy)
-
Fetch Official Documentation (~5 min)
- Use /fetching-library-docs to get Pipecat docs
- Why official docs: AI memory is unreliable for API details
- What to look for: Frame architecture, processors, transports, plugins
- Save relevant excerpts for skill creation
-
Create Your Skill with /skill-creator (~10 min)
- Invoke /skill-creator with your LEARNING-SPEC.md and fetched docs
- Skill structure: Persona + Questions + Principles
- Review generated skill for accuracy
- Commit to
.claude/skills/pipecat/SKILL.md
-
Verify Your Skill Works (~5 min)
- Test: "Create a minimal Pipecat voice pipeline"
- Verify generated code matches official patterns
- If issues found: Improve skill and re-test
- Skill is now your knowledge artifact
Duration Estimate: 30 minutes
File Output: .claude/skills/pipecat/SKILL.md
Prerequisites:
- Part 10 completed (chat interfaces)
- Chapter 79 completed (voice AI fundamentals)
- Chapter 80 recommended (LiveKit comparison context)
Try With AI Prompts:
-
Draft Your LEARNING-SPEC.md
I'm about to learn Pipecat, a frame-based voice AI framework with
40+ provider integrations. Help me write a LEARNING-SPEC.md:
My context:
- I just learned LiveKit Agents in Chapter 80
- I want to understand Pipecat's different approach (frames vs jobs)
- My goal is flexibility in provider selection for my Digital FTEs
Help me define:
1. What specific aspects of Pipecat should I focus on?
2. How does it differ from LiveKit (what's unique)?
3. What success criteria would prove I've learned it?
Make this MY specification, not a generic template.What you're learning: Comparative specification - defining learning goals relative to what you already know.
-
Analyze the Official Docs
I fetched Pipecat documentation. Here are the key sections:
[paste relevant excerpts from /fetching-library-docs output]
Help me understand:
1. What are frames? How do they flow through pipelines?
2. What patterns appear repeatedly in the examples?
3. What makes Pipecat different from LiveKit's approach?
4. What should my skill definitely include?
I want to build a GROUNDED skill, not one based on assumptions.What you're learning: Documentation analysis - extracting unique patterns from primary sources.
-
Review Your Generated Skill
Here's the skill /skill-creator generated:
[paste SKILL.md content]
Compare this to the official documentation:
1. Does it accurately represent Pipecat's frame architecture?
2. Are there any claims not supported by the docs?
3. How does it compare to my livekit-agents skill?
4. What should be removed as incorrect or speculative?
Help me make this skill ACCURATE and DISTINCT from LiveKit.What you're learning: Validation - ensuring AI-generated content matches authoritative sources.
Lesson 1: Frame-Based Pipeline Architecture
Title: Frame-Based Pipeline Architecture
Learning Objectives:
- Explain frames as the fundamental data unit in Pipecat pipelines
- Distinguish frame types: AudioRawFrame, TextFrame, EndFrame, control signals
- Implement processors that transform frame streams
- Compose processors into complete voice pipelines
- Configure different transports (Daily WebRTC, WebSocket, local)
Stage: Layer 2 (AI Collaboration) - Use skill to build, improve skill based on learnings
CEFR Proficiency: B1
New Concepts (count: 4):
- Frame-based architecture (core abstraction)
- Frame types and their purposes
- Processors and pipelines
- Transport abstraction
Cognitive Load Validation: 4 concepts <= 10 limit (B1) -> WITHIN LIMIT
Maps to Evals: #2 (Frame Understanding), #3 (Pipeline Architecture), #4 (Transport Flexibility)
Key Sections:
-
The Frame Abstraction (~7 min)
- What is a frame: The data unit flowing through pipelines
- Why frames: Uniform interface for diverse data types
- Frame lifecycle: Creation, transformation, consumption
- Comparison to LiveKit: Jobs vs Frames mental model
- Diagram: Frame flow through pipeline
-
Frame Types (~8 min)
- AudioRawFrame: Raw audio data (samples, sample rate, channels)
- TextFrame: Transcribed text or LLM responses
- EndFrame: Signals end of conversation/stream
- Control Frames: StartInterruptionFrame, StopInterruptionFrame
- System Frames: Lifecycle management, pipeline control
- Code: Working with different frame types
-
Processors: The Building Blocks (~10 min)
- What processors do: Transform input frames to output frames
- Processor interface: async process_frame method
- Examples: STT processor (Audio -> Text), LLM processor (Text -> Text)
- Chaining: Output of one becomes input of next
- Code: Implementing a basic processor
-
Pipelines: Composing Processors (~8 min)
- Pipeline construction: List of processors in order
- Frame routing: How frames flow through
- Parallel pipelines: Multiple processing paths
- Error handling: What happens when processor fails
- Code: Building a complete voice pipeline
-
Transport Abstraction (~7 min)
- Transport role: Audio I/O to/from the outside world
- Daily WebRTC: Browser-based realtime communication
- FastAPI WebSocket: Custom backend integration
- Local Audio: Microphone/speaker for testing
- Code: Configuring different transports
-
Improve Your Skill (~5 min)
- Reflect: What frame patterns did you learn?
- Update
.claude/skills/pipecat/SKILL.md - Add: Frame type guidance, processor patterns
- Test: Does improved skill generate better code?
Duration Estimate: 45 minutes
Three Roles Integration (Layer 2):
AI as Teacher:
- Skill explains frame lifecycle patterns you didn't know
- "Control frames propagate immediately, bypassing queued frames"
AI as Student:
- You refine skill's transport explanation based on your deployment needs
- "Add WebSocket transport pattern for my Next.js frontend"
AI as Co-Worker:
- Iterate on pipeline composition together
- First attempt misses error handling -> AI suggests try/except pattern -> you validate
Try With AI Prompts:
-
Understand the Frame Abstraction
I'm learning Pipecat's frame-based architecture. Coming from LiveKit's
job-based model, help me understand:
1. What's a frame? How is it different from a LiveKit job?
2. What frame types exist and when do I use each?
3. How do frames flow through a pipeline?
4. What happens when a frame reaches the end?
Use diagrams or pseudocode to clarify the flow.What you're learning: Mental model translation - mapping new concepts to familiar ones.
-
Build a Processor Chain
Help me build a complete voice pipeline using my pipecat skill:
Requirements:
- Transport: Daily WebRTC (for browser testing)
- STT: Deepgram Nova-3
- LLM: GPT-4o-mini
- TTS: Cartesia Sonic
Walk me through each processor and how frames flow between them.
After we build it, I'll test and report what works.What you're learning: Processor composition - building systems from modular components.
-
Compare Transport Options
I need to choose the right transport for my use case:
Scenario A: Browser-based voice agent for customer support
Scenario B: CLI tool for voice interaction during development
Scenario C: WebSocket integration with existing FastAPI backend
Use my pipecat skill to recommend transports for each and explain
the tradeoffs. I'll implement one and report back.What you're learning: Transport selection - matching infrastructure to requirements.
Lesson 2: Multi-Provider Integration & Custom Processors
Title: Multi-Provider Integration & Custom Processors
Learning Objectives:
- Configure multiple STT/LLM/TTS providers via Pipecat's plugin system
- Integrate speech-to-speech models (OpenAI Realtime, Gemini Live, Nova Sonic)
- Implement custom processors for domain-specific transformations
- Finalize
pipecatskill for production use
Stage: Layer 2 (AI Collaboration) + Layer 3 (Intelligence Design)
CEFR Proficiency: B1-B2
New Concepts (count: 3):
- Provider plugins (40+ integrations)
- S2S model integration
- Custom processor implementation
Cognitive Load Validation: 3 concepts <= 10 limit (B1-B2) -> WITHIN LIMIT
Maps to Evals: #5 (Provider Integration), #6 (S2S Integration), #7 (Custom Processors)
Key Sections:
-
The Plugin Ecosystem (~8 min)
- Pipecat's 40+ provider integrations
- Plugin categories: STT, LLM, TTS, Transport, Vision
- Installation: pip install pipecat-ai[provider]
- Provider comparison: Latency, cost, quality tradeoffs
- Table: Key providers and their strengths
-
Swapping Providers (~10 min)
- The modular advantage: Change one processor, keep the pipeline
- STT providers: Deepgram, Whisper, AssemblyAI, Gladia
- LLM providers: OpenAI, Anthropic, Google, Together, local
- TTS providers: Cartesia, ElevenLabs, Azure, Deepgram Aura
- Code: Switching from Deepgram to Whisper with one line
-
Speech-to-Speech Integration (~12 min)
- Why S2S: Native voice understanding + generation
- OpenAI Realtime: via
RTVIProcessor - Gemini Live: via
GeminiMultimodalLive - AWS Nova Sonic: via Nova plugin
- When to use S2S vs cascaded pipeline
- Code: Configuring OpenAI Realtime through Pipecat
-
Custom Processors (~10 min)
- When to customize: Domain-specific transformations
- Processor base class: FrameProcessor
- Example: Sentiment analysis processor (Text -> Emotion + Text)
- Example: Translation processor (Text -> Translated Text)
- Example: Content filter (blocks inappropriate content)
- Code: Implementing a custom processor
-
Finalize Your Skill (~5 min)
- Complete skill review: Does it cover all learnings?
- Add: Provider selection guidance
- Add: Custom processor patterns
- Final test: Use skill to scaffold a multi-provider voice system
- Commit: Production-ready skill artifact
Duration Estimate: 45 minutes
Three Roles Integration (Layer 2 + Layer 3):
AI as Teacher:
- Skill guides provider selection for your use case
- "For realtime transcription, Deepgram Nova-3 has 90ms latency vs Whisper's 300ms"
AI as Student:
- You teach skill your domain constraints
- "I need HIPAA compliance - add provider filtering"
AI as Co-Worker:
- Design custom processor together
- You specify transformation logic -> AI generates code -> you validate behavior
Skill Finalization:
At lesson end, students have a production-ready pipecat skill that:
- Scaffolds voice pipelines with frame-based architecture
- Guides provider selection across 40+ integrations
- Supports S2S model configuration
- Includes custom processor patterns
Try With AI Prompts:
-
Choose the Right Providers
I need to build a voice agent with these constraints:
- Latency: Under 500ms total response time
- Cost: Under $0.05 per minute
- Quality: Natural-sounding voice, accurate transcription
- Region: Must support EU data residency
Use my pipecat skill to recommend:
1. Which STT provider?
2. Which LLM provider?
3. Which TTS provider?
Explain the tradeoffs and alternatives for each.What you're learning: Provider selection - balancing latency, cost, quality, compliance.
-
Integrate Speech-to-Speech
I want to try OpenAI's Realtime API through Pipecat instead of
building my own pipeline. Help me:
1. Configure RTVIProcessor for OpenAI Realtime
2. Understand what I lose vs the cascaded approach
3. Understand what I gain (latency, naturalness)
4. Set up function calling through the S2S model
Use my pipecat skill. I'll test and report latency numbers.What you're learning: S2S integration - using native voice models through framework abstraction.
-
Build a Custom Processor
I need a custom processor that:
1. Receives TextFrame from STT
2. Detects if user is asking about sensitive topics (medical, legal)
3. If sensitive: Adds disclaimer frame before LLM response
4. If not sensitive: Passes through unchanged
Help me implement this using my pipecat skill. Walk through:
- Processor class structure
- Frame handling logic
- Testing approachWhat you're learning: Custom processor implementation - extending Pipecat for domain needs.
IV. Skill Dependency Graph
Skill Dependencies:
Lesson 0: Build Skill (foundation)
|
Lesson 1: Frame Architecture (requires skill)
|
Lesson 2: Multi-Provider + Custom Processors (requires frame understanding)
Cross-Chapter Dependencies:
- Requires: Chapter 79 (Voice AI Fundamentals) - architecture mental models
- Requires: Chapter 80 (LiveKit Agents) - comparison context, voice pipeline understanding
- Prepares for: Chapter 82 (OpenAI Realtime API) - direct API access after framework abstraction
- Prepares for: Chapter 85 (Capstone) - production voice agent
V. Assessment Plan
Formative Assessments (During Lessons)
- Lesson 0: Skill generation verification (skill works, matches docs)
- Lesson 1: Pipeline code review (correct frame flow)
- Lesson 2: Provider swap demonstration (change provider without breaking pipeline)
Summative Assessment (End of Chapter)
Chapter 81 Quiz:
- Frame Architecture: Explain how frames flow through processors
- Frame Types: When to use AudioRawFrame vs TextFrame vs EndFrame
- Transports: Compare Daily vs WebSocket vs Local transports
- Providers: How to swap STT provider without changing pipeline
- Custom Processors: When and how to implement custom transformations
Practical Assessment:
- Build a voice pipeline that uses two different provider combinations
- Implement a custom processor for your domain
- Demonstrate transport flexibility (run same pipeline on different transports)
VI. Validation Checklist
Chapter-Level Validation:
- Chapter type identified: TECHNICAL (SKILL-FIRST L00 Pattern)
- Concept density analysis documented: 9 concepts across 3 lessons
- Lesson count justified: 3 lessons (~3 concepts each, within B1-B2 limit)
- All evals covered by lessons
- All lessons map to at least one eval
Stage Progression Validation:
- Lesson 0: Layer 1 + Layer 2 (skill creation with AI collaboration)
- Lesson 1: Layer 2 (AI collaboration, skill improvement)
- Lesson 2: Layer 2 + Layer 3 (provider integration, custom processors)
- No premature spec-driven content (that's Chapter 85 Capstone)
Cognitive Load Validation:
- Lesson 0: 2 concepts <= 10 (B1 limit) PASS
- Lesson 1: 4 concepts <= 10 (B1 limit) PASS
- Lesson 2: 3 concepts <= 10 (B1-B2 limit) PASS
L00 Pattern Requirements:
- Lesson 0 creates skill from official documentation
- Fresh clone of skills-lab (no state assumptions)
- LEARNING-SPEC.md written before skill creation
- /fetching-library-docs used for documentation
- Skill tested and verified before proceeding
- Each subsequent lesson TESTS and IMPROVES the skill
- "Improve Your Skill" section in each lesson
Three Roles Validation (Layer 2 lessons):
- Each Layer 2 lesson demonstrates AI as Teacher
- Each Layer 2 lesson demonstrates AI as Student
- Each Layer 2 lesson demonstrates AI as Co-Worker (convergence)
Canonical Source Validation:
- Skills format follows
.claude/skills/<name>/SKILL.mdpattern - Lesson 0 references /fetching-library-docs for official docs
- Provider patterns align with Pipecat plugin system
VII. File Structure
63-pipecat/
├── _category_.json # Existing
├── README.md # Chapter overview (create)
├── 00-build-pipecat-skill.md # Lesson 0: L00 pattern (create)
├── 01-frame-pipeline-architecture.md # Lesson 1 (create)
├── 02-multi-provider-integration.md # Lesson 2 (create)
└── 03-chapter-quiz.md # Assessment (create)
VIII. Summary
Chapter 81: Pipecat is a 3-lesson SKILL-FIRST technical chapter:
| Lesson | Title | Concepts | Duration | Evals |
|---|---|---|---|---|
| 0 | Build Your Pipecat Skill | 2 | 30 min | #1 |
| 1 | Frame-Based Pipeline Architecture | 4 | 45 min | #2, #3, #4 |
| 2 | Multi-Provider Integration & Custom Processors | 3 | 45 min | #5, #6, #7 |
Total: 9 concepts, ~120 minutes, creates production-ready pipecat skill
Skill Output: .claude/skills/pipecat/SKILL.md - a reusable Digital FTE component grounded in official documentation.
Comparison to Chapter 80 (LiveKit):
- Chapter 80: 4 lessons, 10 concepts, distributed architecture focus
- Chapter 81: 3 lessons, 9 concepts, modular composition focus
- Together: Complete voice framework toolkit for Digital FTEs