Updated Feb 23, 2026

Chapter 84: Phone & Browser Integration - Lesson Plan

Generated by: chapter-planner v2.0.0 (Reasoning-Activated) Source: Part 11 README, Deep Search Report, Chapters 79-83 context Created: 2026-01-02 Constitution: v6.0.0 (Reasoning Mode)

I. Chapter Analysis

Chapter Type

TECHNICAL (SKILL-FIRST L00 Pattern) - This chapter uses the Skill-First Learning pattern where students build TWO skills (voice-telephony and web-audio-capture) FIRST from official documentation, then learn integration patterns by improving those skills across subsequent lessons.

Recognition signals:

Learning objectives use "implement/integrate/configure/deploy"
Code examples required for every lesson
TWO skill artifacts created in Lesson 0
Subsequent lessons TEST and IMPROVE both skills
Follows L00 pattern established in Parts 5-7
Bridges frameworks (Chapters 80-83) to real communication channels

Concept Density Analysis

Core Concepts (from Deep Search + Part 11 README): 9 concepts

SIP protocol fundamentals and LiveKit native SIP
Twilio Voice integration (inbound/outbound calls)
Telnyx as cost-effective telephony alternative
Call queuing, transfer, and hold patterns
Web Audio API and getUserMedia for microphone access
AudioWorklet for low-latency browser audio processing
Silero VAD in browser via WebAssembly
WebRTC vs WebSocket for browser-server audio transport
Production telephony patterns (IVR, recording, failover)

Complexity Assessment: Standard (integration patterns building on framework knowledge)

Proficiency Tier: B1-B2 (Part 11 requires Parts 6, 7, 9, 10 + Chapters 79-83 completed; students have voice AI framework experience)

Justified Lesson Count: 4 lessons

Lesson 0: Build Your Integration Skills (L00 pattern - creates BOTH skills)
Lesson 1: Phone Integration with SIP & Twilio
Lesson 2: Browser Audio Capture & VAD
Lesson 3: Production Telephony Patterns

Reasoning:

9 concepts across 4 lessons = ~2.25 concepts per lesson
B1-B2 limit is 10 concepts per lesson - well within limit
L00 pattern requires skill-first approach
Two related but distinct skill domains (telephony vs browser audio)
Chapter bridges to Chapter 85 Capstone where both skills compose

II. Success Evals (from Part 11 README + Chapter 84 Description)

Success Criteria (what students must achieve):

Skill Creation: Students build working voice-telephony and web-audio-capture skills from official documentation
SIP Understanding: Students can explain SIP protocol fundamentals and when to use LiveKit native SIP vs Twilio
Twilio Integration: Students implement inbound AND outbound voice calls via Twilio Voice
Cost Optimization: Students configure Telnyx as cost-effective alternative (understand ~$0.002/min vs Twilio pricing)
Browser Audio Mastery: Students capture microphone audio using Web Audio API with low-latency AudioWorklet processing
VAD Implementation: Students run Silero VAD in browser via WebAssembly for client-side voice activity detection
Production Patterns: Students implement IVR patterns, call recording, and failover strategies

All lessons below map to these evals.

III. Lesson Sequence

Lesson 0: Build Your Integration Skills

Title: Build Your Integration Skills

Learning Objectives:

Write a LEARNING-SPEC.md that defines what you want to learn about telephony and browser audio
Fetch official documentation for Twilio Voice, Telnyx, Web Audio API, and Silero VAD
Create TWO skills grounded in official documentation:
- voice-telephony skill for phone integration
- web-audio-capture skill for browser audio
Verify both skills work with minimal integration tests

Stage: Layer 1 (Manual Foundation) + Layer 2 (AI Collaboration via /skill-creator)

CEFR Proficiency: B1

New Concepts (count: 2):

Dual-skill LEARNING-SPEC.md (specification for two related domains)
Documentation-grounded integration skill creation

Cognitive Load Validation: 2 concepts <= 10 limit (B1) -> WITHIN LIMIT

Maps to Evals: #1 (Skill Creation)

Key Sections:

Clone the Skills Lab Fresh (~3 min)
- Why fresh clone: No state assumptions from previous chapters
- Command: git clone [skills-lab-repo] && cd skills-lab
- Verify clean environment

Write Your LEARNING-SPEC.md for BOTH Skills (~10 min)

What is LEARNING-SPEC.md: Your specification for two related skill domains

Template structure:

# Learning Specification: Voice Integration Skills

## Skill 1: voice-telephony

### What I Want to Learn
- How SIP protocol works for voice calls
- How to integrate Twilio Voice for inbound/outbound calls
- How Telnyx provides cost-effective alternative (~$0.002/min)
- Call flow patterns (queuing, transfer, hold)

### Why This Matters
- Phone integration makes Digital FTEs accessible via PSTN
- Need to connect voice agents to real phone numbers

### Success Criteria
- [ ] Skill can guide Twilio Voice setup (inbound + outbound)
- [ ] Skill explains SIP vs HTTP webhook tradeoffs
- [ ] Skill includes cost comparison (Twilio vs Telnyx)

## Skill 2: web-audio-capture

### What I Want to Learn
- How Web Audio API captures microphone input
- How AudioWorklet provides low-latency processing
- How to run Silero VAD in browser via WebAssembly
- WebRTC vs WebSocket for audio transport

### Why This Matters
- Browser-based voice enables web apps without phone network
- Client-side VAD reduces server round-trips

### Success Criteria
- [ ] Skill can scaffold getUserMedia + AudioWorklet setup
- [ ] Skill explains Silero VAD WASM integration
- [ ] Skill guides WebRTC vs WebSocket decision

Write YOUR specification (not a copy)

Fetch Official Documentation (~8 min)
- For voice-telephony:
  - Twilio Voice API docs
  - Telnyx Voice docs
  - LiveKit SIP integration docs
- For web-audio-capture:
  - MDN Web Audio API
  - AudioWorklet specification
  - Silero VAD GitHub/docs
- Why official docs: Telephony APIs change frequently; AI memory unreliable
- Save relevant excerpts for skill creation
Create Your Skills with /skill-creator (~12 min)
- Invoke /skill-creator for voice-telephony skill
  - Persona: Telephony integration specialist
  - Questions: SIP vs webhooks, provider selection, call flow
  - Principles: Cost optimization, reliability, compliance
- Invoke /skill-creator for web-audio-capture skill
  - Persona: Browser audio engineering specialist
  - Questions: Latency requirements, VAD strategy, transport selection
  - Principles: Low latency, browser compatibility, efficient processing
- Commit to:
  - .claude/skills/voice-telephony/SKILL.md
  - .claude/skills/web-audio-capture/SKILL.md
Verify Both Skills Work (~7 min)
- Test voice-telephony: "How do I set up inbound Twilio Voice calls?"
- Test web-audio-capture: "How do I capture microphone audio with AudioWorklet?"
- Verify generated guidance matches official documentation
- If issues found: Improve skills and re-test
- Skills are now your knowledge artifacts

Duration Estimate: 40 minutes

File Outputs:

.claude/skills/voice-telephony/SKILL.md
.claude/skills/web-audio-capture/SKILL.md

Prerequisites:

Part 10 completed (chat interfaces)
Chapters 79-83 completed (voice frameworks and direct APIs)
LiveKit Agents and Pipecat skills from Chapters 80-81

Try With AI Prompts:

Draft Your Dual-Skill LEARNING-SPEC.md

I need to build TWO related skills for voice integration:
1. voice-telephony - connecting to phone networks
2. web-audio-capture - browser-based audio

Help me write a LEARNING-SPEC.md that covers BOTH skills:

My context:
- I've mastered LiveKit Agents (Chapter 80) and Pipecat (Chapter 81)
- I understand direct APIs (Chapters 82-83)
- My goal is connecting my Task Manager to real phones AND browsers

Help me define:
1. What aspects of each domain should I focus on?
2. How do these skills complement each other?
3. What success criteria prove I've learned both?
4. What's explicitly out of scope (skill boundaries)?

Make this MY specification, covering both domains coherently.

What you're learning: Dual-domain specification - defining related but distinct skill boundaries.

Analyze Telephony Documentation

I fetched documentation for Twilio Voice, Telnyx, and LiveKit SIP.
Here are the key sections:
[paste relevant excerpts from /fetching-library-docs output]

Help me understand:
1. When should I use LiveKit's native SIP vs Twilio webhooks?
2. What's Telnyx's pricing advantage (~$0.002/min)?
3. What patterns appear in call flow handling?
4. What compliance considerations exist (recording, GDPR)?

I want a GROUNDED voice-telephony skill, not assumptions.

What you're learning: Provider comparison - understanding when to use which telephony approach.

Review Your Generated Skills

Here are the skills /skill-creator generated:

voice-telephony:
[paste SKILL.md content]

web-audio-capture:
[paste SKILL.md content]

Compare each to official documentation:
1. Does voice-telephony accurately represent Twilio/Telnyx integration?
2. Does web-audio-capture correctly describe AudioWorklet patterns?
3. Are there claims not supported by docs?
4. What's missing that should be included?

Help me make BOTH skills ACCURATE, not just comprehensive.

What you're learning: Multi-skill validation - ensuring related skills are consistent and accurate.

Lesson 1: Phone Integration with SIP & Twilio

Title: Phone Integration with SIP & Twilio

Learning Objectives:

Explain SIP protocol fundamentals and how LiveKit provides native SIP support
Implement inbound voice calls via Twilio Voice (receive calls to your voice agent)
Implement outbound voice calls via Twilio Voice (agent calls users)
Configure Telnyx as cost-effective alternative with ~$0.002/min pricing
Handle 8kHz PSTN audio quality considerations (reduced model advantages)

Stage: Layer 2 (AI Collaboration) - Use skill to build, improve skill based on learnings

CEFR Proficiency: B1

New Concepts (count: 3):

SIP protocol and LiveKit native SIP support
Twilio Voice integration (inbound + outbound)
Telnyx as cost-effective telephony alternative

Cognitive Load Validation: 3 concepts <= 10 limit (B1) -> WITHIN LIMIT

Maps to Evals: #2 (SIP Understanding), #3 (Twilio Integration), #4 (Cost Optimization)

Key Sections:

SIP Protocol Fundamentals (~8 min)
- What is SIP: Session Initiation Protocol for voice/video calls
- SIP vs HTTP webhooks: When to use each approach
- PSTN: Public Switched Telephone Network (traditional phone system)
- LiveKit native SIP: Direct SIP trunk integration without middleware
- Why this matters: SIP = lower latency, HTTP webhooks = simpler setup
- Diagram: SIP call flow from PSTN to voice agent
LiveKit Native SIP Support (~7 min)
- How LiveKit handles SIP: Direct trunk connection
- SIP trunk providers: What they provide (phone numbers, routing)
- Configuration: Connecting SIP trunk to LiveKit
- When to use: High-volume, low-latency requirements
- Code: LiveKit SIP trunk configuration
Twilio Voice Integration (~12 min)
- Inbound calls:
  - Purchase Twilio phone number
  - Configure webhook URL for incoming calls
  - TwiML response to connect to voice agent
  - Code: Flask/FastAPI webhook handler
- Outbound calls:
  - Initiate call via Twilio REST API
  - Connect outbound call to voice agent session
  - Code: Programmatic outbound calling
- Integration patterns: How Twilio connects to LiveKit/Pipecat
Telnyx Cost Optimization (~8 min)
- Pricing comparison:
  - Twilio: ~$0.0085/min + $1/month per number
  - Telnyx: ~$0.002/min + similar number costs
- When to use Telnyx: High volume, cost-sensitive applications
- API comparison: Similar patterns, different endpoints
- Code: Telnyx integration (parallel to Twilio)
- Trade-offs: Feature parity, support, reliability
8kHz PSTN Audio Considerations (~5 min)
- PSTN audio quality: 8kHz sample rate (vs 16kHz+ for web)
- Impact on models: Native S2S advantages reduced at 8kHz
- Recommendation: Cascaded pipeline often better for phone
- Code: Configuring voice agent for PSTN audio quality
Improve Your Skill (~5 min)
- Reflect: What telephony patterns did you learn?
- Update .claude/skills/voice-telephony/SKILL.md
- Add: Provider selection decision tree
- Test: Does improved skill guide Twilio setup correctly?

Duration Estimate: 45 minutes

Three Roles Integration (Layer 2):

AI as Teacher:

Skill explains SIP vs webhook tradeoffs you didn't know
"For high-volume call centers, native SIP reduces latency by 50-100ms vs webhooks"

AI as Student:

You refine skill's cost comparison based on your research
"Telnyx pricing section needs specific per-minute rates for comparison"

AI as Co-Worker:

Iterate on Twilio integration until calls connect correctly
First attempt misses TwiML response -> AI suggests fix -> you validate with test call

Try With AI Prompts:

Understand SIP vs Webhooks

I'm learning phone integration for voice agents. Use my voice-telephony
skill to help me understand:

1. When should I use LiveKit's native SIP vs Twilio webhooks?
2. What's the latency difference between approaches?
3. What infrastructure do I need for each?
4. Which approach works better with my existing LiveKit agent?

Use diagrams or flow charts to clarify the decision.

What you're learning: Architecture selection - choosing the right telephony approach for your context.

Implement Twilio Inbound Calls

Help me implement inbound Twilio Voice calls that connect to my
LiveKit voice agent:

Requirements:
- Twilio phone number: [your number]
- Voice agent: Running on LiveKit (from Chapter 80)
- Webhook: FastAPI endpoint for incoming calls
- Response: Connect caller to voice agent room

Walk me through the code step by step. After we build it, I'll make
a test call and report what works.

What you're learning: Inbound telephony - receiving real phone calls to your voice agent.

Compare Provider Costs

I need to choose between Twilio and Telnyx for my Task Manager's
phone integration. My constraints:

- Expected volume: 1,000-5,000 minutes/month
- Budget sensitivity: Cost matters for this use case
- Feature needs: Inbound + outbound, basic IVR
- Reliability: Must work 99.9% of time

Use my voice-telephony skill to:
1. Calculate monthly costs for both providers
2. Compare feature parity
3. Recommend which to use and why

I'll validate your cost estimates against current pricing pages.

What you're learning: Cost optimization - making informed provider decisions based on usage patterns.

Lesson 2: Browser Audio Capture & VAD

Title: Browser Audio Capture & VAD

Learning Objectives:

Implement microphone capture using Web Audio API and getUserMedia
Configure AudioWorklet for low-latency browser audio processing
Run Silero VAD in browser via WebAssembly (<1ms per 30ms audio chunk)
Compare WebRTC vs WebSocket for browser-server audio transport
Connect browser audio to voice agents (LiveKit or Pipecat)

Stage: Layer 2 (AI Collaboration) - Advanced features, skill improvement

CEFR Proficiency: B1-B2

New Concepts (count: 3):

Web Audio API + getUserMedia for microphone capture
AudioWorklet for low-latency processing
Silero VAD in browser via WebAssembly

Cognitive Load Validation: 3 concepts <= 10 limit (B1-B2) -> WITHIN LIMIT

Maps to Evals: #5 (Browser Audio Mastery), #6 (VAD Implementation)

Key Sections:

Web Audio API Fundamentals (~7 min)
- Browser audio context: The audio processing graph
- Security requirement: HTTPS context mandatory for microphone access
- getUserMedia: Requesting microphone permissions
- Audio graph: Source -> Processing -> Destination
- Code: Basic microphone capture
AudioWorklet for Low Latency (~10 min)
- Why AudioWorklet: Replaces deprecated ScriptProcessorNode
- How it works: Audio processing in separate thread
- Latency: ~128 samples (2.67ms at 48kHz) vs 256+ for legacy
- Implementation:
  - AudioWorkletProcessor class (runs in worklet thread)
  - Registration and instantiation
  - Message passing between main thread and worklet
- Code: Complete AudioWorklet pipeline
Silero VAD in Browser (~12 min)
- What is Silero VAD: Voice Activity Detection model (~2MB)
- Performance: <1ms per 30ms audio chunk
- WebAssembly integration: Running ML model in browser
- ONNX Runtime Web: Executing VAD model
- Why client-side VAD:
  - Reduces server round-trips
  - Lower latency for speech detection
  - Bandwidth savings (only send speech)
- Code: Silero VAD WASM integration
WebRTC vs WebSocket Transport (~10 min)
- WebRTC:
  - Designed for realtime audio/video
  - NAT traversal built-in
  - Adaptive bitrate
  - Used by: LiveKit, Daily
- WebSocket:
  - Simpler implementation
  - No NAT traversal (server must be reachable)
  - Fixed encoding
  - Used by: OpenAI Realtime direct connection
- Decision matrix: When to use each
- Code: Both transport implementations
Connecting to Voice Agents (~6 min)
- LiveKit integration: Browser SDK connects to LiveKit room
- Pipecat integration: WebSocket transport for browser audio
- Complete flow: getUserMedia -> AudioWorklet -> VAD -> Transport -> Agent
- Code: End-to-end browser voice integration
Improve Your Skill (~5 min)
- Reflect: What browser audio patterns did you learn?
- Update .claude/skills/web-audio-capture/SKILL.md
- Add: AudioWorklet setup guidance, VAD integration
- Test: Can skill guide browser audio capture correctly?

Duration Estimate: 50 minutes

Three Roles Integration (Layer 2):

AI as Teacher:

Skill explains AudioWorklet architecture you didn't know
"AudioWorkletProcessor runs in a separate thread, avoiding main thread jank"

AI as Student:

You provide browser compatibility requirements
"Need to support Safari - check AudioWorklet polyfill options"

AI as Co-Worker:

Iterate on Silero VAD integration until detection works
First attempt: WASM loading fails -> debug together -> fix async loading

Try With AI Prompts:

Implement Microphone Capture

I want to capture microphone audio in the browser for my voice agent.
Use my web-audio-capture skill to help me:

Requirements:
- HTTPS context (production deployment)
- Low latency (< 50ms processing delay)
- Works on Chrome, Firefox, Safari
- Outputs PCM audio at 16kHz

Help me implement:
1. getUserMedia with proper constraints
2. AudioWorklet for processing
3. Error handling for permission denied

I'll test in each browser and report what works.

What you're learning: Cross-browser audio capture - handling the messy reality of browser APIs.

Run Silero VAD in Browser

I want to detect speech client-side using Silero VAD in WebAssembly.
This reduces server round-trips and latency.

Use my web-audio-capture skill to help me:

1. Load the Silero VAD ONNX model (~2MB)
2. Run inference in AudioWorklet (< 1ms per chunk)
3. Send speech/non-speech events to main thread
4. Only transmit audio when speech detected

Walk me through the WASM integration. I'll test with real speech
and measure latency.

What you're learning: Client-side ML - running voice models directly in the browser.

Choose Transport Layer

I need to send browser audio to my voice agent. Help me choose:

My constraints:
- Users may be on restrictive networks (corporate firewalls)
- Latency critical (< 300ms end-to-end)
- Backend: LiveKit-based voice agent from Chapter 80
- Scale: 100-500 concurrent users

Use my web-audio-capture skill to:
1. Compare WebRTC vs WebSocket for my use case
2. Explain NAT traversal implications
3. Recommend approach with justification

I'll validate with network tests behind my company firewall.

What you're learning: Transport selection - understanding real-world network constraints.

Lesson 3: Production Telephony Patterns

Title: Production Telephony Patterns

Learning Objectives:

Implement IVR (Interactive Voice Response) patterns with menu navigation
Configure call recording with compliance considerations (GDPR, consent)
Design failover and redundancy for high-availability telephony
Optimize costs across provider mix (Twilio + Telnyx hybrid)
Finalize both skills for production use

Stage: Layer 3 (Intelligence Design) + Layer 4 elements (production patterns)

CEFR Proficiency: B2

New Concepts (count: 3):

IVR patterns and menu navigation
Call recording with compliance
Failover and redundancy design

Cognitive Load Validation: 3 concepts <= 10 limit (B2) -> WITHIN LIMIT

Maps to Evals: #7 (Production Patterns)

Key Sections:

IVR Pattern Implementation (~10 min)
- What is IVR: Automated phone menus ("Press 1 for sales...")
- Modern IVR: Natural language intent detection vs DTMF tones
- Implementation approaches:
  - TwiML for simple menus
  - AI intent classification for natural conversation
- Common patterns:
  - Initial greeting and intent gathering
  - Department routing
  - Queue position updates
  - Callback scheduling
- Code: Complete IVR flow with voice agent integration
Call Queuing, Transfer, and Hold (~8 min)
- Queuing: Managing concurrent calls beyond agent capacity
  - Queue position announcements
  - Estimated wait time
  - Music/message on hold
- Transfer: Moving calls between agents
  - Warm transfer (context preserved)
  - Cold transfer (no context)
  - Transfer to external numbers
- Hold: Pausing active conversation
  - Hold music configuration
  - Return to conversation
- Code: Queue and transfer implementation
Call Recording and Compliance (~10 min)
- Why record: Training, compliance, dispute resolution
- Compliance requirements:
  - GDPR: Explicit consent, right to deletion
  - CCPA: California privacy requirements
  - Two-party consent states (US)
- Implementation:
  - Recording announcement ("This call may be recorded...")
  - Consent capture
  - Secure storage
  - Retention policies
- Code: Compliant call recording setup
Failover and Redundancy (~10 min)
- Failure scenarios:
  - Provider outage (Twilio down)
  - Network issues (SIP trunk failure)
  - Agent unavailability
- Failover strategies:
  - Multi-provider: Telnyx as Twilio backup
  - Geographic redundancy: Multiple regions
  - Graceful degradation: Voicemail when agents unavailable
- Health checks and monitoring
- Code: Multi-provider failover configuration
Cost Optimization Strategies (~7 min)
- Provider mix: High-volume on Telnyx, premium on Twilio
- Call routing: Cheapest route first (Least Cost Routing)
- Off-peak pricing: Scheduling outbound calls
- Monitoring: Cost dashboards and alerts
- Example: 50% cost reduction with hybrid approach
- Code: Cost-optimized routing configuration
Finalize Both Skills (~5 min)
- Complete voice-telephony skill review:
  - IVR patterns included?
  - Compliance guidance present?
  - Failover strategies documented?
- Complete web-audio-capture skill review:
  - AudioWorklet patterns complete?
  - VAD integration guidance present?
  - Transport selection covered?
- Final test: Use skills to scaffold production telephony system
- Commit: Production-ready skill artifacts

Duration Estimate: 50 minutes

Three Roles Integration (Layer 3 + Layer 4):

AI as Teacher:

Skill guides compliance requirements you didn't know
"Two-party consent states require ALL parties agree to recording - announce it first"

AI as Student:

You teach skill your production constraints
"My users are in EU - add GDPR-specific consent flow"

AI as Co-Worker:

Design failover architecture together
You specify uptime requirements -> AI generates multi-provider config -> You validate in staging

Skill Finalization: At lesson end, students have production-ready skills that:

voice-telephony:

Guides SIP and webhook integration
Includes Twilio + Telnyx setup
Provides IVR patterns and call flows
Covers compliance and recording
Includes failover strategies

web-audio-capture:

Scaffolds getUserMedia + AudioWorklet
Guides Silero VAD WASM integration
Covers transport layer selection
Includes browser compatibility guidance

Try With AI Prompts:

Design IVR Flow

I want to build an IVR for my Task Manager phone agent:

Flow requirements:
- Greeting: "Hi, this is Task Manager. How can I help?"
- Intent detection: Natural language, not "press 1 for..."
- Routing: Task queries -> Task Agent, Billing -> Billing Agent
- Fallback: Unclear intent -> Human escalation

Use my voice-telephony skill to help me design:
1. The complete IVR flow diagram
2. Intent classification approach
3. Context passing between IVR and agent
4. Error handling for unrecognized intent

I'll implement and test with real callers.

What you're learning: Modern IVR design - natural conversation, not phone trees.

Implement Compliant Recording

My voice agent needs call recording for:
- Training data collection
- Dispute resolution
- Quality assurance

But I have users in:
- United States (various states)
- European Union (GDPR)

Use my voice-telephony skill to help me:
1. What consent announcements are required?
2. How do I capture and store consent?
3. What retention policies should I implement?
4. How do I handle deletion requests?

Walk me through compliant recording setup.

What you're learning: Compliance engineering - building legally compliant voice systems.

Design Failover Architecture

My production voice agent needs 99.9% uptime. Current setup:
- Primary: Twilio Voice
- Voice agent: LiveKit on Kubernetes
- Expected volume: 10,000 calls/month

Use my voice-telephony skill to design failover:
1. What if Twilio has an outage?
2. What if my LiveKit cluster goes down?
3. What if network connectivity fails?
4. How do I monitor and alert on failures?

I need a multi-provider, multi-region architecture.
Walk me through the design and implementation.

What you're learning: High-availability design - operating voice agents at production scale.

IV. Skill Dependency Graph

Skill Dependencies:

Lesson 0: Build Both Skills (foundation)
    ↓
Lesson 1: Phone Integration (requires voice-telephony skill)
    ↓
Lesson 2: Browser Audio (requires web-audio-capture skill)
    ↓
Lesson 3: Production Patterns (requires both skills + all concepts)

Cross-Chapter Dependencies:

Requires: Chapter 79 (Voice AI Fundamentals) - architecture mental models
Requires: Chapter 80 (LiveKit Agents) - framework integration points
Requires: Chapter 81 (Pipecat) - alternative framework patterns
Requires: Chapter 82 (OpenAI Realtime) - direct API for custom scenarios
Requires: Chapter 83 (Gemini Live) - multimodal integration
Requires: Part 7 (Cloud Native) - Kubernetes deployment patterns
Prepares for: Chapter 85 (Capstone) - production voice agent with phone + browser

V. Assessment Plan

Formative Assessments (During Lessons)

Lesson 0: Dual-skill generation verification (both skills work, match docs)
Lesson 1: Twilio call integration test (make and receive test call)
Lesson 2: Browser audio demo (capture and stream to agent)
Lesson 3: IVR flow test (complete call journey works)

Summative Assessment (End of Chapter)

Chapter 84 Quiz:

SIP: Explain when to use native SIP vs HTTP webhooks
Cost: Compare Twilio and Telnyx pricing for 5,000 min/month
Browser Audio: Describe AudioWorklet's latency advantages
VAD: Explain why client-side Silero VAD reduces latency
Compliance: What consent is required for call recording in EU?

Practical Assessment:

Receive inbound call via Twilio to your voice agent
Capture browser audio and stream to voice agent
Demonstrate IVR flow with intent detection
Show call recording with consent announcement

VI. Validation Checklist

Chapter-Level Validation:

Chapter type identified: TECHNICAL (SKILL-FIRST L00 Pattern)
Concept density analysis documented: 9 concepts across 4 lessons
Lesson count justified: 4 lessons (~2.25 concepts each, within B1-B2 limit)
All evals covered by lessons
All lessons map to at least one eval

Stage Progression Validation:

Lesson 0: Layer 1 + Layer 2 (dual-skill creation with AI collaboration)
Lessons 1-2: Layer 2 (AI collaboration, skill improvement)
Lesson 3: Layer 3 + Layer 4 elements (intelligence design, production patterns)
No premature spec-driven content (that's Chapter 85 Capstone)

Cognitive Load Validation:

Lesson 0: 2 concepts <= 10 (B1 limit) PASS
Lesson 1: 3 concepts <= 10 (B1 limit) PASS
Lesson 2: 3 concepts <= 10 (B1-B2 limit) PASS
Lesson 3: 3 concepts <= 10 (B2 limit) PASS

L00 Pattern Requirements:

Lesson 0 creates TWO skills from official documentation
Fresh clone of skills-lab (no state assumptions)
LEARNING-SPEC.md written before skill creation (covers both skills)
/fetching-library-docs used for documentation
Both skills tested and verified before proceeding
Each subsequent lesson TESTS and IMPROVES relevant skill
"Improve Your Skill" section in each lesson

Three Roles Validation (Layer 2 lessons):

Each Layer 2 lesson demonstrates AI as Teacher
Each Layer 2 lesson demonstrates AI as Student
Each Layer 2 lesson demonstrates AI as Co-Worker (convergence)

Canonical Source Validation:

Skills format follows .claude/skills/<name>/SKILL.md pattern
Lesson 0 references /fetching-library-docs for official docs
Telephony patterns align with real provider documentation (Twilio, Telnyx)
Browser audio patterns align with MDN Web Audio API documentation

VII. File Structure

66-phone-browser-integration/
├── _category_.json               # Existing
├── README.md                     # Chapter overview (create)
├── 00-build-your-skills.md       # Lesson 0: L00 pattern - BOTH skills (create)
├── 01-phone-integration.md       # Lesson 1 (create)
├── 02-browser-audio-vad.md       # Lesson 2 (create)
├── 03-production-patterns.md     # Lesson 3 (create)
└── 04-chapter-quiz.md            # Assessment (create)

VIII. Summary

Chapter 84: Phone & Browser Integration is a 4-lesson SKILL-FIRST technical chapter:

Lesson	Title	Concepts	Duration	Evals
0	Build Your Integration Skills	2	40 min	#1
1	Phone Integration with SIP & Twilio	3	45 min	#2, #3, #4
2	Browser Audio Capture & VAD	3	50 min	#5, #6
3	Production Telephony Patterns	3	50 min	#7

Total: 9 concepts, ~185 minutes, creates TWO production-ready skills

Skill Outputs:

.claude/skills/voice-telephony/SKILL.md - Phone integration patterns grounded in Twilio/Telnyx documentation
.claude/skills/web-audio-capture/SKILL.md - Browser audio capture patterns grounded in MDN/Silero documentation

IX. Technical Reference

Key Technical Facts (from Research Report)

Telephony:

LiveKit has native SIP telephony support (direct trunk integration)
Telnyx pricing: ~$0.002/min for telephony vs ~$0.0085/min Twilio
PSTN audio quality: 8kHz reduces native S2S model advantages

Browser Audio:

Web Audio API requires HTTPS context for microphone access
AudioWorklet latency: ~128 samples (2.67ms at 48kHz)
Silero VAD: <1ms per 30ms audio chunk, ~2MB model

Transport:

WebRTC: NAT traversal, adaptive bitrate, production standard
WebSocket: Simpler, no NAT traversal, good for prototyping

Integration Points with Previous Chapters

Previous Chapter	Integration Point
Chapter 80 (LiveKit)	SIP trunk connects to LiveKit room
Chapter 81 (Pipecat)	WebSocket transport for browser audio
Chapter 82 (OpenAI Realtime)	Direct WebSocket for custom scenarios
Chapter 83 (Gemini Live)	Multimodal browser integration

I. Chapter Analysis​

Chapter Type​

Concept Density Analysis​

II. Success Evals (from Part 11 README + Chapter 84 Description)​

III. Lesson Sequence​

Lesson 0: Build Your Integration Skills​

Lesson 1: Phone Integration with SIP & Twilio​

Lesson 2: Browser Audio Capture & VAD​

Lesson 3: Production Telephony Patterns​

IV. Skill Dependency Graph​

V. Assessment Plan​

Formative Assessments (During Lessons)​

Summative Assessment (End of Chapter)​

VI. Validation Checklist​

VII. File Structure​

VIII. Summary​

IX. Technical Reference​

Key Technical Facts (from Research Report)​

Integration Points with Previous Chapters​

I. Chapter Analysis

Chapter Type

Concept Density Analysis

II. Success Evals (from Part 11 README + Chapter 84 Description)

III. Lesson Sequence

Lesson 0: Build Your Integration Skills

Lesson 1: Phone Integration with SIP & Twilio

Lesson 2: Browser Audio Capture & VAD

Lesson 3: Production Telephony Patterns

IV. Skill Dependency Graph

V. Assessment Plan

Formative Assessments (During Lessons)

Summative Assessment (End of Chapter)

VI. Validation Checklist

VII. File Structure

VIII. Summary

IX. Technical Reference

Key Technical Facts (from Research Report)

Integration Points with Previous Chapters