Skip to main content

Chapter 84: Phone & Browser Integration - Lesson Plan

Generated by: chapter-planner v2.0.0 (Reasoning-Activated) Source: Part 11 README, Deep Search Report, Chapters 79-83 context Created: 2026-01-02 Constitution: v6.0.0 (Reasoning Mode)


I. Chapter Analysis

Chapter Type

TECHNICAL (SKILL-FIRST L00 Pattern) - This chapter uses the Skill-First Learning pattern where students build TWO skills (voice-telephony and web-audio-capture) FIRST from official documentation, then learn integration patterns by improving those skills across subsequent lessons.

Recognition signals:

  • Learning objectives use "implement/integrate/configure/deploy"
  • Code examples required for every lesson
  • TWO skill artifacts created in Lesson 0
  • Subsequent lessons TEST and IMPROVE both skills
  • Follows L00 pattern established in Parts 5-7
  • Bridges frameworks (Chapters 80-83) to real communication channels

Concept Density Analysis

Core Concepts (from Deep Search + Part 11 README): 9 concepts

  1. SIP protocol fundamentals and LiveKit native SIP
  2. Twilio Voice integration (inbound/outbound calls)
  3. Telnyx as cost-effective telephony alternative
  4. Call queuing, transfer, and hold patterns
  5. Web Audio API and getUserMedia for microphone access
  6. AudioWorklet for low-latency browser audio processing
  7. Silero VAD in browser via WebAssembly
  8. WebRTC vs WebSocket for browser-server audio transport
  9. Production telephony patterns (IVR, recording, failover)

Complexity Assessment: Standard (integration patterns building on framework knowledge)

Proficiency Tier: B1-B2 (Part 11 requires Parts 6, 7, 9, 10 + Chapters 79-83 completed; students have voice AI framework experience)

Justified Lesson Count: 4 lessons

  • Lesson 0: Build Your Integration Skills (L00 pattern - creates BOTH skills)
  • Lesson 1: Phone Integration with SIP & Twilio
  • Lesson 2: Browser Audio Capture & VAD
  • Lesson 3: Production Telephony Patterns

Reasoning:

  • 9 concepts across 4 lessons = ~2.25 concepts per lesson
  • B1-B2 limit is 10 concepts per lesson - well within limit
  • L00 pattern requires skill-first approach
  • Two related but distinct skill domains (telephony vs browser audio)
  • Chapter bridges to Chapter 85 Capstone where both skills compose

II. Success Evals (from Part 11 README + Chapter 84 Description)

Success Criteria (what students must achieve):

  1. Skill Creation: Students build working voice-telephony and web-audio-capture skills from official documentation
  2. SIP Understanding: Students can explain SIP protocol fundamentals and when to use LiveKit native SIP vs Twilio
  3. Twilio Integration: Students implement inbound AND outbound voice calls via Twilio Voice
  4. Cost Optimization: Students configure Telnyx as cost-effective alternative (understand ~$0.002/min vs Twilio pricing)
  5. Browser Audio Mastery: Students capture microphone audio using Web Audio API with low-latency AudioWorklet processing
  6. VAD Implementation: Students run Silero VAD in browser via WebAssembly for client-side voice activity detection
  7. Production Patterns: Students implement IVR patterns, call recording, and failover strategies

All lessons below map to these evals.


III. Lesson Sequence


Lesson 0: Build Your Integration Skills

Title: Build Your Integration Skills

Learning Objectives:

  • Write a LEARNING-SPEC.md that defines what you want to learn about telephony and browser audio
  • Fetch official documentation for Twilio Voice, Telnyx, Web Audio API, and Silero VAD
  • Create TWO skills grounded in official documentation:
    • voice-telephony skill for phone integration
    • web-audio-capture skill for browser audio
  • Verify both skills work with minimal integration tests

Stage: Layer 1 (Manual Foundation) + Layer 2 (AI Collaboration via /skill-creator)

CEFR Proficiency: B1

New Concepts (count: 2):

  1. Dual-skill LEARNING-SPEC.md (specification for two related domains)
  2. Documentation-grounded integration skill creation

Cognitive Load Validation: 2 concepts <= 10 limit (B1) -> WITHIN LIMIT

Maps to Evals: #1 (Skill Creation)

Key Sections:

  1. Clone the Skills Lab Fresh (~3 min)

    • Why fresh clone: No state assumptions from previous chapters
    • Command: git clone [skills-lab-repo] && cd skills-lab
    • Verify clean environment
  2. Write Your LEARNING-SPEC.md for BOTH Skills (~10 min)

    • What is LEARNING-SPEC.md: Your specification for two related skill domains
    • Template structure:
      # Learning Specification: Voice Integration Skills

      ## Skill 1: voice-telephony

      ### What I Want to Learn
      - How SIP protocol works for voice calls
      - How to integrate Twilio Voice for inbound/outbound calls
      - How Telnyx provides cost-effective alternative (~$0.002/min)
      - Call flow patterns (queuing, transfer, hold)

      ### Why This Matters
      - Phone integration makes Digital FTEs accessible via PSTN
      - Need to connect voice agents to real phone numbers

      ### Success Criteria
      - [ ] Skill can guide Twilio Voice setup (inbound + outbound)
      - [ ] Skill explains SIP vs HTTP webhook tradeoffs
      - [ ] Skill includes cost comparison (Twilio vs Telnyx)

      ## Skill 2: web-audio-capture

      ### What I Want to Learn
      - How Web Audio API captures microphone input
      - How AudioWorklet provides low-latency processing
      - How to run Silero VAD in browser via WebAssembly
      - WebRTC vs WebSocket for audio transport

      ### Why This Matters
      - Browser-based voice enables web apps without phone network
      - Client-side VAD reduces server round-trips

      ### Success Criteria
      - [ ] Skill can scaffold getUserMedia + AudioWorklet setup
      - [ ] Skill explains Silero VAD WASM integration
      - [ ] Skill guides WebRTC vs WebSocket decision
    • Write YOUR specification (not a copy)
  3. Fetch Official Documentation (~8 min)

    • For voice-telephony:
      • Twilio Voice API docs
      • Telnyx Voice docs
      • LiveKit SIP integration docs
    • For web-audio-capture:
      • MDN Web Audio API
      • AudioWorklet specification
      • Silero VAD GitHub/docs
    • Why official docs: Telephony APIs change frequently; AI memory unreliable
    • Save relevant excerpts for skill creation
  4. Create Your Skills with /skill-creator (~12 min)

    • Invoke /skill-creator for voice-telephony skill
      • Persona: Telephony integration specialist
      • Questions: SIP vs webhooks, provider selection, call flow
      • Principles: Cost optimization, reliability, compliance
    • Invoke /skill-creator for web-audio-capture skill
      • Persona: Browser audio engineering specialist
      • Questions: Latency requirements, VAD strategy, transport selection
      • Principles: Low latency, browser compatibility, efficient processing
    • Commit to:
      • .claude/skills/voice-telephony/SKILL.md
      • .claude/skills/web-audio-capture/SKILL.md
  5. Verify Both Skills Work (~7 min)

    • Test voice-telephony: "How do I set up inbound Twilio Voice calls?"
    • Test web-audio-capture: "How do I capture microphone audio with AudioWorklet?"
    • Verify generated guidance matches official documentation
    • If issues found: Improve skills and re-test
    • Skills are now your knowledge artifacts

Duration Estimate: 40 minutes

File Outputs:

  • .claude/skills/voice-telephony/SKILL.md
  • .claude/skills/web-audio-capture/SKILL.md

Prerequisites:

  • Part 10 completed (chat interfaces)
  • Chapters 79-83 completed (voice frameworks and direct APIs)
  • LiveKit Agents and Pipecat skills from Chapters 80-81

Try With AI Prompts:

  1. Draft Your Dual-Skill LEARNING-SPEC.md

    I need to build TWO related skills for voice integration:
    1. voice-telephony - connecting to phone networks
    2. web-audio-capture - browser-based audio

    Help me write a LEARNING-SPEC.md that covers BOTH skills:

    My context:
    - I've mastered LiveKit Agents (Chapter 80) and Pipecat (Chapter 81)
    - I understand direct APIs (Chapters 82-83)
    - My goal is connecting my Task Manager to real phones AND browsers

    Help me define:
    1. What aspects of each domain should I focus on?
    2. How do these skills complement each other?
    3. What success criteria prove I've learned both?
    4. What's explicitly out of scope (skill boundaries)?

    Make this MY specification, covering both domains coherently.

    What you're learning: Dual-domain specification - defining related but distinct skill boundaries.

  2. Analyze Telephony Documentation

    I fetched documentation for Twilio Voice, Telnyx, and LiveKit SIP.
    Here are the key sections:
    [paste relevant excerpts from /fetching-library-docs output]

    Help me understand:
    1. When should I use LiveKit's native SIP vs Twilio webhooks?
    2. What's Telnyx's pricing advantage (~$0.002/min)?
    3. What patterns appear in call flow handling?
    4. What compliance considerations exist (recording, GDPR)?

    I want a GROUNDED voice-telephony skill, not assumptions.

    What you're learning: Provider comparison - understanding when to use which telephony approach.

  3. Review Your Generated Skills

    Here are the skills /skill-creator generated:

    voice-telephony:
    [paste SKILL.md content]

    web-audio-capture:
    [paste SKILL.md content]

    Compare each to official documentation:
    1. Does voice-telephony accurately represent Twilio/Telnyx integration?
    2. Does web-audio-capture correctly describe AudioWorklet patterns?
    3. Are there claims not supported by docs?
    4. What's missing that should be included?

    Help me make BOTH skills ACCURATE, not just comprehensive.

    What you're learning: Multi-skill validation - ensuring related skills are consistent and accurate.


Lesson 1: Phone Integration with SIP & Twilio

Title: Phone Integration with SIP & Twilio

Learning Objectives:

  • Explain SIP protocol fundamentals and how LiveKit provides native SIP support
  • Implement inbound voice calls via Twilio Voice (receive calls to your voice agent)
  • Implement outbound voice calls via Twilio Voice (agent calls users)
  • Configure Telnyx as cost-effective alternative with ~$0.002/min pricing
  • Handle 8kHz PSTN audio quality considerations (reduced model advantages)

Stage: Layer 2 (AI Collaboration) - Use skill to build, improve skill based on learnings

CEFR Proficiency: B1

New Concepts (count: 3):

  1. SIP protocol and LiveKit native SIP support
  2. Twilio Voice integration (inbound + outbound)
  3. Telnyx as cost-effective telephony alternative

Cognitive Load Validation: 3 concepts <= 10 limit (B1) -> WITHIN LIMIT

Maps to Evals: #2 (SIP Understanding), #3 (Twilio Integration), #4 (Cost Optimization)

Key Sections:

  1. SIP Protocol Fundamentals (~8 min)

    • What is SIP: Session Initiation Protocol for voice/video calls
    • SIP vs HTTP webhooks: When to use each approach
    • PSTN: Public Switched Telephone Network (traditional phone system)
    • LiveKit native SIP: Direct SIP trunk integration without middleware
    • Why this matters: SIP = lower latency, HTTP webhooks = simpler setup
    • Diagram: SIP call flow from PSTN to voice agent
  2. LiveKit Native SIP Support (~7 min)

    • How LiveKit handles SIP: Direct trunk connection
    • SIP trunk providers: What they provide (phone numbers, routing)
    • Configuration: Connecting SIP trunk to LiveKit
    • When to use: High-volume, low-latency requirements
    • Code: LiveKit SIP trunk configuration
  3. Twilio Voice Integration (~12 min)

    • Inbound calls:
      • Purchase Twilio phone number
      • Configure webhook URL for incoming calls
      • TwiML response to connect to voice agent
      • Code: Flask/FastAPI webhook handler
    • Outbound calls:
      • Initiate call via Twilio REST API
      • Connect outbound call to voice agent session
      • Code: Programmatic outbound calling
    • Integration patterns: How Twilio connects to LiveKit/Pipecat
  4. Telnyx Cost Optimization (~8 min)

    • Pricing comparison:
      • Twilio: ~$0.0085/min + $1/month per number
      • Telnyx: ~$0.002/min + similar number costs
    • When to use Telnyx: High volume, cost-sensitive applications
    • API comparison: Similar patterns, different endpoints
    • Code: Telnyx integration (parallel to Twilio)
    • Trade-offs: Feature parity, support, reliability
  5. 8kHz PSTN Audio Considerations (~5 min)

    • PSTN audio quality: 8kHz sample rate (vs 16kHz+ for web)
    • Impact on models: Native S2S advantages reduced at 8kHz
    • Recommendation: Cascaded pipeline often better for phone
    • Code: Configuring voice agent for PSTN audio quality
  6. Improve Your Skill (~5 min)

    • Reflect: What telephony patterns did you learn?
    • Update .claude/skills/voice-telephony/SKILL.md
    • Add: Provider selection decision tree
    • Test: Does improved skill guide Twilio setup correctly?

Duration Estimate: 45 minutes

Three Roles Integration (Layer 2):

AI as Teacher:

  • Skill explains SIP vs webhook tradeoffs you didn't know
  • "For high-volume call centers, native SIP reduces latency by 50-100ms vs webhooks"

AI as Student:

  • You refine skill's cost comparison based on your research
  • "Telnyx pricing section needs specific per-minute rates for comparison"

AI as Co-Worker:

  • Iterate on Twilio integration until calls connect correctly
  • First attempt misses TwiML response -> AI suggests fix -> you validate with test call

Try With AI Prompts:

  1. Understand SIP vs Webhooks

    I'm learning phone integration for voice agents. Use my voice-telephony
    skill to help me understand:

    1. When should I use LiveKit's native SIP vs Twilio webhooks?
    2. What's the latency difference between approaches?
    3. What infrastructure do I need for each?
    4. Which approach works better with my existing LiveKit agent?

    Use diagrams or flow charts to clarify the decision.

    What you're learning: Architecture selection - choosing the right telephony approach for your context.

  2. Implement Twilio Inbound Calls

    Help me implement inbound Twilio Voice calls that connect to my
    LiveKit voice agent:

    Requirements:
    - Twilio phone number: [your number]
    - Voice agent: Running on LiveKit (from Chapter 80)
    - Webhook: FastAPI endpoint for incoming calls
    - Response: Connect caller to voice agent room

    Walk me through the code step by step. After we build it, I'll make
    a test call and report what works.

    What you're learning: Inbound telephony - receiving real phone calls to your voice agent.

  3. Compare Provider Costs

    I need to choose between Twilio and Telnyx for my Task Manager's
    phone integration. My constraints:

    - Expected volume: 1,000-5,000 minutes/month
    - Budget sensitivity: Cost matters for this use case
    - Feature needs: Inbound + outbound, basic IVR
    - Reliability: Must work 99.9% of time

    Use my voice-telephony skill to:
    1. Calculate monthly costs for both providers
    2. Compare feature parity
    3. Recommend which to use and why

    I'll validate your cost estimates against current pricing pages.

    What you're learning: Cost optimization - making informed provider decisions based on usage patterns.


Lesson 2: Browser Audio Capture & VAD

Title: Browser Audio Capture & VAD

Learning Objectives:

  • Implement microphone capture using Web Audio API and getUserMedia
  • Configure AudioWorklet for low-latency browser audio processing
  • Run Silero VAD in browser via WebAssembly (<1ms per 30ms audio chunk)
  • Compare WebRTC vs WebSocket for browser-server audio transport
  • Connect browser audio to voice agents (LiveKit or Pipecat)

Stage: Layer 2 (AI Collaboration) - Advanced features, skill improvement

CEFR Proficiency: B1-B2

New Concepts (count: 3):

  1. Web Audio API + getUserMedia for microphone capture
  2. AudioWorklet for low-latency processing
  3. Silero VAD in browser via WebAssembly

Cognitive Load Validation: 3 concepts <= 10 limit (B1-B2) -> WITHIN LIMIT

Maps to Evals: #5 (Browser Audio Mastery), #6 (VAD Implementation)

Key Sections:

  1. Web Audio API Fundamentals (~7 min)

    • Browser audio context: The audio processing graph
    • Security requirement: HTTPS context mandatory for microphone access
    • getUserMedia: Requesting microphone permissions
    • Audio graph: Source -> Processing -> Destination
    • Code: Basic microphone capture
  2. AudioWorklet for Low Latency (~10 min)

    • Why AudioWorklet: Replaces deprecated ScriptProcessorNode
    • How it works: Audio processing in separate thread
    • Latency: ~128 samples (2.67ms at 48kHz) vs 256+ for legacy
    • Implementation:
      • AudioWorkletProcessor class (runs in worklet thread)
      • Registration and instantiation
      • Message passing between main thread and worklet
    • Code: Complete AudioWorklet pipeline
  3. Silero VAD in Browser (~12 min)

    • What is Silero VAD: Voice Activity Detection model (~2MB)
    • Performance: <1ms per 30ms audio chunk
    • WebAssembly integration: Running ML model in browser
    • ONNX Runtime Web: Executing VAD model
    • Why client-side VAD:
      • Reduces server round-trips
      • Lower latency for speech detection
      • Bandwidth savings (only send speech)
    • Code: Silero VAD WASM integration
  4. WebRTC vs WebSocket Transport (~10 min)

    • WebRTC:
      • Designed for realtime audio/video
      • NAT traversal built-in
      • Adaptive bitrate
      • Used by: LiveKit, Daily
    • WebSocket:
      • Simpler implementation
      • No NAT traversal (server must be reachable)
      • Fixed encoding
      • Used by: OpenAI Realtime direct connection
    • Decision matrix: When to use each
    • Code: Both transport implementations
  5. Connecting to Voice Agents (~6 min)

    • LiveKit integration: Browser SDK connects to LiveKit room
    • Pipecat integration: WebSocket transport for browser audio
    • Complete flow: getUserMedia -> AudioWorklet -> VAD -> Transport -> Agent
    • Code: End-to-end browser voice integration
  6. Improve Your Skill (~5 min)

    • Reflect: What browser audio patterns did you learn?
    • Update .claude/skills/web-audio-capture/SKILL.md
    • Add: AudioWorklet setup guidance, VAD integration
    • Test: Can skill guide browser audio capture correctly?

Duration Estimate: 50 minutes

Three Roles Integration (Layer 2):

AI as Teacher:

  • Skill explains AudioWorklet architecture you didn't know
  • "AudioWorkletProcessor runs in a separate thread, avoiding main thread jank"

AI as Student:

  • You provide browser compatibility requirements
  • "Need to support Safari - check AudioWorklet polyfill options"

AI as Co-Worker:

  • Iterate on Silero VAD integration until detection works
  • First attempt: WASM loading fails -> debug together -> fix async loading

Try With AI Prompts:

  1. Implement Microphone Capture

    I want to capture microphone audio in the browser for my voice agent.
    Use my web-audio-capture skill to help me:

    Requirements:
    - HTTPS context (production deployment)
    - Low latency (< 50ms processing delay)
    - Works on Chrome, Firefox, Safari
    - Outputs PCM audio at 16kHz

    Help me implement:
    1. getUserMedia with proper constraints
    2. AudioWorklet for processing
    3. Error handling for permission denied

    I'll test in each browser and report what works.

    What you're learning: Cross-browser audio capture - handling the messy reality of browser APIs.

  2. Run Silero VAD in Browser

    I want to detect speech client-side using Silero VAD in WebAssembly.
    This reduces server round-trips and latency.

    Use my web-audio-capture skill to help me:

    1. Load the Silero VAD ONNX model (~2MB)
    2. Run inference in AudioWorklet (< 1ms per chunk)
    3. Send speech/non-speech events to main thread
    4. Only transmit audio when speech detected

    Walk me through the WASM integration. I'll test with real speech
    and measure latency.

    What you're learning: Client-side ML - running voice models directly in the browser.

  3. Choose Transport Layer

    I need to send browser audio to my voice agent. Help me choose:

    My constraints:
    - Users may be on restrictive networks (corporate firewalls)
    - Latency critical (< 300ms end-to-end)
    - Backend: LiveKit-based voice agent from Chapter 80
    - Scale: 100-500 concurrent users

    Use my web-audio-capture skill to:
    1. Compare WebRTC vs WebSocket for my use case
    2. Explain NAT traversal implications
    3. Recommend approach with justification

    I'll validate with network tests behind my company firewall.

    What you're learning: Transport selection - understanding real-world network constraints.


Lesson 3: Production Telephony Patterns

Title: Production Telephony Patterns

Learning Objectives:

  • Implement IVR (Interactive Voice Response) patterns with menu navigation
  • Configure call recording with compliance considerations (GDPR, consent)
  • Design failover and redundancy for high-availability telephony
  • Optimize costs across provider mix (Twilio + Telnyx hybrid)
  • Finalize both skills for production use

Stage: Layer 3 (Intelligence Design) + Layer 4 elements (production patterns)

CEFR Proficiency: B2

New Concepts (count: 3):

  1. IVR patterns and menu navigation
  2. Call recording with compliance
  3. Failover and redundancy design

Cognitive Load Validation: 3 concepts <= 10 limit (B2) -> WITHIN LIMIT

Maps to Evals: #7 (Production Patterns)

Key Sections:

  1. IVR Pattern Implementation (~10 min)

    • What is IVR: Automated phone menus ("Press 1 for sales...")
    • Modern IVR: Natural language intent detection vs DTMF tones
    • Implementation approaches:
      • TwiML for simple menus
      • AI intent classification for natural conversation
    • Common patterns:
      • Initial greeting and intent gathering
      • Department routing
      • Queue position updates
      • Callback scheduling
    • Code: Complete IVR flow with voice agent integration
  2. Call Queuing, Transfer, and Hold (~8 min)

    • Queuing: Managing concurrent calls beyond agent capacity
      • Queue position announcements
      • Estimated wait time
      • Music/message on hold
    • Transfer: Moving calls between agents
      • Warm transfer (context preserved)
      • Cold transfer (no context)
      • Transfer to external numbers
    • Hold: Pausing active conversation
      • Hold music configuration
      • Return to conversation
    • Code: Queue and transfer implementation
  3. Call Recording and Compliance (~10 min)

    • Why record: Training, compliance, dispute resolution
    • Compliance requirements:
      • GDPR: Explicit consent, right to deletion
      • CCPA: California privacy requirements
      • Two-party consent states (US)
    • Implementation:
      • Recording announcement ("This call may be recorded...")
      • Consent capture
      • Secure storage
      • Retention policies
    • Code: Compliant call recording setup
  4. Failover and Redundancy (~10 min)

    • Failure scenarios:
      • Provider outage (Twilio down)
      • Network issues (SIP trunk failure)
      • Agent unavailability
    • Failover strategies:
      • Multi-provider: Telnyx as Twilio backup
      • Geographic redundancy: Multiple regions
      • Graceful degradation: Voicemail when agents unavailable
    • Health checks and monitoring
    • Code: Multi-provider failover configuration
  5. Cost Optimization Strategies (~7 min)

    • Provider mix: High-volume on Telnyx, premium on Twilio
    • Call routing: Cheapest route first (Least Cost Routing)
    • Off-peak pricing: Scheduling outbound calls
    • Monitoring: Cost dashboards and alerts
    • Example: 50% cost reduction with hybrid approach
    • Code: Cost-optimized routing configuration
  6. Finalize Both Skills (~5 min)

    • Complete voice-telephony skill review:
      • IVR patterns included?
      • Compliance guidance present?
      • Failover strategies documented?
    • Complete web-audio-capture skill review:
      • AudioWorklet patterns complete?
      • VAD integration guidance present?
      • Transport selection covered?
    • Final test: Use skills to scaffold production telephony system
    • Commit: Production-ready skill artifacts

Duration Estimate: 50 minutes

Three Roles Integration (Layer 3 + Layer 4):

AI as Teacher:

  • Skill guides compliance requirements you didn't know
  • "Two-party consent states require ALL parties agree to recording - announce it first"

AI as Student:

  • You teach skill your production constraints
  • "My users are in EU - add GDPR-specific consent flow"

AI as Co-Worker:

  • Design failover architecture together
  • You specify uptime requirements -> AI generates multi-provider config -> You validate in staging

Skill Finalization: At lesson end, students have production-ready skills that:

voice-telephony:

  • Guides SIP and webhook integration
  • Includes Twilio + Telnyx setup
  • Provides IVR patterns and call flows
  • Covers compliance and recording
  • Includes failover strategies

web-audio-capture:

  • Scaffolds getUserMedia + AudioWorklet
  • Guides Silero VAD WASM integration
  • Covers transport layer selection
  • Includes browser compatibility guidance

Try With AI Prompts:

  1. Design IVR Flow

    I want to build an IVR for my Task Manager phone agent:

    Flow requirements:
    - Greeting: "Hi, this is Task Manager. How can I help?"
    - Intent detection: Natural language, not "press 1 for..."
    - Routing: Task queries -> Task Agent, Billing -> Billing Agent
    - Fallback: Unclear intent -> Human escalation

    Use my voice-telephony skill to help me design:
    1. The complete IVR flow diagram
    2. Intent classification approach
    3. Context passing between IVR and agent
    4. Error handling for unrecognized intent

    I'll implement and test with real callers.

    What you're learning: Modern IVR design - natural conversation, not phone trees.

  2. Implement Compliant Recording

    My voice agent needs call recording for:
    - Training data collection
    - Dispute resolution
    - Quality assurance

    But I have users in:
    - United States (various states)
    - European Union (GDPR)

    Use my voice-telephony skill to help me:
    1. What consent announcements are required?
    2. How do I capture and store consent?
    3. What retention policies should I implement?
    4. How do I handle deletion requests?

    Walk me through compliant recording setup.

    What you're learning: Compliance engineering - building legally compliant voice systems.

  3. Design Failover Architecture

    My production voice agent needs 99.9% uptime. Current setup:
    - Primary: Twilio Voice
    - Voice agent: LiveKit on Kubernetes
    - Expected volume: 10,000 calls/month

    Use my voice-telephony skill to design failover:
    1. What if Twilio has an outage?
    2. What if my LiveKit cluster goes down?
    3. What if network connectivity fails?
    4. How do I monitor and alert on failures?

    I need a multi-provider, multi-region architecture.
    Walk me through the design and implementation.

    What you're learning: High-availability design - operating voice agents at production scale.


IV. Skill Dependency Graph

Skill Dependencies:

Lesson 0: Build Both Skills (foundation)

Lesson 1: Phone Integration (requires voice-telephony skill)

Lesson 2: Browser Audio (requires web-audio-capture skill)

Lesson 3: Production Patterns (requires both skills + all concepts)

Cross-Chapter Dependencies:

  • Requires: Chapter 79 (Voice AI Fundamentals) - architecture mental models
  • Requires: Chapter 80 (LiveKit Agents) - framework integration points
  • Requires: Chapter 81 (Pipecat) - alternative framework patterns
  • Requires: Chapter 82 (OpenAI Realtime) - direct API for custom scenarios
  • Requires: Chapter 83 (Gemini Live) - multimodal integration
  • Requires: Part 7 (Cloud Native) - Kubernetes deployment patterns
  • Prepares for: Chapter 85 (Capstone) - production voice agent with phone + browser

V. Assessment Plan

Formative Assessments (During Lessons)

  • Lesson 0: Dual-skill generation verification (both skills work, match docs)
  • Lesson 1: Twilio call integration test (make and receive test call)
  • Lesson 2: Browser audio demo (capture and stream to agent)
  • Lesson 3: IVR flow test (complete call journey works)

Summative Assessment (End of Chapter)

Chapter 84 Quiz:

  1. SIP: Explain when to use native SIP vs HTTP webhooks
  2. Cost: Compare Twilio and Telnyx pricing for 5,000 min/month
  3. Browser Audio: Describe AudioWorklet's latency advantages
  4. VAD: Explain why client-side Silero VAD reduces latency
  5. Compliance: What consent is required for call recording in EU?

Practical Assessment:

  • Receive inbound call via Twilio to your voice agent
  • Capture browser audio and stream to voice agent
  • Demonstrate IVR flow with intent detection
  • Show call recording with consent announcement

VI. Validation Checklist

Chapter-Level Validation:

  • Chapter type identified: TECHNICAL (SKILL-FIRST L00 Pattern)
  • Concept density analysis documented: 9 concepts across 4 lessons
  • Lesson count justified: 4 lessons (~2.25 concepts each, within B1-B2 limit)
  • All evals covered by lessons
  • All lessons map to at least one eval

Stage Progression Validation:

  • Lesson 0: Layer 1 + Layer 2 (dual-skill creation with AI collaboration)
  • Lessons 1-2: Layer 2 (AI collaboration, skill improvement)
  • Lesson 3: Layer 3 + Layer 4 elements (intelligence design, production patterns)
  • No premature spec-driven content (that's Chapter 85 Capstone)

Cognitive Load Validation:

  • Lesson 0: 2 concepts <= 10 (B1 limit) PASS
  • Lesson 1: 3 concepts <= 10 (B1 limit) PASS
  • Lesson 2: 3 concepts <= 10 (B1-B2 limit) PASS
  • Lesson 3: 3 concepts <= 10 (B2 limit) PASS

L00 Pattern Requirements:

  • Lesson 0 creates TWO skills from official documentation
  • Fresh clone of skills-lab (no state assumptions)
  • LEARNING-SPEC.md written before skill creation (covers both skills)
  • /fetching-library-docs used for documentation
  • Both skills tested and verified before proceeding
  • Each subsequent lesson TESTS and IMPROVES relevant skill
  • "Improve Your Skill" section in each lesson

Three Roles Validation (Layer 2 lessons):

  • Each Layer 2 lesson demonstrates AI as Teacher
  • Each Layer 2 lesson demonstrates AI as Student
  • Each Layer 2 lesson demonstrates AI as Co-Worker (convergence)

Canonical Source Validation:

  • Skills format follows .claude/skills/<name>/SKILL.md pattern
  • Lesson 0 references /fetching-library-docs for official docs
  • Telephony patterns align with real provider documentation (Twilio, Telnyx)
  • Browser audio patterns align with MDN Web Audio API documentation

VII. File Structure

66-phone-browser-integration/
├── _category_.json # Existing
├── README.md # Chapter overview (create)
├── 00-build-your-skills.md # Lesson 0: L00 pattern - BOTH skills (create)
├── 01-phone-integration.md # Lesson 1 (create)
├── 02-browser-audio-vad.md # Lesson 2 (create)
├── 03-production-patterns.md # Lesson 3 (create)
└── 04-chapter-quiz.md # Assessment (create)

VIII. Summary

Chapter 84: Phone & Browser Integration is a 4-lesson SKILL-FIRST technical chapter:

LessonTitleConceptsDurationEvals
0Build Your Integration Skills240 min#1
1Phone Integration with SIP & Twilio345 min#2, #3, #4
2Browser Audio Capture & VAD350 min#5, #6
3Production Telephony Patterns350 min#7

Total: 9 concepts, ~185 minutes, creates TWO production-ready skills

Skill Outputs:

  • .claude/skills/voice-telephony/SKILL.md - Phone integration patterns grounded in Twilio/Telnyx documentation
  • .claude/skills/web-audio-capture/SKILL.md - Browser audio capture patterns grounded in MDN/Silero documentation

IX. Technical Reference

Key Technical Facts (from Research Report)

Telephony:

  • LiveKit has native SIP telephony support (direct trunk integration)
  • Telnyx pricing: ~$0.002/min for telephony vs ~$0.0085/min Twilio
  • PSTN audio quality: 8kHz reduces native S2S model advantages

Browser Audio:

  • Web Audio API requires HTTPS context for microphone access
  • AudioWorklet latency: ~128 samples (2.67ms at 48kHz)
  • Silero VAD: <1ms per 30ms audio chunk, ~2MB model

Transport:

  • WebRTC: NAT traversal, adaptive bitrate, production standard
  • WebSocket: Simpler, no NAT traversal, good for prototyping

Integration Points with Previous Chapters

Previous ChapterIntegration Point
Chapter 80 (LiveKit)SIP trunk connects to LiveKit room
Chapter 81 (Pipecat)WebSocket transport for browser audio
Chapter 82 (OpenAI Realtime)Direct WebSocket for custom scenarios
Chapter 83 (Gemini Live)Multimodal browser integration