Chapter 79: Voice AI Fundamentals
Build mental models before building systems. This chapter creates a voice-foundations skill covering architectures, latency targets, and the modern voice stack.
Goals
- Understand the voice AI landscape (frameworks vs. direct APIs)
- Map the STT → LLM → TTS pipeline and latency budgets
- Learn transport options (WebRTC, WebSockets, HTTP streaming)
- Capture fundamentals in a reusable voice foundations skill
Lesson Progression
- Voice landscape and use cases
- Latency budgets and quality targets
- Architecture options: frameworks vs. raw APIs
- Capstone: documented voice foundations skill
Outcome & Method
You finish with a concise voice fundamentals skill that informs all later implementation chapters.
Prerequisites
- Parts 6-9 (agent APIs, deployment, TypeScript async patterns)