Skip to main content

Chapter 79: Voice AI Fundamentals

Build mental models before building systems. This chapter creates a voice-foundations skill covering architectures, latency targets, and the modern voice stack.


Goals

  • Understand the voice AI landscape (frameworks vs. direct APIs)
  • Map the STT → LLM → TTS pipeline and latency budgets
  • Learn transport options (WebRTC, WebSockets, HTTP streaming)
  • Capture fundamentals in a reusable voice foundations skill

Lesson Progression

  • Voice landscape and use cases
  • Latency budgets and quality targets
  • Architecture options: frameworks vs. raw APIs
  • Capstone: documented voice foundations skill

Outcome & Method

You finish with a concise voice fundamentals skill that informs all later implementation chapters.


Prerequisites

  • Parts 6-9 (agent APIs, deployment, TypeScript async patterns)