Function Calling in Voice
The Realtime API does not just speak—it acts. Through function calling, your voice agent executes real operations: creating tasks, querying databases, sending notifications. But voice-triggered actions require different UX patterns than chat-based tools.
This lesson teaches you to implement voice functions that feel natural: filler speech during async operations, confirmation for destructive actions, and graceful error recovery.
Voice Function Calling: The Difference
In chat interfaces, tool calls are invisible—users wait for a response. In voice, silence is awkward. A 3-second database query in chat is fine. A 3-second silence in voice feels broken.
Chat Tool Call
User: "Create a task to review the proposal"
[2 second wait - acceptable]
Agent: "I've created your task."
Voice Tool Call (Wrong)
User: "Create a task to review the proposal"
[2 seconds of silence - feels broken]
Agent: "I've created your task."
Voice Tool Call (Right)
User: "Create a task to review the proposal"
Agent: "Creating that task for you..."
[function executes]
Agent: "Done. I've added 'review the proposal' to your task list."
The pattern: Fill silence with acknowledgment. Confirm completion with results.
Defining Voice Functions
Function definitions for the Realtime API follow the OpenAI function calling schema, but require voice-specific considerations.
Basic Function Schema
task_functions = [
{
"type": "function",
"name": "create_task",
"description": "Create a new task for the user. Use when the user asks to add, create, or make a new task.",
"parameters": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The task title. Extract the core action from the user's request."
},
"due_date": {
"type": "string",
"description": "When the task is due, in YYYY-MM-DD format. Parse natural language like 'tomorrow', 'next Friday', 'end of week'."
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high"],
"description": "Task priority. Default to 'medium' if not specified."
}
},
"required": ["title"]
}
},
{
"type": "function",
"name": "list_tasks",
"description": "List the user's tasks. Use when user asks what tasks they have, what's on their list, or what's due.",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["all", "pending", "completed", "overdue"],
"description": "Which tasks to show. Default to 'pending' if not specified."
},
"due_filter": {
"type": "string",
"enum": ["today", "tomorrow", "this_week", "all"],
"description": "Filter by due date. Default to 'all' if not specified."
}
}
}
},
{
"type": "function",
"name": "complete_task",
"description": "Mark a task as complete. Use when user says they finished, completed, or done with a task.",
"parameters": {
"type": "object",
"properties": {
"task_identifier": {
"type": "string",
"description": "How the user referred to the task. Could be title, number, or description."
}
},
"required": ["task_identifier"]
}
},
{
"type": "function",
"name": "delete_task",
"description": "Permanently delete a task. This is destructive and cannot be undone. Requires confirmation.",
"parameters": {
"type": "object",
"properties": {
"task_identifier": {
"type": "string",
"description": "How the user referred to the task to delete."
},
"confirmed": {
"type": "boolean",
"description": "Whether the user has confirmed deletion. Must be true to proceed."
}
},
"required": ["task_identifier", "confirmed"]
}
}
]
Voice-Specific Description Guidelines
| Guideline | Why | Example |
|---|---|---|
| Include trigger phrases | Model knows when to call | "Use when user asks to add, create, or make..." |
| Mention defaults | Reduces clarification requests | "Default to 'medium' if not specified" |
| Flag destructive actions | Enables confirmation logic | "This is destructive and cannot be undone" |
| Describe natural language parsing | Improves extraction | "Parse natural language like 'tomorrow', 'next Friday'" |
Registering Functions with the Session
Functions are registered during session configuration:
def send_session_config(data_channel, instructions: str, functions: list):
"""Configure session with voice functions."""
config = {
"type": "session.update",
"session": {
"instructions": f"""
{instructions}
When using tools:
- Acknowledge the request before calling the function
- Provide brief filler while waiting ("Let me check that...")
- Confirm results clearly and concisely
- If an error occurs, explain what happened and suggest next steps
""",
"tools": functions,
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"silence_duration_ms": 500
},
"input_audio_format": "pcm16",
"output_audio_format": "pcm16"
}
}
data_channel.send(json.dumps(config))
Handling Function Calls
When the model decides to call a function, you receive events through the DataChannel:
Function Call Event Flow
Model decides to call function
│
├── response.function_call_arguments.delta (streaming arguments)
│
├── response.function_call_arguments.done (arguments complete)
│
└── You execute function and send result
Implementation
class FunctionHandler:
"""Handles voice function calls."""
def __init__(self, data_channel, task_service):
self.data_channel = data_channel
self.task_service = task_service
self.pending_calls = {} # call_id -> accumulated arguments
async def handle_event(self, event: dict):
"""Process function-related events."""
event_type = event.get("type")
if event_type == "response.function_call_arguments.delta":
# Accumulate streaming arguments
call_id = event["call_id"]
delta = event.get("delta", "")
if call_id not in self.pending_calls:
self.pending_calls[call_id] = {
"name": event["name"],
"arguments": ""
}
self.pending_calls[call_id]["arguments"] += delta
elif event_type == "response.function_call_arguments.done":
# Arguments complete, execute function
call_id = event["call_id"]
call_info = self.pending_calls.pop(call_id, None)
if call_info:
result = await self._execute_function(
call_info["name"],
json.loads(call_info["arguments"])
)
self._send_result(call_id, result)
async def _execute_function(self, name: str, args: dict) -> dict:
"""Execute the function and return result."""
print(f"[function] Executing {name} with {args}")
try:
if name == "create_task":
task = await self.task_service.create(
title=args["title"],
due_date=args.get("due_date"),
priority=args.get("priority", "medium")
)
return {
"success": True,
"message": f"Created task: {task.title}",
"task_id": task.id,
"due_date": task.due_date
}
elif name == "list_tasks":
tasks = await self.task_service.list(
status=args.get("status", "pending"),
due_filter=args.get("due_filter", "all")
)
return {
"success": True,
"count": len(tasks),
"tasks": [
{"title": t.title, "due": t.due_date, "priority": t.priority}
for t in tasks
]
}
elif name == "complete_task":
task = await self.task_service.complete(args["task_identifier"])
return {
"success": True,
"message": f"Completed: {task.title}"
}
elif name == "delete_task":
if not args.get("confirmed"):
return {
"success": False,
"needs_confirmation": True,
"message": "Deletion requires confirmation. Ask user to confirm."
}
await self.task_service.delete(args["task_identifier"])
return {
"success": True,
"message": "Task deleted permanently"
}
else:
return {"success": False, "error": f"Unknown function: {name}"}
except Exception as e:
return {
"success": False,
"error": str(e),
"suggestion": "Please try again or rephrase your request."
}
def _send_result(self, call_id: str, result: dict):
"""Send function result back to the model."""
response = {
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": call_id,
"output": json.dumps(result)
}
}
self.data_channel.send(json.dumps(response))
# Trigger response generation
self.data_channel.send(json.dumps({"type": "response.create"}))
Output:
[function] Executing create_task with {'title': 'review proposal', 'due_date': '2025-01-03'}
[realtime] Function result sent
[agent] Done. I've created a task to review the proposal, due on Friday.
Filler Speech Patterns
Users expect feedback during async operations. The model can generate filler speech before executing functions.
Instructing the Model for Filler
Include filler guidance in your instructions:
instructions = """
You are a voice assistant for Task Manager.
**Before calling any function:**
1. Acknowledge the request naturally: "Sure, I'll do that" or "Let me check"
2. For operations that might take time, add brief filler: "One moment..."
**After function results:**
1. Summarize the result concisely
2. If there are multiple items, list the top 3 and mention the total
3. Offer next steps: "Would you like to add another?" or "Anything else?"
**Filler examples:**
- Creating task: "Sure, creating that task for you..."
- Listing tasks: "Let me pull up your tasks..."
- Completing task: "Marking that as done..."
- Error: "Hmm, I ran into an issue..."
"""
Filler Timing
The model generates filler before the function call event:
User: "What tasks do I have today?"
│
▼
Model generates filler: "Let me check your tasks for today..."
│
▼
Function call event: list_tasks(due_filter="today")
│
▼
Your code executes function (may take 500ms)
│
▼
Send result to model
│
▼
Model generates response: "You have 3 tasks today: review proposal,
send invoices, and team standup."
The user hears filler immediately, then the result—no awkward silence.
Custom Filler for Long Operations
For operations exceeding 2-3 seconds, consider multi-stage filler:
async def execute_with_progress(self, name: str, args: dict):
"""Execute function with progress updates."""
# Stage 1: Immediate acknowledgment (model-generated)
# Already handled by instructions
# Stage 2: If operation is slow, send interim status
operation = asyncio.create_task(self._execute_function(name, args))
await asyncio.sleep(2) # Wait 2 seconds
if not operation.done():
# Send interim status through DataChannel
self._speak("Still working on that, almost there...")
return await operation
def _speak(self, text: str):
"""Make the model speak specific text."""
event = {
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": text}]
}
}
self.data_channel.send(json.dumps(event))
self.data_channel.send(json.dumps({"type": "response.create"}))
Confirmation Patterns for Destructive Actions
Not all actions should execute immediately. Destructive operations need confirmation.
Identifying Destructive Actions
| Action | Destructive? | Why |
|---|---|---|
| Create task | No | Easily undone |
| Complete task | Maybe | Could revert, but context-dependent |
| Delete task | Yes | Permanent data loss |
| Clear all tasks | Yes | Massive data loss |
| Send email | Yes | Cannot unsend |
| Cancel subscription | Yes | May have consequences |
Two-Step Confirmation Pattern
# Step 1: Function called without confirmation
def handle_delete_request(self, task_identifier: str, confirmed: bool):
if not confirmed:
# Tell model to ask for confirmation
return {
"success": False,
"needs_confirmation": True,
"task_to_delete": task_identifier,
"message": "Please confirm deletion by asking: 'Are you sure you want to delete this task?'"
}
# Step 2: User confirms, function called again with confirmed=True
self.task_service.delete(task_identifier)
return {"success": True, "message": "Task deleted"}
Conversation Flow
User: "Delete the proposal review task"
Agent: "You want to delete 'review proposal'. This can't be undone.
Should I go ahead and delete it?"
User: "Yes, delete it"
Agent: "Done. I've permanently deleted the proposal review task."
Timeout for Confirmation
Confirmations should expire:
class ConfirmationManager:
"""Manages pending confirmations with timeout."""
def __init__(self, timeout_seconds: int = 30):
self.pending = {} # action_id -> (action_data, timestamp)
self.timeout = timeout_seconds
def request_confirmation(self, action_id: str, action_data: dict) -> dict:
"""Store pending confirmation."""
self.pending[action_id] = (action_data, time.time())
return {
"needs_confirmation": True,
"action_id": action_id,
"message": f"Confirm: {action_data['description']}?"
}
def confirm(self, action_id: str) -> dict | None:
"""Retrieve confirmed action if still valid."""
if action_id not in self.pending:
return None
action_data, timestamp = self.pending.pop(action_id)
if time.time() - timestamp > self.timeout:
return {
"expired": True,
"message": "Confirmation expired. Please try again."
}
return action_data
def cleanup_expired(self):
"""Remove expired confirmations."""
now = time.time()
expired = [
action_id for action_id, (_, ts) in self.pending.items()
if now - ts > self.timeout
]
for action_id in expired:
del self.pending[action_id]
Error Handling in Voice
Errors in voice context need different handling than chat:
Error Categories
| Category | Voice Response | Example |
|---|---|---|
| User Error | Clarify and suggest | "I couldn't find a task called 'meeting'. Did you mean 'team meeting'?" |
| Temporary Failure | Apologize and retry offer | "I'm having trouble reaching the task system. Want me to try again?" |
| Permanent Failure | Explain and provide alternative | "I can't delete tasks right now. Try again in a few minutes, or use the app." |
| Unknown Error | Graceful fallback | "Something went wrong. Can you try rephrasing that?" |
Error Response Implementation
def format_error_for_voice(self, error: Exception, function_name: str) -> dict:
"""Convert error to voice-appropriate response."""
error_str = str(error).lower()
# User error: bad input
if "not found" in error_str:
return {
"success": False,
"error_type": "not_found",
"spoken_message": "I couldn't find that task. Could you describe it differently?",
"suggestion": "Try using the exact task title or a unique phrase from it."
}
# Temporary failure: service unavailable
if "timeout" in error_str or "connection" in error_str:
return {
"success": False,
"error_type": "temporary",
"spoken_message": "I'm having trouble reaching the task system right now.",
"suggestion": "Would you like me to try again?"
}
# Permission error
if "permission" in error_str or "unauthorized" in error_str:
return {
"success": False,
"error_type": "permission",
"spoken_message": "You don't have permission to do that.",
"suggestion": "You may need to check your account settings."
}
# Unknown error: graceful fallback
return {
"success": False,
"error_type": "unknown",
"spoken_message": "Something went wrong with that request.",
"suggestion": "Could you try rephrasing? Or we can try again."
}
Retry Pattern
async def execute_with_retry(
self,
name: str,
args: dict,
max_retries: int = 2
) -> dict:
"""Execute function with automatic retry on temporary failures."""
for attempt in range(max_retries + 1):
try:
result = await self._execute_function(name, args)
if result.get("success"):
return result
# Check if retryable
if result.get("error_type") == "temporary" and attempt < max_retries:
await asyncio.sleep(1) # Brief pause
continue
return result
except Exception as e:
if attempt < max_retries:
await asyncio.sleep(1)
continue
return self.format_error_for_voice(e, name)
return {
"success": False,
"error_type": "exhausted",
"spoken_message": "I tried a few times but couldn't complete that. Please try again later."
}
Complete Function Calling Example
Here is a complete voice agent with function calling:
import asyncio
import json
from datetime import datetime, timedelta
# Simulated task service
class TaskService:
def __init__(self):
self.tasks = [
{"id": 1, "title": "Review proposal", "due": "2025-01-02", "priority": "high", "status": "pending"},
{"id": 2, "title": "Send invoices", "due": "2025-01-02", "priority": "medium", "status": "pending"},
{"id": 3, "title": "Team standup", "due": "2025-01-02", "priority": "low", "status": "pending"},
]
self.next_id = 4
async def create(self, title: str, due_date: str = None, priority: str = "medium"):
await asyncio.sleep(0.5) # Simulate database
task = {
"id": self.next_id,
"title": title,
"due": due_date or (datetime.now() + timedelta(days=1)).strftime("%Y-%m-%d"),
"priority": priority,
"status": "pending"
}
self.tasks.append(task)
self.next_id += 1
return type("Task", (), task)()
async def list(self, status: str = "pending", due_filter: str = "all"):
await asyncio.sleep(0.3) # Simulate database
filtered = [t for t in self.tasks if status == "all" or t["status"] == status]
return [type("Task", (), t)() for t in filtered]
async def complete(self, identifier: str):
await asyncio.sleep(0.3)
for task in self.tasks:
if identifier.lower() in task["title"].lower():
task["status"] = "completed"
return type("Task", (), task)()
raise ValueError(f"Task not found: {identifier}")
class VoiceFunctionAgent:
"""Voice agent with function calling."""
def __init__(self, data_channel):
self.data_channel = data_channel
self.task_service = TaskService()
self.pending_calls = {}
def get_functions(self) -> list:
"""Return function definitions for session config."""
return [
{
"type": "function",
"name": "create_task",
"description": "Create a new task. Use when user asks to add, create, or make a task.",
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string", "description": "Task title"},
"due_date": {"type": "string", "description": "Due date in YYYY-MM-DD"},
"priority": {"type": "string", "enum": ["low", "medium", "high"]}
},
"required": ["title"]
}
},
{
"type": "function",
"name": "list_tasks",
"description": "List user's tasks. Use when user asks what tasks they have.",
"parameters": {
"type": "object",
"properties": {
"status": {"type": "string", "enum": ["all", "pending", "completed"]},
"due_filter": {"type": "string", "enum": ["today", "all"]}
}
}
},
{
"type": "function",
"name": "complete_task",
"description": "Mark task complete. Use when user says they finished a task.",
"parameters": {
"type": "object",
"properties": {
"task_identifier": {"type": "string", "description": "Task name or description"}
},
"required": ["task_identifier"]
}
}
]
def get_instructions(self) -> str:
"""Return voice-optimized instructions."""
return """
You are a voice assistant for Task Manager.
When calling functions:
1. Acknowledge first: "Sure, I'll check that" or "Creating that now"
2. After results, summarize concisely
3. For errors, explain simply and suggest next steps
Keep responses under 30 words. Be conversational, not robotic.
"""
async def handle_event(self, event: dict):
"""Process events from DataChannel."""
event_type = event.get("type")
if event_type == "response.function_call_arguments.delta":
call_id = event["call_id"]
if call_id not in self.pending_calls:
self.pending_calls[call_id] = {"name": event["name"], "arguments": ""}
self.pending_calls[call_id]["arguments"] += event.get("delta", "")
elif event_type == "response.function_call_arguments.done":
call_id = event["call_id"]
call_info = self.pending_calls.pop(call_id, None)
if call_info:
result = await self._execute(call_info["name"], call_info["arguments"])
self._send_result(call_id, result)
async def _execute(self, name: str, args_str: str) -> dict:
"""Execute function and return result."""
try:
args = json.loads(args_str) if args_str else {}
print(f"[function] {name}({args})")
if name == "create_task":
task = await self.task_service.create(**args)
return {"success": True, "message": f"Created: {task.title}", "due": task.due}
elif name == "list_tasks":
tasks = await self.task_service.list(**args)
if not tasks:
return {"success": True, "message": "No tasks found", "count": 0}
return {
"success": True,
"count": len(tasks),
"tasks": [{"title": t.title, "due": t.due} for t in tasks[:5]]
}
elif name == "complete_task":
task = await self.task_service.complete(args["task_identifier"])
return {"success": True, "message": f"Completed: {task.title}"}
else:
return {"success": False, "error": f"Unknown: {name}"}
except Exception as e:
return {"success": False, "error": str(e), "spoken": "I couldn't do that. Try again?"}
def _send_result(self, call_id: str, result: dict):
"""Send function result to model."""
self.data_channel.send(json.dumps({
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": call_id,
"output": json.dumps(result)
}
}))
self.data_channel.send(json.dumps({"type": "response.create"}))
Sample Conversation:
User: "What tasks do I have?"
Agent: "Let me check your tasks... You have 3 pending tasks: review proposal
due tomorrow, send invoices also tomorrow, and team standup. Want me
to mark any as complete?"
User: "Create a task to call the client"
Agent: "Sure, creating that... Done. I added 'call the client' to your list,
due tomorrow. Anything else?"
User: "Mark the standup as done"
Agent: "Got it... Team standup is now marked complete. You have 3 tasks left."
Try With AI
Prompt 1: Design Voice Functions
I'm building voice function calling for my Task Manager. Help me design:
1. Function schemas for: create, list, complete, delete, and reschedule tasks
2. Voice-optimized descriptions that trigger correctly
3. Parameter defaults to minimize clarification questions
4. Destructive action flags for confirmation logic
Consider edge cases:
- User says "remind me to..." (is this a task?)
- User says "I'm done with everything" (complete all?)
- User mentions time without date ("at 3pm")
What you are learning: Function design for voice context. Good schemas reduce clarification loops and improve user experience.
Prompt 2: Implement Filler Patterns
My voice agent has these latency characteristics:
- create_task: 500ms average, 2s worst case
- list_tasks: 300ms average, 1s worst case
- external_api_call: 1-5 seconds
Help me design filler speech patterns:
1. What should the model say before each function type?
2. How do I handle operations that might take 5+ seconds?
3. Should filler be model-generated or code-generated?
4. How do I avoid filler for fast operations (feels patronizing)?
Provide example instructions and code.
What you are learning: UX timing patterns. Voice interfaces have stricter timing requirements than visual interfaces.
Prompt 3: Error Recovery Flows
My voice agent encounters these errors. Design voice-appropriate recovery:
1. Task not found: User said "complete the meeting task" but no exact match
2. Network timeout: Database didn't respond in 3 seconds
3. Validation error: User tried to create task with due date in the past
4. Rate limit: Too many requests, need to slow down
5. Permission denied: User tried to delete someone else's task
For each:
- What should the agent say?
- What action should the agent offer?
- How do we avoid user frustration?
Consider: Users are on the phone, can't see a screen, can't easily re-type.
What you are learning: Error UX design. Voice errors need immediate, actionable guidance—users cannot see error codes or retry buttons.
Safety Note
Function calling in voice carries unique risks:
- Misheard commands: "Delete all tasks" vs "Delete Paul's tasks"—implement confirmation for bulk operations
- Unauthorized callers: Voice does not prove identity—require PIN for sensitive operations
- Background noise triggers: TV says "buy ten shares"—your agent should not execute
- Social engineering: Caller pretends to be account owner—implement verification
Test with adversarial scenarios before production. Monitor early calls for unexpected function invocations.