Mocking LLM Calls
Every LLM call costs money. Every mock costs nothing.
You learned in Lesson 1 that an unmocked test suite can cost $1,000-2,000 per month. You also learned that mocked tests run in milliseconds instead of seconds. But knowing the "why" doesn't tell you the "how."
This lesson teaches you respx—the library that makes zero-cost LLM testing possible. By the end, you'll mock OpenAI completions, tool calls, error responses, and Anthropic messages with confidence.
Why respx Works for LLM Mocking
Here's the key insight: respx mocks httpx at the transport layer.
The OpenAI SDK, Anthropic SDK, and most modern Python HTTP clients use httpx under the hood. When you mock at the transport layer, you intercept requests before they leave your machine. The SDK never knows the difference—it receives your mock response as if it came from the real API.
Your Code → OpenAI SDK → httpx → respx intercepts → Mock Response
↑
No network call happens
This architecture means:
- Zero API calls (no charges)
- Instant responses (no network latency)
- Complete control (you define exactly what the LLM "returns")
- Deterministic tests (same mock, same result, every time)
respx Basics: Three Styles
respx offers three ways to set up mocks. Choose based on your testing needs.
Decorator Style (Recommended for Single Tests)
import pytest
import respx
import httpx
@pytest.mark.asyncio
@respx.mock
async def test_openai_completion():
"""Mock with decorator—cleanest for single tests."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(
200,
json={
"choices": [{
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}],
"usage": {"total_tokens": 25}
}
)
)
# Your agent code runs here—it thinks it's calling OpenAI
result = await call_openai("Say hello")
assert "Hello" in result
assert respx.calls.call_count == 1
Output:
tests/test_agent.py::test_openai_completion PASSED
The @respx.mock decorator activates mocking for that test. When the test ends, mocking automatically deactivates.
Context Manager Style (For Explicit Scope)
@pytest.mark.asyncio
async def test_with_context_manager():
"""Mock with context manager—explicit control over scope."""
with respx.mock:
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(200, json={
"choices": [{"message": {"content": "Response"}}]
})
)
result = await call_openai("Test")
assert "Response" in result
# After the `with` block, mocking is disabled
Output:
tests/test_agent.py::test_with_context_manager PASSED
Use context managers when you need mocking for only part of a test.
Fixture Style (For Shared Mocks)
@pytest.fixture
def mock_openai():
"""Fixture for tests that share mock configuration."""
with respx.mock:
yield respx
@pytest.mark.asyncio
async def test_with_fixture(mock_openai):
mock_openai.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(200, json={
"choices": [{"message": {"content": "Fixture response"}}]
})
)
result = await call_openai("Test")
assert "Fixture" in result
Output:
tests/test_agent.py::test_with_fixture PASSED
Fixtures shine when multiple tests need the same mock setup.
Mocking OpenAI Responses
The OpenAI chat completions API returns a specific structure. Your mock must match it exactly, or your code will fail trying to parse the response.
Basic Completion Response
@pytest.mark.asyncio
@respx.mock
async def test_openai_chat_completion():
"""Mock a standard chat completion response."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(
200,
json={
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699876543,
"model": "gpt-4",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "I'd be happy to help you with that task."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 12,
"total_tokens": 27
}
}
)
)
from openai import AsyncOpenAI
client = AsyncOpenAI(api_key="test-key-not-used")
response = await client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Help me with this task"}]
)
assert response.choices[0].message.content == "I'd be happy to help you with that task."
assert response.usage.total_tokens == 27
Output:
tests/test_openai.py::test_openai_chat_completion PASSED
Critical structure details:
- Response must have
choicesarray (notchoice) - Each choice has
messagewithroleandcontent usagetracks tokens for cost monitoring codefinish_reasonindicates completion state
Full Response with All Fields
When your code inspects metadata, include the complete structure:
COMPLETE_OPENAI_RESPONSE = {
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699876543,
"model": "gpt-4o-2024-08-06",
"system_fingerprint": "fp_abc123",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Your complete response here.",
"refusal": None
},
"logprobs": None,
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 50,
"total_tokens": 75,
"completion_tokens_details": {
"reasoning_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
}
}
Mocking Tool Calls
When your agent uses tools, the LLM returns a tool_calls array instead of (or alongside) content. This is where mocking gets interesting.
Single Tool Call
@pytest.mark.asyncio
@respx.mock
async def test_agent_tool_call():
"""Mock an LLM response that triggers a tool."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(
200,
json={
"choices": [{
"message": {
"role": "assistant",
"content": None, # No content when making tool call
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "create_task",
"arguments": '{"title": "Buy groceries", "priority": "high"}'
}
}]
},
"finish_reason": "tool_calls"
}]
}
)
)
# Your agent processes this and calls the create_task tool
result = await agent.process("Create a task to buy groceries")
# Verify tool was parsed correctly
assert result.tool_calls[0].function.name == "create_task"
assert '"title": "Buy groceries"' in result.tool_calls[0].function.arguments
Output:
tests/test_agent.py::test_agent_tool_call PASSED
Tool call structure requirements:
contentisNonewhen the LLM makes a tool calltool_callsis an array (multiple tools can be called)- Each tool call has
id,type, andfunction function.argumentsis a JSON string (not parsed object)finish_reasonis"tool_calls"not"stop"
Multiple Tool Calls
@pytest.mark.asyncio
@respx.mock
async def test_multiple_tool_calls():
"""Mock LLM calling multiple tools in parallel."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(
200,
json={
"choices": [{
"message": {
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": "call_1",
"type": "function",
"function": {
"name": "get_weather",
"arguments": '{"city": "London"}'
}
},
{
"id": "call_2",
"type": "function",
"function": {
"name": "get_weather",
"arguments": '{"city": "Paris"}'
}
}
]
},
"finish_reason": "tool_calls"
}]
}
)
)
result = await agent.process("What's the weather in London and Paris?")
assert len(result.tool_calls) == 2
cities = [tc.function.arguments for tc in result.tool_calls]
assert '"London"' in cities[0]
assert '"Paris"' in cities[1]
Output:
tests/test_agent.py::test_multiple_tool_calls PASSED
Mocking Error Responses
Production agents must handle failures gracefully. respx lets you simulate every error condition.
Rate Limit (429)
@pytest.mark.asyncio
@respx.mock
async def test_rate_limit_handling():
"""Test agent handles rate limiting gracefully."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(
429,
json={
"error": {
"message": "Rate limit exceeded. Please retry after 60 seconds.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
},
headers={"Retry-After": "60"}
)
)
# Your error handling should catch this
with pytest.raises(RateLimitError) as exc_info:
await agent.process("Test query")
assert "rate limit" in str(exc_info.value).lower()
Output:
tests/test_errors.py::test_rate_limit_handling PASSED
Server Error (500)
@pytest.mark.asyncio
@respx.mock
async def test_server_error_handling():
"""Test agent handles OpenAI server errors."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(
500,
json={
"error": {
"message": "Internal server error",
"type": "server_error",
"code": "internal_error"
}
}
)
)
with pytest.raises(ServerError):
await agent.process("Test query")
Output:
tests/test_errors.py::test_server_error_handling PASSED
Timeout
@pytest.mark.asyncio
@respx.mock
async def test_timeout_handling():
"""Test agent handles network timeouts."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
side_effect=httpx.TimeoutException("Connection timed out")
)
with pytest.raises(AgentTimeoutError) as exc_info:
await agent.process("Test query")
assert "timed out" in str(exc_info.value).lower()
Output:
tests/test_errors.py::test_timeout_handling PASSED
Malformed Response
@pytest.mark.asyncio
@respx.mock
async def test_malformed_response_handling():
"""Test agent handles unexpected response structure."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(
200,
json={"unexpected": "structure", "no_choices": True}
)
)
with pytest.raises(AgentResponseError):
await agent.process("Test query")
Output:
tests/test_errors.py::test_malformed_response_handling PASSED
Sequential Responses (Multi-Turn Testing)
Real agent conversations involve multiple LLM calls. Use side_effect with a list to mock sequential responses.
@pytest.mark.asyncio
@respx.mock
async def test_multi_turn_agent_flow(client):
"""Test complete agent pipeline: tool call → execution → response."""
respx.post("https://api.openai.com/v1/chat/completions").mock(
side_effect=[
# First call: LLM decides to use a tool
httpx.Response(200, json={
"choices": [{
"message": {
"role": "assistant",
"content": None,
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": {
"name": "create_task",
"arguments": '{"title": "Write tests"}'
}
}]
},
"finish_reason": "tool_calls"
}]
}),
# Second call: LLM responds after seeing tool result
httpx.Response(200, json={
"choices": [{
"message": {
"role": "assistant",
"content": "I've created the task 'Write tests' for you."
},
"finish_reason": "stop"
}]
})
]
)
response = await client.post(
"/api/agent/chat",
json={"message": "Create a task to write tests"}
)
assert response.status_code == 200
assert "created" in response.json()["response"].lower()
# Verify the LLM was called twice
assert respx.calls.call_count == 2
Output:
tests/test_integration.py::test_multi_turn_agent_flow PASSED
Verifying Mock Calls
respx tracks every request it intercepts. Use this to verify your code makes the right API calls.
@pytest.mark.asyncio
@respx.mock
async def test_request_verification():
"""Verify correct data is sent to OpenAI."""
route = respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(200, json={
"choices": [{"message": {"content": "Response"}}]
})
)
await agent.process("Hello, agent!")
# Verify route was called
assert route.called
assert route.call_count == 1
# Inspect the request
request = route.calls[0].request
body = request.content.decode()
# Verify request body contains expected data
assert "Hello, agent!" in body
assert "gpt-4" in body # Expected model
# Verify headers
assert "Bearer" in request.headers.get("Authorization", "")
Output:
tests/test_agent.py::test_request_verification PASSED
Mocking Anthropic API
Anthropic's API structure differs from OpenAI. Here's the correct format:
@pytest.mark.asyncio
@respx.mock
async def test_anthropic_completion():
"""Mock Anthropic messages API."""
respx.post("https://api.anthropic.com/v1/messages").mock(
return_value=httpx.Response(
200,
json={
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [{
"type": "text",
"text": "Hello! I'm Claude, happy to help."
}],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 15,
"output_tokens": 12
}
}
)
)
from anthropic import AsyncAnthropic
client = AsyncAnthropic(api_key="test-key")
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
assert response.content[0].text == "Hello! I'm Claude, happy to help."
Output:
tests/test_anthropic.py::test_anthropic_completion PASSED
Key differences from OpenAI:
- Endpoint is
/v1/messages(not/v1/chat/completions) contentis an array of content blocks, not a stringstop_reasoninstead offinish_reasoninput_tokens/output_tokensinstead ofprompt_tokens/completion_tokens
Anthropic Tool Use
@pytest.mark.asyncio
@respx.mock
async def test_anthropic_tool_use():
"""Mock Anthropic tool use response."""
respx.post("https://api.anthropic.com/v1/messages").mock(
return_value=httpx.Response(
200,
json={
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [{
"type": "tool_use",
"id": "toolu_abc123",
"name": "create_task",
"input": {"title": "New task", "priority": "high"}
}],
"model": "claude-sonnet-4-20250514",
"stop_reason": "tool_use",
"usage": {"input_tokens": 20, "output_tokens": 30}
}
)
)
# Your agent code handles this tool call
result = await agent.process("Create a high-priority task")
assert result.content[0].type == "tool_use"
assert result.content[0].name == "create_task"
Output:
tests/test_anthropic.py::test_anthropic_tool_use PASSED
Hands-On Exercise
Create a complete test file that achieves zero API calls.
Create tests/test_agent_mock.py:
"""
Complete agent testing with mocked LLM calls.
Requirements:
1. All tests pass with zero network calls
2. Mock OpenAI completion response
3. Mock tool call response
4. Mock 429 rate limit error
5. Verify all mocks were called correctly
"""
import pytest
import respx
import httpx
from app.agent import Agent, RateLimitError
@pytest.fixture
def agent():
"""Create agent instance for testing."""
return Agent(api_key="test-key-not-used")
# TODO: Implement these tests
@pytest.mark.asyncio
@respx.mock
async def test_basic_completion(agent):
"""Test agent handles basic completion."""
# Your mock here
pass
@pytest.mark.asyncio
@respx.mock
async def test_tool_call_execution(agent):
"""Test agent parses and executes tool calls."""
# Your mock here
pass
@pytest.mark.asyncio
@respx.mock
async def test_rate_limit_error(agent):
"""Test agent handles 429 gracefully."""
# Your mock here
pass
@pytest.mark.asyncio
@respx.mock
async def test_multi_turn_conversation(agent):
"""Test two-turn conversation with tool."""
# Your mock with side_effect here
pass
Run your tests with:
pytest tests/test_agent_mock.py -v
Success criteria:
- All 4 tests pass
pytest --covshows mocking code is covered- No network calls (tests complete in under 1 second)
Try With AI
Work with your AI assistant to build comprehensive mocks.
Prompt 1: Generate Mock for Your Agent
My agent uses OpenAI and calls these tools:
- create_task(title: str, priority: str, due_date: str | None)
- search_tasks(query: str, status: str = "all")
- update_task(task_id: int, updates: dict)
Generate respx mocks for:
1. A completion that calls create_task with all parameters
2. A completion that calls search_tasks and returns results
3. A multi-turn flow: search → no results → create
Use the exact OpenAI response structure with choices[0].message.tool_calls.
What you're learning: AI generates complex mock structures that you can verify against the OpenAI documentation. You validate the structure is correct before trusting it.
Prompt 2: Fix Mock Structure
My test is failing. Here's the mock I'm using:
respx.post("https://api.openai.com/v1/chat/completions").mock(
return_value=httpx.Response(200, json={
"response": {
"tool_calls": [{"name": "create_task", "args": {"title": "Test"}}]
}
})
)
My code expects `response.choices[0].message.tool_calls[0].function.name`.
What's wrong with my mock? Fix it to match the real OpenAI response format.
What you're learning: You teach the AI the correct structure by pointing out what your code expects. This reinforces your understanding of the API response format.
Prompt 3: Build Error Scenario Suite
I want comprehensive error coverage for my agent. Help me design tests for:
1. Rate limit (429) with retry logic - what headers should I include?
2. Server error (500) with fallback behavior
3. Timeout with graceful degradation
4. Invalid JSON response
5. Missing required fields in response
For each, show me the respx mock and what my agent should do in response.
Which should we implement first?
What you're learning: Collaborative error scenario planning. AI suggests scenarios you might miss; you prioritize based on your agent's production requirements.
Reflect on Your Skill
Test Your Skill
Using my agent-tdd skill, show me how to mock an OpenAI API call
that returns a tool call. Does my skill include:
- The correct choices[0].message.tool_calls structure?
- The function.arguments as JSON string (not parsed object)?
- The finish_reason of "tool_calls"?
Also check: does my skill cover Anthropic API mocking?
Identify Gaps
After running that prompt, ask yourself:
- Does my skill include both decorator and fixture mock styles?
- Does it show sequential response mocking with
side_effect? - Does it cover error responses (429, 500, timeout)?
- Does it explain how to verify mock calls were made?
Improve Your Skill
If you found gaps:
My agent-tdd skill needs comprehensive LLM mocking patterns. Add:
1. OpenAI mock patterns:
- Basic completion with choices[0].message.content
- Tool calls with tool_calls array structure
- Sequential responses using side_effect list
2. Anthropic mock patterns:
- Messages API endpoint and response structure
- Tool use with content[0].type = "tool_use"
3. Error mocking:
- Rate limit (429) with Retry-After header
- Server error (500)
- Timeout using side_effect=httpx.TimeoutException
4. Call verification:
- route.called and route.call_count
- Inspecting request.content
Make sure each pattern shows the exact JSON structure required.
Your skill grows with each lesson. The mocking patterns you add now will save hours on every future agent project.