Cost Tracking and Billing

Your Digital FTE will work 24/7. It will process thousands of requests. Every API call costs money. Every token sent to Claude costs money.

If you don't track these costs precisely, you'll ship a product that loses money on every customer.

This lesson teaches you to see the economics of your agent—and design pricing that ensures profitability.

The Economics Problem

When you run an agent, costs accumulate invisibly:

Per-message cost: Each message to Claude API costs based on tokens sent and received
Model selection cost: Haiku (2-3x cheaper) vs Opus (5-10x more expensive)
Context window cost: Longer context = more tokens = higher cost
Iteration cost: More turns (max_turns) = more messages = more cost

A customer using your Digital FTE for 10 interactions doesn't think about the cost. But you should: each interaction might cost $0.05, $0.50, or $5.00 depending on how efficiently your agent works.

The margin problem: If your customer pays $99/month for your Digital FTE, but it costs you $150 in API calls to service them, you lose money. You need cost visibility.

The pricing problem: You can't price intelligently without knowing your cost basis. Subscription? Pay-per-use? Success fee? Your choice depends on understanding what your agent actually costs to run.

The Solution: Real-Time Cost Tracking

The Claude Agent SDK gives you exact cost data in every message. You extract it. You track it. You make it visible.

Specification: Cost Tracking System

Intent: Capture every dollar your Digital FTE costs so you can track profitability per customer, per workflow, and overall.

Success Criteria:

✓ Per-message token usage visible as messages stream
✓ Total cost accumulated across full agent execution
✓ Cost tracked by customer, workflow, or time period
✓ Cost alerts when exceeding thresholds
✓ Historical cost data supports pricing analysis

Constraints:

Must extract costs in real-time (no post-execution analysis delays)
Must handle partial message streams (agent may be interrupted)
Must distinguish model costs (Haiku 2-3x cheaper than Opus)

Pattern: Extracting Per-Message Usage

Every message in the agent stream has cost data. You iterate through the stream and extract it:

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions

async def track_agent_cost():
    """Track costs as agent executes in real-time."""
    total_tokens = 0
    total_cost = 0
    message_costs = []

    options = ClaudeAgentOptions(
        allowed_tools=["Read", "Bash", "Glob"],
        permission_mode="acceptEdits"
    )

    async for message in query(
        prompt="Analyze the codebase structure and identify test coverage gaps",
        options=options
    ):
        # Extract per-message token usage
        if message.type == "assistant" and hasattr(message, "usage"):
            input_tokens = message.usage.get("input_tokens", 0)
            output_tokens = message.usage.get("output_tokens", 0)
            message_total = input_tokens + output_tokens
            total_tokens += message_total

            # Store for analysis
            message_costs.append({
                "message_id": getattr(message, "id", None),
                "input_tokens": input_tokens,
                "output_tokens": output_tokens,
                "total_tokens": message_total,
                "timestamp": getattr(message, "timestamp", None)
            })

            print(f"Message {len(message_costs)}: {message_total} tokens")

        # Extract final cost after execution completes
        if message.type == "result" and hasattr(message, "total_cost_usd"):
            total_cost = message.total_cost_usd
            print(f"\nExecution Summary:")
            print(f"  Total tokens: {total_tokens}")
            print(f"  Total cost: ${total_cost:.4f}")

    return {
        "total_tokens": total_tokens,
        "total_cost_usd": total_cost,
        "message_breakdown": message_costs
    }

# Run and see real costs
result = asyncio.run(track_agent_cost())
print(f"\nFinal cost: ${result['total_cost_usd']:.4f}")

Output:

Message 1: 1247 tokens
Message 2: 892 tokens
Message 3: 1501 tokens

Execution Summary:
  Total tokens: 3640
  Total cost: $0.0438

Pattern: Understanding total_cost_usd ★

The total_cost_usd field is the key metric for monetization. It's the final answer to "How much did this agent execution cost?"

async def billing_from_costs():
    """Extract costs and map to customer billing."""
    options = ClaudeAgentOptions(
        allowed_tools=["Read", "Grep"],
        permission_mode="acceptEdits"
    )

    async for message in query(
        prompt="Find all TODO comments in the project",
        options=options
    ):
        # Only process the final result message
        if message.type == "result" and hasattr(message, "total_cost_usd"):
            agent_cost = message.total_cost_usd

            # Example: Subscription with 60% margin requirement
            customer_id = "customer_acme_corp"
            monthly_budget = 99.00  # What customer pays monthly
            max_cost_per_execution = monthly_budget * 0.40  # Reserve 40% margin

            if agent_cost > max_cost_per_execution:
                print(f"WARNING: Cost ${agent_cost:.4f} exceeds budget ${max_cost_per_execution:.4f}")
                print("This customer needs a premium plan or optimization")
            else:
                print(f"Cost ${agent_cost:.4f} - Margin preserved ✓")

            return {
                "customer_id": customer_id,
                "execution_cost": agent_cost,
                "margin_status": "OK" if agent_cost < max_cost_per_execution else "EXCEEDED"
            }

# This is how billing systems work: cost → margin calculation → pricing decision
asyncio.run(billing_from_costs())

Output:

Cost $0.0438 - Margin preserved ✓

Pattern: Cost Aggregation by Customer

In production, you track costs per customer to understand profitability:

from datetime import datetime, timedelta
import json

class CustomerBillingTracker:
    """Track agent costs per customer for subscription billing."""

    def __init__(self, storage_path: str = "billing_data.json"):
        self.storage_path = storage_path
        self.data = self._load_data()

    def _load_data(self) -> dict:
        """Load existing billing data."""
        try:
            with open(self.storage_path, "r") as f:
                return json.load(f)
        except FileNotFoundError:
            return {}

    def _save_data(self):
        """Persist billing data."""
        with open(self.storage_path, "w") as f:
            json.dump(self.data, f, indent=2)

    def record_execution(self, customer_id: str, cost_usd: float, execution_type: str):
        """Record a single agent execution cost for a customer."""
        if customer_id not in self.data:
            self.data[customer_id] = {
                "total_cost": 0,
                "executions": [],
                "created_date": datetime.now().isoformat()
            }

        self.data[customer_id]["total_cost"] += cost_usd
        self.data[customer_id]["executions"].append({
            "cost": cost_usd,
            "type": execution_type,
            "timestamp": datetime.now().isoformat()
        })

        self._save_data()
        return self.data[customer_id]

    def monthly_cost(self, customer_id: str, month_back: int = 0) -> float:
        """Calculate customer's total cost for a specific month."""
        if customer_id not in self.data:
            return 0.0

        target_date = datetime.now() - timedelta(days=30 * month_back)
        month_start = target_date.replace(day=1)
        month_end = (month_start + timedelta(days=32)).replace(day=1)

        total = 0.0
        for execution in self.data[customer_id]["executions"]:
            exec_date = datetime.fromisoformat(execution["timestamp"])
            if month_start <= exec_date < month_end:
                total += execution["cost"]

        return total

    def profitability(self, customer_id: str, subscription_price: float) -> dict:
        """Calculate if customer is profitable."""
        monthly_cost = self.monthly_cost(customer_id)
        margin = subscription_price - monthly_cost
        margin_pct = (margin / subscription_price * 100) if subscription_price else 0

        return {
            "customer_id": customer_id,
            "monthly_cost": round(monthly_cost, 4),
            "subscription_price": subscription_price,
            "margin": round(margin, 2),
            "margin_pct": round(margin_pct, 1),
            "profitable": margin > 0
        }

# Usage in billing system
tracker = CustomerBillingTracker()

# After each agent execution, record the cost
tracker.record_execution(
    customer_id="customer_acme_corp",
    cost_usd=0.0438,
    execution_type="code_analysis"
)
tracker.record_execution(
    customer_id="customer_acme_corp",
    cost_usd=0.0312,
    execution_type="documentation_check"
)

# Check profitability
profitability = tracker.profitability(
    customer_id="customer_acme_corp",
    subscription_price=99.00
)
print(f"Customer margin: ${profitability['margin']:.2f} ({profitability['margin_pct']}% margin)")

Output:

Customer margin: $98.93 (99.9% margin)

This is healthy profitability. Costs stay well below subscription price.

Billing Models for Digital FTEs

With cost visibility, you can price intelligently. Different models work for different agents:

Model 1: Subscription Billing

How it works: Fixed monthly price. Customer uses agent as much as they want.

Best for: Predictable workflows (code review, security analysis, content moderation)

Pricing calculation:

Monthly subscription = (Target margin % × Average monthly cost per customer) / (1 - Target margin %)

Example:
- Average customer uses agent 20 times/month
- Cost per execution: $0.05
- Monthly cost per customer: $1.00
- Target margin: 70%
- Monthly subscription: $1.00 / (1 - 0.70) = $3.33

Set subscription at $9.99/month → 90% margin

Pros: Predictable revenue, simple billing Cons: Requires cost discipline; high-usage customers become margin drains

Model 2: Success-Fee Billing

How it works: Customer pays commission on value delivered (e.g., 2% of savings, $5 per lead)

Best for: High-ROI agents (lead generation, cost savings, revenue impact)

Pricing calculation:

Success fee = Value delivered × Commission %

Example: Legal contract review
- Agent saves customer $500 per contract reviewed (avoids expensive mistakes)
- Commission: 2% × $500 = $10 per contract
- Cost to run agent: $0.15
- Margin: $10 - $0.15 = $9.85 (98%+)

Pros: Unlimited upside, aligned incentives, justifiable pricing Cons: Requires outcome tracking, customer trust

Model 3: Usage-Based Billing

How it works: Customer pays per execution or per token consumed

Best for: Variable-workload agents (document analysis, research, real-time classification)

Pricing calculation:

Price per execution = (Cost per execution) / (1 - Desired margin %)

Example:
- Cost to run: $0.25
- Desired margin: 70%
- Price: $0.25 / (1 - 0.70) = $0.83 per execution
- Set price at $1.00 per execution → 75% margin

Pros: Fair to customers, scales with usage, easy to implement Cons: Customer cost uncertainty, requires usage tracking

Model 4: Hybrid Billing

How it works: Base subscription + usage overage fees

Best for: Most Digital FTEs (subscriptions handle baseline, overages capture high-volume users)

Pricing calculation:

Base subscription: $49/month (covers 50 executions)
Overage: $0.50 per execution beyond 50

Customer A (10 executions): Pays $49
Customer B (100 executions): Pays $49 + (50 × $0.50) = $74
Average margin: 65-75%

Pros: Predictable base revenue, captures high-value customers Cons: More complex billing, requires tracking

Cost Optimization: Maximizing Margin

Lower costs = higher margins. Here are the levers:

Optimization 1: max_turns to Prevent Runaway Costs

An agent with unlimited iterations can spiral into expensive loops. Set max_turns to bound costs:

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Bash", "Glob"],
    max_turns=5,  # Maximum 5 message iterations
    permission_mode="acceptEdits"
)

async for message in query(
    prompt="Debug the authentication service",
    options=options
):
    if message.type == "result":
        # Agent stops after 5 turns, even if incomplete
        # This guarantees cost ≤ (Cost per message × 5)
        total_cost = message.total_cost_usd
        print(f"Bounded cost execution: ${total_cost:.4f}")

Effect: Reduces cost variability. Some tasks may fail, but costs are predictable.

Optimization 2: Model Selection by Task

Use cheaper models for simple tasks, expensive models for complex reasoning:

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Grep"],
    agents={
        "classifier": AgentDefinition(
            description="Simple classification task",
            prompt="Classify this document as: [urgent|normal|low-priority]",
            tools=["Read"],
            model="haiku"  # 2-3x cheaper, fast for simple tasks
        ),
        "analyzer": AgentDefinition(
            description="Complex reasoning required",
            prompt="Analyze security implications of this code",
            tools=["Read", "Grep"],
            model="sonnet"  # Better reasoning for complex analysis
        )
    }
)

# Classification: $0.01 per execution (using Haiku)
# Security analysis: $0.08 per execution (using Sonnet)
# Overall: 2x cheaper than using Sonnet for everything

Effect: Reduces cost by 50-75% for simple tasks by model selection.

Optimization 3: Context Pruning

Longer context = more tokens = higher cost. Remove old messages:

from claude_agent_sdk import ClaudeSDKClient

async def prune_context_for_cost():
    """Long-running agent that prunes old messages to save costs."""
    options = ClaudeAgentOptions(
        allowed_tools=["Read", "Bash"],
        # Add setting to auto-prune old messages
        extra_args={"context-pruning-strategy": "aggressive"}
    )

    async with ClaudeSDKClient(options) as client:
        # First task: large project analysis
        await client.query("Analyze project structure")
        async for msg in client.receive_response():
            print(f"Cost: {msg.total_cost_usd:.4f}")

        # Prune old messages before next task
        # This keeps context window small and costs down
        await client.prune_messages()

        # Second task: much cheaper because context is smaller
        await client.query("Generate summary")
        async for msg in client.receive_response():
            print(f"Cost after pruning: {msg.total_cost_usd:.4f}")  # Lower

Effect: Reduces cost 30-50% in long-running sessions by removing irrelevant history.

Digital FTE Monetization: Connecting Cost to Revenue

Cost tracking enables pricing. Pricing enables revenue. Revenue enables iteration.

Here's the monetization flow:

Phase 1: Track Costs

Measure real execution costs per customer
Identify cost drivers (which features/workflows cost most?)
Establish baseline economics

Phase 2: Set Pricing

Calculate break-even price based on target margin
Choose model (subscription, success fee, usage)
Test pricing with early customers

Phase 3: Monitor Profitability

Track monthly margin per customer
Identify unprofitable use cases
Optimize high-cost workflows

Phase 4: Scale Revenue

Acquire customers at profitable pricing
Reinvest margin into product improvements
Expand to new use cases

Example: A code review Digital FTE

Step	Metric	Value
1. Track costs	Avg cost per code review	$0.12
2. Set pricing	Subscription (30 reviews/month)	$14.99/month
3. Profitability	Margin per customer	$10.35 (69%)
4. Scale	100 customers → revenue	$1,499/month
5. Reinvest	Improve agent quality	$500/month → R&D

This is how Digital FTEs become sustainable revenue.

Try With AI

Build a cost-aware billing system for your Digital FTE.

Prompt 1: Cost Tracking Architecture

Ask Claude: "I'm building a Digital FTE that analyzes code repositories. I need a Python system that tracks how much each customer's usage costs me. The agent runs multiple times per customer per month. How should I structure a cost tracking system that records total_cost_usd, aggregates by customer, and calculates monthly margin? Show me the class design and key methods."

What you're learning: How to design systems that make costs visible and actionable.

Prompt 2: Pricing Model Recommendation

Based on the cost data you're tracking, ask Claude: "My code analysis Digital FTE costs an average of $0.08 per execution. Typical customers run it 40 times per month. I want 70% margin. Should I price this as subscription ($25/month), usage-based ($1.50 per execution), or success-fee (10% of time saved)? What are the tradeoffs and how do I validate which model customers prefer?"

What you're learning: How to evaluate pricing models against cost structure and customer preferences.

Prompt 3: Cost Optimization Audit

Bring your implementation code and ask Claude: "Here's my agent configuration. It's costing too much. Help me audit the costs: Which parts are expensive? Should I change max_turns? Can I use cheaper models for any tasks? Can I optimize the prompts to get the same results with fewer tokens? Show me specific changes with estimated cost reduction."

What you're learning: How to optimize margins without sacrificing quality.

What emerged from iteration: A cost-conscious approach to agent development where economics inform design. You're not just building agents—you're building profitable products.

Reflect on Your Skill

You built a claude-agent skill in Lesson 0. Test and improve it based on what you learned.

Test Your Skill

Using my claude-agent skill, implement cost tracking for agent executions.
Does my skill cover token usage and total_cost_usd extraction?

Identify Gaps

Ask yourself:

Did my skill explain how to extract per-message usage and total_cost_usd?
Did it cover billing model design (subscription, usage-based, success-fee)?

Improve Your Skill

If you found gaps:

My claude-agent skill is missing cost tracking and monetization patterns.
Update it to include:
- Token usage extraction from messages
- total_cost_usd tracking
- Billing model design for Digital FTEs

The Economics Problem​

The Solution: Real-Time Cost Tracking​

Specification: Cost Tracking System​

Pattern: Extracting Per-Message Usage​

Pattern: Understanding total_cost_usd ★​

Pattern: Cost Aggregation by Customer​

Billing Models for Digital FTEs​

Model 1: Subscription Billing​

Model 2: Success-Fee Billing​

Model 3: Usage-Based Billing​

Model 4: Hybrid Billing​

Cost Optimization: Maximizing Margin​

Optimization 1: max_turns to Prevent Runaway Costs​

Optimization 2: Model Selection by Task​

Optimization 3: Context Pruning​

Digital FTE Monetization: Connecting Cost to Revenue​

Try With AI​

Prompt 1: Cost Tracking Architecture​

Prompt 2: Pricing Model Recommendation​

Prompt 3: Cost Optimization Audit​

Reflect on Your Skill​

Test Your Skill​

Identify Gaps​

Improve Your Skill​

The Economics Problem

The Solution: Real-Time Cost Tracking

Specification: Cost Tracking System

Pattern: Extracting Per-Message Usage

Pattern: Understanding total_cost_usd ★

Pattern: Cost Aggregation by Customer

Billing Models for Digital FTEs

Model 1: Subscription Billing

Model 2: Success-Fee Billing

Model 3: Usage-Based Billing

Model 4: Hybrid Billing

Cost Optimization: Maximizing Margin

Optimization 1: max_turns to Prevent Runaway Costs

Optimization 2: Model Selection by Task

Optimization 3: Context Pruning

Digital FTE Monetization: Connecting Cost to Revenue

Try With AI

Prompt 1: Cost Tracking Architecture

Prompt 2: Pricing Model Recommendation

Prompt 3: Cost Optimization Audit

Reflect on Your Skill

Test Your Skill

Identify Gaps

Improve Your Skill