Economics of Custom Models
Fine-tuning is cheap. A single training run might cost $5. So why hesitate?
Because training cost is the smallest line item. Data preparation takes days. Evaluation requires expertise. Deployment needs infrastructure. Maintenance never ends.
This lesson builds your economic intuition for LLMOps projects. You'll learn to calculate true costs, identify break-even points, and avoid projects that look promising but destroy value.
The Cost Iceberg
What you see: "$5 training run"
What's below the surface:
Visible Costs (10%)
└── Training compute: $5-100
Hidden Costs (90%)
├── Data preparation: 20-100 hours
├── Evaluation development: 10-40 hours
├── Infrastructure setup: 5-20 hours
├── Monitoring and maintenance: Ongoing
├── Iteration cycles: 3-10x multiplier
└── Opportunity cost: What else could you build?
Understanding the full iceberg prevents budget surprises and failed projects.
Training Costs: The Small Part
Training costs depend on three factors: model size, training duration, and hardware choice.
Cost Formula
Training Cost = (GPU Hours) x (Price per GPU Hour)
GPU Hours = (Tokens / Throughput) x (Number of Epochs)
Real-World Examples
| Model | Training Data | Hardware | Duration | Cost |
|---|---|---|---|---|
| Llama 3.1 8B (QLoRA) | 5,000 examples | RTX 4090 (local) | 2 hours | $0 (owned) |
| Llama 3.1 8B (QLoRA) | 5,000 examples | A100 40GB (cloud) | 1 hour | $2-4 |
| Llama 3.1 70B (LoRA) | 10,000 examples | 4x A100 80GB | 8 hours | $80-120 |
| Mistral 7B (QLoRA) | 2,000 examples | T4 (cloud) | 3 hours | $1-2 |
Key insight: Training costs are remarkably low for consumer-scale projects. A $50 cloud budget covers extensive experimentation.
Platform Comparison
| Platform | GPU Options | Cost per Hour | Best For |
|---|---|---|---|
| Lambda Labs | A100, H100 | $1.10-3.29 | Extended training |
| RunPod | A100, RTX 4090 | $0.44-2.49 | Budget experiments |
| Vast.ai | Various | $0.30-2.00 | Spot instances |
| Together.ai | Managed training | Per-token pricing | Simplicity |
| Unsloth (local) | Your GPU | $0 | Iteration speed |
Recommendation: Start with local training (Unsloth) for iteration speed. Move to cloud only when you need larger models or faster throughput.
Inference Costs: The Big Part
Training happens once. Inference happens continuously. A model that costs $10 to train might cost $1,000 per month to run.
API-Based Inference
Using frontier models through APIs:
| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 |
| Anthropic | Claude 3.5 Sonnet | $3.00 | $15.00 |
| Anthropic | Claude 3.5 Haiku | $0.25 | $1.25 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
Self-Hosted Inference
Running your own fine-tuned model:
| Setup | Monthly Cost | Capacity |
|---|---|---|
| RTX 4090 (home) | $15 electricity | ~20 QPS (7B model) |
| A10G (AWS) | $600-800 | ~30 QPS (7B model) |
| A100 (Lambda) | $800-1,200 | ~50 QPS (7B model) |
| T4 (GCP) | $200-300 | ~10 QPS (7B model) |
QPS = Queries per second capacity
Cost Per Query Comparison
For a typical task management query (500 input tokens, 200 output tokens):
| Approach | Cost per Query | 10,000 Queries/Month |
|---|---|---|
| GPT-4o | $0.0033 | $33 |
| GPT-4o-mini | $0.00020 | $2 |
| Self-hosted 7B (cloud) | $0.00008 | $0.80 + infra |
| Self-hosted 7B (local) | ~$0.00001 | $0.10 + electricity |
The pattern: At low volume, APIs are cheaper (no infrastructure). At high volume, self-hosting wins (amortized costs).
Break-Even Analysis
When does self-hosting make sense? Calculate the break-even point.
Break-Even Formula
Break-Even = Fixed Costs / (API Cost per Query - Self-Host Cost per Query)
Fixed Costs = Training + Infrastructure Setup + One Month Server
Example: Task Management Assistant
Scenario: 50,000 queries per month
Option A: GPT-4o-mini API
- Cost per query: $0.00020
- Monthly cost: 50,000 x $0.00020 = $10
- No setup costs
Option B: Self-hosted Llama 3.1 8B (cloud)
- Training: $20 (including iterations)
- Infrastructure setup: 10 hours x $100/hr = $1,000
- Monthly server: $400
- Cost per query: $400 / 50,000 = $0.008
- First month total: $1,420
Break-even calculation:
- Monthly savings after setup: $10 - $400 = -$390
Wait—self-hosting is MORE expensive? Yes, at 50,000 queries/month.
Revised scenario: 500,000 queries per month
Option A: GPT-4o-mini API
- Monthly cost: 500,000 x $0.00020 = $100
Option B: Self-hosted (same infrastructure)
- Monthly server: $400
- Still more expensive!
Revised scenario: 5,000,000 queries per month
Option A: GPT-4o-mini API
- Monthly cost: $1,000
Option B: Self-hosted (scaled infrastructure)
- Monthly server: $800 (larger instance)
- Monthly savings: $200
- Payback period: $1,020 / $200 = 5.1 months
The insight: Self-hosting requires massive scale OR specific requirements (privacy, customization) to justify economically.
When Economics Favor Fine-Tuning
Cost alone rarely justifies fine-tuning. These factors do:
1. Capability Gap
Scenario: GPT-4o-mini can't reliably produce your required JSON format. GPT-4o can, but costs 16x more.
Analysis:
- GPT-4o-mini at 50,000 queries: $10/month but 40% failure rate
- GPT-4o at 50,000 queries: $165/month with 5% failure rate
- Fine-tuned 7B: $400/month infrastructure + $20 training, 3% failure rate
Decision: If failure costs exceed $155/month, fine-tuning wins.
2. Latency Requirements
Scenario: Your application needs <200ms response time.
Analysis:
- GPT-4o-mini: 500-2000ms typical
- Self-hosted 7B: 50-150ms typical
Decision: If latency matters, self-hosting may be the only option.
3. Data Privacy
Scenario: You process sensitive data that cannot leave your infrastructure.
Analysis: API-based models are eliminated by policy.
Decision: Self-hosting is required regardless of cost.
4. Unique Capability
Scenario: You need domain expertise no commercial model provides.
Analysis: Fine-tuning is the only way to achieve the capability.
Decision: Cost comparison is irrelevant—build or don't build.
Hidden Costs Deep Dive
The non-obvious costs that derail projects:
Data Preparation (40-60% of total effort)
| Activity | Hours (typical) | Cost at $100/hr |
|---|---|---|
| Data collection | 10-40 | $1,000-4,000 |
| Data cleaning | 5-20 | $500-2,000 |
| Format conversion | 2-8 | $200-800 |
| Quality review | 10-30 | $1,000-3,000 |
| Iteration cycles | 20-60 | $2,000-6,000 |
| Total | 47-158 | $4,700-15,800 |
Compare to training compute: $20.
Evaluation Development (20-30% of effort)
| Activity | Hours (typical) | Cost at $100/hr |
|---|---|---|
| Define success metrics | 2-5 | $200-500 |
| Build evaluation sets | 10-20 | $1,000-2,000 |
| Implement eval pipeline | 5-15 | $500-1,500 |
| Analyze results | 5-10 | $500-1,000 |
| Total | 22-50 | $2,200-5,000 |
Without proper evaluation, you can't know if your model works.
Infrastructure and Operations
| Activity | Setup Hours | Monthly Hours | Monthly Cost |
|---|---|---|---|
| Server configuration | 5-10 | 0 | $0 |
| Monitoring setup | 3-8 | 0 | $0 |
| Ongoing maintenance | 0 | 2-5 | $200-500 |
| Incident response | 0 | 0-10 | $0-1,000 |
| Model updates | 0 | 2-4 | $200-400 |
Iteration Multiplier
First attempts rarely work. Plan for 3-10 complete cycles:
Realistic Project Cost = Base Estimate x Iteration Multiplier
Iteration Multiplier:
- Well-defined task: 2-3x
- Medium complexity: 4-6x
- Novel or complex: 8-10x
The ROI Framework
Justify LLMOps investment with this framework:
Step 1: Quantify Current State
What does the task cost today?
Current Cost = (Human Hours per Task) x (Hourly Rate) x (Tasks per Month)
Example:
Current Cost = 0.5 hours x $50/hr x 1,000 tasks = $25,000/month
Step 2: Project Future State
What will it cost with a custom model?
Future Cost = (Infrastructure) + (Maintenance) + (Human Oversight)
Example:
Future Cost = $800 + $400 + $5,000 = $6,200/month
Step 3: Calculate Savings
Monthly Savings = Current Cost - Future Cost
Monthly Savings = $25,000 - $6,200 = $18,800
Annual Savings = $225,600
Step 4: Account for Investment
Total Investment = Development + Training + Setup
Total Investment = $20,000 + $500 + $2,000 = $22,500
Payback Period = Investment / Monthly Savings
Payback Period = $22,500 / $18,800 = 1.2 months
Step 5: Calculate ROI
First Year ROI = (Annual Savings - Investment) / Investment
First Year ROI = ($225,600 - $22,500) / $22,500 = 903%
This is a strong project. Most LLMOps projects targeting manual work show similar numbers—if the task is well-suited to automation.
Cost Pitfalls to Avoid
Pitfall 1: Underestimating Data Costs
Mistake: "We have lots of data, so preparation will be quick."
Reality: Raw data is never training-ready. Budget 50-100 hours for data work.
Pitfall 2: Ignoring Evaluation Costs
Mistake: "We'll know if it works when we deploy it."
Reality: Without evaluation, you'll deploy a model that fails silently, costing far more than proper eval development.
Pitfall 3: Single-Run Thinking
Mistake: "Training costs $20, so the project costs $20."
Reality: Plan for 5-10 training runs as you iterate. Budget $100-200 for training compute.
Pitfall 4: Forgetting Maintenance
Mistake: "Once deployed, it runs forever."
Reality: Models need updates as requirements change. Budget 5-10 hours monthly.
Pitfall 5: Over-Optimizing Training Cost
Mistake: "I'll use cheaper hardware to save $10 per run."
Reality: Slower iteration costs more in human time than you save in compute. Optimize for iteration speed, not training cost.
The Task API Economics
Let's apply this framework to our running example.
Goal: Build a Task Management Assistant as a Digital FTE
Current state: No automation (or expensive API calls)
Investment estimate:
- Data preparation: 40 hours x $100 = $4,000
- Training iterations: 10 runs x $10 = $100
- Evaluation development: 20 hours x $100 = $2,000
- Infrastructure setup: 10 hours x $100 = $1,000
- Total investment: $7,100
Monthly operating costs:
- Self-hosted inference: $400
- Maintenance: $200
- Monthly total: $600
Value creation:
- Tasks handled: 10,000/month
- Time saved per task: 3 minutes
- Total time saved: 500 hours/month
- Value at $50/hr: $25,000/month
ROI calculation:
- Monthly value: $25,000
- Monthly cost: $600
- Monthly savings: $24,400
- Payback period: 0.3 months (9 days)
- First-year ROI: 4,000%+
This is why LLMOps matters. The economics, when applied to suitable problems, are extraordinary.
Try With AI
Build your economic intuition by analyzing real scenarios.
Prompt 1: Calculate Your Use Case
I'm considering fine-tuning a model for [your use case]. Help me build a cost model:
Current state:
- [How is the task done today?]
- [How long does it take per instance?]
- [How many instances per month?]
Proposed solution:
- Base model: [7B/13B/70B]
- Training data: [estimated examples]
- Expected query volume: [per month]
Calculate:
1. Training investment (including iterations)
2. Monthly operating costs
3. Value created
4. Payback period
5. Key assumptions that could change the analysis
What you're learning: By applying the framework to your actual situation, you build the muscle memory for economic analysis. You'll discover which assumptions matter most and where to focus diligence.
Prompt 2: Break-Even Sensitivity
I calculated that self-hosting breaks even at 2 million queries/month versus GPT-4o-mini.
Create a sensitivity analysis:
1. How does break-even change if API prices drop 50%?
2. How does it change if my infrastructure costs are 2x higher?
3. What's the break-even if I need 99.9% uptime (requiring redundancy)?
4. At what failure rate does the fine-tuned model need to beat the API to justify the investment?
What you're learning: This develops your intuition for which variables matter most. You'll discover that break-even is often sensitive to a few key assumptions—knowing which ones lets you focus due diligence.
Prompt 3: Hidden Cost Audit
Review my LLMOps project plan and identify hidden costs I might have missed:
Project: [Description]
Budget:
- Training: $200
- Infrastructure: $500/month
- Total: $6,200/year
What costs am I likely underestimating? What activities are missing from my plan? Suggest a more realistic budget with justification.
What you're learning: This exercise calibrates your mental model of project scope. You'll internalize the full iceberg of costs, making future estimates more accurate and reducing budget surprises.