Skip to main content

Economics of Custom Models

Fine-tuning is cheap. A single training run might cost $5. So why hesitate?

Because training cost is the smallest line item. Data preparation takes days. Evaluation requires expertise. Deployment needs infrastructure. Maintenance never ends.

This lesson builds your economic intuition for LLMOps projects. You'll learn to calculate true costs, identify break-even points, and avoid projects that look promising but destroy value.

The Cost Iceberg

What you see: "$5 training run"

What's below the surface:

Visible Costs (10%)
└── Training compute: $5-100

Hidden Costs (90%)
├── Data preparation: 20-100 hours
├── Evaluation development: 10-40 hours
├── Infrastructure setup: 5-20 hours
├── Monitoring and maintenance: Ongoing
├── Iteration cycles: 3-10x multiplier
└── Opportunity cost: What else could you build?

Understanding the full iceberg prevents budget surprises and failed projects.

Training Costs: The Small Part

Training costs depend on three factors: model size, training duration, and hardware choice.

Cost Formula

Training Cost = (GPU Hours) x (Price per GPU Hour)

GPU Hours = (Tokens / Throughput) x (Number of Epochs)

Real-World Examples

ModelTraining DataHardwareDurationCost
Llama 3.1 8B (QLoRA)5,000 examplesRTX 4090 (local)2 hours$0 (owned)
Llama 3.1 8B (QLoRA)5,000 examplesA100 40GB (cloud)1 hour$2-4
Llama 3.1 70B (LoRA)10,000 examples4x A100 80GB8 hours$80-120
Mistral 7B (QLoRA)2,000 examplesT4 (cloud)3 hours$1-2

Key insight: Training costs are remarkably low for consumer-scale projects. A $50 cloud budget covers extensive experimentation.

Platform Comparison

PlatformGPU OptionsCost per HourBest For
Lambda LabsA100, H100$1.10-3.29Extended training
RunPodA100, RTX 4090$0.44-2.49Budget experiments
Vast.aiVarious$0.30-2.00Spot instances
Together.aiManaged trainingPer-token pricingSimplicity
Unsloth (local)Your GPU$0Iteration speed

Recommendation: Start with local training (Unsloth) for iteration speed. Move to cloud only when you need larger models or faster throughput.

Inference Costs: The Big Part

Training happens once. Inference happens continuously. A model that costs $10 to train might cost $1,000 per month to run.

API-Based Inference

Using frontier models through APIs:

ProviderModelInput (per 1M tokens)Output (per 1M tokens)
OpenAIGPT-4o$2.50$10.00
OpenAIGPT-4o-mini$0.15$0.60
AnthropicClaude 3.5 Sonnet$3.00$15.00
AnthropicClaude 3.5 Haiku$0.25$1.25
GoogleGemini 1.5 Flash$0.075$0.30

Self-Hosted Inference

Running your own fine-tuned model:

SetupMonthly CostCapacity
RTX 4090 (home)$15 electricity~20 QPS (7B model)
A10G (AWS)$600-800~30 QPS (7B model)
A100 (Lambda)$800-1,200~50 QPS (7B model)
T4 (GCP)$200-300~10 QPS (7B model)

QPS = Queries per second capacity

Cost Per Query Comparison

For a typical task management query (500 input tokens, 200 output tokens):

ApproachCost per Query10,000 Queries/Month
GPT-4o$0.0033$33
GPT-4o-mini$0.00020$2
Self-hosted 7B (cloud)$0.00008$0.80 + infra
Self-hosted 7B (local)~$0.00001$0.10 + electricity

The pattern: At low volume, APIs are cheaper (no infrastructure). At high volume, self-hosting wins (amortized costs).

Break-Even Analysis

When does self-hosting make sense? Calculate the break-even point.

Break-Even Formula

Break-Even = Fixed Costs / (API Cost per Query - Self-Host Cost per Query)

Fixed Costs = Training + Infrastructure Setup + One Month Server

Example: Task Management Assistant

Scenario: 50,000 queries per month

Option A: GPT-4o-mini API

  • Cost per query: $0.00020
  • Monthly cost: 50,000 x $0.00020 = $10
  • No setup costs

Option B: Self-hosted Llama 3.1 8B (cloud)

  • Training: $20 (including iterations)
  • Infrastructure setup: 10 hours x $100/hr = $1,000
  • Monthly server: $400
  • Cost per query: $400 / 50,000 = $0.008
  • First month total: $1,420

Break-even calculation:

  • Monthly savings after setup: $10 - $400 = -$390

Wait—self-hosting is MORE expensive? Yes, at 50,000 queries/month.

Revised scenario: 500,000 queries per month

Option A: GPT-4o-mini API

  • Monthly cost: 500,000 x $0.00020 = $100

Option B: Self-hosted (same infrastructure)

  • Monthly server: $400
  • Still more expensive!

Revised scenario: 5,000,000 queries per month

Option A: GPT-4o-mini API

  • Monthly cost: $1,000

Option B: Self-hosted (scaled infrastructure)

  • Monthly server: $800 (larger instance)
  • Monthly savings: $200
  • Payback period: $1,020 / $200 = 5.1 months

The insight: Self-hosting requires massive scale OR specific requirements (privacy, customization) to justify economically.

When Economics Favor Fine-Tuning

Cost alone rarely justifies fine-tuning. These factors do:

1. Capability Gap

Scenario: GPT-4o-mini can't reliably produce your required JSON format. GPT-4o can, but costs 16x more.

Analysis:

  • GPT-4o-mini at 50,000 queries: $10/month but 40% failure rate
  • GPT-4o at 50,000 queries: $165/month with 5% failure rate
  • Fine-tuned 7B: $400/month infrastructure + $20 training, 3% failure rate

Decision: If failure costs exceed $155/month, fine-tuning wins.

2. Latency Requirements

Scenario: Your application needs <200ms response time.

Analysis:

  • GPT-4o-mini: 500-2000ms typical
  • Self-hosted 7B: 50-150ms typical

Decision: If latency matters, self-hosting may be the only option.

3. Data Privacy

Scenario: You process sensitive data that cannot leave your infrastructure.

Analysis: API-based models are eliminated by policy.

Decision: Self-hosting is required regardless of cost.

4. Unique Capability

Scenario: You need domain expertise no commercial model provides.

Analysis: Fine-tuning is the only way to achieve the capability.

Decision: Cost comparison is irrelevant—build or don't build.

Hidden Costs Deep Dive

The non-obvious costs that derail projects:

Data Preparation (40-60% of total effort)

ActivityHours (typical)Cost at $100/hr
Data collection10-40$1,000-4,000
Data cleaning5-20$500-2,000
Format conversion2-8$200-800
Quality review10-30$1,000-3,000
Iteration cycles20-60$2,000-6,000
Total47-158$4,700-15,800

Compare to training compute: $20.

Evaluation Development (20-30% of effort)

ActivityHours (typical)Cost at $100/hr
Define success metrics2-5$200-500
Build evaluation sets10-20$1,000-2,000
Implement eval pipeline5-15$500-1,500
Analyze results5-10$500-1,000
Total22-50$2,200-5,000

Without proper evaluation, you can't know if your model works.

Infrastructure and Operations

ActivitySetup HoursMonthly HoursMonthly Cost
Server configuration5-100$0
Monitoring setup3-80$0
Ongoing maintenance02-5$200-500
Incident response00-10$0-1,000
Model updates02-4$200-400

Iteration Multiplier

First attempts rarely work. Plan for 3-10 complete cycles:

Realistic Project Cost = Base Estimate x Iteration Multiplier

Iteration Multiplier:
- Well-defined task: 2-3x
- Medium complexity: 4-6x
- Novel or complex: 8-10x

The ROI Framework

Justify LLMOps investment with this framework:

Step 1: Quantify Current State

What does the task cost today?

Current Cost = (Human Hours per Task) x (Hourly Rate) x (Tasks per Month)

Example:
Current Cost = 0.5 hours x $50/hr x 1,000 tasks = $25,000/month

Step 2: Project Future State

What will it cost with a custom model?

Future Cost = (Infrastructure) + (Maintenance) + (Human Oversight)

Example:
Future Cost = $800 + $400 + $5,000 = $6,200/month

Step 3: Calculate Savings

Monthly Savings = Current Cost - Future Cost
Monthly Savings = $25,000 - $6,200 = $18,800

Annual Savings = $225,600

Step 4: Account for Investment

Total Investment = Development + Training + Setup
Total Investment = $20,000 + $500 + $2,000 = $22,500

Payback Period = Investment / Monthly Savings
Payback Period = $22,500 / $18,800 = 1.2 months

Step 5: Calculate ROI

First Year ROI = (Annual Savings - Investment) / Investment
First Year ROI = ($225,600 - $22,500) / $22,500 = 903%

This is a strong project. Most LLMOps projects targeting manual work show similar numbers—if the task is well-suited to automation.

Cost Pitfalls to Avoid

Pitfall 1: Underestimating Data Costs

Mistake: "We have lots of data, so preparation will be quick."

Reality: Raw data is never training-ready. Budget 50-100 hours for data work.

Pitfall 2: Ignoring Evaluation Costs

Mistake: "We'll know if it works when we deploy it."

Reality: Without evaluation, you'll deploy a model that fails silently, costing far more than proper eval development.

Pitfall 3: Single-Run Thinking

Mistake: "Training costs $20, so the project costs $20."

Reality: Plan for 5-10 training runs as you iterate. Budget $100-200 for training compute.

Pitfall 4: Forgetting Maintenance

Mistake: "Once deployed, it runs forever."

Reality: Models need updates as requirements change. Budget 5-10 hours monthly.

Pitfall 5: Over-Optimizing Training Cost

Mistake: "I'll use cheaper hardware to save $10 per run."

Reality: Slower iteration costs more in human time than you save in compute. Optimize for iteration speed, not training cost.

The Task API Economics

Let's apply this framework to our running example.

Goal: Build a Task Management Assistant as a Digital FTE

Current state: No automation (or expensive API calls)

Investment estimate:

  • Data preparation: 40 hours x $100 = $4,000
  • Training iterations: 10 runs x $10 = $100
  • Evaluation development: 20 hours x $100 = $2,000
  • Infrastructure setup: 10 hours x $100 = $1,000
  • Total investment: $7,100

Monthly operating costs:

  • Self-hosted inference: $400
  • Maintenance: $200
  • Monthly total: $600

Value creation:

  • Tasks handled: 10,000/month
  • Time saved per task: 3 minutes
  • Total time saved: 500 hours/month
  • Value at $50/hr: $25,000/month

ROI calculation:

  • Monthly value: $25,000
  • Monthly cost: $600
  • Monthly savings: $24,400
  • Payback period: 0.3 months (9 days)
  • First-year ROI: 4,000%+

This is why LLMOps matters. The economics, when applied to suitable problems, are extraordinary.

Try With AI

Build your economic intuition by analyzing real scenarios.

Prompt 1: Calculate Your Use Case

I'm considering fine-tuning a model for [your use case]. Help me build a cost model:

Current state:
- [How is the task done today?]
- [How long does it take per instance?]
- [How many instances per month?]

Proposed solution:
- Base model: [7B/13B/70B]
- Training data: [estimated examples]
- Expected query volume: [per month]

Calculate:
1. Training investment (including iterations)
2. Monthly operating costs
3. Value created
4. Payback period
5. Key assumptions that could change the analysis

What you're learning: By applying the framework to your actual situation, you build the muscle memory for economic analysis. You'll discover which assumptions matter most and where to focus diligence.

Prompt 2: Break-Even Sensitivity

I calculated that self-hosting breaks even at 2 million queries/month versus GPT-4o-mini.

Create a sensitivity analysis:
1. How does break-even change if API prices drop 50%?
2. How does it change if my infrastructure costs are 2x higher?
3. What's the break-even if I need 99.9% uptime (requiring redundancy)?
4. At what failure rate does the fine-tuned model need to beat the API to justify the investment?

What you're learning: This develops your intuition for which variables matter most. You'll discover that break-even is often sensitive to a few key assumptions—knowing which ones lets you focus due diligence.

Prompt 3: Hidden Cost Audit

Review my LLMOps project plan and identify hidden costs I might have missed:

Project: [Description]
Budget:
- Training: $200
- Infrastructure: $500/month
- Total: $6,200/year

What costs am I likely underestimating? What activities are missing from my plan? Suggest a more realistic budget with justification.

What you're learning: This exercise calibrates your mental model of project scope. You'll internalize the full iceberg of costs, making future estimates more accurate and reducing budget surprises.