Skip to main content

Model Guidance Strategy

James pulled up the architecture diagrams from Lesson 5. Something had been nagging him since the comparison table. Architectures 1, 2, and 3 all had the same component in common: a model router. Free learners got DeepSeek. Paid learners got GPT-5.4 mini. Premium learners got Claude Sonnet. OpenRouter acted as the gateway. Claude Code Router sat inside every NanoClaw container.

"Architecture 4 does not have any of that," he said. "No OpenRouter. No Claude Code Router. No routing logic at all. But the learners still use different models. Where did the routing go?"

Emma shrugged. "It did not go anywhere. It was eliminated. The learner opens OpenClaw, picks whatever model they want, and connects to TutorClaw's MCP server. We have zero control over that choice and zero cost exposure from it."


You are doing exactly what James is doing. You have seen the 37x cost range across models (Lesson 2) and the four architectures compared (Lesson 5). Now you are looking at the gap: if you cannot route learners to specific models, what do you do instead?

What Architecture 4 Removes

In Architectures 1 through 3, model routing required dedicated infrastructure:

ComponentRole in RoutingApproximate Monthly Cost
OpenRouter gatewayUnified API for multiple LLM providers$200-400 (API fees + overhead)
Claude Code RouterShim inside NanoClaw containers routing by tier$200-400 (container compute)
NanoClaw containersPer-learner containers running routing logic$100-200 (orchestration)
Routing configurationTier-to-model mapping, fallback logic, monitoringEngineering time

Total routing infrastructure: $500-1,000/month.

Architecture 4 removes all four rows. The learner picks their model in OpenClaw. The cost of model routing drops to $0.

That $500-1,000/month saving is not a token cost reduction. It is infrastructure that no longer exists. The learner's LLM cost is a separate number that the learner pays directly to the model provider. Panaversity never touches it.

TutorClaw's Model Guidance Table

Routing is gone, but guidance remains. TutorClaw publishes recommendations in the shim skill's documentation and in the MCP server's structured responses:

Learner's BudgetRecommended ModelExpected QualityApprox. Cost/Day
Tight (free API credits)DeepSeek V3.2 or GPT-5 NanoGood for PRIMM-Lite, weaker on full PRIMM-AI+$0.01-$0.05
ModerateGPT-5.4 miniGood for full PRIMM-AI+, solid code execution feedback$0.05-$0.15
ComfortableClaude Sonnet 4.5Excellent for all features, best pedagogical depth$0.15-$0.40
Premium / corporateClaude Opus 4.6Maximum quality for complex reasoning$0.50-$2.00

This table does not control anything. A learner on the Tight budget can connect Claude Opus if they want. A premium corporate user can connect DeepSeek. The table is a recommendation, not a gate.

The Calculation: Routing Infrastructure vs Guidance

Compare the monthly cost of the two approaches:

Model routing (Architectures 1-3):

  • Routing infrastructure: $500-1,000/month
  • Plus the operator's LLM token costs: $2,000-12,000/month (varies by architecture)
  • Total model-related cost: $2,500-13,000/month

Model guidance (Architecture 4):

  • Routing infrastructure: $0
  • Operator's LLM token costs: $0 (learner pays their own)
  • Cost of publishing guidance: $0 incremental (it lives in the shim skill docs and MCP responses already built)
  • Total model-related cost: $0

The savings are not just in tokens. The entire category of routing infrastructure disappears.

Design Your Own Guidance Table

TutorClaw's table works for a tutoring product. Your product has different features, different quality thresholds, and different user expectations. To build your own table, you need three inputs:

  1. Your product's core features (what does it do at minimum? what degrades with weaker models?)
  2. Your user segments (who are your budget-conscious users? who will pay for quality?)
  3. Model pricing from Lesson 2 (the 37x range from $0.40/M tokens to $15/M tokens and above)

A token budget calculator helps translate user behavior into daily cost:

Daily cost = (exchanges/day) x (avg tokens/exchange) x (price per token)

For TutorClaw, a typical study session involves roughly 20-30 exchanges at around 2,000-4,000 tokens per exchange (input + output combined). At Claude Sonnet 4.5 pricing:

30 exchanges x 3,000 tokens x ($15 / 1,000,000 tokens) = $1.35/day

But at DeepSeek V3.2 pricing:

30 exchanges x 3,000 tokens x ($0.40 / 1,000,000 tokens) = $0.036/day

That is the 37x range in practice. Both learners connect to the same MCP server. Both get the same structured pedagogical guidance. The difference is how the LLM interprets and presents that guidance.

Why the MCP Server Makes This Work

The get_pedagogical_guidance tool returns structured responses: step-by-step instructions, concept breakdowns, assessment criteria. These are not open-ended prompts that require a strong model to interpret correctly. They are explicit structures that even a weaker model can follow.

For a strong model like Claude Sonnet 4.5, the structured response becomes a scaffold for richer pedagogical conversation. The model adds nuance, asks follow-up questions, adapts its teaching style.

For a weaker model like DeepSeek V3.2, the same structured response acts as a strict template. The model follows the steps more literally, with less embellishment. The pedagogy still works because the logic comes from the MCP server, not the model.

This is the core design insight: the intelligence lives in the server's structured responses, not in the LLM's general capability. The LLM is the delivery mechanism. The MCP server is the brain.

The Product Design Question

Here is the shift that matters. In Architectures 1 through 3, the economics question was: "How do we minimize our model costs?" The operator was paying for LLM inference, so every token mattered. Routing learners to cheaper models was a cost optimization strategy.

In Architecture 4, the operator pays $0 for LLM inference. The economics question becomes: "How do we make our pedagogical intelligence valuable enough that learners choose to pay for it regardless of their model costs?"

That is a product design question, not an infrastructure question. The answer is not "use cheaper models." The answer is: build structured intelligence into the MCP server that makes every model better at teaching your subject.

Try With AI

Exercise 1: Calculate the Learner LLM Cost Distribution

Give your AI assistant TutorClaw's guidance table and learner distribution, then compute the total:

TutorClaw has 16,000 learners with this distribution:
- 75% (12,000) use the Tight tier at $0.01-$0.05/day
- 19% (3,040) use the Moderate tier at $0.05-$0.15/day
- 6% (960) use the Comfortable or Premium tier at $0.15-$2.00/day

Using the midpoint of each range, calculate:
1. The average daily LLM cost per learner
2. The total monthly LLM cost across all 16,000 learners
3. Remember: Panaversity pays none of this. Who pays it?

Then compare this total to what Panaversity would have paid
under Architecture 1 ($12,000/month in LLM costs). How much
money moved from operator expense to learner expense?

What you are learning: The total LLM cost did not disappear. It shifted from the operator's P&L to the learner's personal spending. This is the Great Inversion applied to model costs specifically. The operator's margin improves not because the cost went away, but because it moved to the other side of the balance sheet.

Exercise 2: Design a Guidance Table for Your Own Product

Pick a product idea (or use TutorClaw as a starting point) and design your own model guidance table:

I am building an AI-powered [your product type] that helps
[your user] with [your core task]. The product's key features are:
1. [Feature that needs strong reasoning]
2. [Feature that works with basic generation]
3. [Feature that depends on structured data from an MCP server]

Design a model guidance table with four budget tiers. For each tier:
- Recommend a specific model (use real model names and pricing)
- Describe which features work well and which degrade
- Estimate the daily cost based on [X] interactions per day
at roughly [Y] tokens per interaction

Format it as a Markdown table matching this structure:
| Budget Level | Recommended Model | Expected Quality | Approx. Cost/Day |

What you are learning: Model guidance is a product design exercise, not a cost minimization exercise. The quality column forces you to think about what your product actually needs from the LLM versus what comes from your structured backend. Products with strong server-side logic degrade less across model tiers.

Exercise 3: Compare Routing Infrastructure Cost

Ask your AI assistant to trace the request flow under both approaches:

In Architecture 2 of TutorClaw, a free-tier learner sends a message.
Trace the full request flow: which components does the message pass
through? Include OpenRouter, Claude Code Router, the NanoClaw
container, and the model endpoint.

Now trace the same message in Architecture 4. The same learner,
the same question. Which components are gone? What replaced them?

Finally: estimate the monthly infrastructure cost of the routing
components that were removed. Do not include LLM token costs in
this estimate, only the infrastructure (API gateway fees, container
compute, orchestration overhead).

What you are learning: The difference between token costs and routing infrastructure costs. Most discussions about LLM economics focus on per-token pricing. This exercise reveals the hidden cost layer: the infrastructure needed to route requests to different models. Architecture 4 eliminates this layer entirely, saving $500-1,000/month in infrastructure alone, before any token savings.


James sat quietly for a moment. "It is like recommending tools to warehouse workers," he said finally. "I used to manage a distribution center. We told every new hire: get the Milwaukee M18 drill, it handles everything we throw at it. Some guys bought the DeWalt instead because it was cheaper. Some bought the Makita because they already had the batteries. We did not care. The drill was their tool. What mattered was that our standard operating procedures worked regardless of which drill they brought."

"And if a new drill came out that was twice as good for half the price?" Emma asked.

"Then everyone benefits. Our procedures do not change. The work gets done faster because the tool is better, but the procedures, the intelligence, stay the same." He pointed at the guidance table on his screen. "That is what TutorClaw does. The MCP server is the operating procedure. The model is the drill."

Emma nodded. "That is exactly right. And I should be honest about what I do not know here. I designed the MCP responses to degrade gracefully across model quality levels. I have tested Claude and GPT extensively. But I have not tested every model on the market. The structured responses should work with models I have not verified, because the logic is explicit enough for any instruction-following LLM. But that is an assumption, not a confirmed fact. I am confident in the architecture; I am less confident in how edge-case models handle the structured guidance."

"So what do you do about that?"

"Publish the guidance table, collect feedback from learners who use different models, and update the recommendations as the data comes in." She set down her pen. "But the bigger question is not which model a learner picks. It is what this architecture makes TutorClaw, economically. You have a product that costs almost nothing to run, serves thousands of learners, and lets each of them bring their own compute. That is not just a cost structure. That is a new kind of economic actor."

James looked at the guidance table, then at the cost comparison from earlier lessons. "A Digital FTE."

"Exactly. And that is what we formalize next."

Flashcards Study Aid