Stress-Test Your Numbers
James had the Python calculator from Lesson 4 open on his screen. He had run it with TutorClaw's numbers. He trusted the output. But trust was the problem.
"These numbers look good," he said. "Almost too good. Ninety-nine percent margin. What breaks it? What if fewer people convert? What if Cloudflare changes its pricing? What if I am wrong about something I did not even think to question?"
Emma nodded. "That is the right instinct. Every financial model is a set of assumptions wearing a suit. The question is not whether your assumptions are correct today. The question is which assumptions, if they change, will break your model."
You are doing exactly what James is doing. You have a working financial model. Now you systematically break it to find out where the real risks hide.
Stress Test 1: Conversion Rate
Open the economics.py calculator from Lesson 4 (or paste the function into a conversation with your AI assistant). Start with TutorClaw's defaults:
calculate_economics(
total_learners=16_000,
free_fraction=0.75,
paid_fraction=0.19,
premium_fraction=0.06,
paid_price_usd=1.75,
premium_price_usd=10.50,
infra_cost=60.00,
stripe_percent=0.029,
stripe_flat=0.30,
)
Now change paid_fraction while keeping everything else constant. Run the calculator at each value:
| paid_fraction | premium_fraction | Expected Behavior |
|---|---|---|
| 0.19 | 0.06 | Baseline: TutorClaw's current numbers |
| 0.10 | 0.06 | Paid drops nearly in half |
| 0.05 | 0.06 | Only 5% of free users convert |
| 0.01 | 0.06 | Near-zero paid conversion |
| 0.19 | 0.03 | Premium drops in half |
| 0.19 | 0.01 | Near-zero premium conversion |
Run each scenario. Write down the net revenue and gross margin for each row. Then answer:
- At what
paid_fractiondoes net revenue drop to zero (with premium at 6%)? - At what
premium_fractiondoes net revenue drop to zero (with paid at 19%)? - Which variable is the model more sensitive to: paid conversion or premium conversion?
The answer to question 3 reveals something important. Premium subscribers at $10.50/month contribute far more revenue per person than paid subscribers at $1.75/month. Losing premium subscribers hurts more than losing paid subscribers, even though there are fewer of them. This is the leverage effect of tiered pricing.
Stress Test 2: Infrastructure Costs
Now hold conversion rates at baseline (19% paid, 6% premium) and increase infra_cost:
| infra_cost | What This Simulates |
|---|---|
| $60 | Baseline: current TutorClaw infrastructure |
| $200 | Larger VPS, paid R2 tier, managed database |
| $500 | Adding your own inference server |
| $1,000 | Full managed infrastructure with redundancy |
| $5,000 | Cloud-hosted with auto-scaling |
Run each scenario. Write down the gross margin for each row.
Notice something: even at $1,000/month infrastructure (16x the current cost), the gross margin barely moves. At $60, the margin is ~88.9% (using the calculator's precise tier counts, including Stripe flat fees). At $1,000, it drops to ~82.8%. Infrastructure cost is not where TutorClaw's model breaks.
Now try $12,300 (Architecture 1's total cost). The margin drops to ~22%. That is the difference the Great Inversion makes: the jump from $60 to $12,300 is almost entirely LLM inference cost that Architecture 4 pushes to the learner.
Stress Test 3: The Great Inversion Applied
This is the capstone exercise. You leave TutorClaw behind and apply the pattern to a new product.
Step 1: Pick a product. Choose an AI product idea. Some options:
- A coding assistant that reviews pull requests and suggests fixes
- A customer support bot that handles returns, refunds, and FAQs
- A content generator that writes marketing copy, blog posts, or social media
- A legal document reviewer that flags risks in contracts
- Your own idea
Step 2: Estimate the cost structure. For your chosen product at 10,000 users, fill in this table twice: once for a traditional architecture (you host the LLM), once for an inverted architecture (the user provides their own LLM via OpenClaw):
| Cost Component | Traditional (You Host LLM) | Inverted (User Provides LLM) |
|---|---|---|
| LLM inference | $__/month | $0 |
| Server/VPS | $__/month | $__/month |
| Storage | $__/month | $__/month |
| Payment processing | $__/month | $__/month |
| Total | $__/month | $__/month |
Step 3: Model revenue. Choose a pricing structure (freemium, flat subscription, usage-based). Estimate monthly revenue at 10,000 users.
Step 4: Calculate gross margin for both architectures. Use the formula:
Gross margin = (Revenue - Total costs) / Revenue * 100
Step 5: Scale it. Repeat the cost estimate at 1,000 users and 100,000 users. At what scale does the traditional architecture become unsustainable? At what point does the cost difference between traditional and inverted become larger than the revenue itself?
Step 6: Challenge the inversion. Ask yourself: can the users of this product actually provide their own LLM? A developer using a coding assistant probably can. An accountant reviewing contracts might not. A customer calling a support line definitely cannot. The Great Inversion requires that the end user has both the technical ability and the willingness to run their own inference.
Try With AI
Exercise 1: Automated Sensitivity Analysis
Here are TutorClaw's baseline numbers:
- 16,000 learners: 75% free, 19% paid ($1.75/mo), 6% premium ($10.50/mo)
- Infrastructure: $60/month
- Stripe: 2.9% + $0.30 per transaction
Run a sensitivity analysis. For each variable below, find the
exact threshold where net revenue drops to zero:
1. paid_fraction (holding premium at 6%)
2. premium_fraction (holding paid at 19%)
3. infra_cost (holding conversions at 19%/6%)
Which variable has the tightest margin of safety? Which has
the widest? What does this tell you about where to focus
risk mitigation?
What you are learning: Sensitivity analysis reveals which assumptions matter and which are noise. A variable with a wide margin of safety (like infrastructure cost) does not need constant monitoring. A variable with a tight margin (like premium conversion rate) is where you should invest in retention and growth strategies.
Exercise 2: Design Two Architectures
I want to build a customer support chatbot for e-commerce.
Users send messages about orders, returns, and product questions.
The bot handles 80% of inquiries automatically; 20% get escalated
to a human agent.
Design two architectures:
Architecture A: I host the LLM (estimate costs for GPT-4o-mini
or Claude Haiku at ~500 messages/user/month for 10,000 users)
Architecture B: Each business customer uses their own OpenClaw
instance connected to my MCP server for support intelligence
For each architecture at 10,000 users, estimate:
- Monthly LLM cost
- Monthly infrastructure cost
- Total cost
- Gross margin (assuming $29/mo per business customer)
At what user count does Architecture A become unprofitable?
What you are learning: Applying the Factory/Edge decomposition to a non-education product. The support bot's intelligence (escalation rules, product knowledge, tone guidelines) lives in the Factory layer. The customer's conversation context and LLM live at the Edge. This exercise tests whether the Great Inversion generalizes beyond TutorClaw.
Exercise 3: Challenge the Inversion
The Great Inversion works for TutorClaw because learners are
already AI-literate: they use OpenClaw daily and choosing an
LLM model is second nature.
Consider these three products:
1. An AI tax preparer for small businesses (users: accountants)
2. An AI writing tutor for middle school students (users: 12-year-olds)
3. An AI code reviewer for developer teams (users: software engineers)
For each product:
- Can the end user realistically provide their own LLM? Why or why not?
- If they cannot, what is the minimum viable "inversion" you could
offer? (Partial inversion? Guided setup? Pre-configured OpenClaw?)
- At what point does the user base become AI-literate enough for
full inversion?
Which products can invert today? Which might invert in 2-3 years
as AI tools become more mainstream?
What you are learning: The Great Inversion is not universal. It depends on the end user's technical literacy and willingness to manage their own infrastructure. For some products, the inversion is natural (developer tools, AI-native education). For others, it requires either a gentler onboarding path or a hybrid architecture where the operator handles LLM inference for users who cannot or will not do it themselves.
James leaned back. He had broken TutorClaw's model from every angle. The conversion rate could drop to nearly nothing and the product stayed alive because the infrastructure cost was so low. Infrastructure itself could multiply 16x before the margin dipped below 90%. The model was resilient because the Great Inversion removed the single largest cost component: LLM inference.
Then he had designed a support chatbot with both architectures. The traditional version needed $3,000/month in LLM costs at 10,000 users. The inverted version needed $80. Same revenue. Wildly different margin. The pattern held.
"But there is a catch," James said. "My support bot serves e-commerce managers. They are not developers. They are not going to set up OpenClaw and pick an LLM model. TutorClaw works because your learners are already using these tools every day."
Emma was quiet for a moment. "That is the part I am genuinely uncertain about," she said. "TutorClaw works because learners are AI-literate. They run OpenClaw. They understand what a model is. A product for accountants who have never heard of an API key? I am not sure the inversion applies there. Not yet." She paused. "Maybe in two years, when AI assistants are as common as email clients, every professional will have something like OpenClaw on their machine. But today? For non-technical users? I honestly do not know if the Great Inversion is a universal pattern or a pattern that only works when your users are already on the other side of the literacy gap."
James nodded. "So the thesis has a boundary condition."
"Every thesis does." Emma closed the calculator. "And every architecture does too. TutorClaw looks clean and inevitable now. Four components, one cost table, a thesis that holds up under stress. But it did not start this way. We tried five other designs before landing on Architecture 4. The economics made the final choice obvious, but the path to that choice was anything but."
"How many pivots?"
"Six. And each one felt like starting over, even though most of the work carried forward. That story, the sequence of decisions that produced this architecture, is what we cover next."