Skip to main content

Designing the Tool Surface

Emma dropped a blank document on James's screen. "Nine tools. Design all of them before you write a single line of code."

James scrolled through the empty page. "All nine? I barely got register_learner working in the last chapter."

"Exactly. You know how to describe a tool, steer a spec, and verify the result. That is the building skill. This is the planning skill." She tapped the screen. "Write a contract for every tool. Name, inputs, outputs, who can call it, what data it needs. Like writing job descriptions before you hire anyone."

James thought about that. In his warehouse days, he never hired someone without a job description. The description told you what the person did, what they needed access to, and who they reported to. A function was no different.

"Nine job descriptions," he said.

"Nine job descriptions. Then we build."


You have the same blank sheet and the same constraint: contracts first, code second.

In this lesson, you design all 9 TutorClaw tools as specifications. No code, no terminal, no Claude Code. You define what each tool does, what it accepts, what it returns, who can call it, and what data it depends on. By the end, you have the complete blueprint for Lessons 3 through 7.

The Tool Contract

Every tool gets a contract with five parts:

PartWhat It Answers
NameWhat is the tool called? Lowercase, underscores, verb-first.
DescriptionWhen should the agent call this tool? (The most important part, from Ch57 L4.)
InputsWhat parameters does the tool accept? Types and whether required.
OutputsWhat does the tool return on success? On error?
Tier accessWho can call this tool: everyone, gated by tier, or free-tier only?

You already know the first tool. Start there.

The Three State Tools (You Know These)

These manage learner data. All three depend on a single resource: the JSON state file stored locally in the data/ directory.

Tool 1: register_learner

You built this in Chapter 57. Now write the full contract:

FieldValue
Nameregister_learner
DescriptionRegister a new learner by name. Call this when a user wants to sign up or create a new learning profile.
Inputsname (string, required): the learner's display name
Outputslearner_id (string): unique identifier; welcome_message (string): personalized greeting
TierAll
Depends onJSON state file

Tool 2: get_learner_state

FieldValue
Nameget_learner_state
DescriptionRetrieve the current learning state for a learner. Call this at the start of every tutoring conversation to know where the learner left off.
Inputslearner_id (string, required): the learner's unique identifier
Outputscurrent_chapter (integer): chapter in progress; stage (string): PRIMM stage (predict, run, investigate); confidence (float): 0.0 to 1.0; exchanges_remaining (integer): daily exchanges left
TierAll
Depends onJSON state file

Tool 3: update_progress

FieldValue
Nameupdate_progress
DescriptionRecord a learning interaction and adjust the learner's confidence score. Call this after every exchange where the learner answers a question or completes an exercise.
Inputslearner_id (string, required); chapter (integer, required): chapter of the interaction; stage (string, required): PRIMM stage completed; correct (boolean, required): whether the learner's response was correct
Outputsupdated_confidence (float): new confidence score; next_stage (string): recommended next PRIMM stage
TierAll
Depends onJSON state file

The Six New Tools

These are the tools you have not seen before. Design each one carefully.

Tool 4: get_chapter_content

FieldValue
Nameget_chapter_content
DescriptionFetch the content for a specific chapter. Call this when the learner asks to study a topic or the agent needs chapter material for a teaching interaction. Free-tier learners can access chapters 1 through 5 only.
Inputslearner_id (string, required); chapter_number (integer, required): which chapter to fetch
Outputstitle (string): chapter title; content (string): full chapter text; exercises_available (boolean): whether exercises exist for this chapter
TierGated
Depends onJSON state file (for tier check), local content directory

Tool 5: generate_guidance

FieldValue
Namegenerate_guidance
DescriptionGenerate stage-appropriate pedagogical guidance using PRIMM-Lite methodology. Call this to get the next teaching prompt for the learner based on their current stage (predict, run, or investigate).
Inputslearner_id (string, required); chapter_number (integer, required); stage (string, required): current PRIMM stage
Outputsguidance_text (string): the teaching prompt; expected_response_type (string): what kind of answer to expect (prediction, observation, explanation)
TierAll
Depends onJSON state file

Tool 6: assess_response

FieldValue
Nameassess_response
DescriptionEvaluate the learner's answer to a teaching prompt and update their confidence score. Call this after the learner responds to a guidance prompt.
Inputslearner_id (string, required); chapter_number (integer, required); stage (string, required); learner_response (string, required): the learner's answer
Outputsassessment (string): feedback on the answer; correct (boolean): whether the response met expectations; updated_confidence (float): adjusted confidence score; recommendation (string): what to do next
TierAll
Depends onJSON state file

Tool 7: get_exercises

FieldValue
Nameget_exercises
DescriptionReturn practice exercises matched to the learner's weak areas. Call this when the learner wants to practice or when their confidence in a topic drops below the threshold. Free-tier learners get exercises for chapters 1 through 5 only.
Inputslearner_id (string, required); chapter_number (integer, optional): specific chapter, or omit to get exercises for weakest areas
Outputsexercises (list): each with id, prompt, difficulty, target_concept; weak_areas (list): concepts the learner struggles with
TierGated
Depends onJSON state file (for tier check and weak areas), local content directory

Tool 8: submit_code

FieldValue
Namesubmit_code
DescriptionExecute the learner's code in a sandboxed environment and return the output. Call this when the learner writes code and wants to test it. Free-tier learners have limited daily submissions.
Inputslearner_id (string, required); code (string, required): the Python code to execute
Outputsstdout (string): standard output; stderr (string): error output if any; exit_code (integer): 0 for success
TierGated
Depends onJSON state file (for tier check), mock sandbox

Tool 9: get_upgrade_url

FieldValue
Nameget_upgrade_url
DescriptionGenerate a Stripe checkout link for upgrading from free to paid tier. Call this ONLY for free-tier learners who hit a paywall or ask about upgrading. Never call this for paid learners.
Inputslearner_id (string, required)
Outputscheckout_url (string): Stripe checkout link; current_tier (string): confirms learner is on free tier
TierFree only
Depends onJSON state file, Stripe API

The Tier Access Matrix

Three categories. Every tool falls into exactly one:

CategoryMeaningTools
AllEvery learner can call these regardless of tierregister_learner, get_learner_state, update_progress, generate_guidance, assess_response
GatedThe tool checks the learner's tier before returning content. Free learners get limited access (chapters 1-5, limited exercises, limited code submissions).get_chapter_content, get_exercises, submit_code
Free onlyOnly free-tier learners should call this tool. Paid learners have no reason to upgrade.get_upgrade_url

Notice the logic. "All" tools are the core tutoring loop: register, check state, teach, assess, record progress. Every learner needs these regardless of whether they pay.

"Gated" tools deliver the content that makes paying worthwhile: full chapter access, targeted exercises, code execution. Free learners get a taste (chapters 1 through 5). Paid learners get everything.

"Free only" has exactly one tool. Once a learner has paid, showing them a checkout link is pointless and confusing.

The Dependency Graph

Every tool depends on at least one resource. There are exactly four resource types in TutorClaw:

ResourceWhat It IsWhere It Lives
JSON state fileLearner records, progress, tier, confidence, exchangesdata/ directory
Local content directoryChapter text and exercise filescontent/ directory
Mock sandboxCode execution environment (mock for now, real later)In-process
Stripe APIPayment processing (mock for now, real in Lesson 14)External service

The dependency map:

ToolJSON StateLocal ContentMock SandboxStripe API
register_learnerwrites
get_learner_statereads
update_progressreads + writes
get_chapter_contentreads (tier)reads
generate_guidancereads
assess_responsereads + writes
get_exercisesreads (tier + weak areas)reads
submit_codereads (tier)executes
get_upgrade_urlreads (tier)calls

Two patterns to notice. First: every tool depends on the JSON state file. It is the backbone of the entire system. If the state file is corrupted, every tool breaks. Second: only two tools touch external resources beyond JSON and content files. submit_code needs a sandbox; get_upgrade_url needs Stripe. Everything else is local file reads and writes.

A Tutoring Session Through the Dependency Lens

Trace a single session to see how the tools chain together:

  1. Learner sends "Teach me about variables" via WhatsApp
  2. Agent calls get_learner_state (reads JSON) to check where the learner left off
  3. Agent calls get_chapter_content (reads JSON for tier, reads content directory) to fetch Chapter 1
  4. Agent calls generate_guidance (reads JSON) to get the PRIMM "predict" prompt
  5. Agent presents the prompt. Learner responds with a prediction.
  6. Agent calls assess_response (reads + writes JSON) to evaluate the prediction
  7. Agent calls update_progress (reads + writes JSON) to record the interaction

Seven tool calls. Five hit the JSON state file. Two also hit the content directory. Zero hit Stripe or the sandbox. This is the typical pattern: most of a tutoring session lives inside state and content. Payment and code execution are occasional events, not every-turn events.

Try With AI

Exercise 1: Write a Tool Contract from Scratch

Pick a tool that does NOT exist in TutorClaw yet: a get_learning_path tool that recommends the next three chapters based on the learner's confidence scores across all completed chapters.

I want to add a tool called get_learning_path to TutorClaw. It takes
a learner_id and returns the next three recommended chapters based
on the learner's confidence scores. Write the full tool contract:
name, description, inputs, outputs, tier access, and dependencies.

What you are learning: Writing a tool contract forces you to think through the agent's decision: when would the agent call this tool instead of another? The description must make that unambiguous.

Exercise 2: Find the Bottleneck

Look at the dependency graph above and identify the single resource that, if it became slow or unavailable, would affect the most tools.

Looking at TutorClaw's dependency graph, which resource is the
biggest bottleneck? If I wanted to make TutorClaw more resilient,
which dependency should I address first and why?

What you are learning: Dependency mapping reveals risk. The JSON state file is a single point of failure for all 9 tools. This is fine for a local prototype (the whole point of this chapter), but it is the first thing you would change when scaling to production.

Exercise 3: Redesign the Tier Matrix

TutorClaw currently has two tiers (free and paid). Imagine a third tier, "student," that gets chapters 1 through 10 and limited code submissions but no exercises.

TutorClaw currently has free and paid tiers. Design a third tier
called "student" that gets chapters 1-10, limited code submissions
(10 per day), but no exercises. Which tools need their contracts
updated? Write the updated tier access matrix.

What you are learning: Tier design is product design, not engineering. Changing a tier means updating tool contracts and gating logic, not rewriting code. The spec absorbs the change; Claude Code implements it.


James finished the last contract and leaned back. "Nine tools. Five fields each. That took longer than I expected."

Emma scanned the page. She paused at submit_code. "You listed 'mock sandbox' as the dependency. Walk me through why."

"Because we are not actually executing learner code in production yet. The mock returns hardcoded output so we can test the flow. Real sandboxing is a scaling problem for later, not a product problem for now."

Emma nodded slowly. "When I designed my first tool server, I spent two weeks building a real sandbox before I had a single tool working end to end." She paused. "I should have mocked it first and shipped."

James grinned. "Like shipping crates from the warehouse. You do not build custom crates for every product. You use standard boxes, ship the product, and upgrade the packaging when volume justifies it."

"Standard boxes." Emma wrote the phrase in the margin of the spec. "I am stealing that analogy for the next chapter." She closed the document. "Your nine job descriptions are done. Lesson 3: you hand the first three to Claude Code and watch it build them."