Skip to main content

Context Engineer Your Tools

Emma set her coffee down and pulled up a screenshot on her phone. A product page. Five-star reviews on top, then a wall of one-star reviews at the bottom.

"This was mine," she said. "An agent product I shipped two years ago. Nine tools, clean architecture, solid tests. It died in eight weeks."

James leaned in. "What happened?"

"Tool selection. The agent called the billing tool when users asked about their learning progress. The word 'account' appeared in both descriptions. Users would say 'where is my account' meaning their learning profile, and the agent would pull up their payment history." She turned the phone off. "Two months. That is how long it lasted. We patched routing logic, we added conditional checks, we built a classifier on top. None of it worked. The fix was always in the descriptions."

"What was wrong with them?"

"They were one line each. One line per tool. Nine tools, nine lines. The agent had to guess from nine vague sentences which tool to call." She looked at his screen. "Your tool descriptions have the same problem. Let me show you."


You are doing exactly what James is doing. You have nine tools in your TutorClaw server and AGENTS.md orchestrating them. The tools work. The tests pass. But the descriptions that tell the agent when to call each tool are still the one-line versions from when you first built them.

In this lesson, you send three ambiguous messages from WhatsApp, watch the agent pick the wrong tool, then rewrite all nine descriptions using a pattern that fixes the problem.

Step 1: Send Three Ambiguous Messages

Open WhatsApp and send these three messages to your TutorClaw agent. Before each one, predict which tool should fire. Then watch the dashboard.

Message 1: "Help me understand where I am in the course"

Should trigger get_learner_state (position in curriculum). But "understand" might trigger generate_guidance and "in the course" might trigger get_chapter_content. Watch which tool badge lights up.

Message 2: "I'm stuck on this concept and need help"

Should trigger generate_guidance (pedagogical support). But "stuck" might trigger get_exercises and "need help" is so generic that any tool could claim it. Watch the dashboard.

Message 3: "Can I try something?"

Should trigger get_exercises (practice). But "try" might trigger submit_code and the vagueness gives the agent nothing to work with. Watch the dashboard.

Record what happened:

MessageExpected ToolActual ToolCorrect?
"Help me understand where I am in the course"get_learner_state??
"I'm stuck on this concept and need help"generate_guidance??
"Can I try something?"get_exercises??

At least one should fire the wrong tool. If all three are correct, send increasingly ambiguous messages until you find a miss. Every multi-tool agent has a selection boundary where descriptions blur.

Step 2: The Two-Layer Description Pattern

The problem with one-line descriptions is that they answer only one question: "What does this tool do?" The agent also needs to know: "When should I call this tool instead of the other eight?"

The fix is a two-layer description.

Layer 1 (Short): One sentence for the agent's initial tool selection scan. Precise, no ambiguity. This is the job title on a resume.

Layer 2 (Behavioral): Detailed guidance about WHEN to use this tool, WHEN NOT to use it, and what to do with the result. This is the full job description with responsibilities and boundaries.

Here is what this looks like for get_chapter_content:

Before (one-line):

Fetch chapter content for the learner.

The agent sees "chapter content" and calls this tool any time someone mentions a chapter. That includes "what chapter am I on?" (wrong tool: that is get_learner_state) and "give me exercises from chapter 3" (wrong tool: that is get_exercises).

After (two-layer):

Layer 1: Fetch the text content for a specific chapter number.
Use ONLY when the learner needs to read course material.

Layer 2: Call this tool when the learner explicitly asks to read,
study, or review a chapter. NEVER call this when the learner asks
about their progress (use get_learner_state instead). NEVER call
this when the learner wants practice problems (use get_exercises
instead). After fetching content, pass it to generate_guidance
to create a PRIMM-structured teaching response.

Layer 1 tells the agent what the tool does. Layer 2 tells the agent how to make the right decision.

Step 3: Add NEVER Statements

NEVER statements work because of how agents select tools. With nine tools and an ambiguous message, the agent eliminates poor matches first, then picks from what remains. Positive descriptions tell the agent what qualifies. NEVER statements tell the agent what to eliminate. Elimination narrows the field faster.

For each tool, write at least one NEVER statement that prevents the most common wrong selection:

ToolNEVER StatementWhy
register_learnerNEVER call this for existing learners. If the learner already has an ID, they are already registered.Prevents re-registration when someone says "sign me up for chapter 5"
get_learner_stateNEVER call this to change the learner's progress. This tool reads state only. Use update_progress to write changes.Prevents confusion with update_progress
update_progressNEVER call this to check where a learner is. Use get_learner_state to read current position.Prevents confusion with get_learner_state
get_chapter_contentNEVER call this when the learner wants exercises or practice problems. Use get_exercises instead.Prevents confusion with get_exercises
get_exercisesNEVER call this when the learner wants to read or study chapter material. Use get_chapter_content instead.Prevents confusion with get_chapter_content
generate_guidanceNEVER call this to fetch raw content. This tool generates teaching responses, not chapter text. Use get_chapter_content for raw content.Prevents confusion with get_chapter_content
assess_responseNEVER call this for general questions or conversation. Only call this when the learner submits an answer to a specific exercise or question.Prevents triggering on casual messages
submit_codeNEVER call this for non-code messages. Only call this when the learner explicitly submits code to run.Prevents triggering on "let me try this idea"
get_upgrade_urlNEVER call this for paid-tier learners. Check the learner's tier first using get_learner_state.Prevents showing upgrade options to paying customers

Each NEVER statement targets one specific confusion. Target the most common wrong selection, not every possible one.

Step 4: Add Cross-Tool References

NEVER statements tell the agent what not to do. Cross-tool references tell it what to do instead. If the agent reaches the wrong tool's description, the reference redirects it to the right one.

get_learner_state:

If the learner wants to update their progress after completing
something, use update_progress instead. If the learner wants chapter
content, use get_chapter_content instead.

generate_guidance:

If the learner just wants to read chapter material without teaching
structure, use get_chapter_content instead. If the learner wants
exercises to practice, use get_exercises instead.

get_exercises:

If the learner wants to submit code they already wrote, use
submit_code instead. If the learner wants to study before
practicing, use get_chapter_content first.

Cross-references create a navigation map inside your tool descriptions. An agent that lands on the wrong tool can redirect itself without failing the request.

Step 5: Describe the Updates to Claude Code

Open Claude Code in your tutorclaw-mcp project and send this message:

I need to rewrite all 9 tool descriptions in the TutorClaw MCP server
using a two-layer pattern.

For each tool, the description should have:

Layer 1: One sentence stating what the tool does. Precise and
unambiguous.

Layer 2: Behavioral guidance with:
- WHEN to call this tool (specific triggers)
- WHEN NOT to call this tool (NEVER statements for the most
common wrong selection)
- Cross-references to related tools ("use X instead")

Here are the NEVER statements for each tool:

register_learner: NEVER call for existing learners
get_learner_state: NEVER call to change progress (read only)
update_progress: NEVER call to check current state (write only)
get_chapter_content: NEVER call for exercises or practice
get_exercises: NEVER call for reading material
generate_guidance: NEVER call for raw content
assess_response: NEVER call for general conversation
submit_code: NEVER call for non-code messages
get_upgrade_url: NEVER call for paid-tier learners

Update all 9 tool descriptions with this pattern.

Claude Code updates the descriptions across the server. It knows where they live because it built the tool registration code.

Step 6: Verify the Fix

Resend the same three messages from Step 1. Same words, same order.

Message 1: "Help me understand where I am in the course" should now fire get_learner_state. The NEVER statements on generate_guidance and get_chapter_content eliminate them as candidates.

Message 2: "I'm stuck on this concept and need help" should now fire generate_guidance. Its description says "Call this when the learner is stuck or needs pedagogical support." NEVER statements on get_exercises and get_chapter_content eliminate them.

Message 3: "Can I try something?" should now fire get_exercises. Its description says "Call this when the learner wants to practice." The NEVER statement on submit_code eliminates it.

MessageExpected ToolActual Tool (Before)Actual Tool (After)Fixed?
"Help me understand where I am in the course"get_learner_state???
"I'm stuck on this concept and need help"generate_guidance???
"Can I try something?"get_exercises???

If any still miss, read the description for the tool that incorrectly fired. Does it have a NEVER statement for this scenario? Does the correct tool's description have a clear positive match? Refine and test again.

Try With AI

Exercise 1: Find a Fourth Ambiguous Message

Invent a message that could plausibly trigger two or more tools. Test it against your updated descriptions:

Send this message to TutorClaw via WhatsApp: "I think I got it,
what is next?"

Which tool fires? Is it the right one? If not, which tool
description needs a NEVER statement or cross-reference to fix it?

What you are learning: Real users send messages that do not map cleanly to any single tool. Finding these boundary cases before your users do is the difference between a product that feels intelligent and one that feels broken.

Exercise 2: Audit the Description Lengths

Ask Claude Code to review the balance between Layer 1 and Layer 2 across all nine tools:

List all 9 tool descriptions. For each one, count the words in
Layer 1 (the short selection sentence) and Layer 2 (the behavioral
guidance). Flag any tool where Layer 2 is longer than 4 sentences.
Too much guidance dilutes the signal.

What you are learning: Tool descriptions follow the same principle as AGENTS.md: enough context to make the right decision, not so much that the agent drowns in instructions. Brevity is a design constraint, not a limitation.

Exercise 3: Compare Before and After

Ask Claude Code to show you the original one-line descriptions alongside the new two-layer versions:

Show me a before-and-after comparison of the tool descriptions
for get_learner_state, get_chapter_content, and generate_guidance.
Original one-line version next to the new two-layer version.
Which specific additions prevent the most confusion?

What you are learning: Context engineering is not about writing more. It is about writing the right constraints. The NEVER statements and cross-references you added are often fewer than 30 words per tool, but they change the agent's behavior dramatically.


James resent the three messages. He watched the dashboard.

Message 1: get_learner_state. Correct.

Message 2: generate_guidance. Correct.

Message 3: get_exercises. Correct.

"Two layers. NEVER statements. Cross-references." He leaned back. "The agent is not smarter. It just has better directions."

Emma nodded. "That is context engineering. You did not change the model. You did not change the code. You changed the context the model reads before it makes a decision." She picked up her coffee. "Two months. That is how long my product lasted because I thought one-line descriptions were enough. We built routing logic on top, conditional checks, a classifier to pre-sort messages. None of it worked. The fix was always in the descriptions. Thirty words per tool. That is all it needed."

James looked at the nine updated descriptions on his screen. "So the agent reads these every time it gets a message?"

"Every time. Nine descriptions, nine NEVER statements, nine sets of cross-references. The agent scans all of them, eliminates the ones that say NEVER for this scenario, and picks from what remains. The better your descriptions, the smaller the remaining set, the more accurate the selection."

"What is next?"

"You have nine tools. You have AGENTS.md. You have descriptions that actually work." She set her cup down. "You do not have tests. The restart test from Lesson 3 proved your state tools survive. The WhatsApp test from Lesson 8 proved your server connects. But you do not have a test suite that proves all nine tools work correctly in every scenario. Lesson 11: we build that."