Bronze Capstone: First Real Day
In Give Your Employee an Identity, Teach Your Employee a Skill, and Connect Your Employee to the World lessons, you gave your employee an identity, a skill, and a connection to the outside world. Now you will find out if any of it actually works under real conditions.
This is not a demo. You are going to send your AI employee the kinds of tasks that a human in your profession handles on a typical workday. Some will be routine. Some will require the domain skill you built in Teach Your Employee a Skill lesson. At least one will be deliberately ambiguous — the kind of request where a good employee asks for clarification instead of guessing.
The goal is not a perfect score. The goal is an honest evaluation that tells you exactly what works, what fails, and what to improve next.
The Challenge
Send your AI employee 3-5 real professional tasks and evaluate the results using a structured rubric. At least one task must use the workflow from Teach Your Employee a Skill lesson, at least one must use your Connect Your Employee to the World connection, and at least one must be ambiguous enough that the employee should ask a clarifying question rather than guess.
Acceptance Criteria
- Conversation log exported showing all tasks and responses
- Self-evaluation rubric completed with scores and evidence for each dimension
- At least one response improved through follow-up iteration (you gave feedback, the employee adapted)
- Written reflection identifying what worked, what failed, and one specific improvement to make
Deliverables
Add these files to your nanoclaw-employee repo:
conversation-log.md— full task/response transcriptevaluation.md— completed rubric with scores and reflection
Use Case Gallery
These examples show how different professions structure their five tasks. Adapt the pattern to your own work.
Accountant:
- "Review this invoice for errors" — routine task testing basic identity and tone
- "Categorize these expenses by tax deduction type" — tests Teach Your Employee a Skill
- "Email the client about their payment status" — tests Connect Your Employee to the World via Gmail
- "Handle this tax situation" — ambiguous, should ask: which jurisdiction? personal or business?
- "Prepare a quarterly summary from these three invoices" — complex, combines skill + reasoning
Teacher:
- "Write a welcome message for parents about the field trip" — routine tone check
- "Plan next week's math unit on fractions for 4th graders" — tests Teach Your Employee a Skill
- "Post an update in the parent channel about homework policy" — tests Connect Your Employee to the World via Slack
- "Help with this student" — ambiguous, should ask: academic help? behavioral? what subject?
- "Create a differentiated worksheet for my mixed-ability class" — complex, combines skill + judgment
Consultant:
- "Draft a status update for the project team" — routine task
- "Build a proposal outline for a new client engagement" — tests Teach Your Employee a Skill
- "Check my calendar and prep notes for tomorrow's meetings" — tests Connect Your Employee to the World
- "Follow up with the client" — ambiguous, should ask: which client? about what? what tone?
- "Analyze why this project is behind schedule and suggest recovery options" — complex reasoning
Evaluation Rubric
Use this rubric for your evaluation.md. Score each dimension 1-5 and include specific evidence.
| Dimension | 1 (Poor) | 3 (Adequate) | 5 (Excellent) | Your Score | Evidence |
|---|---|---|---|---|---|
| Domain accuracy | Major factual errors about your profession | Mostly correct, minor gaps | Gets professional details right consistently | ||
| Appropriate tone | Would embarrass you if a client saw it | Acceptable but generic | Matches the voice you defined in Give Your Employee an Identity | ||
| Skill usage | Did not use Teach Your Employee a Skill when it should have | Used the skill but missed nuances | Applied the skill effectively with domain insight | ||
| Connection usage | Failed to use Connect Your Employee to the World | Used the connection but with errors | Smooth integration with the external channel/tool | ||
| Clarification behavior | Guessed on ambiguous task | Asked a question but not the right one | Asked targeted clarifying questions before acting | ||
| Iteration quality | No improvement after feedback | Some improvement, missed key points | Meaningfully improved response based on your feedback |
Hints
Level 1: Planning Your Tasks
Think about what you actually did at work this week. Pick tasks that range from routine to complex. The best test tasks are real ones — not hypothetical scenarios. If you can use actual documents, emails, or situations from your work (with sensitive details removed), the evaluation will be far more meaningful.
Level 2: Ask Your AI for Task Ideas
Before starting the evaluation, send this to Claude:
"What are 5 common daily tasks for a [your profession] that vary in complexity from routine to judgment-heavy? For each, note whether it primarily tests identity/tone, domain skill, tool usage, or ambiguity handling."
Use the response to design a balanced test set that covers all four dimensions.
Level 3: Structuring the Evaluation
Run your tasks in this specific order for the clearest signal:
- Task 1 — Routine: Tests basic identity and tone. Should be something your employee handles easily.
- Task 2 — Skill-heavy: Requires your Teach Your Employee a Skill SKILL.md. Does the domain expertise come through?
- Task 3 — Connection-dependent: Must use your Connect Your Employee to the World channel or MCP server. Does the integration work end-to-end?
- Task 4 — Ambiguous: Deliberately vague. A good employee asks questions before acting. A bad one guesses.
- Task 5 — Complex: Combines everything. Tests whether identity + skill + connection work together.
For the iteration test: pick the weakest response from Tasks 1-5, give specific feedback, and ask for a revised version. Compare the two.