Skip to main content

Superpowers: From Plan to Ship

James has a plan. Twelve tasks, each with file paths, actions, and verification steps. He generated it in the last lesson using brainstorming and writing-plans, and it sits in docs/plans/ ready for execution. But he has been here before. The sticky notes on his monitor have task breakdowns from three projects ago. The Notion board has a sprint plan from last month. Plans are easy. Execution is where plans go to die.

"I have a great plan," James says, scrolling through the task list. "But I always have great plans. The problem is somewhere around task 4, when I realize task 2 broke something in task 1, and by task 7 I've abandoned the plan entirely and I'm improvising."

Emma nods. "What if each task ran in a fresh context, so task 4 couldn't accidentally inherit the mistakes from task 2? And what if every task was reviewed against the plan before moving on, so you'd catch drift at task 2 instead of discovering it at task 7?"


You have the same plan James has. The brainstorming skill shaped your idea; the writing-plans skill broke it into granular steps. Now you will watch what happens when Claude executes that plan, tests it, reviews it, and prepares it for merging.

Executing the Plan

What do you think happens when you hand Claude a 12-task plan and say "execute it"? Before running anything, make a prediction: Does Claude attempt all 12 tasks at once? Does it ask permission before each task? What happens if task 3 fails?

Running Your Plan

If you just finished writing-plans in the same session, Claude will offer to execute the plan immediately. It presents two options (subagent-driven or parallel session) and you can say yes to start right away.

If you are returning to a plan later, open Claude Code in your project directory and ask Claude to execute it:

Execute the implementation plan in docs/plans/[your-plan-name].md

Either way, Claude activates one of two execution skills depending on the plan's complexity and your choice.

Two Execution Approaches

SkillHow It WorksBest For
executing-plansRuns tasks in batches, pauses for your review between batchesPlans where you want checkpoint control
subagent-driven-developmentDispatches a fresh subagent per task with two-stage reviewPlans with independent tasks that benefit from isolation

Subagent-driven-development is the more interesting pattern. Here is what you observe when it runs:

  1. Claude reads Task 1 from the plan
  2. A fresh subagent launches with only the context needed for that task
  3. The subagent completes the work
  4. Stage 1 review: Did the subagent build what the plan specified? (spec compliance)
  5. Stage 2 review: Is the code well-structured? (code quality)
  6. Claude moves to Task 2 with a new subagent

Why fresh subagents matter: When a single Claude session handles all 12 tasks, context accumulates. By task 8, Claude carries the memory of tasks 1-7, including wrong turns, abandoned approaches, and patched workarounds. A fresh subagent for each task starts clean. It reads the plan, reads the relevant files, and executes without inheriting baggage from previous tasks.

What Two-Stage Review Looks Like

After each subagent completes its task, two checks run automatically:

Stage 1 (Spec Compliance): "Did you build what was asked?"

The reviewer compares the subagent's output against the plan's task description. If the plan said "Write SKILL.md with persona, core instructions, and output format" and the subagent created a SKILL.md with only a persona and no output format, Stage 1 flags the gap.

Stage 2 (Code Quality): "Is the code well-built?"

Separate from whether it matches the spec, the reviewer checks structure, naming, error handling, and maintainability.

This two-stage pattern prevents a common failure: code that looks clean but doesn't match what you asked for. Style and correctness are different questions, and Superpowers asks them separately.


Testing Before Code: The TDD Skill

James watches the subagents execute his plan. Task 2 involves writing the skill's core instructions. He expects Claude to write the SKILL.md first, then check if it works.

Instead, something surprising happens. Claude writes the expected behavior first:

## Test: Main-point detection on a clear paragraph

**Input:** "Named Ranges let you assign meaningful labels to cells.
Instead of remembering that B7 holds revenue, you name it 'Revenue'
and use that name in every formula."

**Expected output:** Skill identifies main point as "Named Ranges
assign meaningful labels to cells." No clarity issues flagged.

## Test: Main-point detection on a buried paragraph

**Input:** "There are many approaches to organizing data, and while
some prefer tables, others use ranges, but the key insight that most
people miss is that Named Ranges eliminate formula errors entirely."

**Expected output:** Skill flags main point as buried. Suggests
moving the key insight to the first sentence.

The test defines what the skill should do before the skill exists. This is not a mistake. This is RED.

The RED-GREEN-REFACTOR Cycle

The Superpowers TDD skill enforces a strict discipline:

PhaseWhat HappensPurpose
REDWrite expected behavior that the skill cannot yet produceDefine what success looks like before building anything
GREENWrite the minimum skill instructions to produce the expected outputBuild only what is needed, nothing more
REFACTORImprove the skill instructions while keeping the tests greenClean up without changing behavior

The key insight: TDD is not about testing. It is about designing through constraints. The expected behavior tells you what to build before you build it. Without that definition, you decide what to build while building it, which is how scope creeps, edge cases get missed, and "quick fixes" become permanent.

What Happens If You Skip RED

The TDD skill enforces this order strictly. If you (or a subagent) write the implementation before defining expected behavior, the skill flags it. This feels aggressive until you experience the alternative: checks written after the fact tend to verify what the skill already does rather than what it should do. They pass by definition and catch nothing.

Comparing Approaches

Build First, Check AfterDefine First, Build After
Check describes what the skill happens to doCheck describes what the skill must do
Checks pass by default (they mirror what was built)Checks fail first (they define requirements)
Edge cases discovered when users complainEdge cases discovered during design
Changing the skill is risky (checks are fragile)Changing the skill is safe (checks verify behavior)

"I've checked my work after the fact for years," James admits. "The checks always pass on the first run. That should have been a warning sign."


Code Review Against the Plan

Your tasks are executed. Your tests are green. Before merging, one more step: code review.

Ask Claude to review your completed work:

Review the code changes against the original plan

Claude activates the requesting-code-review skill. This is not a generic code review. It reviews against your plan, checking whether the implementation matches the specification you created during brainstorming and planning.

How Spec-Based Review Works

The review examines each task from the plan and compares the implementation:

  • Did task 2 produce the skill instructions described in the plan? Not "are these good instructions?" but "are these the instructions you said you would write?"
  • Do the verification steps from the plan actually pass? The plan said "invoke skill on a sample paragraph, confirm it produces feedback in the expected format." Does it?
  • Did the implementation introduce scope the plan didn't include? Adding features the plan didn't specify is scope creep, even if the features seem useful.

Severity Levels

Review findings come with severity classifications:

SeverityMeaningAction Required
CriticalBlocks merge; implementation contradicts the plan or breaks existing functionalityMust fix before proceeding
ImportantShould fix; implementation works but deviates from plan intent or has quality issuesFix now or create follow-up task
MinorNote for later; style preferences, optimization opportunities, documentation gapsAddress when convenient

Disagreeing with the Review

Here is where critical evaluation becomes essential. The code review is not infallible. Sometimes the review flags something as "critical" that you know is correct because you have context the reviewer lacks.

Emma tells James about a project where a review flagged her skill design as "critical: does not match plan." The plan specified a summary report format, but during building she discovered that inline suggestions worked better for the use case. She updated the skill but not the plan. The review was technically correct (it didn't match the plan) but practically wrong (the plan was outdated).

"The review caught a real discrepancy," Emma says. "But the right response was to update the plan, not revert the skill. Knowing the difference requires your judgment, not the tool's."

When you disagree with a finding:

  1. Read the finding carefully. Does the review have a point you missed?
  2. If you still disagree, articulate why. "The plan specified X, but during implementation I discovered Y, which makes Z the better approach because..."
  3. Update the plan to match the new decision, or fix the code to match the plan. Either is valid; leaving them mismatched is not.

Finishing the Branch

All tasks executed. Tests green. Code reviewed. Time to ship.

Finish this development branch

Claude activates the finishing-a-development-branch skill and runs through a checklist:

Step 1: Verify tests pass

The skill runs the full test suite one final time. Not just the tests you wrote during TDD, but all existing tests. A common failure: your new code passes its own tests but breaks something elsewhere. The finishing skill catches this before you merge.

Step 2: Present merge options

OptionWhat It DoesWhen to Use
MergeMerges the branch into the target branch directlySmall features, high confidence
PRCreates a pull request for team reviewTeam projects, code that affects others
KeepKeeps the branch alive without mergingWork-in-progress, waiting for feedback
DiscardDeletes the branch and its changesExperiment that didn't work out

Step 3: Clean up workspace

After your merge decision, the skill cleans up the git worktree (if you created one in Step 2 of the 7-step workflow). No orphaned branches, no forgotten worktrees cluttering your project.

The Full Cycle, Complete

James merges his branch. Clean commits, each one verified by a subagent, tested before building, reviewed against the plan. The writing-clarity skill he brainstormed in Lesson 18 is now a working plugin.

"I used to spend half my time fixing mistakes from the other half," he says. "The plan caught scope issues before I built them. The tests caught problems before I introduced them. The review caught drift before I merged it."

Emma pauses. "I'll be honest: I still haven't figured out when to let execution run fully autonomously versus when to intervene. Some tasks are clear enough for subagent-driven-development. Others need the checkpoint pauses from executing-plans. I don't have a rule for which is which. I'm still calibrating."

She points at his merged branch. "But here's what matters. The 7-step workflow isn't about making you faster, though it does. It's about making your first attempt closer to your last attempt. The gap between 'what I planned' and 'what I shipped' used to be weeks of rework. Now it's a code review comment."

The full cycle, from a rough idea to merged code, used the same plugin you installed in Lesson 18. Seven skills, one workflow, one philosophy: think before you build, verify as you go, review before you ship. In the next lesson, you will see how Boris Cherny (Anthropic's Head of Claude Code) combines these patterns with parallel sessions, CLAUDE.md customization, and session management into a personal development system.

Try With AI

Execute your plan from Lesson 18:

Execute the implementation plan from Lesson 18. Watch how Claude
dispatches subagents for each task. Count how many subagents are
spawned. For each one, note: what task it is working on, whether
it passes Stage 1 (spec compliance), and whether it passes
Stage 2 (code quality).

What you're learning: How orchestrated execution works. Each subagent gets fresh context, preventing the accumulation of errors that happens when one long session handles everything. The two-stage review separates "did you build the right thing" from "did you build it well."


Test before you build:

Pick one task from your plan. Ask Claude to define the expected
behavior BEFORE building that task. Read the expected behavior
carefully: what does it require the output to look like? Now let
Claude build it. Compare: does the result match what you would
have built if you had started directly without defining success first?

What you're learning: Defining success first changes what you build. The expected behavior constrains the output, eliminating features you didn't ask for and exposing edge cases you would have missed. Work built to satisfy a definition looks different from work built from imagination.


Challenge a code review:

Run a code review on your executed plan. Read every finding.
Find one suggestion you disagree with. Write a response explaining
why your approach is better than what the review recommends.
Be specific: reference the plan, the implementation, and the
tradeoff the review missed.

What you're learning: Critical evaluation of AI feedback. Not every review finding is correct. The ability to accept valid criticism, reject invalid criticism, and articulate the difference is what separates someone who uses AI from someone who is used by AI.

Flashcards Study Aid