Skip to main content

Orchestrate Other Agents

James had a task that needed three things: research a market trend, analyze competitors, and draft a summary. He typed all three into one message and sent it. His agent produced a single paragraph that mentioned no sources, skipped the competitor analysis entirely, and ended with "Let me know if you need more detail."

"At the warehouse," James said, "when a large order came in with three product lines, I never asked one supplier to handle all three. I split it. One per supplier. Tracked who delivered first." He looked at his WhatsApp. "This agent tried to do all three in one pass and did none of them well."

"What tools does it have?" Emma asked.

James checked the dashboard. "Something called sessions_spawn. It can spawn a subagent to handle a subtask."

"Did it use it?"

"No. It answered directly."

Emma pulled up a chair. "Two problems. First: the model. Free-tier models see tools but do not use them. They answer badly instead of delegating well. Second: you have not told it how to work. At the warehouse, who decided to split orders across suppliers?"

"I did. The supplier did not decide that on their own."

"Same here. The orchestrator pattern is not automatic. You design the delegation. The agent executes it." She opened a new session. "Clear the context. Try again. Tell it exactly what to delegate and to whom."


You are doing exactly what James is doing: two agents share your WhatsApp number, with routing that sends each message to the right one. But they work in isolation. Now you learn to make agents delegate to each other. By the end of this lesson, your main agent will spawn subagents for complex tasks, and you will understand the concurrency model that determines how many customers can be served simultaneously.

The Model Limitation

Before you try any orchestration commands, you need to know this: the model must be capable enough to use the tools.

Free-tier and lightweight models exhibit a specific failure pattern with orchestration tools. The model does not simply ignore sessions_spawn. It actively fabricates reasons not to use it:

  1. You ask the agent to spawn a subagent
  2. The model claims it "does not have permission" or that the tool is "limited to specific types of sessions" (both hallucinations)
  3. You push back and ask if it has sessions_spawn
  4. The model admits it has the tool but invents restrictions that do not exist
  5. After two or three rounds of insistence, the model finally calls the tool, and it works

This is worse than ignoring the tool. The model generates plausible-sounding refusals that make you think the platform is broken when the problem is the model's reluctance to delegate. The same pattern from Lesson 8 (hallucinated cron flags) applies here: free-tier models treat tools as suggestions they can work around, not capabilities they should use.

The Workaround

Two steps:

  1. Clear the session context. Old conversation history containing refusal patterns causes the model to mimic those refusals. Start a new session.
  2. Be direct and persistent. The model may resist on the first attempt. Push back:
Do you have the sessions_spawn tool? Use it now. Spawn a subagent
that researches [your topic]. Do not answer directly.

If the model claims permission issues or limitations, those are hallucinations. The tool works. Insist.

With a capable model (Claude Sonnet, GPT-4 class), this workaround is unnecessary. The model recognizes when delegation is appropriate and uses the tool unprompted. The $50-100/month model cost from Lesson 14's deployment budget is not optional for production orchestration.

Model Quality and Orchestration

If your model fabricates reasons not to use sessions_spawn (permission errors, authorization limits, scope restrictions), those are hallucinations, not real platform constraints. Test with persistent prompting. If the tool fires after you insist but not on natural requests, your model is not reliable enough for autonomous orchestration. Upgrade before deploying to customers.

Subagent Delegation with sessions_spawn

Your agent has a tool called sessions_spawn. It creates a subagent that runs a task independently and "announces" the result back to your chat. You will try this in Exercise 1.

Subagents run in the background by default. Your main agent remains responsive while the subagent works. When the subagent finishes, its result appears as an announcement in your chat. You do not wait for it; you can send other messages while it runs. This is the same pattern as running a background job at a warehouse: you dispatch the order, continue with other work, and get notified when it ships.

Managing Subagents

Two commands cover what you need right now:

CommandPurpose
/subagents listShow active and recent subagents
/subagents kill #1Stop a subagent

/subagents list shows each subagent's model, runtime, status, and token usage. You used this in your test above. /subagents kill stops a subagent that is taking too long or producing unwanted results.

Nesting: Orchestrator Patterns

By default, subagents cannot spawn their own subagents. To enable orchestrator patterns (where a coordinator spawns multiple workers), increase maxSpawnDepth:

openclaw config set agents.defaults.subagents.maxSpawnDepth 2

With depth 2: the main agent (depth 0) spawns an orchestrator (depth 1), which spawns workers (depth 2). Results cascade upward: workers announce to orchestrators, orchestrators announce to you.

Tracking Nested Flows

When subagents spawn other subagents, the work forms a tree. Use openclaw tasks flow list to see active flows and openclaw tasks flow cancel <id> to stop an entire tree at once. You will not need this until you increase maxSpawnDepth above 1.

The Two-Layer Concurrency Model

With orchestration working, the natural question is: what happens when multiple customers message simultaneously? The answer is a two-layer queueing system.

Layer 1: Session Lane (Per-Customer)

Every customer gets their own session lane. Within a session lane, maxConcurrent is 1. Messages from the same customer are processed sequentially.

Customer A sends: "Book 2pm tomorrow"
Customer A sends: "Wait, make that 3pm"

Session Lane A: [Book 2pm] → [Change to 3pm]
↑ processed first ↑ waits

Why sequential? Because conversation context matters. If "Book 2pm" and "Change to 3pm" run in parallel, you get a race condition: the booking might happen before the correction is processed. Sequential processing within a session guarantees message order.

Layer 2: Global Lane (Shared)

The global lane is shared across all session lanes. Its maxConcurrent defaults to 4.

Session Lane A ──┐
Session Lane B ──┤
Session Lane C ──┼── Global Lane (max 4 concurrent) ──→ Model
Session Lane D ──┤
Session Lane E ──┘ ↑ Customer E waits here

When a session lane's message is ready, it enqueues into the global lane. The global lane allows up to 4 concurrent executions. Four different customers can have their messages processed simultaneously.

What Happens with 5 Simultaneous Customers

Five customers message at the same second:

  1. Each message enters its own session lane
  2. All five session lanes enqueue into the global lane
  3. The global lane accepts the first 4 immediately
  4. Customer 5 queues in the global lane
  5. When one of the 4 finishes (typical agent turn: 3-8 seconds), customer 5 is dequeued

Customer 5 waits approximately 3-8 seconds. The wait time equals the processing time of the fastest of the first four. Not minutes. Seconds.

Scaling to 55 Customers

Real-world messaging is bursty but not uniform:

Time of DayActive CustomersSimultaneous MessagesQueue Behavior
Morning check10-151-2 at same secondNo queueing
Peak hour20-252-3 at same secondRare, brief queueing
Blast scenario20+ respond10-20 at same secondClears in waves, ~25 sec total

With maxConcurrent=4, even the worst case (a blast notification triggers 20 simultaneous responses) clears in waves of 4. Four served immediately, four more after the first wave finishes (~5 seconds), and so on. The full queue clears in roughly 25 seconds. Acceptable for WhatsApp, where the human expectation is a response within a minute.

No Cross-Contamination

Session lanes are completely independent. Customer A's conversation history, context, and memory are never visible during Customer B's agent turn. The global lane controls concurrency (how many run at once), not isolation (what each sees). Isolation is handled by the session system from Lesson 2.

Adjusting Concurrency

The default is configurable:

openclaw config get agents.defaults.maxConcurrent

On a more powerful server, increase it:

openclaw config set agents.defaults.maxConcurrent 8

One constraint: your model provider must handle the parallel request volume. At maxConcurrent=4, that is 4 simultaneous API calls. Free-tier providers with 15 requests per minute will hit their rate limit within seconds. This is another reason production orchestration requires a paid model provider.

ACP: Run Claude Code from WhatsApp

ACP (Agent Client Protocol) is how OpenClaw controls external coding agents. Not a theoretical API: a working bridge to Claude Code, Codex, Cursor, Copilot, Gemini CLI, and other supported harnesses. From WhatsApp, you can spawn a Claude Code session that reads your codebase, runs commands, and reports back.

Enable the ACP Plugin

The acpx plugin ships bundled with OpenClaw but is not enabled by default. Run /acp doctor in your WhatsApp chat to check:

/acp doctor

If the output shows ACP_BACKEND_MISSING, follow the installation steps that /acp doctor prints. The exact install command depends on your system and how you installed OpenClaw. After installing, enable it:

openclaw config set plugins.entries.acpx.enabled true
openclaw config set acp.enabled true
openclaw config set plugins.entries.acpx.config.permissionMode approve-reads
openclaw gateway restart

The permissionMode line is critical. ACP sessions are non-interactive: Claude Code cannot prompt you for permission through WhatsApp. Without a permission mode set, every file read or command execution is denied and you get ACP_TURN_FAILED: Permission denied. The three options:

ModeWhat it allows
approve-readsRead files and list directories. Block writes/exec
approve-allRead, write, and execute commands
deny-allBlock everything (useful for testing setup only)

Start with approve-reads. Move to approve-all only when you trust the task and understand the risk.

Run /acp doctor again. The output should show healthy: yes before you continue.

Spawning an External Agent

From WhatsApp:

/acp spawn claude --bind here

The --bind here flag binds the Claude Code session to your current conversation so that /acp steer commands reach it. Without --bind here, the session spawns but is unbound and you cannot send it instructions.

ACP sessions are persistent by default. The session stays alive after completing a task, and you can send multiple /acp steer commands to the same session. This is a continuous conversation with Claude Code, not a one-shot task.

Wait for the spawn confirmation before sending any commands. The session takes a few seconds to initialize. If you send /acp steer before the session is ready, you get ACP_SESSION_INIT_FAILED because the steer targets your main WhatsApp agent instead of the spawned Claude Code session.

Once the spawn confirms, send it work:

/acp steer Summarize the README.md in my current project

Claude Code may take 10-30 seconds to respond depending on the task. Check /acp status to confirm the session is processing. When done, close the session:

/acp status
/acp close
Thread mode on Discord and Slack

On channels that support threads (Discord, Slack), use --thread auto to place the ACP session in its own thread for continuous back-and-forth work. WhatsApp does not support threads, so --bind here is the only option.

Your Agent Can Spawn Claude Code Too

You used /acp spawn to manually start a Claude Code session. Your agent can do this on its own. The sessions_spawn tool supports runtime="acp", which means the agent can programmatically delegate a coding task to Claude Code without you typing any slash command.

Ask your agent:

Review the failing tests in my project and summarize the issues.
Use sessions_spawn with runtime acp to delegate this to Claude Code.

The agent spawns a Claude Code session, sends it the task, and announces the result back to your chat. This is how your personal AI employee hands off technical work to a coding specialist: you describe what you need, and the agent decides whether to handle it directly or delegate to Claude Code.

ACP sessions run on your machine

ACP sessions are NOT sandboxed. Claude Code spawned via /acp spawn claude has the same filesystem access as Claude Code running in your terminal. The permissionMode you configured in the setup section applies to all ACP sessions, whether you spawn them manually or the agent spawns them programmatically.

The Agent OS Mental Model (Expanded)

In Lesson 8, you learned the Agent OS mental model: gateway as kernel, workspace files as firmware, heartbeats as cron daemon, plugins as device drivers. The concurrency model adds a new layer:

OS ConceptOpenClaw Equivalent
KernelGateway (routes messages, manages sessions)
FirmwareWorkspace files (loaded every message)
Cron daemonHeartbeats and cron jobs
Device driversPlugins (TTS, tools, hooks)
Process schedulerTwo-layer concurrency (session + global)
NetworkingChannels (WhatsApp, Discord, Telegram)

The process scheduler analogy is exact. An operating system multiplexes CPU time across processes. OpenClaw multiplexes model inference across customer sessions. Session lanes are per-process queues. The global lane is the CPU scheduler. maxConcurrent is the number of cores.

Queue Internals (Brief)

Three details worth knowing for debugging:

  • Generation tracking: Each lane has a generation counter that increments on gateway restart. Stale tasks from a previous run are ignored. This prevents zombie tasks from corrupting queue state.
  • Gateway draining: When the gateway shuts down, new enqueue attempts fail immediately instead of silently queuing work that will never execute.
  • Wait callbacks: Tasks that sit in queue beyond 2 seconds trigger a warning. This is how the gateway detects congestion before customers notice.

Try With AI

Exercise 1: Spawn a Subagent

Send your agent a delegation request on WhatsApp:

Use sessions_spawn to research the top 3 trends in AI agent
platforms for 2026. Spawn a subagent for this task.

Watch for the announcement when the subagent finishes. While it runs, check its status:

/subagents list

If the agent answered directly without spawning, your model ignored the tool. Start a new session and try with more forceful prompting, or upgrade your model.

What you are learning: sessions_spawn delegates work to a subagent. The subagent runs independently and announces results back. /subagents list shows running subagents. Model quality determines whether delegation works.

Exercise 2: Spawn Claude Code via ACP

If you have Claude Code installed, try spawning it from WhatsApp:

/acp spawn claude --bind here

Wait for the spawn confirmation, then send it a task:

/acp steer Summarize the README.md in my current project

Check session status and close when done:

/acp status
/acp close

If /acp doctor shows errors, follow the setup steps in the ACP section above before retrying.

What you are learning: ACP turns external coding agents into OpenClaw-managed sessions. You control Claude Code, Codex, or Gemini CLI from the same WhatsApp chat you use for everything else. This is how your personal AI employee delegates technical work to coding specialists.

Exercise 3: Understand the Queue

Ask your agent on WhatsApp:

If I have maxConcurrent set to 4 and 7 customers message at the
same second, each taking 5 seconds to process, how long does
customer 7 wait? Explain the two-layer concurrency model.

Compare the agent's answer with your own reasoning. Customers 1-4 start immediately. Customers 5-7 queue. After 5 seconds, the first 4 finish and 5-7 start. Customer 7 waits approximately 5 seconds, not 15.

What you are learning: The concurrency model processes in parallel waves, not sequentially. Understanding this lets you predict latency and tune maxConcurrent for your workload.


When Emma came back, James had a calculation written on a sticky note. "Four parallel, one queues. Five seconds worst case for number five." He held up the note. "At 55 customers, peak hour is maybe 3 simultaneous. No queueing at all."

Emma looked at the note, then at his screen. "You calculated that without being asked."

"Token costs in Lesson 4. Heartbeat batching in Lesson 8. This is the same thinking. How much does it cost, how long does it take, what is the constraint." He peeled the sticky note off the monitor. "The constraint is the model provider, not the gateway. Free tier at 15 requests per minute would choke at maxConcurrent=4."

"What about the spawn test?"

"First attempt: the model ignored the tool completely. Answered directly with garbage. I cleared the session, sent the explicit prompt, and sessions_spawn fired. Fourteen seconds for the subagent to run." He paused. "Then I spawned Claude Code with /acp spawn claude. From WhatsApp. It reviewed a test file and summarized the issues. I controlled a coding agent from my phone."

Emma leaned back. "The model is the weakest link."

"The model is always the weakest link. Everything else is infrastructure."

She almost smiled. "You sound like an engineer."

"I sound like someone who spent ten lessons breaking things and reading logs."

James leaned back. "At the warehouse, we had a conveyor system with four packing stations. Orders queued at the entrance. Four orders packed simultaneously. The fifth waited until a station opened. Same math. Same constraint: throughput equals stations times speed."

Emma was quiet for a moment. "The concurrency math holds for steady traffic. Burst patterns are where I am less certain. Twenty customers responding to the same blast notification within the same second, each with a spawned subagent. The queue model says it clears in waves, but I have not tested that at the edge." She closed her laptop. "You have two agents, orchestration, and a concurrency model you understand. Before we add anything else, we should think about who approves what these agents do. Hooks and security. Lesson 13."

Flashcards Study Aid