Skip to main content

Axiom I: Shell as Orchestrator

A junior developer inherits a legacy project. The build process is a 400-line bash script called deploy.sh. It downloads dependencies, compiles code, runs tests, builds Docker images, pushes to a registry, updates Kubernetes manifests, and sends Slack notifications. When it breaks — and it breaks often — nobody can debug it because the logic is tangled with the coordination. Variable names collide. Error handling is inconsistent. A failed test still triggers the deployment because someone forgot an exit 1 three months ago.

The senior engineer on the team rewrites the entire process in a weekend. The new version: a 12-line Makefile. Each target calls a proper program. The Python test suite handles testing. Docker handles image building. kubectl handles deployment. A small Go binary handles notifications. The Makefile does nothing except decide what runs, in what order, with what inputs.

The system went from fragile to obvious. Not because the senior wrote better bash. Because she stopped using bash for computation and started using it for what it was designed for: orchestration.


The Problem Without This Axiom

When developers first encounter the shell, they treat it as a programming language. They write loops, parse strings, manipulate data, implement business logic — all inside .sh files. This works for small tasks but collapses at scale.

The symptoms are predictable:

  • Debugging becomes archaeology. A 300-line bash script has no type system, no stack traces, no IDE support. When it fails on line 247, you read from line 1.
  • Testing becomes impossible. You cannot unit test a bash function that depends on global state, environment variables, and the output of twelve prior commands.
  • Collaboration becomes hazardous. Two developers editing the same deployment script inevitably break each other's assumptions about variable scope.
  • AI agents cannot reason about it. An AI reading a 500-line bash script sees an opaque wall of string manipulation. An AI reading a 12-line Makefile sees clear intent: build this, test that, deploy there.

The root cause in every case: computation and coordination are tangled together. The script is simultaneously deciding what to do and how to do it. These are fundamentally different responsibilities.


The Axiom Defined

Axiom I: The shell is the universal coordination layer for all agent work. Programs do computation; the shell orchestrates programs.

This axiom draws a clear architectural boundary:

ResponsibilityBelongs ToExamples
CoordinationShellSequencing, parallelism, piping, error routing, environment setup
ComputationProgramsData transformation, business logic, parsing, validation, algorithms

The shell's job is to answer: What runs? In what order? With what inputs? What happens if it fails?

A program's job is to answer: Given this input, what is the correct output?

When you respect this boundary, every component becomes independently testable, replaceable, and understandable. When you violate it, you get the 400-line deploy.sh.


From Principle to Axiom

In Chapter 4, you learned Principle 1: Bash is the Key — terminal access is the fundamental capability that makes AI agentic rather than passive. That principle answered the question: What enables agency?

This axiom answers a different question: How should the shell be used once you have it?

Principle 1Axiom I
QuestionWhat enables agency?How should the agent use the shell?
AnswerTerminal accessAs an orchestration layer
FocusCapabilityArchitecture
Level"Can I act?""How should I act?"
MetaphorHaving a key to the buildingKnowing which rooms to use for what

The principle gave you access. The axiom gives you discipline. An agent that has terminal access but uses it for 500-line computation scripts is like a conductor who grabs a violin mid-performance — technically capable, architecturally wrong.


Practical Application

Composition Primitives

The shell provides three orchestration primitives that cover the vast majority of coordination needs:

Pipes connect programs into data pipelines. Each program does one thing; the shell routes data between them.

# Orchestration: the shell routes data between four programs
# Each program handles its own computation
cat server.log | grep "ERROR" | sort -t' ' -k2 | uniq -c

Here, cat reads, grep filters, sort orders, uniq counts. The shell wrote zero logic — it only connected programs.

Exit codes communicate success or failure between steps.

# Orchestration: the shell decides what happens based on program results
python run_tests.py && docker build -t myapp . && docker push myapp:latest

The && operator is pure orchestration: "run the next program only if the previous one succeeded." The shell makes no judgment about what "success" means — it trusts the program's exit code.

Redirection routes data to files, devices, or other processes.

# Orchestration: the shell routes output to appropriate destinations
python analyze.py < input.csv > results.json 2> errors.log

The shell connects the program to its inputs and outputs. The program never knows or cares where its data comes from or goes.

Makefiles as Orchestration

When coordination involves multiple steps with dependencies, a Makefile expresses the relationships declaratively:

# This entire file is orchestration. Zero computation.
.PHONY: all test build deploy clean

all: test build deploy

test:
python -m pytest tests/ --tb=short
npm run lint

build: test
docker build -t taskapi:latest .

deploy: build
kubectl apply -f k8s/deployment.yaml
kubectl rollout status deployment/taskapi

clean:
rm -rf dist/ __pycache__/ .pytest_cache/
docker rmi taskapi:latest 2>/dev/null || true

Notice what the Makefile does NOT do:

  • It does not parse test output to decide if tests passed (pytest handles that via exit codes)
  • It does not implement Docker image layer logic (Docker handles that)
  • It does not manage Kubernetes rollout strategy (kubectl handles that)

The Makefile's only job: sequence the programs and respect their exit codes. This is orchestration in its purest form.

The Shell in Agent Workflows

When Claude Code works on your project, observe what it actually does in the terminal:

# Claude Code's typical workflow is pure orchestration:
grep -r "def process_payment" src/ # Find the function (grep does the search)
python -m pytest tests/test_payment.py # Run relevant tests (pytest does the testing)
# [edits the file using its own capabilities]
python -m pytest tests/test_payment.py # Verify the fix (pytest validates)
git diff # Show what changed (git does the diffing)

The AI agent uses the shell exactly as Axiom I prescribes: as a coordination layer that invokes specialized programs. It does not write 50-line bash scripts to parse test output. It does not implement custom search algorithms in awk. It calls programs and interprets their results.

This is why shell access makes AI agents effective — the shell gives them a universal interface to all tools, and the orchestration pattern means they never need to reimplement tool logic.


The Complexity Threshold

Not every piece of shell code is orchestration. The moment your shell script starts doing computation, you have crossed the complexity threshold and should extract that logic into a proper program.

Heuristics for detecting the threshold:

SignalShell (Orchestration)Program (Computation)
Lines of logicUnder 20 linesOver 20 lines of actual logic
Control flowLinear or single conditionalNested loops, complex branching
String manipulationFilenames and pathsParsing, formatting, transformation
Error handlingExit codes and set -eTry/catch, recovery strategies, retries
StateEnvironment variables for configData structures, accumulators, caches
TestingNot needed (trivial coordination)Required (complex logic)

Example: Crossing the threshold

This starts as orchestration but has crossed into computation:

# BAD: This is computation disguised as shell
for file in $(find . -name "*.py"); do
module=$(echo "$file" | sed 's|./||' | sed 's|/|.|g' | sed 's|.py$||')
if python -c "import $module" 2>/dev/null; then
count=$(grep -c "def " "$file")
if [ "$count" -gt 10 ]; then
echo "WARNING: $file has $count functions, consider splitting"
total=$((total + count))
fi
fi
done
echo "Total functions in importable modules: $total"

The fix: extract the computation into a program.

Loading Python environment...

# The SHELL orchestrates — one line, clear intent
python analyze_modules.py src/ || echo "Code complexity review needed"

The program is testable, type-checkable, debuggable with a real debugger, and readable by any Python developer. The shell line is pure orchestration: run this program, handle its exit code.


Anti-Patterns

Anti-PatternWhat It Looks LikeWhy It FailsThe Fix
The Mega-Script500-line bash with loops, parsing, error handlingUntestable, undebuggable, unreasonableExtract computation into programs; shell only orchestrates
Ignoring Exit CodesCommands chained with ; instead of &&Failures cascade silently; deployment proceeds after broken testsUse &&, set -e, or explicit if checks
Reinventing MakeCustom Python/Node build script that shells out to toolsAdds dependency, startup time, maintenance burden for pure sequencingUse Make (or Just, Task) for orchestration that is already sequencing
Shell as Data Processorawk, sed, cut pipelines exceeding 3 stagesBrittle, unreadable, impossible to test edge casesWrite a Python/Go program for complex data transformation
Environment Spaghetti30 export statements before the real commandsCoupling, ordering bugs, invisible stateUse .env files, explicit arguments, or config programs
Ignoring the Universal InterfaceCustom REST client in bash (curl + jq + loops)Error handling is painful, JSON parsing is fragileWrite a small program that calls the API and outputs structured results

Try With AI

Prompt 1: Classify Orchestration vs Computation

I'm learning about shell orchestration. Look at this shell script and classify each section as either ORCHESTRATION (coordination between programs) or COMPUTATION (logic that should be a program):

#!/bin/bash
set -e

# Section A
export DB_URL="postgres://localhost/myapp"
export REDIS_URL="redis://localhost:6379"

# Section B
python -m pytest tests/ && npm run test

# Section C
for f in $(find src/ -name "*.ts"); do
lines=$(wc -l < "$f")
if [ "$lines" -gt 300 ]; then
imports=$(grep -c "^import" "$f")
ratio=$((lines / (imports + 1)))
if [ "$ratio" -gt 50 ]; then
echo "WARN: $f may need splitting ($lines lines, $imports imports)"
fi
fi
done

# Section D
docker build -t myapp . && docker push myapp:latest

For each section, explain your classification and suggest how to refactor any computation into a proper program.

What you're learning: How to see the architectural boundary between coordination and computation in real shell code. You are developing the pattern recognition to identify when shell usage has crossed from orchestration (its strength) into computation (where proper programs belong).

Prompt 2: Design a Makefile Orchestration Layer

I have a project with these manual steps that I currently run by hand:

1. Lint Python code with ruff
2. Run Python tests with pytest
3. Check TypeScript types with tsc --noEmit
4. Run frontend tests with vitest
5. Build the Docker image
6. Run integration tests against the container
7. Push the image to registry if all tests pass
8. Deploy to staging with kubectl

Help me design a Makefile that orchestrates these steps. Requirements:
- Each target should be one or two lines (pure orchestration)
- Dependencies between targets should be explicit
- Failing at any step must stop the pipeline
- I want to be able to run individual targets (just lint, just test)

After showing the Makefile, explain which parts are orchestration and confirm that no target contains computation logic.

What you're learning: How to express workflow coordination declaratively using Make's dependency graph. You are practicing the discipline of keeping each target to pure orchestration — calling programs rather than implementing logic — and making the sequencing explicit through target dependencies.

Prompt 3: Refactor a Threshold Violation

Here is a shell script from my project. It works, but I suspect it has crossed the complexity threshold. Analyze it and help me refactor:

#!/bin/bash
REPORT=""
TOTAL_ISSUES=0

for dir in src/*/; do
module=$(basename "$dir")
py_files=$(find "$dir" -name "*.py" | wc -l)
test_files=$(find "$dir" -name "test_*.py" | wc -l)

coverage=0
if [ "$py_files" -gt 0 ]; then
coverage=$((test_files * 100 / py_files))
fi

if [ "$coverage" -lt 60 ]; then
REPORT="$REPORT\nLOW COVERAGE: $module ($coverage% - $test_files tests for $py_files files)"
TOTAL_ISSUES=$((TOTAL_ISSUES + 1))
fi
done

if [ "$TOTAL_ISSUES" -gt 0 ]; then
echo -e "Coverage Report:\n$REPORT"
echo "Total issues: $TOTAL_ISSUES"
exit 1
fi
echo "All modules have adequate test coverage"

Refactor this into:
1. A proper program (Python) that handles the computation
2. A shell one-liner that orchestrates it

Explain what made the original cross the threshold.

What you're learning: How to apply the complexity threshold heuristics to real code. You are practicing the mechanical skill of extracting computation into a testable, type-safe program while reducing the shell's role to pure coordination — calling the program and acting on its exit code.


Safety Note

Shell orchestration is powerful precisely because it coordinates programs that can modify your system. When building orchestration layers: always use set -e (or && chaining) so failures halt the pipeline rather than cascading silently. Never orchestrate destructive operations (rm -rf, git reset --hard, kubectl delete) without explicit confirmation gates. Test your orchestration on non-production resources first. The shell's power as a universal coordinator means its mistakes are also universal — a mis-orchestrated pipeline can deploy broken code, delete data, or corrupt state across every tool it touches.