Skip to main content
Updated Feb 26, 2026

Workflow Patterns & Reusable Skills

In Lessons 9 and 10, you deployed and diagnosed production services so that a fresh server can run your agents reliably. Now you face a different problem: you have deployed five agents, and each deployment followed the same pattern -- create a user, set permissions, write a service file, start the service, verify health. Five times you typed those commands. Five scripts sit in different directories.

You open those scripts side by side. Three have bugs. One forgot StartLimitBurst. Another hardcodes a port that conflicts with a service you deployed later. The third skips the health check entirely -- it declares success after systemctl start without verifying the agent actually responds. The other two scripts work, but they differ in small ways: different RestartSec values, different MemoryMax limits, different user names. There is no canonical version. Every deployment is a fork of whatever you remembered at the time.

The fourth time you need to deploy, you stop. You look at the five scripts, the three bugs, the two inconsistencies, and you realize: copying is not scaling. Every copy is a chance to introduce a new variation, a new bug, a new gap in your process. The pattern is solid. The execution is fragile. What you need is one authoritative version -- a deployment pattern that you write once, test once, and reuse every time.

In Chapter 6, the book itself is built on reusable skills -- SKILL.md files that package expertise for permanent reuse. Now you apply the same pattern to operations. Instead of copying deployment scripts, you will build a single, tested deployment workflow, and then package it as a reusable skill that an AI coding agent can execute without your supervision.

This lesson teaches you to recognize when repetition has become a liability, and how to turn that liability into leverage.

The principle

The fourth time you copy a script is the time you should have written it once.


Deployment Patterns: Three Approaches

Not every deployment needs the same strategy. Here are three patterns, each with different trade-offs.

Simple Restart

Stop the old version, deploy the new code, start the service.

# Simple restart deployment
sudo systemctl stop my-agent
# Deploy new code (copy files, update configs)
sudo cp -r /tmp/agent-v2/* /opt/agent/
sudo systemctl start my-agent

Output:

(no output -- stop and start complete silently on success)

Pros: Simple. One service file. No extra infrastructure.

Cons: Downtime between stop and start. If the new version fails, you must manually restore the old code and restart. During the gap, any requests to your agent fail.

Blue-Green Deployment

Run two copies of your agent (blue and green). One is live, the other is idle. Deploy to the idle one, verify it works, then cut traffic over through your router (reverse proxy/load balancer). If the new version fails, switch back instantly.

Pros: Near-zero downtime with a proper routing cutover. Instant rollback. You verify before switching.

Cons: Requires two service files, a routing layer for real cutover, and temporarily uses double the resources during the switch.

Rolling Deployment

If you run multiple instances of the same agent (say three copies behind a load balancer), update them one at a time. Each instance gets the new code while the others continue serving.

Pros: Gradual rollout. If one instance fails, the others still serve traffic.

Cons: Mixed versions run simultaneously during the rollout. Requires multiple instances and a load balancer (more infrastructure than a single-server setup typically has).

When to Use Each Pattern

PatternBest ForDowntimeRollback SpeedResource CostComplexity
Simple restartDevelopment, staging, low-traffic agentsSeconds to minutesManual (slow)1x (minimal)Low
Blue-greenProduction single-server agents (with reverse proxy/load balancer)Near-zeroInstant (switch back)2x during switchMedium
RollingMulti-instance production agentsZeroGradual (per instance)1x + 1 instanceHigh

For most single-server agent deployments, blue-green is the sweet spot. It can eliminate user-visible downtime when your router switches traffic only after health checks pass. The rest of this lesson focuses on implementing that pattern.


Implementing Blue-Green Deployment

Blue-green deployment uses two systemd services -- one "blue" and one "green." At any time, exactly one is live (receiving traffic). The other is idle, waiting for the next deployment.

Step 1: Create Two Service Files

The blue service runs on port 8000, the green on port 8001. A control-state file tracks the active color, and your router handles real traffic cutover.

Create the blue service:

sudo nano /etc/systemd/system/my-agent-blue.service
[Unit]
Description=Agent Blue Instance
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=agent-runner
WorkingDirectory=/opt/agent-blue
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%

[Install]
WantedBy=multi-user.target

Create the green service:

sudo nano /etc/systemd/system/my-agent-green.service
[Unit]
Description=Agent Green Instance
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=agent-runner
WorkingDirectory=/opt/agent-green
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8001
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%

[Install]
WantedBy=multi-user.target

Set up the dedicated service user and directory structure:

sudo id -u agent-runner >/dev/null 2>&1 || sudo useradd -r -s /usr/sbin/nologin -d /opt -M agent-runner
sudo mkdir -p /opt/agent-blue /opt/agent-green
sudo cp /opt/agent/agent_main.py /opt/agent-blue/
sudo cp /opt/agent/agent_main.py /opt/agent-green/
sudo chown -R agent-runner:agent-runner /opt/agent-blue /opt/agent-green

Output:

(no output -- directories and files created silently)

Reload systemd and start the blue instance as the initial live version:

sudo systemctl daemon-reload
sudo systemctl start my-agent-blue
sudo systemctl enable my-agent-blue

Output:

Created symlink /etc/systemd/system/multi-user.target.wants/my-agent-blue.service → /etc/systemd/system/my-agent-blue.service.

Verify it's running:

curl -s http://localhost:8000/health | python3 -m json.tool

Output:

{
"status": "healthy",
"agent": "running",
"timestamp": "2026-02-11T10:30:01.234567"
}

Step 2: Track Which Instance Is Live

Use a simple file to record the active color:

echo "blue" | sudo tee /opt/agent-active-color

Output:

blue

This file is control state only. Scripts read it to know which color should be considered active. It does not reroute external traffic by itself.

Step 3: The Blue-Green Deploy Script

This script automates the full deployment: deploy to the idle instance, verify health, mark control state, perform a routing cutover, and provide rollback.

sudo nano /usr/local/bin/blue-green-deploy.sh
#!/bin/bash
# blue-green-deploy.sh - Health-gated blue-green deployment for agent_main.py
set -euo pipefail

NEW_CODE_DIR="${1:?Usage: blue-green-deploy.sh /path/to/new/code}"
ACTIVE_COLOR_FILE="/opt/agent-active-color"

# Determine current and target colors
CURRENT_COLOR=$(cat "$ACTIVE_COLOR_FILE")
if [ "$CURRENT_COLOR" = "blue" ]; then
TARGET_COLOR="green"
TARGET_PORT=8001
CURRENT_PORT=8000
else
TARGET_COLOR="blue"
TARGET_PORT=8000
CURRENT_PORT=8001
fi

echo "=== Blue-Green Deployment ==="
echo "Current live: $CURRENT_COLOR (port $CURRENT_PORT)"
echo "Deploying to: $TARGET_COLOR (port $TARGET_PORT)"
echo ""

# Step 1: Deploy new code to target
echo "[1/5] Deploying new code to $TARGET_COLOR..."
sudo cp -r "$NEW_CODE_DIR"/* "/opt/agent-${TARGET_COLOR}/"
echo "Done."

# Step 2: Start the target instance
echo "[2/5] Starting my-agent-${TARGET_COLOR}..."
sudo systemctl start "my-agent-${TARGET_COLOR}"
sleep 3
echo "Done."

# Step 3: Health check the target
echo "[3/5] Checking health on port ${TARGET_PORT}..."
HEALTH_RESPONSE=$(curl -sf "http://localhost:${TARGET_PORT}/health" 2>&1) || {
echo "FAILED: Health check did not pass on $TARGET_COLOR."
echo "Rolling back: stopping $TARGET_COLOR."
sudo systemctl stop "my-agent-${TARGET_COLOR}"
exit 1
}
echo "Health response: $HEALTH_RESPONSE"
echo "Health check passed."

# Step 4: Mark active color and perform routing cutover
echo "[4/5] Marking $TARGET_COLOR as active in control state..."
echo "$TARGET_COLOR" | sudo tee "$ACTIVE_COLOR_FILE" > /dev/null
echo "Control state updated: $TARGET_COLOR"
echo "ACTION REQUIRED: update your reverse proxy/load balancer to port ${TARGET_PORT}."
read -r -p "Press Enter only after traffic cutover is confirmed... " _

# Step 5: Stop the old instance
echo "[5/5] Stopping old instance (my-agent-${CURRENT_COLOR})..."
sudo systemctl stop "my-agent-${CURRENT_COLOR}"
echo "Done."

echo ""
echo "=== Deployment Complete ==="
echo "Active: $TARGET_COLOR on port $TARGET_PORT"
echo "Rolled back instance (${CURRENT_COLOR}) is stopped."
echo ""
echo "To rollback, run:"
echo " sudo systemctl start my-agent-${CURRENT_COLOR}"
echo " sudo systemctl stop my-agent-${TARGET_COLOR}"
echo " echo ${CURRENT_COLOR} | sudo tee ${ACTIVE_COLOR_FILE}"

Make it executable:

sudo chmod +x /usr/local/bin/blue-green-deploy.sh

Output:

(no output -- permissions set silently)

Step 4: Run the Deployment

Simulate deploying a new version by updating the code in a staging directory and running the script:

sudo mkdir -p /tmp/agent-v2
sudo cp /opt/agent/agent_main.py /tmp/agent-v2/

Run the blue-green deploy:

sudo blue-green-deploy.sh /tmp/agent-v2

Output:

=== Blue-Green Deployment ===
Current live: blue (port 8000)
Deploying to: green (port 8001)

[1/5] Deploying new code to green...
Done.
[2/5] Starting my-agent-green...
Done.
[3/5] Checking health on port 8001...
Health response: {"status":"healthy","agent":"running","timestamp":"2026-02-11T10:35:12.456789"}
Health check passed.
[4/5] Marking green as active in control state...
Control state updated: green
ACTION REQUIRED: update your reverse proxy/load balancer to port 8001.
[Press Enter after cutover]
[5/5] Stopping old instance (my-agent-blue)...
Done.

=== Deployment Complete ===
Active: green on port 8001
Rolled back instance (blue) is stopped.

To rollback, run:
sudo systemctl start my-agent-blue
sudo systemctl stop my-agent-green
echo blue | sudo tee /opt/agent-active-color

Step 5: Rollback Procedure

If the new version has a bug that the health check didn't catch, rollback is three commands plus one routing change:

sudo systemctl start my-agent-blue
sudo systemctl stop my-agent-green
echo blue | sudo tee /opt/agent-active-color
# Then switch your reverse proxy/load balancer back to port 8000

Output:

blue

Verify the rollback:

curl -s http://localhost:8000/health | python3 -m json.tool

Output:

{
"status": "healthy",
"agent": "running",
"timestamp": "2026-02-11T10:36:45.123456"
}

The old version is back in under 10 seconds. No redeployment needed.


Monitoring Integration

Deploying an agent is half the job. Keeping it healthy afterward requires monitoring: rotating logs before they fill the disk, alerting when disk space runs low, and verifying health on a schedule.

Log Rotation with logrotate

Your agent writes logs. Without rotation, those logs grow until they fill the disk and crash everything.

logrotate is a standard Linux tool that rotates, compresses, and removes old log files automatically.

Create a logrotate configuration for your agent:

sudo nano /etc/logrotate.d/agent-logs
/var/log/agent/*.log {
weekly
rotate 4
compress
delaycompress
missingok
notifempty
create 0640 agent-runner agent-runner
postrotate
systemctl reload my-agent-blue my-agent-green 2>/dev/null || true
endscript
}

Each directive serves a purpose:

DirectiveWhat It Does
weeklyRotate logs once per week
rotate 4Keep 4 rotated files (4 weeks of history)
compressCompress rotated files with gzip
delaycompressWait one rotation before compressing (so the most recent rotated file is still plain text for easy reading)
missingokDon't error if a log file is missing
notifemptySkip rotation if the log file is empty
create 0640 agent-runner agent-runnerCreate new log file with these permissions and ownership
postrotateAfter rotating, signal the service to reopen log files

Create the log directory:

sudo mkdir -p /var/log/agent
sudo chown agent-runner:agent-runner /var/log/agent

Output:

(no output -- directory created and ownership set)

Test the configuration (dry run):

sudo logrotate -d /etc/logrotate.d/agent-logs

Output:

reading config file /etc/logrotate.d/agent-logs

Handling 1 log files in /var/log/agent/*.log
glob pattern expanded to:
/var/log/agent/agent.log

considering log /var/log/agent/agent.log
Now: 2026-02-11 10:40
Last rotated at 2026-02-11 00:00
log does not need rotating (log has been rotated within the last week)

The -d flag runs a dry run -- it shows what logrotate would do without actually doing it. Use this to verify your configuration before trusting it with production logs.

Disk Space Alerts

Log rotation prevents gradual disk fill, but other things consume space too -- temporary files, core dumps, downloaded models. A simple bash script can check disk usage and alert you.

sudo nano /usr/local/bin/check-disk-space.sh
#!/bin/bash
# check-disk-space.sh - Alert when disk usage exceeds threshold
set -euo pipefail

THRESHOLD=80
ALERT_LOG="/var/log/agent/disk-alerts.log"

# Get disk usage percentage for the root filesystem
USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')

if [ "$USAGE" -gt "$THRESHOLD" ]; then
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$TIMESTAMP] WARNING: Disk usage at ${USAGE}% (threshold: ${THRESHOLD}%)" | tee -a "$ALERT_LOG"

# Show top space consumers
echo "Top 5 directories by size:" | tee -a "$ALERT_LOG"
du -sh /var/log/* /opt/* /tmp/* 2>/dev/null | sort -rh | head -5 | tee -a "$ALERT_LOG"
else
echo "Disk usage OK: ${USAGE}%"
fi

Make it executable:

sudo chmod +x /usr/local/bin/check-disk-space.sh

Output:

(no output -- permissions set silently)

Test it:

sudo check-disk-space.sh

Output (when disk usage is normal):

Disk usage OK: 36%

Output (when disk usage exceeds the threshold):

[2026-02-11 10:45:00] WARNING: Disk usage at 85% (threshold: 80%)
Top 5 directories by size:
4.2G /var/log/journal
1.8G /opt/agent-blue
1.8G /opt/agent-green
512M /tmp/model-cache
128M /var/log/agent

Health Check Scheduling with cron

In Lesson 9, you created the canonical health check script at /usr/local/bin/check-agent-health.sh. For blue-green setups, add a thin wrapper so cron checks whichever color is currently active.

Create the active-color-aware wrapper:

sudo nano /usr/local/bin/check-active-agent-health.sh
#!/bin/bash
set -euo pipefail

ACTIVE_COLOR_FILE="/opt/agent-active-color"
ACTIVE_COLOR=$(cat "$ACTIVE_COLOR_FILE")

case "$ACTIVE_COLOR" in
blue)
SERVICE="my-agent-blue"
PORT=8000
;;
green)
SERVICE="my-agent-green"
PORT=8001
;;
*)
echo "[FAIL] Unknown active color: $ACTIVE_COLOR"
exit 1
;;
esac

/usr/local/bin/check-agent-health.sh "$SERVICE" "$PORT"

Make it executable:

sudo chmod +x /usr/local/bin/check-active-agent-health.sh

Add cron entries for both health checks and disk monitoring:

sudo crontab -e

Add these lines:

# Agent health check every 5 minutes
*/5 * * * * /usr/local/bin/check-active-agent-health.sh >> /var/log/agent/health-check.log 2>&1

# Disk space check every hour
0 * * * * /usr/local/bin/check-disk-space.sh >> /var/log/agent/disk-alerts.log 2>&1

Verify the cron entries were saved:

sudo crontab -l

Output:

# Agent health check every 5 minutes
*/5 * * * * /usr/local/bin/check-active-agent-health.sh >> /var/log/agent/health-check.log 2>&1

# Disk space check every hour
0 * * * * /usr/local/bin/check-disk-space.sh >> /var/log/agent/disk-alerts.log 2>&1

Now your agent is monitored around the clock. Health checks run every 5 minutes, disk alerts run every hour, and logs rotate weekly with 4 weeks of compressed history.


The Complete Deployment Workflow

Here is what a full multi-step deployment looks like when you combine everything from this chapter: user creation, service installation, health verification, and monitoring setup -- all in one script.

sudo nano /usr/local/bin/full-deploy.sh
#!/bin/bash
# full-deploy.sh - Complete agent deployment workflow
# Combines: user creation, service install, start, health check, monitoring
set -euo pipefail

AGENT_NAME="${1:?Usage: full-deploy.sh <agent-name>}"
AGENT_DIR="/opt/${AGENT_NAME}"
AGENT_USER="agent-runner"
SERVICE_FILE="/etc/systemd/system/${AGENT_NAME}.service"

echo "=== Full Deployment: ${AGENT_NAME} ==="

# Step 1: Create dedicated user (if not exists)
echo "[1/6] Checking agent user..."
if id "$AGENT_USER" &>/dev/null; then
echo "User $AGENT_USER already exists."
else
sudo useradd -r -s /usr/sbin/nologin -d /opt -M "$AGENT_USER"
echo "Created system user: $AGENT_USER"
fi

# Step 2: Create directory and deploy code
echo "[2/6] Deploying agent code..."
sudo mkdir -p "$AGENT_DIR"
sudo cp /tmp/agent-release/agent_main.py "$AGENT_DIR/"
sudo chown -R "$AGENT_USER":"$AGENT_USER" "$AGENT_DIR"
echo "Code deployed to $AGENT_DIR"

# Step 3: Install systemd service
echo "[3/6] Installing systemd service..."
sudo tee "$SERVICE_FILE" > /dev/null <<EOF
[Unit]
Description=Digital FTE Agent: ${AGENT_NAME}
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=${AGENT_USER}
WorkingDirectory=${AGENT_DIR}
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%

[Install]
WantedBy=multi-user.target
EOF
echo "Service file installed: $SERVICE_FILE"

# Step 4: Start service
echo "[4/6] Starting service..."
sudo systemctl daemon-reload
sudo systemctl enable "$AGENT_NAME"
sudo systemctl start "$AGENT_NAME"
sleep 3
echo "Service started."

# Step 5: Health check
echo "[5/6] Verifying health..."
if curl -sf http://localhost:8000/health > /dev/null 2>&1; then
echo "Health check PASSED."
else
echo "Health check FAILED. Check logs:"
echo " sudo journalctl -u $AGENT_NAME -n 20"
exit 1
fi

# Step 6: Confirm
echo "[6/6] Verifying final state..."
STATUS=$(systemctl is-active "$AGENT_NAME")
echo "Service status: $STATUS"

echo ""
echo "=== Deployment Complete ==="
echo "Service: $AGENT_NAME"
echo "Status: $STATUS"
echo "Health: http://localhost:8000/health"
echo "Logs: sudo journalctl -u $AGENT_NAME -f"

Make it executable:

sudo chmod +x /usr/local/bin/full-deploy.sh

Output:

(no output -- permissions set silently)

Run the full deployment:

sudo mkdir -p /tmp/agent-release
sudo cp /opt/agent/agent_main.py /tmp/agent-release/
sudo full-deploy.sh my-agent

Output:

=== Full Deployment: my-agent ===
[1/6] Checking agent user...
User agent-runner already exists.
[2/6] Deploying agent code...
Code deployed to /opt/my-agent
[3/6] Installing systemd service...
Service file installed: /etc/systemd/system/my-agent.service
[4/6] Starting service...
Service started.
[5/6] Verifying health...
Health check PASSED.
[6/6] Verifying final state...
Service status: active

=== Deployment Complete ===
Service: my-agent
Status: active
Health: http://localhost:8000/health
Logs: sudo journalctl -u my-agent -f

One command. User created, code deployed, service installed, agent started, health verified. This is the difference between a manual checklist and an automated workflow -- the script never forgets a step.


A Note on Docker

Docker packages your application with its dependencies in a container -- an isolated environment that runs the same way regardless of the host system. For single-server deployments like the ones in this chapter, systemd is simpler and has zero overhead: your agent runs directly on the host OS with no abstraction layer. Docker excels when you need environment consistency across development, staging, and production servers, or when you're deploying to orchestrated environments like Kubernetes. You'll learn Docker in a dedicated chapter later in this book. For now, systemd is the right tool for your deployment -- it gives you service management, restart policies, resource limits, and logging with nothing extra to install.


Building Reusable SKILL.md Operations

You have a working deployment workflow. But right now it exists as a script -- useful, but tied to your specific setup. What happens when the next project needs a different port, a different user, a different runtime? You copy the script, tweak it, and now you have two scripts to maintain. Sound familiar?

This is exactly the problem from the opening: copying is not scaling. The solution is to package your deployment expertise as a reusable skill -- a structured file that an AI coding agent can read, understand, and execute for any agent deployment, not just this one.

Recognizing Patterns Worth Formalizing

Before writing a skill, confirm the pattern is worth the effort. Apply three criteria:

CriterionThresholdWhy
FrequencyRecurring 2+ timesOne-off operations are not worth formalizing
ComplexityMore than 3 stepsSimple commands do not need orchestration
ValueSaves time or prevents errorsCreation effort must pay for itself

Apply the framework to your chapter work:

PatternFreqComplexValueVerdict
Create user + set permissions4+4 stepsPrevents security mistakesSkill
Write + enable systemd service3+5 stepsPrevents config errorsSkill
Health check sequence4+3 stepsCatches silent failuresSkill
Run ls100+1 stepTrivialNot a skill
Full deploy pipeline (all above)3+12+ stepsSaves 20+ min per deploySkill

The full deployment pipeline passes all three thresholds.

The SKILL.md Structure

Skills live in a directory with a specific format:

.claude/skills/linux-agent-ops/SKILL.md

Every SKILL.md starts with YAML frontmatter:

---
name: linux-agent-ops
description: |
Expert guidance for deploying AI agents as systemd services on Linux.
Use when creating agent users, writing service files, setting permissions,
or verifying agent health. Covers the full deploy-verify-monitor cycle.
---

Two fields: name and description. The description must be specific enough that an AI agent knows when to invoke it.

Body: Persona + Questions + Principles

After the frontmatter, the body follows a three-part pattern:

Persona defines the expertise level and mindset:

## Persona

You are a Linux operations engineer deploying AI agents to production
servers. Every step must be repeatable, every failure must be recoverable,
and nothing runs as root unless absolutely necessary.

Key Questions capture the decisions that vary between deployments:

## Key Questions

1. **What user should own the agent process?**
Default: Dedicated `agent-runner` user with no login shell.

2. **What port does the agent listen on?**
Default: 8000. Must not conflict with other services.

3. **What restart policy?**
Default: Restart=on-failure with RestartSec=5. Never Restart=always.

4. **What resource limits?**
Default: MemoryMax=512M, CPUQuota=50%.

5. **How to verify health?**
Default: HTTP GET to /health returns 200.

6. **What runtime?**
Options: Python (uvicorn), Node.js (node), compiled binary.

Each question has a default. Defaults make skills fast -- you only override what differs.

Principles encode hard-won lessons as rules:

## Principles

1. Never run agents as root. Create a dedicated user.
2. Always use Restart=on-failure, never Restart=always.
3. Always set MemoryMax and CPUQuota resource limits.
4. Always include StartLimitBurst and StartLimitIntervalSec.
5. Always verify health after deployment via /health endpoint.
6. Script everything you do more than once.

Combine all three sections plus an Implementation section listing deploy steps into a single .claude/skills/linux-agent-ops/SKILL.md file. An AI coding agent reads this file, understands the procedure, asks the right questions, follows the principles, and executes -- without a human walking it through.

The Deploy Script as Skill Implementation

The skill specification tells an AI agent what to do. A script does it. This script combines user creation and permissions (L07), service file writing (L09), and health verification from Lesson 9:

#!/bin/bash
# deploy-agent.sh -- Implements the linux-agent-ops skill
# Usage: sudo ./deploy-agent.sh <agent-name> <port> <exec-start-cmd>
set -euo pipefail

AGENT_NAME="${1:?Usage: deploy-agent.sh <name> <port> <exec-start>}"
AGENT_PORT="${2:?Missing port}"
EXEC_START="${3:?Missing ExecStart command}"
AGENT_USER="agent-runner"
AGENT_DIR="/opt/${AGENT_NAME}"
SERVICE_FILE="/etc/systemd/system/${AGENT_NAME}.service"

echo "=== Deploying ${AGENT_NAME} on port ${AGENT_PORT} ==="

# Step 1: Create dedicated user
if id "${AGENT_USER}" &>/dev/null; then
echo "[OK] User ${AGENT_USER} exists"
else
useradd -r -s /usr/sbin/nologin "${AGENT_USER}"
echo "[OK] Created user ${AGENT_USER}"
fi

# Step 2: Create directory
mkdir -p "${AGENT_DIR}"
chown -R "${AGENT_USER}:${AGENT_USER}" "${AGENT_DIR}"
echo "[OK] Directory ${AGENT_DIR} ready"

# Step 3: Write systemd service file
cat > "${SERVICE_FILE}" <<EOF
[Unit]
Description=AI Agent: ${AGENT_NAME}
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=${AGENT_USER}
WorkingDirectory=${AGENT_DIR}
ExecStart=${EXEC_START}
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=50%

[Install]
WantedBy=multi-user.target
EOF
echo "[OK] Service file written"

# Step 4: Enable and start
systemctl daemon-reload
systemctl enable "${AGENT_NAME}"
systemctl start "${AGENT_NAME}"
echo "[OK] Service enabled and started"

# Step 5: Verify health
sleep 5
if systemctl is-active --quiet "${AGENT_NAME}"; then
echo "[OK] Service is running"
else
echo "[FAIL] Service failed to start"
journalctl -u "${AGENT_NAME}" --no-pager -n 10
exit 1
fi

if curl -sf "http://localhost:${AGENT_PORT}/health" > /dev/null 2>&1; then
echo "[OK] Health endpoint responding"
else
echo "[WARN] Health endpoint not responding"
fi

echo "=== Deployment complete: ${AGENT_NAME} ==="

Output:

=== Deploying my-agent on port 8000 ===
[OK] Created user agent-runner
[OK] Directory /opt/my-agent ready
[OK] Service file written
[OK] Service enabled and started
[OK] Service is running
[OK] Health endpoint responding
=== Deployment complete: my-agent ===

Every principle maps to the script:

PrincipleScript Implementation
Never run as rootUser=${AGENT_USER} in service file
Restart=on-failureRestart=on-failure with RestartSec=5
Resource limitsMemoryMax=512M, CPUQuota=50%
Start-limit protectionStartLimitBurst=5, StartLimitIntervalSec=60
Verify healthcurl to /health after startup
Script everythingThe script replaces 12+ manual commands

Testing Skills on Fresh Systems

Your script works on your server. But your development server has accumulated state: users already created, packages installed, directories existing. Your script may silently depend on that accumulated state.

Common hidden assumptions:

AssumptionWhat BreaksFix
Python is installeduvicorn not foundAdd dependency check
curl is installedHealth check failsCheck command -v curl
Network is availablecurl times outAdd readiness check
Previous service existsStale config loadedScript writes fresh file

Add prerequisite checking to the top of your deploy script:

# Add after set -euo pipefail in deploy-agent.sh
for cmd in curl systemctl useradd; do
command -v "${cmd}" &>/dev/null || { echo "[FAIL] Missing: ${cmd}"; exit 1; }
done
echo "[OK] All prerequisites available"

Output:

[OK] All prerequisites available

This turns a mysterious mid-deployment failure into an immediate, clear error at the start.

Without the skill: SSH in, type 12 commands from memory, hope you remember the right MemoryMax value. Takes 20 minutes. Error-prone. Cannot be delegated.

With the skill: An AI coding agent reads your SKILL.md, asks the right questions, follows the principles, runs the script. Takes 2 minutes. Consistent. Fully delegatable.

That gap -- between manual expertise and delegatable intelligence -- is the core of Digital FTE construction. Every skill you create makes your AI agents more capable. Every principle you encode prevents a class of errors permanently.


Minimum Viable Skill

If you take one thing from this lesson: put your entire deployment sequence into a single deploy-agent.sh script with set -euo pipefail. A script any team member can run on any fresh server is worth more than a 20-step deployment checklist that only you understand.

Exercises

Exercise 1: Document a Blue-Green Deployment Plan

Write a deployment plan for updating your agent from v1 to v2 using the blue-green pattern. Your plan should include:

  1. The name of the currently active service (and its port)
  2. The name of the idle service you will deploy to (and its port)
  3. The health check command you will run before switching
  4. The exact commands to switch traffic
  5. The exact rollback commands if something goes wrong

Write your plan as a text file:

nano ~/blue-green-plan.txt

Verification -- your plan should contain all five sections:

grep -c "active service\|idle service\|health check\|switch traffic\|rollback" ~/blue-green-plan.txt

Expected output:

5

If the count is lower than 5, your plan is missing sections. Review the blue-green deployment walkthrough above and add the missing pieces.

Exercise 2: Write a Skill Specification

Write a SKILL.md for "deploy-agent" with YAML frontmatter (name, description), a Persona section, Key Questions (at least five covering user creation, service config, permissions, monitoring, validation), and Principles (at least four rules).

mkdir -p /tmp/test-skill/deploy-agent
nano /tmp/test-skill/deploy-agent/SKILL.md

Verification -- check that all five deployment dimensions appear:

grep -ci "user" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "service\|systemd" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "permission\|chown\|chmod" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "health\|monitor" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "valid\|verif\|test" /tmp/test-skill/deploy-agent/SKILL.md

Expected output (each line should show at least 1):

3
4
2
3
2

Exercise 3: Write a Full Deployment Script

Write a deployment script that combines user creation, service file installation, service start, and health verification into one automated pass. You can use full-deploy.sh above as a reference, but write it yourself.

After writing and running your script, verify both conditions:

systemctl is-active my-agent

Expected output:

active
curl -s http://localhost:8000/health

Expected output:

{"status":"healthy","agent":"running","timestamp":"..."}

Both checks must pass. If systemctl is-active shows inactive or failed, check journalctl -u my-agent -n 20 for error details. If curl fails, verify the port number matches what's in your service file's ExecStart.


Try With AI

"Ask Claude: 'I need to update my production agent with zero downtime. Compare restart, blue-green, and rolling deployment approaches for a single-server setup. Which do you recommend and why?'"

What you're learning: AI helps you evaluate production trade-offs by considering factors (resource overhead, complexity, rollback speed) that are hard to assess without experience. Notice how AI weighs the trade-offs differently for a single-server constraint versus a multi-server fleet.

"Tell Claude: 'I've deployed agents 5 times following a similar pattern: create user, set permissions, write service file, enable service, verify health. Help me formalize this as a reusable deployment skill with a Persona, Questions, and Principles structure.' Review and refine the structure AI produces."

What you're learning: AI helps transform tacit operational knowledge into explicit, structured, reusable intelligence. Compare the structure AI produces with the Persona + Questions + Principles pattern from this lesson. Notice where AI adds categories you hadn't considered and where your hands-on experience catches assumptions AI makes.

Safety Reminder

Always test deployment scripts on a non-production server first. Blue-green deployments involve stopping services -- a typo in the active color or port number can take down your live agent. Run through the full cycle (deploy, verify, switch, rollback) in a staging environment before trusting the script with production traffic.


The deployment skill you package in this lesson IS the SupportBot deployment. Lesson 12 uses it. One command, one clean server, one running production agent. Every principle you encoded -- non-root users, restart policies, resource limits, health verification -- fires automatically.

Lesson 12 is the moment everything connects. You will write the specification, deploy SupportBot to a production server, validate five layers of production readiness, and package the result as a script anyone on your team can run. The midnight panic from Lesson 1 was the problem. Lesson 12 is the solution you built with your own hands across eleven lessons of Linux mastery.

Flashcards Study Aid