Updated Feb 26, 2026

Workflow Patterns & Reusable Skills

In Lessons 9 and 10, you deployed and diagnosed production services so that a fresh server can run your agents reliably. Now you face a different problem: you have deployed five agents, and each deployment followed the same pattern -- create a user, set permissions, write a service file, start the service, verify health. Five times you typed those commands. Five scripts sit in different directories.

You open those scripts side by side. Three have bugs. One forgot StartLimitBurst. Another hardcodes a port that conflicts with a service you deployed later. The third skips the health check entirely -- it declares success after systemctl start without verifying the agent actually responds. The other two scripts work, but they differ in small ways: different RestartSec values, different MemoryMax limits, different user names. There is no canonical version. Every deployment is a fork of whatever you remembered at the time.

The fourth time you need to deploy, you stop. You look at the five scripts, the three bugs, the two inconsistencies, and you realize: copying is not scaling. Every copy is a chance to introduce a new variation, a new bug, a new gap in your process. The pattern is solid. The execution is fragile. What you need is one authoritative version -- a deployment pattern that you write once, test once, and reuse every time.

In Chapter 6, the book itself is built on reusable skills -- SKILL.md files that package expertise for permanent reuse. Now you apply the same pattern to operations. Instead of copying deployment scripts, you will build a single, tested deployment workflow, and then package it as a reusable skill that an AI coding agent can execute without your supervision.

This lesson teaches you to recognize when repetition has become a liability, and how to turn that liability into leverage.

The principle

The fourth time you copy a script is the time you should have written it once.

Deployment Patterns: Three Approaches

Not every deployment needs the same strategy. Here are three patterns, each with different trade-offs.

Simple Restart

Stop the old version, deploy the new code, start the service.

# Simple restart deployment
sudo systemctl stop my-agent
# Deploy new code (copy files, update configs)
sudo cp -r /tmp/agent-v2/* /opt/agent/
sudo systemctl start my-agent

Output:

(no output -- stop and start complete silently on success)

Pros: Simple. One service file. No extra infrastructure.

Cons: Downtime between stop and start. If the new version fails, you must manually restore the old code and restart. During the gap, any requests to your agent fail.

Blue-Green Deployment

Run two copies of your agent (blue and green). One is live, the other is idle. Deploy to the idle one, verify it works, then cut traffic over through your router (reverse proxy/load balancer). If the new version fails, switch back instantly.

Pros: Near-zero downtime with a proper routing cutover. Instant rollback. You verify before switching.

Cons: Requires two service files, a routing layer for real cutover, and temporarily uses double the resources during the switch.

Rolling Deployment

If you run multiple instances of the same agent (say three copies behind a load balancer), update them one at a time. Each instance gets the new code while the others continue serving.

Pros: Gradual rollout. If one instance fails, the others still serve traffic.

Cons: Mixed versions run simultaneously during the rollout. Requires multiple instances and a load balancer (more infrastructure than a single-server setup typically has).

When to Use Each Pattern

Pattern	Best For	Downtime	Rollback Speed	Resource Cost	Complexity
Simple restart	Development, staging, low-traffic agents	Seconds to minutes	Manual (slow)	1x (minimal)	Low
Blue-green	Production single-server agents (with reverse proxy/load balancer)	Near-zero	Instant (switch back)	2x during switch	Medium
Rolling	Multi-instance production agents	Zero	Gradual (per instance)	1x + 1 instance	High

For most single-server agent deployments, blue-green is the sweet spot. It can eliminate user-visible downtime when your router switches traffic only after health checks pass. The rest of this lesson focuses on implementing that pattern.

Implementing Blue-Green Deployment

Blue-green deployment uses two systemd services -- one "blue" and one "green." At any time, exactly one is live (receiving traffic). The other is idle, waiting for the next deployment.

Step 1: Create Two Service Files

The blue service runs on port 8000, the green on port 8001. A control-state file tracks the active color, and your router handles real traffic cutover.

Create the blue service:

sudo nano /etc/systemd/system/my-agent-blue.service

[Unit]
Description=Agent Blue Instance
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=agent-runner
WorkingDirectory=/opt/agent-blue
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%

[Install]
WantedBy=multi-user.target

Create the green service:

sudo nano /etc/systemd/system/my-agent-green.service

[Unit]
Description=Agent Green Instance
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=agent-runner
WorkingDirectory=/opt/agent-green
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8001
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%

[Install]
WantedBy=multi-user.target

Set up the dedicated service user and directory structure:

sudo id -u agent-runner >/dev/null 2>&1 || sudo useradd -r -s /usr/sbin/nologin -d /opt -M agent-runner
sudo mkdir -p /opt/agent-blue /opt/agent-green
sudo cp /opt/agent/agent_main.py /opt/agent-blue/
sudo cp /opt/agent/agent_main.py /opt/agent-green/
sudo chown -R agent-runner:agent-runner /opt/agent-blue /opt/agent-green

Output:

(no output -- directories and files created silently)

Reload systemd and start the blue instance as the initial live version:

sudo systemctl daemon-reload
sudo systemctl start my-agent-blue
sudo systemctl enable my-agent-blue

Output:

Created symlink /etc/systemd/system/multi-user.target.wants/my-agent-blue.service → /etc/systemd/system/my-agent-blue.service.

Verify it's running:

curl -s http://localhost:8000/health | python3 -m json.tool

Output:

{
    "status": "healthy",
    "agent": "running",
    "timestamp": "2026-02-11T10:30:01.234567"
}

Step 2: Track Which Instance Is Live

Use a simple file to record the active color:

echo "blue" | sudo tee /opt/agent-active-color

Output:

blue

This file is control state only. Scripts read it to know which color should be considered active. It does not reroute external traffic by itself.

Step 3: The Blue-Green Deploy Script

This script automates the full deployment: deploy to the idle instance, verify health, mark control state, perform a routing cutover, and provide rollback.

sudo nano /usr/local/bin/blue-green-deploy.sh

#!/bin/bash
# blue-green-deploy.sh - Health-gated blue-green deployment for agent_main.py
set -euo pipefail

NEW_CODE_DIR="${1:?Usage: blue-green-deploy.sh /path/to/new/code}"
ACTIVE_COLOR_FILE="/opt/agent-active-color"

# Determine current and target colors
CURRENT_COLOR=$(cat "$ACTIVE_COLOR_FILE")
if [ "$CURRENT_COLOR" = "blue" ]; then
    TARGET_COLOR="green"
    TARGET_PORT=8001
    CURRENT_PORT=8000
else
    TARGET_COLOR="blue"
    TARGET_PORT=8000
    CURRENT_PORT=8001
fi

echo "=== Blue-Green Deployment ==="
echo "Current live: $CURRENT_COLOR (port $CURRENT_PORT)"
echo "Deploying to: $TARGET_COLOR (port $TARGET_PORT)"
echo ""

# Step 1: Deploy new code to target
echo "[1/5] Deploying new code to $TARGET_COLOR..."
sudo cp -r "$NEW_CODE_DIR"/* "/opt/agent-${TARGET_COLOR}/"
echo "Done."

# Step 2: Start the target instance
echo "[2/5] Starting my-agent-${TARGET_COLOR}..."
sudo systemctl start "my-agent-${TARGET_COLOR}"
sleep 3
echo "Done."

# Step 3: Health check the target
echo "[3/5] Checking health on port ${TARGET_PORT}..."
HEALTH_RESPONSE=$(curl -sf "http://localhost:${TARGET_PORT}/health" 2>&1) || {
    echo "FAILED: Health check did not pass on $TARGET_COLOR."
    echo "Rolling back: stopping $TARGET_COLOR."
    sudo systemctl stop "my-agent-${TARGET_COLOR}"
    exit 1
}
echo "Health response: $HEALTH_RESPONSE"
echo "Health check passed."

# Step 4: Mark active color and perform routing cutover
echo "[4/5] Marking $TARGET_COLOR as active in control state..."
echo "$TARGET_COLOR" | sudo tee "$ACTIVE_COLOR_FILE" > /dev/null
echo "Control state updated: $TARGET_COLOR"
echo "ACTION REQUIRED: update your reverse proxy/load balancer to port ${TARGET_PORT}."
read -r -p "Press Enter only after traffic cutover is confirmed... " _

# Step 5: Stop the old instance
echo "[5/5] Stopping old instance (my-agent-${CURRENT_COLOR})..."
sudo systemctl stop "my-agent-${CURRENT_COLOR}"
echo "Done."

echo ""
echo "=== Deployment Complete ==="
echo "Active: $TARGET_COLOR on port $TARGET_PORT"
echo "Rolled back instance (${CURRENT_COLOR}) is stopped."
echo ""
echo "To rollback, run:"
echo "  sudo systemctl start my-agent-${CURRENT_COLOR}"
echo "  sudo systemctl stop my-agent-${TARGET_COLOR}"
echo "  echo ${CURRENT_COLOR} | sudo tee ${ACTIVE_COLOR_FILE}"

Make it executable:

sudo chmod +x /usr/local/bin/blue-green-deploy.sh

Output:

(no output -- permissions set silently)

Step 4: Run the Deployment

Simulate deploying a new version by updating the code in a staging directory and running the script:

sudo mkdir -p /tmp/agent-v2
sudo cp /opt/agent/agent_main.py /tmp/agent-v2/

Run the blue-green deploy:

sudo blue-green-deploy.sh /tmp/agent-v2

Output:

=== Blue-Green Deployment ===
Current live: blue (port 8000)
Deploying to: green (port 8001)

[1/5] Deploying new code to green...
Done.
[2/5] Starting my-agent-green...
Done.
[3/5] Checking health on port 8001...
Health response: {"status":"healthy","agent":"running","timestamp":"2026-02-11T10:35:12.456789"}
Health check passed.
[4/5] Marking green as active in control state...
Control state updated: green
ACTION REQUIRED: update your reverse proxy/load balancer to port 8001.
[Press Enter after cutover]
[5/5] Stopping old instance (my-agent-blue)...
Done.

=== Deployment Complete ===
Active: green on port 8001
Rolled back instance (blue) is stopped.

To rollback, run:
  sudo systemctl start my-agent-blue
  sudo systemctl stop my-agent-green
  echo blue | sudo tee /opt/agent-active-color

Step 5: Rollback Procedure

If the new version has a bug that the health check didn't catch, rollback is three commands plus one routing change:

sudo systemctl start my-agent-blue
sudo systemctl stop my-agent-green
echo blue | sudo tee /opt/agent-active-color
# Then switch your reverse proxy/load balancer back to port 8000

Output:

blue

Verify the rollback:

curl -s http://localhost:8000/health | python3 -m json.tool

Output:

{
    "status": "healthy",
    "agent": "running",
    "timestamp": "2026-02-11T10:36:45.123456"
}

The old version is back in under 10 seconds. No redeployment needed.

Monitoring Integration

Deploying an agent is half the job. Keeping it healthy afterward requires monitoring: rotating logs before they fill the disk, alerting when disk space runs low, and verifying health on a schedule.

Log Rotation with logrotate

Your agent writes logs. Without rotation, those logs grow until they fill the disk and crash everything.

logrotate is a standard Linux tool that rotates, compresses, and removes old log files automatically.

Create a logrotate configuration for your agent:

sudo nano /etc/logrotate.d/agent-logs

/var/log/agent/*.log {
    weekly
    rotate 4
    compress
    delaycompress
    missingok
    notifempty
    create 0640 agent-runner agent-runner
    postrotate
        systemctl reload my-agent-blue my-agent-green 2>/dev/null || true
    endscript
}

Each directive serves a purpose:

Directive	What It Does
`weekly`	Rotate logs once per week
`rotate 4`	Keep 4 rotated files (4 weeks of history)
`compress`	Compress rotated files with gzip
`delaycompress`	Wait one rotation before compressing (so the most recent rotated file is still plain text for easy reading)
`missingok`	Don't error if a log file is missing
`notifempty`	Skip rotation if the log file is empty
`create 0640 agent-runner agent-runner`	Create new log file with these permissions and ownership
`postrotate`	After rotating, signal the service to reopen log files

Create the log directory:

sudo mkdir -p /var/log/agent
sudo chown agent-runner:agent-runner /var/log/agent

Output:

(no output -- directory created and ownership set)

Test the configuration (dry run):

sudo logrotate -d /etc/logrotate.d/agent-logs

Output:

reading config file /etc/logrotate.d/agent-logs

Handling 1 log files in /var/log/agent/*.log
  glob pattern expanded to:
    /var/log/agent/agent.log

considering log /var/log/agent/agent.log
  Now: 2026-02-11 10:40
  Last rotated at 2026-02-11 00:00
  log does not need rotating (log has been rotated within the last week)

The -d flag runs a dry run -- it shows what logrotate would do without actually doing it. Use this to verify your configuration before trusting it with production logs.

Disk Space Alerts

Log rotation prevents gradual disk fill, but other things consume space too -- temporary files, core dumps, downloaded models. A simple bash script can check disk usage and alert you.

sudo nano /usr/local/bin/check-disk-space.sh

#!/bin/bash
# check-disk-space.sh - Alert when disk usage exceeds threshold
set -euo pipefail

THRESHOLD=80
ALERT_LOG="/var/log/agent/disk-alerts.log"

# Get disk usage percentage for the root filesystem
USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')

if [ "$USAGE" -gt "$THRESHOLD" ]; then
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    echo "[$TIMESTAMP] WARNING: Disk usage at ${USAGE}% (threshold: ${THRESHOLD}%)" | tee -a "$ALERT_LOG"

    # Show top space consumers
    echo "Top 5 directories by size:" | tee -a "$ALERT_LOG"
    du -sh /var/log/* /opt/* /tmp/* 2>/dev/null | sort -rh | head -5 | tee -a "$ALERT_LOG"
else
    echo "Disk usage OK: ${USAGE}%"
fi

Make it executable:

sudo chmod +x /usr/local/bin/check-disk-space.sh

Output:

(no output -- permissions set silently)

Test it:

sudo check-disk-space.sh

Output (when disk usage is normal):

Disk usage OK: 36%

Output (when disk usage exceeds the threshold):

[2026-02-11 10:45:00] WARNING: Disk usage at 85% (threshold: 80%)
Top 5 directories by size:
4.2G    /var/log/journal
1.8G    /opt/agent-blue
1.8G    /opt/agent-green
512M    /tmp/model-cache
128M    /var/log/agent

Health Check Scheduling with cron

In Lesson 9, you created the canonical health check script at /usr/local/bin/check-agent-health.sh. For blue-green setups, add a thin wrapper so cron checks whichever color is currently active.

Create the active-color-aware wrapper:

sudo nano /usr/local/bin/check-active-agent-health.sh

#!/bin/bash
set -euo pipefail

ACTIVE_COLOR_FILE="/opt/agent-active-color"
ACTIVE_COLOR=$(cat "$ACTIVE_COLOR_FILE")

case "$ACTIVE_COLOR" in
  blue)
    SERVICE="my-agent-blue"
    PORT=8000
    ;;
  green)
    SERVICE="my-agent-green"
    PORT=8001
    ;;
  *)
    echo "[FAIL] Unknown active color: $ACTIVE_COLOR"
    exit 1
    ;;
esac

/usr/local/bin/check-agent-health.sh "$SERVICE" "$PORT"

Make it executable:

sudo chmod +x /usr/local/bin/check-active-agent-health.sh

Add cron entries for both health checks and disk monitoring:

sudo crontab -e

Add these lines:

# Agent health check every 5 minutes
*/5 * * * * /usr/local/bin/check-active-agent-health.sh >> /var/log/agent/health-check.log 2>&1

# Disk space check every hour
0 * * * * /usr/local/bin/check-disk-space.sh >> /var/log/agent/disk-alerts.log 2>&1

Verify the cron entries were saved:

sudo crontab -l

Output:

# Agent health check every 5 minutes
*/5 * * * * /usr/local/bin/check-active-agent-health.sh >> /var/log/agent/health-check.log 2>&1

# Disk space check every hour
0 * * * * /usr/local/bin/check-disk-space.sh >> /var/log/agent/disk-alerts.log 2>&1

Now your agent is monitored around the clock. Health checks run every 5 minutes, disk alerts run every hour, and logs rotate weekly with 4 weeks of compressed history.

The Complete Deployment Workflow

Here is what a full multi-step deployment looks like when you combine everything from this chapter: user creation, service installation, health verification, and monitoring setup -- all in one script.

sudo nano /usr/local/bin/full-deploy.sh

#!/bin/bash
# full-deploy.sh - Complete agent deployment workflow
# Combines: user creation, service install, start, health check, monitoring
set -euo pipefail

AGENT_NAME="${1:?Usage: full-deploy.sh <agent-name>}"
AGENT_DIR="/opt/${AGENT_NAME}"
AGENT_USER="agent-runner"
SERVICE_FILE="/etc/systemd/system/${AGENT_NAME}.service"

echo "=== Full Deployment: ${AGENT_NAME} ==="

# Step 1: Create dedicated user (if not exists)
echo "[1/6] Checking agent user..."
if id "$AGENT_USER" &>/dev/null; then
    echo "User $AGENT_USER already exists."
else
    sudo useradd -r -s /usr/sbin/nologin -d /opt -M "$AGENT_USER"
    echo "Created system user: $AGENT_USER"
fi

# Step 2: Create directory and deploy code
echo "[2/6] Deploying agent code..."
sudo mkdir -p "$AGENT_DIR"
sudo cp /tmp/agent-release/agent_main.py "$AGENT_DIR/"
sudo chown -R "$AGENT_USER":"$AGENT_USER" "$AGENT_DIR"
echo "Code deployed to $AGENT_DIR"

# Step 3: Install systemd service
echo "[3/6] Installing systemd service..."
sudo tee "$SERVICE_FILE" > /dev/null <<EOF
[Unit]
Description=Digital FTE Agent: ${AGENT_NAME}
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=${AGENT_USER}
WorkingDirectory=${AGENT_DIR}
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%

[Install]
WantedBy=multi-user.target
EOF
echo "Service file installed: $SERVICE_FILE"

# Step 4: Start service
echo "[4/6] Starting service..."
sudo systemctl daemon-reload
sudo systemctl enable "$AGENT_NAME"
sudo systemctl start "$AGENT_NAME"
sleep 3
echo "Service started."

# Step 5: Health check
echo "[5/6] Verifying health..."
if curl -sf http://localhost:8000/health > /dev/null 2>&1; then
    echo "Health check PASSED."
else
    echo "Health check FAILED. Check logs:"
    echo "  sudo journalctl -u $AGENT_NAME -n 20"
    exit 1
fi

# Step 6: Confirm
echo "[6/6] Verifying final state..."
STATUS=$(systemctl is-active "$AGENT_NAME")
echo "Service status: $STATUS"

echo ""
echo "=== Deployment Complete ==="
echo "Service: $AGENT_NAME"
echo "Status:  $STATUS"
echo "Health:  http://localhost:8000/health"
echo "Logs:    sudo journalctl -u $AGENT_NAME -f"

Make it executable:

sudo chmod +x /usr/local/bin/full-deploy.sh

Output:

(no output -- permissions set silently)

Run the full deployment:

sudo mkdir -p /tmp/agent-release
sudo cp /opt/agent/agent_main.py /tmp/agent-release/
sudo full-deploy.sh my-agent

Output:

=== Full Deployment: my-agent ===
[1/6] Checking agent user...
User agent-runner already exists.
[2/6] Deploying agent code...
Code deployed to /opt/my-agent
[3/6] Installing systemd service...
Service file installed: /etc/systemd/system/my-agent.service
[4/6] Starting service...
Service started.
[5/6] Verifying health...
Health check PASSED.
[6/6] Verifying final state...
Service status: active

=== Deployment Complete ===
Service: my-agent
Status:  active
Health:  http://localhost:8000/health
Logs:    sudo journalctl -u my-agent -f

One command. User created, code deployed, service installed, agent started, health verified. This is the difference between a manual checklist and an automated workflow -- the script never forgets a step.

A Note on Docker

Docker packages your application with its dependencies in a container -- an isolated environment that runs the same way regardless of the host system. For single-server deployments like the ones in this chapter, systemd is simpler and has zero overhead: your agent runs directly on the host OS with no abstraction layer. Docker excels when you need environment consistency across development, staging, and production servers, or when you're deploying to orchestrated environments like Kubernetes. You'll learn Docker in a dedicated chapter later in this book. For now, systemd is the right tool for your deployment -- it gives you service management, restart policies, resource limits, and logging with nothing extra to install.

Building Reusable SKILL.md Operations

You have a working deployment workflow. But right now it exists as a script -- useful, but tied to your specific setup. What happens when the next project needs a different port, a different user, a different runtime? You copy the script, tweak it, and now you have two scripts to maintain. Sound familiar?

This is exactly the problem from the opening: copying is not scaling. The solution is to package your deployment expertise as a reusable skill -- a structured file that an AI coding agent can read, understand, and execute for any agent deployment, not just this one.

Recognizing Patterns Worth Formalizing

Before writing a skill, confirm the pattern is worth the effort. Apply three criteria:

Criterion	Threshold	Why
Frequency	Recurring 2+ times	One-off operations are not worth formalizing
Complexity	More than 3 steps	Simple commands do not need orchestration
Value	Saves time or prevents errors	Creation effort must pay for itself

Apply the framework to your chapter work:

Pattern	Freq	Complex	Value	Verdict
Create user + set permissions	4+	4 steps	Prevents security mistakes	Skill
Write + enable systemd service	3+	5 steps	Prevents config errors	Skill
Health check sequence	4+	3 steps	Catches silent failures	Skill
Run `ls`	100+	1 step	Trivial	Not a skill
Full deploy pipeline (all above)	3+	12+ steps	Saves 20+ min per deploy	Skill

The full deployment pipeline passes all three thresholds.

The SKILL.md Structure

Skills live in a directory with a specific format:

.claude/skills/linux-agent-ops/SKILL.md

Every SKILL.md starts with YAML frontmatter:

---
name: linux-agent-ops
description: |
  Expert guidance for deploying AI agents as systemd services on Linux.
  Use when creating agent users, writing service files, setting permissions,
  or verifying agent health. Covers the full deploy-verify-monitor cycle.
---

Two fields: name and description. The description must be specific enough that an AI agent knows when to invoke it.

Body: Persona + Questions + Principles

After the frontmatter, the body follows a three-part pattern:

Persona defines the expertise level and mindset:

## Persona

You are a Linux operations engineer deploying AI agents to production
servers. Every step must be repeatable, every failure must be recoverable,
and nothing runs as root unless absolutely necessary.

Key Questions capture the decisions that vary between deployments:

## Key Questions

1. **What user should own the agent process?**
   Default: Dedicated `agent-runner` user with no login shell.

2. **What port does the agent listen on?**
   Default: 8000. Must not conflict with other services.

3. **What restart policy?**
   Default: Restart=on-failure with RestartSec=5. Never Restart=always.

4. **What resource limits?**
   Default: MemoryMax=512M, CPUQuota=50%.

5. **How to verify health?**
   Default: HTTP GET to /health returns 200.

6. **What runtime?**
   Options: Python (uvicorn), Node.js (node), compiled binary.

Each question has a default. Defaults make skills fast -- you only override what differs.

Principles encode hard-won lessons as rules:

## Principles

Never run agents as root. Create a dedicated user.
Always use Restart=on-failure, never Restart=always.
Always set MemoryMax and CPUQuota resource limits.
Always include StartLimitBurst and StartLimitIntervalSec.
Always verify health after deployment via /health endpoint.
Script everything you do more than once.

Combine all three sections plus an Implementation section listing deploy steps into a single .claude/skills/linux-agent-ops/SKILL.md file. An AI coding agent reads this file, understands the procedure, asks the right questions, follows the principles, and executes -- without a human walking it through.

The Deploy Script as Skill Implementation

The skill specification tells an AI agent what to do. A script does it. This script combines user creation and permissions (L07), service file writing (L09), and health verification from Lesson 9:

#!/bin/bash
# deploy-agent.sh -- Implements the linux-agent-ops skill
# Usage: sudo ./deploy-agent.sh <agent-name> <port> <exec-start-cmd>
set -euo pipefail

AGENT_NAME="${1:?Usage: deploy-agent.sh <name> <port> <exec-start>}"
AGENT_PORT="${2:?Missing port}"
EXEC_START="${3:?Missing ExecStart command}"
AGENT_USER="agent-runner"
AGENT_DIR="/opt/${AGENT_NAME}"
SERVICE_FILE="/etc/systemd/system/${AGENT_NAME}.service"

echo "=== Deploying ${AGENT_NAME} on port ${AGENT_PORT} ==="

# Step 1: Create dedicated user
if id "${AGENT_USER}" &>/dev/null; then
    echo "[OK] User ${AGENT_USER} exists"
else
    useradd -r -s /usr/sbin/nologin "${AGENT_USER}"
    echo "[OK] Created user ${AGENT_USER}"
fi

# Step 2: Create directory
mkdir -p "${AGENT_DIR}"
chown -R "${AGENT_USER}:${AGENT_USER}" "${AGENT_DIR}"
echo "[OK] Directory ${AGENT_DIR} ready"

# Step 3: Write systemd service file
cat > "${SERVICE_FILE}" <<EOF
[Unit]
Description=AI Agent: ${AGENT_NAME}
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=simple
User=${AGENT_USER}
WorkingDirectory=${AGENT_DIR}
ExecStart=${EXEC_START}
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=50%

[Install]
WantedBy=multi-user.target
EOF
echo "[OK] Service file written"

# Step 4: Enable and start
systemctl daemon-reload
systemctl enable "${AGENT_NAME}"
systemctl start "${AGENT_NAME}"
echo "[OK] Service enabled and started"

# Step 5: Verify health
sleep 5
if systemctl is-active --quiet "${AGENT_NAME}"; then
    echo "[OK] Service is running"
else
    echo "[FAIL] Service failed to start"
    journalctl -u "${AGENT_NAME}" --no-pager -n 10
    exit 1
fi

if curl -sf "http://localhost:${AGENT_PORT}/health" > /dev/null 2>&1; then
    echo "[OK] Health endpoint responding"
else
    echo "[WARN] Health endpoint not responding"
fi

echo "=== Deployment complete: ${AGENT_NAME} ==="

Output:

=== Deploying my-agent on port 8000 ===
[OK] Created user agent-runner
[OK] Directory /opt/my-agent ready
[OK] Service file written
[OK] Service enabled and started
[OK] Service is running
[OK] Health endpoint responding
=== Deployment complete: my-agent ===

Every principle maps to the script:

Principle	Script Implementation
Never run as root	`User=${AGENT_USER}` in service file
Restart=on-failure	`Restart=on-failure` with `RestartSec=5`
Resource limits	`MemoryMax=512M`, `CPUQuota=50%`
Start-limit protection	`StartLimitBurst=5`, `StartLimitIntervalSec=60`
Verify health	`curl` to `/health` after startup
Script everything	The script replaces 12+ manual commands

Testing Skills on Fresh Systems

Your script works on your server. But your development server has accumulated state: users already created, packages installed, directories existing. Your script may silently depend on that accumulated state.

Common hidden assumptions:

Assumption	What Breaks	Fix
Python is installed	`uvicorn` not found	Add dependency check
`curl` is installed	Health check fails	Check `command -v curl`
Network is available	`curl` times out	Add readiness check
Previous service exists	Stale config loaded	Script writes fresh file

Add prerequisite checking to the top of your deploy script:

# Add after set -euo pipefail in deploy-agent.sh
for cmd in curl systemctl useradd; do
    command -v "${cmd}" &>/dev/null || { echo "[FAIL] Missing: ${cmd}"; exit 1; }
done
echo "[OK] All prerequisites available"

Output:

[OK] All prerequisites available

This turns a mysterious mid-deployment failure into an immediate, clear error at the start.

Without the skill: SSH in, type 12 commands from memory, hope you remember the right MemoryMax value. Takes 20 minutes. Error-prone. Cannot be delegated.

With the skill: An AI coding agent reads your SKILL.md, asks the right questions, follows the principles, runs the script. Takes 2 minutes. Consistent. Fully delegatable.

That gap -- between manual expertise and delegatable intelligence -- is the core of Digital FTE construction. Every skill you create makes your AI agents more capable. Every principle you encode prevents a class of errors permanently.

Minimum Viable Skill

If you take one thing from this lesson: put your entire deployment sequence into a single deploy-agent.sh script with set -euo pipefail. A script any team member can run on any fresh server is worth more than a 20-step deployment checklist that only you understand.

Exercises

Exercise 1: Document a Blue-Green Deployment Plan

Write a deployment plan for updating your agent from v1 to v2 using the blue-green pattern. Your plan should include:

The name of the currently active service (and its port)
The name of the idle service you will deploy to (and its port)
The health check command you will run before switching
The exact commands to switch traffic
The exact rollback commands if something goes wrong

Write your plan as a text file:

nano ~/blue-green-plan.txt

Verification -- your plan should contain all five sections:

grep -c "active service\|idle service\|health check\|switch traffic\|rollback" ~/blue-green-plan.txt

Expected output:

If the count is lower than 5, your plan is missing sections. Review the blue-green deployment walkthrough above and add the missing pieces.

Exercise 2: Write a Skill Specification

Write a SKILL.md for "deploy-agent" with YAML frontmatter (name, description), a Persona section, Key Questions (at least five covering user creation, service config, permissions, monitoring, validation), and Principles (at least four rules).

mkdir -p /tmp/test-skill/deploy-agent
nano /tmp/test-skill/deploy-agent/SKILL.md

Verification -- check that all five deployment dimensions appear:

grep -ci "user" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "service\|systemd" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "permission\|chown\|chmod" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "health\|monitor" /tmp/test-skill/deploy-agent/SKILL.md
grep -ci "valid\|verif\|test" /tmp/test-skill/deploy-agent/SKILL.md

Expected output (each line should show at least 1):

Exercise 3: Write a Full Deployment Script

Write a deployment script that combines user creation, service file installation, service start, and health verification into one automated pass. You can use full-deploy.sh above as a reference, but write it yourself.

After writing and running your script, verify both conditions:

systemctl is-active my-agent

Expected output:

active

curl -s http://localhost:8000/health

Expected output:

{"status":"healthy","agent":"running","timestamp":"..."}

Both checks must pass. If systemctl is-active shows inactive or failed, check journalctl -u my-agent -n 20 for error details. If curl fails, verify the port number matches what's in your service file's ExecStart.

Try With AI

"Ask Claude: 'I need to update my production agent with zero downtime. Compare restart, blue-green, and rolling deployment approaches for a single-server setup. Which do you recommend and why?'"

What you're learning: AI helps you evaluate production trade-offs by considering factors (resource overhead, complexity, rollback speed) that are hard to assess without experience. Notice how AI weighs the trade-offs differently for a single-server constraint versus a multi-server fleet.

"Tell Claude: 'I've deployed agents 5 times following a similar pattern: create user, set permissions, write service file, enable service, verify health. Help me formalize this as a reusable deployment skill with a Persona, Questions, and Principles structure.' Review and refine the structure AI produces."

What you're learning: AI helps transform tacit operational knowledge into explicit, structured, reusable intelligence. Compare the structure AI produces with the Persona + Questions + Principles pattern from this lesson. Notice where AI adds categories you hadn't considered and where your hands-on experience catches assumptions AI makes.

Safety Reminder

Always test deployment scripts on a non-production server first. Blue-green deployments involve stopping services -- a typo in the active color or port number can take down your live agent. Run through the full cycle (deploy, verify, switch, rollback) in a staging environment before trusting the script with production traffic.

The deployment skill you package in this lesson IS the SupportBot deployment. Lesson 12 uses it. One command, one clean server, one running production agent. Every principle you encoded -- non-root users, restart policies, resource limits, health verification -- fires automatically.

Lesson 12 is the moment everything connects. You will write the specification, deploy SupportBot to a production server, validate five layers of production readiness, and package the result as a script anyone on your team can run. The midnight panic from Lesson 1 was the problem. Lesson 12 is the solution you built with your own hands across eleven lessons of Linux mastery.

Deployment Patterns: Three Approaches​

Simple Restart​

Blue-Green Deployment​

Rolling Deployment​

When to Use Each Pattern​

Implementing Blue-Green Deployment​

Step 1: Create Two Service Files​

Step 2: Track Which Instance Is Live​

Step 3: The Blue-Green Deploy Script​

Step 4: Run the Deployment​

Step 5: Rollback Procedure​

Monitoring Integration​

Log Rotation with logrotate​

Disk Space Alerts​

Health Check Scheduling with cron​

The Complete Deployment Workflow​

A Note on Docker​

Building Reusable SKILL.md Operations​

Recognizing Patterns Worth Formalizing​

The SKILL.md Structure​

Body: Persona + Questions + Principles​

The Deploy Script as Skill Implementation​

Testing Skills on Fresh Systems​

Exercises​

Exercise 1: Document a Blue-Green Deployment Plan​

Exercise 2: Write a Skill Specification​

Exercise 3: Write a Full Deployment Script​

Try With AI​

Flashcards Study Aid​

Deployment Patterns: Three Approaches

Simple Restart

Blue-Green Deployment

Rolling Deployment

When to Use Each Pattern

Implementing Blue-Green Deployment

Step 1: Create Two Service Files

Step 2: Track Which Instance Is Live

Step 3: The Blue-Green Deploy Script

Step 4: Run the Deployment

Step 5: Rollback Procedure

Monitoring Integration

Log Rotation with logrotate

Disk Space Alerts

Health Check Scheduling with cron

The Complete Deployment Workflow

A Note on Docker

Building Reusable SKILL.md Operations

Recognizing Patterns Worth Formalizing

The SKILL.md Structure

Body: Persona + Questions + Principles

The Deploy Script as Skill Implementation

Testing Skills on Fresh Systems

Exercises

Exercise 1: Document a Blue-Green Deployment Plan

Exercise 2: Write a Skill Specification

Exercise 3: Write a Full Deployment Script

Try With AI

Flashcards Study Aid