Advanced Workflow Integration Patterns
You've deployed agents one at a time. You can write a service file, set restart policies, configure start-limit protection, and check health endpoints. Each skill works in isolation.
But production deployment is never one skill at a time. You need to update an agent without downtime. You need logs that don't fill the disk. You need alerts when disk space runs low. You need a single script that creates a user, installs a service, starts the agent, and verifies health -- all in one automated pass.
The jump from "deploy one agent" to "deploy reliably every time" requires patterns. This lesson teaches three deployment patterns, shows you how to implement the most important one (blue-green), and then layers monitoring on top so your deployments stay healthy after they go live.
Deployment Patterns: Three Approaches
Not every deployment needs the same strategy. Here are three patterns, each with different trade-offs.
Simple Restart
Stop the old version, deploy the new code, start the service.
# Simple restart deployment
sudo systemctl stop my-agent
# Deploy new code (copy files, update configs)
sudo cp -r /tmp/agent-v2/* /opt/agent/
sudo systemctl start my-agent
Output:
(no output -- stop and start complete silently on success)
Pros: Simple. One service file. No extra infrastructure.
Cons: Downtime between stop and start. If the new version fails, you must manually restore the old code and restart. During the gap, any requests to your agent fail.
Blue-Green Deployment
Run two copies of your agent (blue and green). One is live, the other is idle. Deploy to the idle one, verify it works, then switch traffic. If the new version fails, switch back instantly.
Pros: Zero downtime. Instant rollback. You verify before switching.
Cons: Requires two service files and temporarily uses double the resources during the switch.
Rolling Deployment
If you run multiple instances of the same agent (say three copies behind a load balancer), update them one at a time. Each instance gets the new code while the others continue serving.
Pros: Gradual rollout. If one instance fails, the others still serve traffic.
Cons: Mixed versions run simultaneously during the rollout. Requires multiple instances and a load balancer (more infrastructure than a single-server setup typically has).
When to Use Each Pattern
| Pattern | Best For | Downtime | Rollback Speed | Resource Cost | Complexity |
|---|---|---|---|---|---|
| Simple restart | Development, staging, low-traffic agents | Seconds to minutes | Manual (slow) | 1x (minimal) | Low |
| Blue-green | Production single-server agents | Zero | Instant (switch back) | 2x during switch | Medium |
| Rolling | Multi-instance production agents | Zero | Gradual (per instance) | 1x + 1 instance | High |
For most single-server agent deployments, blue-green is the sweet spot. It eliminates downtime without requiring the multi-instance infrastructure that rolling deployments need. The rest of this lesson focuses on implementing it.
Implementing Blue-Green Deployment
Blue-green deployment uses two systemd services -- one "blue" and one "green." At any time, exactly one is live (receiving traffic). The other is idle, waiting for the next deployment.
Step 1: Create Two Service Files
The blue service runs on port 8000, the green on port 8001. A symlink determines which one is "live."
Create the blue service:
sudo nano /etc/systemd/system/my-agent-blue.service
[Unit]
Description=Agent Blue Instance
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5
[Service]
Type=simple
User=nobody
WorkingDirectory=/opt/agent-blue
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%
[Install]
WantedBy=multi-user.target
Create the green service:
sudo nano /etc/systemd/system/my-agent-green.service
[Unit]
Description=Agent Green Instance
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5
[Service]
Type=simple
User=nobody
WorkingDirectory=/opt/agent-green
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8001
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%
[Install]
WantedBy=multi-user.target
Set up the directory structure:
sudo mkdir -p /opt/agent-blue /opt/agent-green
sudo cp /opt/agent/agent_main.py /opt/agent-blue/
sudo cp /opt/agent/agent_main.py /opt/agent-green/
Output:
(no output -- directories and files created silently)
Reload systemd and start the blue instance as the initial live version:
sudo systemctl daemon-reload
sudo systemctl start my-agent-blue
sudo systemctl enable my-agent-blue
Output:
Created symlink /etc/systemd/system/multi-user.target.wants/my-agent-blue.service → /etc/systemd/system/my-agent-blue.service.
Verify it's running:
curl -s http://localhost:8000/health | python3 -m json.tool
Output:
{
"status": "healthy",
"agent": "running",
"timestamp": "2026-02-11T10:30:01.234567"
}
Step 2: Track Which Instance Is Live
Use a simple file to record the active color:
echo "blue" | sudo tee /opt/agent-active-color
Output:
blue
This file is the single source of truth. Scripts read it to know which instance is currently serving traffic.
Step 3: The Blue-Green Deploy Script
This script automates the full deployment: deploy to the idle instance, verify health, switch, and provide rollback.
sudo nano /usr/local/bin/blue-green-deploy.sh
#!/bin/bash
# blue-green-deploy.sh - Zero-downtime deployment for agent_main.py
set -euo pipefail
NEW_CODE_DIR="${1:?Usage: blue-green-deploy.sh /path/to/new/code}"
ACTIVE_COLOR_FILE="/opt/agent-active-color"
# Determine current and target colors
CURRENT_COLOR=$(cat "$ACTIVE_COLOR_FILE")
if [ "$CURRENT_COLOR" = "blue" ]; then
TARGET_COLOR="green"
TARGET_PORT=8001
CURRENT_PORT=8000
else
TARGET_COLOR="blue"
TARGET_PORT=8000
CURRENT_PORT=8001
fi
echo "=== Blue-Green Deployment ==="
echo "Current live: $CURRENT_COLOR (port $CURRENT_PORT)"
echo "Deploying to: $TARGET_COLOR (port $TARGET_PORT)"
echo ""
# Step 1: Deploy new code to target
echo "[1/5] Deploying new code to $TARGET_COLOR..."
sudo cp -r "$NEW_CODE_DIR"/* "/opt/agent-${TARGET_COLOR}/"
echo "Done."
# Step 2: Start the target instance
echo "[2/5] Starting my-agent-${TARGET_COLOR}..."
sudo systemctl start "my-agent-${TARGET_COLOR}"
sleep 3
echo "Done."
# Step 3: Health check the target
echo "[3/5] Checking health on port ${TARGET_PORT}..."
HEALTH_RESPONSE=$(curl -sf "http://localhost:${TARGET_PORT}/health" 2>&1) || {
echo "FAILED: Health check did not pass on $TARGET_COLOR."
echo "Rolling back: stopping $TARGET_COLOR."
sudo systemctl stop "my-agent-${TARGET_COLOR}"
exit 1
}
echo "Health response: $HEALTH_RESPONSE"
echo "Health check passed."
# Step 4: Switch traffic (update the active-color file)
echo "[4/5] Switching live traffic to $TARGET_COLOR..."
echo "$TARGET_COLOR" | sudo tee "$ACTIVE_COLOR_FILE" > /dev/null
echo "Live instance is now: $TARGET_COLOR (port $TARGET_PORT)"
# Step 5: Stop the old instance
echo "[5/5] Stopping old instance (my-agent-${CURRENT_COLOR})..."
sudo systemctl stop "my-agent-${CURRENT_COLOR}"
echo "Done."
echo ""
echo "=== Deployment Complete ==="
echo "Active: $TARGET_COLOR on port $TARGET_PORT"
echo "Rolled back instance (${CURRENT_COLOR}) is stopped."
echo ""
echo "To rollback, run:"
echo " sudo systemctl start my-agent-${CURRENT_COLOR}"
echo " sudo systemctl stop my-agent-${TARGET_COLOR}"
echo " echo ${CURRENT_COLOR} | sudo tee ${ACTIVE_COLOR_FILE}"
Make it executable:
sudo chmod +x /usr/local/bin/blue-green-deploy.sh
Output:
(no output -- permissions set silently)
Step 4: Run the Deployment
Simulate deploying a new version by updating the code in a staging directory and running the script:
sudo mkdir -p /tmp/agent-v2
sudo cp /opt/agent/agent_main.py /tmp/agent-v2/
Run the blue-green deploy:
sudo blue-green-deploy.sh /tmp/agent-v2
Output:
=== Blue-Green Deployment ===
Current live: blue (port 8000)
Deploying to: green (port 8001)
[1/5] Deploying new code to green...
Done.
[2/5] Starting my-agent-green...
Done.
[3/5] Checking health on port 8001...
Health response: {"status":"healthy","agent":"running","timestamp":"2026-02-11T10:35:12.456789"}
Health check passed.
[4/5] Switching live traffic to green...
Live instance is now: green (port 8001)
[5/5] Stopping old instance (my-agent-blue)...
Done.
=== Deployment Complete ===
Active: green on port 8001
Rolled back instance (blue) is stopped.
To rollback, run:
sudo systemctl start my-agent-blue
sudo systemctl stop my-agent-green
echo blue | sudo tee /opt/agent-active-color
Step 5: Rollback Procedure
If the new version has a bug that the health check didn't catch, rollback is three commands:
sudo systemctl start my-agent-blue
sudo systemctl stop my-agent-green
echo blue | sudo tee /opt/agent-active-color
Output:
blue
Verify the rollback:
curl -s http://localhost:8000/health | python3 -m json.tool
Output:
{
"status": "healthy",
"agent": "running",
"timestamp": "2026-02-11T10:36:45.123456"
}
The old version is back in under 10 seconds. No redeployment needed.
Monitoring Integration
Deploying an agent is half the job. Keeping it healthy afterward requires monitoring: rotating logs before they fill the disk, alerting when disk space runs low, and verifying health on a schedule.
Log Rotation with logrotate
Your agent writes logs. Without rotation, those logs grow until they fill the disk and crash everything.
logrotate is a standard Linux tool that rotates, compresses, and removes old log files automatically.
Create a logrotate configuration for your agent:
sudo nano /etc/logrotate.d/agent-logs
/var/log/agent/*.log {
weekly
rotate 4
compress
delaycompress
missingok
notifempty
create 0640 nobody nobody
postrotate
systemctl reload my-agent-blue my-agent-green 2>/dev/null || true
endscript
}
Each directive serves a purpose:
| Directive | What It Does |
|---|---|
weekly | Rotate logs once per week |
rotate 4 | Keep 4 rotated files (4 weeks of history) |
compress | Compress rotated files with gzip |
delaycompress | Wait one rotation before compressing (so the most recent rotated file is still plain text for easy reading) |
missingok | Don't error if a log file is missing |
notifempty | Skip rotation if the log file is empty |
create 0640 nobody nobody | Create new log file with these permissions and ownership |
postrotate | After rotating, signal the service to reopen log files |
Create the log directory:
sudo mkdir -p /var/log/agent
sudo chown nobody:nobody /var/log/agent
Output:
(no output -- directory created and ownership set)
Test the configuration (dry run):
sudo logrotate -d /etc/logrotate.d/agent-logs
Output:
reading config file /etc/logrotate.d/agent-logs
Handling 1 log files in /var/log/agent/*.log
glob pattern expanded to:
/var/log/agent/agent.log
considering log /var/log/agent/agent.log
Now: 2026-02-11 10:40
Last rotated at 2026-02-11 00:00
log does not need rotating (log has been rotated within the last week)
The -d flag runs a dry run -- it shows what logrotate would do without actually doing it. Use this to verify your configuration before trusting it with production logs.
Disk Space Alerts
Log rotation prevents gradual disk fill, but other things consume space too -- temporary files, core dumps, downloaded models. A simple bash script can check disk usage and alert you.
sudo nano /usr/local/bin/check-disk-space.sh
#!/bin/bash
# check-disk-space.sh - Alert when disk usage exceeds threshold
set -euo pipefail
THRESHOLD=80
ALERT_LOG="/var/log/agent/disk-alerts.log"
# Get disk usage percentage for the root filesystem
USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$USAGE" -gt "$THRESHOLD" ]; then
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$TIMESTAMP] WARNING: Disk usage at ${USAGE}% (threshold: ${THRESHOLD}%)" | tee -a "$ALERT_LOG"
# Show top space consumers
echo "Top 5 directories by size:" | tee -a "$ALERT_LOG"
du -sh /var/log/* /opt/* /tmp/* 2>/dev/null | sort -rh | head -5 | tee -a "$ALERT_LOG"
else
echo "Disk usage OK: ${USAGE}%"
fi
Make it executable:
sudo chmod +x /usr/local/bin/check-disk-space.sh
Output:
(no output -- permissions set silently)
Test it:
sudo check-disk-space.sh
Output (when disk usage is normal):
Disk usage OK: 36%
Output (when disk usage exceeds the threshold):
[2026-02-11 10:45:00] WARNING: Disk usage at 85% (threshold: 80%)
Top 5 directories by size:
4.2G /var/log/journal
1.8G /opt/agent-blue
1.8G /opt/agent-green
512M /tmp/model-cache
128M /var/log/agent
Health Check Scheduling with cron
In Lesson 10, you created the canonical health check script at /usr/local/bin/check-agent-health.sh. Instead of duplicating that script here, schedule it with cron so it runs automatically.
Add cron entries for both health checks and disk monitoring:
sudo crontab -e
Add these lines:
# Agent health check every 5 minutes
*/5 * * * * /usr/local/bin/check-agent-health.sh my-agent-blue >> /var/log/agent/health-check.log 2>&1
# Disk space check every hour
0 * * * * /usr/local/bin/check-disk-space.sh >> /var/log/agent/disk-alerts.log 2>&1
Verify the cron entries were saved:
sudo crontab -l
Output:
# Agent health check every 5 minutes
*/5 * * * * /usr/local/bin/check-agent-health.sh my-agent-blue >> /var/log/agent/health-check.log 2>&1
# Disk space check every hour
0 * * * * /usr/local/bin/check-disk-space.sh >> /var/log/agent/disk-alerts.log 2>&1
Now your agent is monitored around the clock. Health checks run every 5 minutes, disk alerts run every hour, and logs rotate weekly with 4 weeks of compressed history.
The Complete Deployment Workflow
Here is what a full multi-step deployment looks like when you combine everything from this chapter: user creation, service installation, health verification, and monitoring setup -- all in one script.
sudo nano /usr/local/bin/full-deploy.sh
#!/bin/bash
# full-deploy.sh - Complete agent deployment workflow
# Combines: user creation, service install, start, health check, monitoring
set -euo pipefail
AGENT_NAME="${1:?Usage: full-deploy.sh <agent-name>}"
AGENT_DIR="/opt/${AGENT_NAME}"
AGENT_USER="agent-runner"
SERVICE_FILE="/etc/systemd/system/${AGENT_NAME}.service"
echo "=== Full Deployment: ${AGENT_NAME} ==="
# Step 1: Create dedicated user (if not exists)
echo "[1/6] Checking agent user..."
if id "$AGENT_USER" &>/dev/null; then
echo "User $AGENT_USER already exists."
else
sudo useradd -r -s /usr/sbin/nologin -d /opt -M "$AGENT_USER"
echo "Created system user: $AGENT_USER"
fi
# Step 2: Create directory and deploy code
echo "[2/6] Deploying agent code..."
sudo mkdir -p "$AGENT_DIR"
sudo cp /tmp/agent-release/agent_main.py "$AGENT_DIR/"
sudo chown -R "$AGENT_USER":"$AGENT_USER" "$AGENT_DIR"
echo "Code deployed to $AGENT_DIR"
# Step 3: Install systemd service
echo "[3/6] Installing systemd service..."
sudo tee "$SERVICE_FILE" > /dev/null <<EOF
[Unit]
Description=Digital FTE Agent: ${AGENT_NAME}
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=5
[Service]
Type=simple
User=${AGENT_USER}
WorkingDirectory=${AGENT_DIR}
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
MemoryMax=512M
CPUQuota=25%
[Install]
WantedBy=multi-user.target
EOF
echo "Service file installed: $SERVICE_FILE"
# Step 4: Start service
echo "[4/6] Starting service..."
sudo systemctl daemon-reload
sudo systemctl enable "$AGENT_NAME"
sudo systemctl start "$AGENT_NAME"
sleep 3
echo "Service started."
# Step 5: Health check
echo "[5/6] Verifying health..."
if curl -sf http://localhost:8000/health > /dev/null 2>&1; then
echo "Health check PASSED."
else
echo "Health check FAILED. Check logs:"
echo " sudo journalctl -u $AGENT_NAME -n 20"
exit 1
fi
# Step 6: Confirm
echo "[6/6] Verifying final state..."
STATUS=$(systemctl is-active "$AGENT_NAME")
echo "Service status: $STATUS"
echo ""
echo "=== Deployment Complete ==="
echo "Service: $AGENT_NAME"
echo "Status: $STATUS"
echo "Health: http://localhost:8000/health"
echo "Logs: sudo journalctl -u $AGENT_NAME -f"
Make it executable:
sudo chmod +x /usr/local/bin/full-deploy.sh
Output:
(no output -- permissions set silently)
Run the full deployment:
sudo mkdir -p /tmp/agent-release
sudo cp /opt/agent/agent_main.py /tmp/agent-release/
sudo full-deploy.sh my-agent
Output:
=== Full Deployment: my-agent ===
[1/6] Checking agent user...
User agent-runner already exists.
[2/6] Deploying agent code...
Code deployed to /opt/my-agent
[3/6] Installing systemd service...
Service file installed: /etc/systemd/system/my-agent.service
[4/6] Starting service...
Service started.
[5/6] Verifying health...
Health check PASSED.
[6/6] Verifying final state...
Service status: active
=== Deployment Complete ===
Service: my-agent
Status: active
Health: http://localhost:8000/health
Logs: sudo journalctl -u my-agent -f
One command. User created, code deployed, service installed, agent started, health verified. This is the difference between a manual checklist and an automated workflow -- the script never forgets a step.
A Note on Docker
Docker packages your application with its dependencies in a container -- an isolated environment that runs the same way regardless of the host system. For single-server deployments like the ones in this chapter, systemd is simpler and has zero overhead: your agent runs directly on the host OS with no abstraction layer. Docker excels when you need environment consistency across development, staging, and production servers, or when you're deploying to orchestrated environments like Kubernetes. You'll learn Docker in a dedicated chapter later in this book. For now, systemd is the right tool for your deployment -- it gives you service management, restart policies, resource limits, and logging with nothing extra to install.
Exercises
Exercise 1: Document a Blue-Green Deployment Plan
Write a deployment plan for updating your agent from v1 to v2 using the blue-green pattern. Your plan should include:
- The name of the currently active service (and its port)
- The name of the idle service you will deploy to (and its port)
- The health check command you will run before switching
- The exact commands to switch traffic
- The exact rollback commands if something goes wrong
Write your plan as a text file:
nano ~/blue-green-plan.txt
Verification -- your plan should contain all five sections:
grep -c "active service\|idle service\|health check\|switch traffic\|rollback" ~/blue-green-plan.txt
Expected output:
5
If the count is lower than 5, your plan is missing sections. Review the blue-green deployment walkthrough above and add the missing pieces.
Exercise 2: Configure logrotate for Agent Logs
Create a logrotate configuration that rotates agent logs weekly, keeps 4 weeks of history, and compresses old logs.
sudo nano /etc/logrotate.d/agent-logs
Write the configuration (refer to the logrotate section above), then verify:
cat /etc/logrotate.d/agent-logs
Expected output:
/var/log/agent/*.log {
weekly
rotate 4
compress
delaycompress
missingok
notifempty
create 0640 nobody nobody
postrotate
systemctl reload my-agent-blue my-agent-green 2>/dev/null || true
endscript
}
Confirm the configuration parses without errors:
sudo logrotate -d /etc/logrotate.d/agent-logs
Expected output (no errors):
reading config file /etc/logrotate.d/agent-logs
...
If you see error: ... in the output, check your syntax -- a common mistake is forgetting the endscript keyword after postrotate.
Exercise 3: Write a Full Deployment Script
Write a deployment script that combines user creation, service file installation, service start, and health verification into one automated pass. You can use full-deploy.sh above as a reference, but write it yourself.
After writing and running your script, verify both conditions:
systemctl is-active my-agent
Expected output:
active
curl -s http://localhost:8000/health
Expected output:
{"status":"healthy","agent":"running","timestamp":"..."}
Both checks must pass. If systemctl is-active shows inactive or failed, check journalctl -u my-agent -n 20 for error details. If curl fails, verify the port number matches what's in your service file's ExecStart.
Try With AI
"Ask Claude: 'I need to update my production agent with zero downtime. Compare restart, blue-green, and rolling deployment approaches for a single-server setup. Which do you recommend and why?'"
What you're learning: AI helps you evaluate production trade-offs by considering factors (resource overhead, complexity, rollback speed) that are hard to assess without experience. Notice how AI weighs the trade-offs differently for a single-server constraint versus a multi-server fleet.
"Tell Claude: 'I chose blue-green deployment. My constraint: only one server with 2GB RAM, so I can't run both versions simultaneously for long. Design a deployment script that minimizes overlap time.' Review the approach."
What you're learning: Providing real resource constraints forces practical solutions instead of textbook ideals. Compare AI's response to the script in this lesson -- it may suggest techniques like pre-pulling code, faster health checks, or immediate shutdown of the old instance to reduce the overlap window.
"Ask Claude: 'Review my complete deployment workflow: [paste your deployment script from Exercise 3]. What failure scenarios does it NOT handle? Add error handling for each one.'"
What you're learning: Production workflows need resilience against failures at every step -- something that emerges through iterative review. AI will likely identify scenarios you didn't consider: what if the user already exists with wrong permissions? What if the port is already in use? What if disk space is too low to copy files? Each failure mode AI finds teaches you to think more defensively about automation.
Always test deployment scripts on a non-production server first. Blue-green deployments involve stopping services -- a typo in the active color or port number can take down your live agent. Run through the full cycle (deploy, verify, switch, rollback) in a staging environment before trusting the script with production traffic.