Capstone — Spec-First Production Deployment
Everything you've learned — CLI navigation, file operations, scripting, security, networking, systemd, debugging, deployment patterns, reusable skills — comes together here. This capstone is different from every other lesson in this chapter: you will write a specification FIRST, then implement it, then validate it. This is how production deployments work in professional environments.
The difference between a hobbyist deployment and a production deployment is not the tools. It is the process. Hobbyists type commands until the service starts. Production engineers define success criteria before touching a terminal, implement against those criteria, and validate systematically. You are going to do the second thing.
You will deploy the same agent_main.py from Lesson 10 — a FastAPI agent with a health check endpoint. But this time, the deployment will be specification-driven, validated layer by layer, and packaged so anyone can reproduce it.
The Spec-First Approach
Think about what happens when you deploy without a specification. You SSH into a server, start typing commands, fix problems as they appear, and eventually something works. Three weeks later, the server reboots and the agent doesn't come back. You SSH in again, try to remember what you did, and spend an hour reconstructing the setup.
A deployment specification prevents this. It is a document that answers four questions before you touch the terminal:
| Question | Section in Spec | Example |
|---|---|---|
| What am I deploying? | Service Definition | FastAPI agent on port 8000, managed by systemd |
| How is it protected? | Security Requirements | Dedicated non-root user, firewall restricts access |
| How do I know it's healthy? | Monitoring Plan | Health endpoint, journalctl logs, disk alerts |
| How do I prove it works? | Validation Criteria | 5-layer check: service, network, security, logs, resources |
The specification becomes your implementation checklist. Every command you run traces back to a requirement. Every validation check maps to a success criterion. Nothing is ad-hoc.
Writing Your DEPLOYMENT-SPEC.md
Here is the template. You will fill it out for agent_main.py before implementing anything.
# DEPLOYMENT-SPEC.md — Agent Production Deployment
## Service Definition
- **Agent name**: agent-prod
- **Application**: agent_main.py (FastAPI with uvicorn)
- **Port**: 8000 (direct binding, no reverse proxy)
- **System user**: agent-prod (dedicated, no login shell)
- **Working directory**: /opt/agent-prod/
- **Restart policy**: on-failure (not always) with 5s delay
- **Start-limit protection**: max 5 restarts in 60 seconds
- **Resource limits**: MemoryMax=512M, CPUQuota=25%
## Security Requirements
- [ ] Dedicated system user `agent-prod` (no root execution)
- [ ] User has no login shell (/usr/sbin/nologin)
- [ ] Application files owned by agent-prod:agent-prod
- [ ] Service file permissions: 644 (root-owned)
- [ ] Firewall allows only port 8000/tcp for agent traffic
- [ ] No password-based SSH (key-based only — from L7)
## Monitoring Plan
- **Health check**: GET /health returns {"status": "healthy", ...}
- **Log location**: journalctl -u agent-prod (systemd journal)
- **Log rotation**: managed by journald (MaxRetentionSec, SystemMaxUse)
- **Disk alerts**: monitor /opt/agent-prod/ usage
- **Crash detection**: systemd restart counter + journal entries
## Validation Criteria
- [ ] Layer 1 — Service: `systemctl is-active agent-prod` returns "active"
- [ ] Layer 2 — Network: `curl -s localhost:8000/health` returns healthy JSON
- [ ] Layer 3 — Security: agent runs as agent-prod user, not root
- [ ] Layer 4 — Monitoring: logs appear in journalctl within last 5 minutes
- [ ] Layer 5 — Resources: MemoryMax shows 536870912 (512MB in bytes)
Expected output (after saving the file):
$ wc -l DEPLOYMENT-SPEC.md
30 DEPLOYMENT-SPEC.md
Read through each section. Notice how every requirement is testable. "Dedicated system user" is not a vague goal — it maps to a specific check: ps -eo user,comm | grep uvicorn must show agent-prod, not root. This is what separates a specification from a wish list.
Implementing the Specification
Now you implement. Every step below references which spec section it satisfies. If a step does not trace to the spec, it does not belong here.
Step 1: Create the Agent User (Security Requirements)
Create a dedicated system user with no login shell:
sudo useradd -r -s /usr/sbin/nologin agent-prod
Expected output:
(no output — silent success means the user was created)
Verify the user exists:
id agent-prod
Expected output:
uid=998(agent-prod) gid=998(agent-prod) groups=998(agent-prod)
The -r flag creates a system user (low UID, no home directory). The -s /usr/sbin/nologin flag prevents anyone from logging in as this user. The agent process runs as this user, but no human can SSH in as it.
Step 2: Set Up the Working Directory (Service Definition)
Create the application directory and copy the agent:
sudo mkdir -p /opt/agent-prod
Expected output:
(no output — directory created)
sudo cp /opt/agent/agent_main.py /opt/agent-prod/
Expected output:
(no output — file copied)
This copies the same agent_main.py you deployed in Lesson 10. If you haven't set it up yet, create it now with the FastAPI agent code from that lesson.
Step 3: Install Dependencies (Service Definition)
sudo pip install fastapi uvicorn
Expected output:
Requirement already satisfied: fastapi in /usr/local/lib/python3.10/dist-packages
Requirement already satisfied: uvicorn in /usr/local/lib/python3.10/dist-packages
If you see "Successfully installed" instead, that is also correct — it means the packages were not previously installed.
Step 4: Set Ownership and Permissions (Security Requirements)
sudo chown -R agent-prod:agent-prod /opt/agent-prod
Expected output:
(no output — ownership changed)
Verify:
ls -la /opt/agent-prod/
Expected output:
total 12
drwxr-xr-x 2 agent-prod agent-prod 4096 Feb 11 10:00 .
drwxr-xr-x 4 root root 4096 Feb 11 10:00 ..
-rw-r--r-- 1 agent-prod agent-prod 892 Feb 11 10:00 agent_main.py
Every file is owned by agent-prod. The agent process can read its own files, but the restricted user cannot modify system files outside this directory.
Step 5: Write the systemd Service File (Service Definition + Security)
sudo nano /etc/systemd/system/agent-prod.service
Add this content:
[Unit]
Description=Production Digital FTE Agent
After=network.target
[Service]
Type=simple
User=agent-prod
Group=agent-prod
WorkingDirectory=/opt/agent-prod
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
# Restart policy: recover from crashes, not intentional stops
Restart=on-failure
RestartSec=5
# Start-limit protection: stop retrying after repeated failures
StartLimitBurst=5
StartLimitIntervalSec=60
# Resource limits: prevent runaway consumption
MemoryMax=512M
CPUQuota=25%
[Install]
WantedBy=multi-user.target
Expected output (after saving):
(no output — file saved)
Each directive traces to the specification:
| Directive | Spec Requirement |
|---|---|
User=agent-prod | Security: dedicated non-root user |
Restart=on-failure | Service: restart on crash, stay stopped on intentional stop |
RestartSec=5 | Service: 5-second delay between restarts |
StartLimitBurst=5 | Service: max 5 restarts in the interval |
StartLimitIntervalSec=60 | Service: 60-second window for counting restarts |
MemoryMax=512M | Service: 512MB memory ceiling |
CPUQuota=25% | Service: 25% CPU ceiling |
Why Restart=on-failure and not Restart=always? Because always restarts the service even after you deliberately run systemctl stop. That means you cannot cleanly stop your own service — you would have to disable it first. on-failure gives you control: crashes get automatic recovery, intentional stops stay stopped.
Step 6: Enable and Start the Service (Service Definition)
sudo systemctl daemon-reload
Expected output:
(no output — daemon-reload completes silently on success)
sudo systemctl enable agent-prod
Expected output:
Created symlink /etc/systemd/system/multi-user.target.wants/agent-prod.service → /etc/systemd/system/agent-prod.service.
sudo systemctl start agent-prod
Expected output:
(no output — a silent start means success)
Verify it is running:
sudo systemctl status agent-prod
Expected output:
● agent-prod.service - Production Digital FTE Agent
Loaded: loaded (/etc/systemd/system/agent-prod.service; enabled; preset: enabled)
Active: active (running) since Tue 2026-02-11 10:05:00 UTC; 5s ago
Main PID: 5432 (uvicorn)
Tasks: 2 (limit: 4915)
Memory: 48.3M (max: 512.0M)
CPU: 320ms
CGroup: /system.slice/agent-prod.service
└─5432 /usr/local/bin/python3 /usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Look at the Memory line — it shows (max: 512.0M), confirming your resource limit is applied.
Step 7: Configure the Firewall (Security Requirements)
sudo ufw allow 8000/tcp
Expected output:
Rule added
Rule added (v6)
Verify the rule:
sudo ufw status | grep 8000
Expected output:
8000/tcp ALLOW Anywhere
8000/tcp (v6) ALLOW Anywhere (v6)
Port 8000 is now open for agent traffic. All other ports remain blocked (except SSH on port 22, which you configured in Lesson 7).
When you learn nginx in a future chapter, you can add a reverse proxy in front for SSL termination and load balancing. For now, direct port binding keeps the architecture simple and gives you one fewer component to debug.
Step 8: Verify the Health Endpoint (Monitoring Plan)
curl -s localhost:8000/health
Expected output:
{"status":"healthy","agent":"running","timestamp":"2026-02-11T10:05:15.234567"}
This is the same health check endpoint from Lesson 10. It confirms the agent is alive and responding to requests.
Your implementation is complete. Every step traced back to a requirement in the specification. Now you validate.
Layered Validation
Validation is not "it seems to work." Validation is systematic proof that every requirement in your specification is met. Each layer tests a different dimension of the deployment.
Layer 1 — Service
Is the service running?
systemctl is-active agent-prod
Expected output (PASS):
active
Expected output (FAIL):
inactive
If it fails, check the journal: journalctl -u agent-prod -n 20 --no-pager
Layer 2 — Network
Is the agent responding to requests?
curl -s localhost:8000/health | python3 -m json.tool
Expected output (PASS):
{
"status": "healthy",
"agent": "running",
"timestamp": "2026-02-11T10:10:00.123456"
}
Expected output (FAIL):
curl: (7) Failed to connect to localhost port 8000: Connection refused
If it fails but the service is active, check that uvicorn is binding to the correct port: ss -tlnp | grep 8000
Layer 3 — Security
Is the agent running as the correct user (not root)?
ps -eo user,comm | grep uvicorn
Expected output (PASS):
agent-p+ uvicorn
Expected output (FAIL):
root uvicorn
The agent-p+ is a truncated display of agent-prod. If you see root, the User= directive in your service file is wrong or missing.
Layer 4 — Monitoring
Are logs flowing?
journalctl -u agent-prod --since "5 min ago" --no-pager | head -5
Expected output (PASS):
Feb 11 10:05:00 server systemd[1]: Started Production Digital FTE Agent.
Feb 11 10:05:01 server uvicorn[5432]: INFO: Started server process [5432]
Feb 11 10:05:01 server uvicorn[5432]: INFO: Waiting for application startup.
Feb 11 10:05:01 server uvicorn[5432]: INFO: Application startup complete.
Feb 11 10:05:01 server uvicorn[5432]: INFO: Uvicorn running on http://0.0.0.0:8000
Expected output (FAIL):
-- No entries --
If no entries appear, either the service has not started recently or journald is not capturing its output.
Layer 5 — Resources
Are resource limits applied?
systemctl show agent-prod --property=MemoryMax
Expected output (PASS):
MemoryMax=536870912
Expected output (FAIL):
MemoryMax=infinity
The value 536870912 is 512 MB in bytes. If you see infinity, the MemoryMax=512M directive is missing from the [Service] section or you forgot to run systemctl daemon-reload after editing.
The Complete Validation Script
Combine all five layers into a single executable script:
#!/bin/bash
# validate-deployment.sh — Layered deployment validation
# Runs 5 validation layers against DEPLOYMENT-SPEC.md requirements
set -u # Exit on undefined variables
AGENT_NAME="agent-prod"
AGENT_PORT="8000"
AGENT_USER="agent-prod"
EXPECTED_MEMORY="536870912"
PASS=0
FAIL=0
check() {
local layer="$1"
local description="$2"
local result="$3"
if [ "$result" -eq 0 ]; then
echo " PASS: $description"
PASS=$((PASS + 1))
else
echo " FAIL: $description"
FAIL=$((FAIL + 1))
fi
}
echo "=== Deployment Validation: $AGENT_NAME ==="
echo ""
# LAYER 1: Service
echo "[Layer 1] Service"
systemctl is-active --quiet "$AGENT_NAME"
check "1" "Service is active" $?
systemctl is-enabled --quiet "$AGENT_NAME"
check "1" "Service is enabled (starts on boot)" $?
echo ""
# LAYER 2: Network
echo "[Layer 2] Network"
HEALTH=$(curl -s -o /dev/null -w "%{http_code}" "localhost:$AGENT_PORT/health" 2>/dev/null)
[ "$HEALTH" = "200" ]
check "2" "Health endpoint returns 200" $?
curl -s "localhost:$AGENT_PORT/health" | grep -q '"status":"healthy"'
check "2" "Health response contains status:healthy" $?
echo ""
# LAYER 3: Security
echo "[Layer 3] Security"
SERVICE_USER=$(ps -eo user,comm --no-headers | grep uvicorn | awk '{print $1}' | head -1)
[ "$SERVICE_USER" = "$AGENT_USER" ] || [ "$SERVICE_USER" = "agent-p+" ]
check "3" "Agent runs as $AGENT_USER (not root)" $?
id "$AGENT_USER" > /dev/null 2>&1
check "3" "Dedicated user $AGENT_USER exists" $?
echo ""
# LAYER 4: Monitoring
echo "[Layer 4] Monitoring"
LOGCOUNT=$(journalctl -u "$AGENT_NAME" --since "5 min ago" --no-pager 2>/dev/null | wc -l)
[ "$LOGCOUNT" -gt 1 ]
check "4" "Logs flowing in journal (found $LOGCOUNT lines)" $?
echo ""
# LAYER 5: Resources
echo "[Layer 5] Resources"
ACTUAL_MEMORY=$(systemctl show "$AGENT_NAME" --property=MemoryMax --value)
[ "$ACTUAL_MEMORY" = "$EXPECTED_MEMORY" ]
check "5" "MemoryMax is $EXPECTED_MEMORY (512MB)" $?
echo ""
echo "=== Results: $PASS passed, $FAIL failed ==="
if [ "$FAIL" -eq 0 ]; then
echo "ALL CHECKS PASSED — deployment meets specification"
exit 0
else
echo "VALIDATION FAILED — review failed checks against DEPLOYMENT-SPEC.md"
exit 1
fi
Save this as validate-deployment.sh and run it:
chmod +x validate-deployment.sh
sudo bash validate-deployment.sh
Expected output (all passing):
=== Deployment Validation: agent-prod ===
[Layer 1] Service
PASS: Service is active
PASS: Service is enabled (starts on boot)
[Layer 2] Network
PASS: Health endpoint returns 200
PASS: Health response contains status:healthy
[Layer 3] Security
PASS: Agent runs as agent-prod (not root)
PASS: Dedicated user agent-prod exists
[Layer 4] Monitoring
PASS: Logs flowing in journal (found 12 lines)
[Layer 5] Resources
PASS: MemoryMax is 536870912 (512MB)
=== Results: 8 passed, 0 failed ===
ALL CHECKS PASSED — deployment meets specification
Expected output (with failure):
=== Deployment Validation: agent-prod ===
[Layer 1] Service
PASS: Service is active
PASS: Service is enabled (starts on boot)
[Layer 2] Network
FAIL: Health endpoint returns 200
FAIL: Health response contains status:healthy
...
=== Results: 6 passed, 2 failed ===
VALIDATION FAILED — review failed checks against DEPLOYMENT-SPEC.md
When a layer fails, work backward: Layer 2 (Network) failed but Layer 1 (Service) passed means the service is running but not responding. Check if uvicorn bound to the right port (ss -tlnp | grep 8000) or if the firewall is blocking local connections.
Packaging for Repeatability
Your deployment works. But can someone else reproduce it? Can YOU reproduce it on a new server next month? Packaging turns your manual commands into a script that does everything in one run.
The Complete deploy.sh
#!/bin/bash
# deploy.sh — Automated agent deployment
# Implements DEPLOYMENT-SPEC.md requirements in a single script
# Usage: sudo bash deploy.sh
set -e # Exit on any error
AGENT_NAME="agent-prod"
AGENT_USER="agent-prod"
AGENT_PORT="8000"
AGENT_DIR="/opt/$AGENT_NAME"
SERVICE_FILE="/etc/systemd/system/$AGENT_NAME.service"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "=== Deploying $AGENT_NAME ==="
# Step 1: Create dedicated user (idempotent)
echo "[1/7] Creating user $AGENT_USER..."
if id "$AGENT_USER" &>/dev/null; then
echo " User $AGENT_USER already exists — skipping"
else
useradd -r -s /usr/sbin/nologin "$AGENT_USER"
echo " Created system user $AGENT_USER"
fi
# Step 2: Create working directory
echo "[2/7] Setting up $AGENT_DIR..."
mkdir -p "$AGENT_DIR"
echo " Directory ready"
# Step 3: Copy application
echo "[3/7] Copying agent_main.py..."
cp "$SCRIPT_DIR/agent_main.py" "$AGENT_DIR/"
echo " Application copied"
# Step 4: Install dependencies
echo "[4/7] Installing Python dependencies..."
pip install --quiet fastapi uvicorn
echo " Dependencies installed"
# Step 5: Set permissions
echo "[5/7] Setting ownership and permissions..."
chown -R "$AGENT_USER:$AGENT_USER" "$AGENT_DIR"
echo " Ownership set to $AGENT_USER"
# Step 6: Write and activate systemd service (idempotent)
echo "[6/7] Configuring systemd service..."
cat > "$SERVICE_FILE" << 'EOF'
[Unit]
Description=Production Digital FTE Agent
After=network.target
[Service]
Type=simple
User=agent-prod
Group=agent-prod
WorkingDirectory=/opt/agent-prod
ExecStart=/usr/local/bin/uvicorn agent_main:app --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
StartLimitBurst=5
StartLimitIntervalSec=60
MemoryMax=512M
CPUQuota=25%
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable "$AGENT_NAME"
echo " Service configured and enabled"
# Step 7: Configure firewall (idempotent)
echo "[7/7] Configuring firewall..."
if ufw status | grep -q "$AGENT_PORT/tcp"; then
echo " Firewall rule for $AGENT_PORT/tcp already exists — skipping"
else
ufw allow "$AGENT_PORT/tcp"
echo " Firewall rule added for port $AGENT_PORT"
fi
# Start the service
echo ""
echo "Starting $AGENT_NAME..."
systemctl restart "$AGENT_NAME"
sleep 2
# Quick verification
if systemctl is-active --quiet "$AGENT_NAME"; then
echo "Service is active"
else
echo "WARNING: Service failed to start — check: journalctl -u $AGENT_NAME -n 20"
exit 1
fi
HEALTH=$(curl -s "localhost:$AGENT_PORT/health" 2>/dev/null || echo "UNREACHABLE")
echo "Health check: $HEALTH"
echo ""
echo "=== Deployment complete ==="
echo "Next: Run validate-deployment.sh for full layered validation"
Expected output:
=== Deploying agent-prod ===
[1/7] Creating user agent-prod...
Created system user agent-prod
[2/7] Setting up /opt/agent-prod/...
Directory ready
[3/7] Copying agent_main.py...
Application copied
[4/7] Installing Python dependencies...
Dependencies installed
[5/7] Setting ownership and permissions...
Ownership set to agent-prod
[6/7] Configuring systemd service...
Service configured and enabled
[7/7] Configuring firewall...
Firewall rule added for port 8000
Starting agent-prod...
Service is active
Health check: {"status":"healthy","agent":"running","timestamp":"2026-02-11T10:15:00.123456"}
=== Deployment complete ===
Next: Run validate-deployment.sh for full layered validation
Notice the idempotency patterns: Step 1 checks if the user exists before creating it. Step 7 checks if the firewall rule exists before adding it. Running deploy.sh twice produces the same result as running it once. This matters because real deployments get re-run — after updates, after server migrations, after debugging. A script that fails on its second run is not production-ready.
Production Readiness Checklist
Before declaring this deployment complete, verify every requirement from the specification:
| Requirement | Status | Verification Command |
|---|---|---|
| Service running | ✓ or ✗ | systemctl is-active agent-prod |
| Starts on boot | ✓ or ✗ | systemctl is-enabled agent-prod |
| Health check passing | ✓ or ✗ | curl -s localhost:8000/health |
| Runs as agent-prod | ✓ or ✗ | ps -eo user,comm | grep uvicorn |
| No login shell | ✓ or ✗ | grep agent-prod /etc/passwd (shows /usr/sbin/nologin) |
| Correct file ownership | ✓ or ✗ | ls -la /opt/agent-prod/ |
| Firewall configured | ✓ or ✗ | sudo ufw status | grep 8000 |
| Memory limit applied | ✓ or ✗ | systemctl show agent-prod --property=MemoryMax → 536870912 |
| CPU limit applied | ✓ or ✗ | systemctl show agent-prod --property=CPUQuotaPerSecUSec |
| Restart protection | ✓ or ✗ | systemctl show agent-prod --property=StartLimitBurst → 5 |
| Logs flowing | ✓ or ✗ | journalctl -u agent-prod --since "5 min ago" |
| Crash recovery works | ✓ or ✗ | Kill process, wait 10s, check is-active |
Every row links back to a line in DEPLOYMENT-SPEC.md. If any row shows ✗, you know exactly what to fix and how to verify the fix.
Exercises
Exercise 1: Write Your DEPLOYMENT-SPEC.md
Using the template from this lesson, write a complete DEPLOYMENT-SPEC.md for deploying agent_main.py to production. Fill in every section: Service Definition, Security Requirements, Monitoring Plan, and Validation Criteria.
Verification:
grep -c "##" DEPLOYMENT-SPEC.md
Expected output:
4
Your spec should have at least 4 section headers. Each section should contain specific, testable requirements — not vague goals.
Exercise 2: Implement the Specification
Follow the implementation steps in this lesson to deploy the agent. Do not skip ahead — implement each step and verify it before moving to the next.
Verification:
systemctl is-active agent-prod
Expected output:
active
If you see inactive or failed, check the journal: journalctl -u agent-prod -n 20 --no-pager
Exercise 3: Run the Layered Validation Script
Save the validation script from this lesson as validate-deployment.sh, make it executable, and run it.
Verification:
sudo bash validate-deployment.sh
Expected output (final line):
ALL CHECKS PASSED — deployment meets specification
If any layer fails, trace the failure to the corresponding spec requirement and fix the underlying issue before re-running.
Exercise 4: Package into deploy.sh and Test
Save the deployment script, place agent_main.py alongside it, and test on a clean state. First, tear down the existing deployment:
sudo systemctl stop agent-prod
sudo systemctl disable agent-prod
sudo rm /etc/systemd/system/agent-prod.service
sudo systemctl daemon-reload
sudo userdel agent-prod
sudo rm -rf /opt/agent-prod
Expected output:
Removed /etc/systemd/system/multi-user.target.wants/agent-prod.service.
Now deploy from scratch:
sudo bash deploy.sh
Verification:
sudo bash validate-deployment.sh | tail -1
Expected output:
ALL CHECKS PASSED — deployment meets specification
A single script, from clean server to production-validated deployment. That is repeatability.
Try With AI
Ask Claude: "Review my DEPLOYMENT-SPEC.md. Are there any production concerns I haven't addressed? Here is the spec: [paste your spec from Exercise 1]." Incorporate Claude's suggestions into your spec before implementing.
What you're learning: AI identifies gaps in specifications — backup strategy, update procedures, rollback plans — that specification authors commonly miss. A specification you think is complete often has blind spots that a second reviewer (human or AI) catches immediately.
Tell Claude your deployment constraints: "I'm deploying to a VPS with 1GB RAM and 20GB disk. The agent processes 100 requests/hour. Optimize my deployment for these constraints." Compare the response to your original MemoryMax and CPUQuota values.
What you're learning: Providing real constraints produces deployments optimized for YOUR situation, not generic best practices. An agent processing 100 requests per hour on a 1GB VPS needs very different resource limits than one handling 10,000 requests per hour on a 64GB server. AI can do the math if you give it the numbers.
After deployment, ask Claude: "My agent is deployed and passing health checks. What should I monitor over the next 24 hours to ensure stability? Create a monitoring checklist." Follow the checklist and report what you observe.
What you're learning: Production is not done at deployment — monitoring, alerting, and observability are ongoing responsibilities. The first 24 hours after deployment reveal patterns (memory growth, log volume, response time drift) that no amount of pre-deployment testing can predict.
Production deployment involves system-level changes — creating users, modifying firewall rules, installing systemd services. Always test on a non-production server first (a VM, a cloud instance you can destroy, or a local container). Keep SSH access working at all times: if you lock yourself out by misconfiguring the firewall, you lose access to the server entirely. Run sudo ufw status before and after any firewall change to confirm you have not blocked your own SSH connection.