Making Your Agent Unkillable

Ali's agent has a proper home now. Organized directories, secrets in .env, logs that persist. He starts the agent with a simple command. It works. He watches the logs scroll by, satisfied.

Then he closes his laptop to go to bed.

Morning. Coffee. He opens the laptop, SSHes back into Dev's server, checks the dashboard. The agent has not run since he went to sleep. Three days of missing data just became four.

He starts it again. It works again. The server reboots for a security update at 3 AM. The agent dies again.

"Close your laptop. Your agent dies. Reboot the server. Your agent dies. It's a pet that dies every time you stop watching it."

The fix is one file.

Why Your Agent Keeps Dying

When you start a program in a terminal, that program becomes a child of the terminal session. Close the terminal, and the operating system sends a signal to every child process: "Your parent is gone. Time to die."

This is what happened to Ali's agent. He started it in his SSH session. When he closed his laptop, the SSH connection dropped. The server killed every process that belonged to that session. Including his agent.

Think of it this way. A process is a phone call. When you hang up, the conversation is over. Nobody keeps talking.

A service is a security guard hired by the building. The building doesn't care if the guard takes a coffee break or swaps shifts. What matters is that someone is always at the desk. If the guard leaves, the building finds a replacement.

Ali's agent is a phone call. He needs it to be a security guard.

The Fix: One File

Linux has a built-in building manager called systemd. It manages every service on the server: the database, the web server, the SSH daemon that let Ali connect in the first place. All of them are systemd services. All of them survive reboots.

To make your agent a systemd service, you write a unit file: a plain text file that answers five questions:

What should run?
As which user?
When should it start?
What if it crashes?
How much memory can it use?

That's it. One file. Let's write it.

The Unit File, Line by Line

What you tell Claude Code: "Create a systemd service file for my competitor-tracker agent. It should run as user ali, start after the network is available, restart if it crashes, and use no more than 512 MB of memory."

What the agent creates at /etc/systemd/system/competitor-tracker.service:

[Unit]
Description=Competitor Tracker Agent
After=network.target

Description. A human-readable name. Shows up in logs and status commands.
After=network.target: Don't start this service until the network is ready. Your agent needs internet access to scrape pricing data. Starting before the network is up would cause immediate failures.

[Service]
Type=simple
User=ali
WorkingDirectory=/opt/agents/competitor-tracker
EnvironmentFile=/opt/agents/competitor-tracker/.env
ExecStart=/usr/bin/python3 /opt/agents/competitor-tracker/src/main.py
Restart=on-failure
RestartSec=5
MemoryMax=512M

Type=simple. The process you start IS the service. No forking, no backgrounding. The simplest and most common type.
User=ali: Run as Ali, not as root. Never run agents as root. (More on this in Lesson 5.)
WorkingDirectory. The agent runs as if you cd'd into this directory first. Relative paths in your code resolve from here.
EnvironmentFile: Load environment variables from .env. Your database password, API keys, and configuration: all available to the agent without hardcoding.
ExecStart. The exact command to run. Full absolute path to Python and the script. No ambiguity.
Restart=on-failure. If the agent crashes (exits with a non-zero code), restart it. If you intentionally stop it with systemctl stop, don't restart.
RestartSec=5 (Wait 5 seconds before restarting. This prevents crash loops) if the agent has a bug that makes it crash on startup, it won't restart thousands of times per minute and flood your logs.
MemoryMax=512M. If the agent uses more than 512 MB of RAM, kill it. This prevents a memory leak from eating all server resources and crashing everything else.

[Install]
WantedBy=multi-user.target

WantedBy=multi-user.target: Start this service when the server boots into its normal operating mode. This is what makes your agent survive reboots.

Pause.

Read the unit file again. Every line answers one of the five questions. There is no magic here. It's a job description for your agent, written in a format that Linux understands.

Bringing the Service to Life

The file exists, but systemd doesn't know about it yet. Three commands make it real.

What you tell Claude Code: "Register the competitor-tracker service, set it to start on boot, and start it now."

What the agent does:

sudo systemctl daemon-reload
sudo systemctl enable competitor-tracker
sudo systemctl start competitor-tracker

daemon-reload: "Hey systemd, I added a new service file. Re-read all your files."
enable: "Start this service automatically on every boot."
start: "Start it right now."

What you tell Claude Code: "Check if the competitor-tracker service is running."

What the agent does:

systemctl status competitor-tracker

What you see:

● competitor-tracker.service - Competitor Tracker Agent
     Loaded: loaded (/etc/systemd/system/competitor-tracker.service; enabled)
     Active: active (running) since Sun 2026-02-28 02:15:33 UTC; 5s ago
   Main PID: 4821 (python3)
     Memory: 47.2M (max: 512.0M)

The green dot. active (running). The agent is alive.

The Moment of Truth

Now for the test that matters.

Close your terminal. Disconnect from the server. Wait ten seconds. SSH back in.

What you tell Claude Code: "Is the competitor-tracker service still running?"

What the agent does:

systemctl status competitor-tracker

What you see:

● competitor-tracker.service - Competitor Tracker Agent
     Active: active (running) since Sun 2026-02-28 02:15:33 UTC; 2min ago

The agent is alive. You closed the terminal. You disconnected. The agent kept running. The building manager did its job.

This is the moment. A process would have died. A service survived.

What Happens When It Crashes

Ali's agent has a bug. Once every few days, it encounters a malformed API response and crashes.

Before systemd, this meant Ali woke up to a dead agent and missing data. Now, watch what happens.

What you tell Claude Code: "Show me the journal log for competitor-tracker from the last hour."

What the agent does:

journalctl -u competitor-tracker --since "1 hour ago"

What you see:

Feb 28 03:42:17 server competitor-tracker[4821]: ERROR: Malformed API response
Feb 28 03:42:17 server systemd[1]: competitor-tracker.service: Main process exited, code=exited, status=1/FAILURE
Feb 28 03:42:22 server systemd[1]: competitor-tracker.service: Scheduled restart job, restart counter is at 1.
Feb 28 03:42:22 server competitor-tracker[4897]: Starting competitor tracker agent...
Feb 28 03:42:23 server competitor-tracker[4897]: Agent running successfully

The agent crashed at 03:42:17. Systemd waited 5 seconds (RestartSec=5). At 03:42:22, it started a new instance. By 03:42:23, the agent was running again. Ali slept through the entire event.

A Note About tmux

You may have heard of tmux: a tool that keeps terminal sessions alive after you disconnect. For interactive sessions you want to keep alive: like a monitoring dashboard or a long-running data migration you're watching: tmux is the right tool.

For agents that run 24/7, systemd is the right tool. tmux keeps a session alive. systemd keeps a service alive, restarts it after crashes, starts it on boot, and enforces resource limits. Your agents need systemd.

The Five systemctl Commands

You'll direct Claude Code to use these. You don't need to memorize them, but recognizing them helps you understand the output.

Command	What it does
`systemctl start <service>`	Start the service now
`systemctl stop <service>`	Stop the service now
`systemctl restart <service>`	Stop then start
`systemctl status <service>`	Show current state, PID, memory
`systemctl enable <service>`	Start automatically on boot

And one for viewing logs:

Command	What it does
`journalctl -u <service>`	Show all logs for this service
`journalctl -u <service> -f`	Follow logs in real time (like a live feed)

Ali's agent is unkillable. It survives terminal closures. It survives reboots. It restarts after crashes. It runs under memory limits that protect the rest of the server.

He feels accomplished. Then Dev checks the server and goes pale.

"It's running as root. With password SSH. Anyone on the internet could..." He doesn't finish the sentence.

PRIMM-AI+ Practice: Creating and Verifying a systemd Service

In this practice you will direct your agent to create a systemd unit file, start the service, and confirm it survives a terminal closure.

Predict [AI-FREE]

Before you direct the agent to create the unit file, write down:

What are the five questions a systemd unit file must answer?
What will systemctl status show for a service that is running correctly?
Your confidence score from 1 to 5.

Do not ask the agent until those notes are written.

Run

Start your session:

$ claude

Then type the prompt below at the > prompt.

What you tell the agent

Create a systemd service file for an agent at
/opt/agents/my-test-agent/src/main.py. It should run as the
current user, start after the network is ready, restart on
failure with a 5-second delay, and use no more than 256 MB
of memory. Then enable it, start it, and show me the status.

Investigate

First, write your own one-sentence explanation of what Restart=on-failure does differently from Restart=always. Then ask the agent: "What happens if the service crashes on startup repeatedly with Restart=always set?"

Modify

Change one requirement: update the MemoryMax limit from 256M to 128M. Predict whether systemctl status output will change immediately after the update, then direct the agent and confirm.

Make [Mastery Gate]

Without re-reading the lesson, direct the agent to create a systemd service for a second agent of your choosing, verify it is running, then deliberately stop it and confirm it does not restart automatically when stopped intentionally. Passing means you can explain why it did not restart.

Try With AI

Prompt 1: Restart Policies

My agent's systemd service uses Restart=on-failure. Explain the
difference between Restart=always and Restart=on-failure. When
would I want each one? What happens if my agent has a bug that
makes it crash immediately on startup and I have Restart=always?

What you're practicing: Understanding restart behavior. The difference between these two values determines whether your agent recovers gracefully or enters an infinite crash loop.

Prompt 2: Adapt for a Different Stack

I have a Node.js Express server that I want to run as a systemd
service. Take the competitor-tracker unit file from this lesson
and modify it for Node.js. What lines change? What stays the
same? Explain each change.

What you're practicing: Transferring the systemd pattern to different technologies. The unit file structure is universal, only the ExecStart line changes significantly.

Prompt 3: History and Context

Before systemd, Linux used "init scripts" to manage services.
What problems did init scripts have that systemd solved? Why
does every major Linux distribution use systemd now? Were there
controversies about the switch?

What you're practicing: Understanding why systemd exists, not just how to use it. Knowing the historical context helps you appreciate what the tool does for you automatically.

Why Your Agent Keeps Dying​

The Fix: One File​

The Unit File, Line by Line​

Bringing the Service to Life​

The Moment of Truth​

What Happens When It Crashes​

A Note About tmux​

The Five systemctl Commands​

PRIMM-AI+ Practice: Creating and Verifying a systemd Service​

Predict [AI-FREE]​

Run​

What you tell the agent​

Investigate​

Modify​

Make [Mastery Gate]​

Try With AI​

Prompt 1: Restart Policies​

Prompt 2: Adapt for a Different Stack​

Prompt 3: History and Context​

Flashcards Study Aid​

Why Your Agent Keeps Dying

The Fix: One File

The Unit File, Line by Line

Bringing the Service to Life

The Moment of Truth

What Happens When It Crashes

A Note About tmux

The Five systemctl Commands

PRIMM-AI+ Practice: Creating and Verifying a systemd Service

Predict [AI-FREE]

Run

What you tell the agent

Investigate

Modify

Make [Mastery Gate]

Try With AI

Prompt 1: Restart Policies

Prompt 2: Adapt for a Different Stack

Prompt 3: History and Context

Flashcards Study Aid