Skip to main content

Process Documentation — SOPs and Runbooks

The SOP existed. It was created two years ago during an ERP implementation, when the consultant who set up the system wrote down how to run the month-end financial close. It was thorough: 13 steps, named roles, system references. Three people were given copies. It was filed in a shared drive folder called "Finance Procedures."

Six months later, the ERP was upgraded. Steps 4 and 7 changed. The approval threshold was raised from £5,000 to £10,000. A new reconciliation requirement was added. Nobody updated the SOP. The consultant was gone. The Finance Manager who had been at the company remembered the old process and knew to follow the new one. New team members read the SOP and followed it, making errors at steps 4 and 7. The error rate was not obvious — it was just an occasional discrepancy that someone fixed quietly each month.

Two years later, an auditor asked for the process documentation for month-end close. The organisation produced the SOP. The auditor compared it to the actual process. The differences were significant enough to constitute a control failure.

This is not a story about negligence. It is a story about the three-stage documentation failure — and why process documentation is not a one-time task.

Plugin Setup Reminder

This exercise requires the Operations plugin (official) and the Operations Intelligence plugin (custom). If you have not installed them, follow the instructions in the Chapter 38 prerequisites before continuing.

The Three-Stage Process Documentation Failure

Process documents decay predictably. Understanding the decay cycle is the first step in designing documentation that resists it.

StageWhat HappensWhy It HappensConsequence
Stage 1: CreationDocument reflects current realityProcess documented at audit, onboarding, or implementationDocument is accurate and useful
Stage 2: DivergenceProcess changes; document does notSystem upgrades, team changes, policy updates — none trigger a document updateDocument becomes partially wrong
Stage 3: HarmDocument describes a process nobody followsAccumulated divergence makes the document unreliableNew employees follow the document and make errors; auditors find discrepancies

The transition from Stage 1 to Stage 2 is not a failure of intent — it is a failure of design. Nobody updates the SOP when a system changes because nobody owns that update as a named, time-bound task. Document Control is the mechanism that assigns that ownership. Without a review date and a named reviewer, the SOP has no maintenance process, only a maintenance aspiration.

SOP vs Runbook — The Distinction That Matters

The most common process documentation mistake is using the wrong format for the wrong purpose. SOPs and runbooks address the same operational domain — recurring processes — but they serve different audiences and perform different functions.

DimensionSOP (Standard Operating Procedure)Runbook
Primary purposeGovernance — who is responsible, what the controls are, what approvals are requiredExecution — exactly how to perform the task, step by step
Primary audienceManagement, auditors, compliance reviewers, new employeesThe person doing the work today
Format emphasisRACI, controls, error handling, document controlExact commands/actions, expected results, failure modes
Level of detailPolicy-level — describes what must happenProcedure-level — describes exactly how to make it happen
When referencedProcess design, audit, onboarding, scope disputesActive task execution
Created with/process-doc/runbook

A practical test: If someone is actively executing a task and needs to know the next step, they want a runbook. If an auditor asks what controls govern the payment approval process, they want an SOP.

Many processes need both. The monthly supplier payment run needs an SOP (who approves what, what controls exist, what the audit trail requires) and a runbook (exactly which SAP transactions to run, in which order, with what expected results, and what to do if the bank portal is unavailable).

Creating SOPs with /process-doc

The /process-doc command produces process documentation from a description. You describe the process — or paste existing documentation — and the command structures it into a complete SOP.

What makes an SOP trustworthy — the quality standards:

Quality StandardRequirement
Named rolesEvery step names a specific job title, not "the team"
Single action per stepEach numbered step does exactly one thing
Embedded controlsControls at the specific risk points, not listed generically at the start
Error handlingExplicit guidance for each step that can fail
Document ControlVersion, date, author, and review date — the maintenance contract

Worked example. You want to document the monthly supplier payment run. You type:

/process-doc
Document our monthly supplier payment run as a complete SOP.

Context:
- Finance team: 3 people (AP Clerk, Finance Manager, CFO)
- ERP: SAP S/4HANA
- Frequency: last working day of each month
- Inputs: approved purchase orders, vendor invoices, bank details
- Approvals: invoices >£10,000 require CFO approval before payment
- Key risk: paying wrong account (invoice redirection fraud)
- Control requirement: four-eyes review — AP Clerk prepares, Finance Manager approves

Produce a complete SOP with: Purpose, Scope, RACI matrix, numbered process
steps with named roles, embedded controls at risk points, error handling
for each step that can fail, and Document Control section.

What to expect: A structured SOP with clear ownership, numbered steps, controls at exactly the right points, and a Document Control section that makes maintenance a named responsibility.

SOP ElementWhat to Verify
RACI matrixEvery role (R, A, C, I) is populated for every step — no empty cells
Step structureEach step has one action, one owner, and one output
ControlsControls appear inside the steps where the risk occurs, not as a separate generic list
Error handlingEvery step that can fail has explicit guidance — not "escalate to manager" but "escalate to Finance Manager within 1 business day; hold payment"
Document ControlVersion number, effective date, author, and next review date are all present

What to evaluate:

  • Does every step name a job title, not a person's name or "the team"?
  • Can you tell, for every step, exactly who does it, what system they use, and what output they produce?
  • Are the controls embedded at the specific risk points — not just listed in an introduction paragraph?
  • Would someone following this SOP know what to do if a step fails?

Creating Runbooks with /runbook

The /runbook command produces execution-focused documentation for recurring operational tasks. Where an SOP answers "what is the process?", a runbook answers "how do I run it right now?"

Worked example. You want a runbook for the IT failover procedure that the on-call engineer must execute when the primary server goes down. You type:

/runbook
Create an operational runbook for our server failover procedure.

Scenario: Primary web server (prod-web-01) becomes unresponsive.
On-call engineer receives PagerDuty alert and needs to execute failover
to standby server (prod-web-02).

Environment:
- Primary: prod-web-01 (AWS EC2 t3.large, eu-west-1a)
- Standby: prod-web-02 (AWS EC2 t3.large, eu-west-1b, pre-configured)
- Load balancer: Application Load Balancer (ALB) controlling traffic
- DNS: Route 53 with health checks
- RDS: Multi-AZ, automated failover (does not require manual action)

Produce: Prerequisites checklist, exact step-by-step procedure with
expected results and failure actions for each step, verification steps,
troubleshooting table, rollback procedure, and escalation paths.

What to expect: Exact commands, expected results, explicit failure handling, and escalation paths — documentation that can be executed at 2am under pressure by an engineer who has not done this specific failover before.

Runbook ElementWhat to Verify
PrerequisitesAll access, tools, and permissions required are listed as checkboxes before step 1
Step precisionEach step gives the exact action — not "update the load balancer" but "remove prod-web-01 from the target group" with the exact console path or CLI command
Expected resultsEvery step has a stated expected outcome so the engineer knows if it worked
Failure actionsEvery step with a failure mode has explicit guidance on what to do next
VerificationThe runbook ends with checks that confirm the procedure completed successfully
EscalationNamed escalation contacts for each failure scenario

The SOP Quality Test

After producing a process document with either command, apply this test before using it:

Give the SOP or runbook to someone who understands the systems involved but has not performed this specific process. Ask them to follow it. Note every point where they stop and ask a question.

Each question is a documentation gap. The most common gaps are:

  • Steps that assume knowledge not in the document ("update the approval record" — where? in what system? how?)
  • Controls that describe intent but not mechanism ("verify bank details" — against what? where is the Vendor Master? what counts as a match?)
  • Error handling that escalates without specifying to whom, how, or within what timeframe

The test catches gaps that the author cannot see because they already know the answers.

Process Gap Analysis

When an existing process has known failure points, use /process-doc with a gap-analysis approach rather than rewriting from scratch. This is more efficient and produces targeted recommendations:

/process-doc
Analyse our order-to-cash process for gaps. I want to understand where
the process is breaking down.

Current process summary:
[Describe the process as it actually works today, including workarounds
and known issues]

Known failure points:
- Customer dispute resolution averages 45 days (target: 14 days)
- Credit checks skipped for repeat customers with outstanding balances
- Revenue recognition errors in month-end close — approximately 2-3 per quarter

Produce: A gap analysis mapping each known issue to the specific process step
where it occurs, the root cause category (missing control / missing step /
unclear ownership / system gap), and a prioritised remediation plan.

Exercise: Document Two Processes (Exercise 2)

Type: Process documentation Time: 45 minutes Plugin commands: Official /process-doc + /runbook Goal: Produce an SOP with complete RACI and Document Control, a runbook for a recurring IT or operational task, and apply the quality test to both

Keep This File

The SOP library you build here feeds directly into Lesson 6 (Change Management). When you map the impact of a change, the first question is: which documented processes does this change affect? Keep your process documentation in this Cowork session — Lesson 6 will reference it when building change impact assessments.

Step 1 — Choose Your Two Processes

Select one process for SOP documentation and one for a runbook. Recommendations:

For the SOP:

  • Monthly supplier payment run (finance operations)
  • New employee onboarding process (HR and IT operations)
  • Client invoicing and accounts receivable process
  • Any process where an auditor might ask "who is responsible and what controls govern this?"

For the runbook:

  • Monthly data backup verification
  • Server or application restart procedure for a critical system
  • End-of-month financial reporting generation
  • Any procedure that must be executed reliably by different people, possibly under time pressure

Step 2 — Run /process-doc for the SOP

/process-doc
Document [your chosen process] as a complete, audit-ready SOP.

Context:
- Organisation: 200-person UK professional services firm
- Team involved: [describe the team and roles]
- Frequency: [daily / weekly / monthly / as triggered]
- Systems used: [name the systems — ERP, CRM, etc.]
- Key approvals required: [describe the approval structure]
- Key risks: [describe the 2-3 highest-risk points in the process]
- Regulatory requirements: [any compliance requirements this process must satisfy]

Produce an SOP with:
1. Purpose and Scope
2. RACI matrix covering every major step
3. Numbered steps — one action per step, named role, system, output
4. Controls embedded at each of the risk points I identified
5. Error handling for every step that can fail
6. Document Control section (version, effective date, author, review date)

What to evaluate:

  • Does every step in the RACI have a specific job title in the Responsible column — not "Finance", not "the team", but "AP Clerk" or "Finance Manager"?
  • Does every step do exactly one thing? If a step contains "and then" or "also", it needs to be split.
  • Are controls embedded at the specific risk points you identified — not just mentioned in the introduction?
  • Is error handling present for every step that can go wrong — and does it specify who to escalate to, how, and within what timeframe?
  • Does the Document Control section include a next review date? If not, the SOP has no maintenance contract.

Step 3 — Run /runbook for the Operational Procedure

/runbook
Create a runbook for [your chosen operational procedure].

Context:
- When this is used: [trigger — e.g., on-call alert, monthly schedule, ad-hoc request]
- Who runs it: [role — e.g., on-call engineer, finance admin]
- Systems involved: [name systems, versions, access required]
- Expected duration: [how long the procedure normally takes]
- What can go wrong: [describe the 2-3 most likely failure modes]

Produce a runbook with:
1. Prerequisites checklist (access, tools, data needed before starting)
2. Exact step-by-step procedure — each step with expected result and failure action
3. Verification checklist at the end
4. Troubleshooting table for the most common failure modes
5. Rollback procedure if something goes wrong
6. Escalation contacts for each failure scenario

What to evaluate:

  • Could someone unfamiliar with the procedure follow it without asking a single clarifying question? (This is the definitive quality test)
  • Does every step have an expected result? Without it, the operator does not know if the step succeeded.
  • Does every step with a failure mode have explicit guidance — not "escalate if something goes wrong" but "if [specific symptom], then [specific action]"?
  • Is the rollback procedure realistic? Does it describe exactly how to undo the procedure, or does it say "revert to previous state" without specifying how?
  • Are escalation contacts named (role plus contact method), not just described by function?

Step 4 — Apply the Engineer Review Test

Find a colleague (or imagine one) who knows your systems but has not performed this specific process. Walk through the SOP and runbook with them:

  1. Read each step aloud and ask: "Could you execute this step right now, with only what is written?"
  2. Mark every step where they would need to ask a clarifying question
  3. Return to /process-doc or /runbook with those specific gaps: "The following steps are unclear — rewrite them with the additional detail needed to eliminate ambiguity"

Deliverable: A completed SOP (with RACI and Document Control), a completed runbook (with prerequisites, exact steps, and escalation paths), and a list of gaps found in the engineer review test with the updated versions addressing each gap.

Try With AI

Try With AI

Reproduce: Apply what you just learned to a simple case.

Create an SOP for our weekly expense claim approval process.

Context:
- Employees submit expense claims in [Concur / our expense system]
- Finance admin reviews and codes to correct cost centres
- Line managers approve claims up to £500
- Finance Director approves claims above £500
- Payment runs weekly on Fridays
- Key risk: duplicate claims and claims without valid receipts

Produce: Purpose, Scope, RACI matrix, numbered steps with one action per step,
controls at each risk point, error handling, and Document Control.

What you are learning: Expense approval is a good learning process because most people are familiar with it from the employee side, making it easy to check whether the SOP captures reality. Pay attention to how the RACI distributes responsibility — and whether the controls are embedded at the right steps or listed generically at the start.

Adapt: Modify the scenario to match your organisation.

I need to document a process that currently exists only in the head of our
[role — e.g., Office Manager / Finance Lead / IT Administrator].

Process: [describe what the process does and when it runs]
Who is involved: [roles and what they do — approximate is fine]
Systems used: [which tools, platforms, or systems are involved]
Known failure points: [what sometimes goes wrong]
Regulatory or audit requirement: [if any — e.g., SOX, ISO 27001, GDPR]

Produce an SOP and a companion runbook. For the SOP: full RACI, controls
at risk points, error handling, Document Control. For the runbook: exact
steps the person executing this process would need, with failure actions.

What you are learning: The gap between a process that "someone knows" and a process that "the organisation knows" is the difference between institutional knowledge and institutional resilience. Documenting it correctly the first time means the process survives the person.

Apply: Extend to a new situation the lesson didn't cover directly.

Our [process name] SOP was last updated 18 months ago. Since then:
- [System/tool] has been upgraded and [specific steps] have changed
- The approval threshold has been raised from [old amount] to [new amount]
- A new compliance requirement has been added: [describe requirement]
- The [role] who owned the process has left; [new role] has taken over

Produce:
1. An updated SOP reflecting the current reality
2. A change notification email to all users of the SOP explaining what changed
and why, with the effective date
3. A Document Control entry showing what changed between version 1 (old) and
version 2 (new), with the reason for each change

Use the previous SOP as the baseline: [describe or paste the old version]

What you are learning: Updating an existing SOP is the most common documentation task — and the one most often skipped because it feels less urgent than creating a new one. This prompt tests whether you can produce a version-controlled update, a change notification, and a Document Control record — the complete maintenance workflow, not just the document itself.

Flashcards Study Aid


Continue to Lesson 6: Change Management — Impact and Rollback →