Skip to main content
Updated Feb 23, 2026

Capstone: Full Production Deployment

Throughout this chapter, you've built cloud deployment knowledge step by step: cluster provisioning with doctl and hetzner-k3s, load balancer configuration, stack deployment with Dapr and Traefik, secrets management, and production verification. Now it's time to apply everything to a complete production scenario.

In previous lessons, you deployed components individually. You tested pieces in isolation. But production deployment isn't about individual commands working in isolation. Production deployment is about the complete sequence: from empty cloud account to working HTTPS endpoint serving real traffic.

This capstone brings it all together. You'll write a deployment specification FIRST, provision real infrastructure, deploy your complete stack, verify with HTTPS, and execute a clean teardown. The specification-first approach is what separates professional cloud engineering from Vibe Coding.

The result: your multi-cloud-deployer skill becomes production-ready. It's not a theoretical exercise. It's a verified, tested Digital FTE component you can use for real deployments.


Phase 1: Write the Deployment Specification

Before touching any CLI tools, write your specification. This forces you to think about environment choices, resource requirements, success criteria, and cost constraints before any infrastructure exists.

Create this file in your project directory:

File: deployment-spec.md

# Deployment Specification: Task API Production

## Target Environment
- **Provider**: [DOKS | Hetzner | AKS]
- **Region**: [region]
- **Cluster Name**: task-api-prod
- **Domain**: tasks.yourdomain.com

## Resource Requirements
- **Nodes**: 3
- **Node Size**: 2 vCPU, 4GB RAM
- **Replicas**: 2
- **Memory Limit**: 512Mi
- **CPU Limit**: 500m

## Stack Components
- [x] Task API (FastAPI + SQLModel)
- [x] Dapr sidecar (state management)
- [x] Traefik Ingress
- [x] cert-manager + Let's Encrypt

## Success Criteria
- [ ] Cluster provisioned in < 10 minutes
- [ ] Full deployment in < 30 minutes
- [ ] HTTPS endpoint accessible
- [ ] Health check returns 200
- [ ] Teardown verified ($0 cost)

## Non-Goals
- [ ] Database persistence (use in-memory for capstone)
- [ ] Monitoring stack (covered in Ch55)
- [ ] CI/CD integration (covered in Ch54)

Why specification first?

Without a spec, you'd start running doctl kubernetes cluster create and figure things out as you go. That's Vibe Coding. You might choose the wrong node size. You might forget TLS configuration until you're debugging certificate errors at 2 AM. You might deploy to the most expensive region.

The spec makes constraints explicit BEFORE you provision. It's your contract with yourself and your budget.


Phase 2: Choose Your Deployment Path

This capstone supports three deployment paths. Each path produces identical outcomes with different providers and cost profiles.

Path A: DigitalOcean DOKS (Production Path)

Best for: Teams, real traffic, managed SLA

Monthly cost: ~$48+ (3-node cluster minimum)

Provisioning command:

doctl kubernetes cluster create task-api-prod \
--region nyc1 \
--node-pool "name=default;size=s-2vcpu-4gb;count=3" \
--wait

Output:

Notice: Cluster is provisioning, waiting for cluster to be running
..............
Notice: Cluster created, fetching credentials
Notice: Adding cluster credentials to kubeconfig file found in "/home/user/.kube/config"

Connect and verify:

doctl kubernetes cluster kubeconfig save task-api-prod
kubectl get nodes

Output:

NAME                      STATUS   ROLES    AGE   VERSION
task-api-prod-default-0 Ready <none> 2m v1.28.2
task-api-prod-default-1 Ready <none> 2m v1.28.2
task-api-prod-default-2 Ready <none> 2m v1.28.2

Path B: Hetzner + K3s (Budget Path)

Best for: Personal practice, budget-conscious learners

Monthly cost: ~$15 (3x CX22 servers)

Provisioning command:

hetzner-k3s create \
--cluster-name task-api-prod \
--location fsn1 \
--masters-pool-size 1 \
--workers-pool-size 2 \
--instance-type cx22

Output:

Creating cluster task-api-prod in fsn1...
Creating master node...
Creating worker nodes...
Installing k3s on master...
Joining workers to cluster...
Cluster ready. Kubeconfig saved to ./kubeconfig

Connect and verify:

export KUBECONFIG=./kubeconfig
kubectl get nodes

Output:

NAME                       STATUS   ROLES                  AGE   VERSION
task-api-prod-master-1 Ready control-plane,master 3m v1.28.2+k3s1
task-api-prod-worker-1 Ready <none> 2m v1.28.2+k3s1
task-api-prod-worker-2 Ready <none> 2m v1.28.2+k3s1

Path C: Azure AKS (Enterprise Path)

Best for: Enterprise environments, Azure ecosystem integration

Monthly cost: ~$75+ (3-node cluster)

Provisioning command:

az aks create \
--resource-group task-api-rg \
--name task-api-prod \
--location eastus \
--node-count 3 \
--node-vm-size Standard_B2s \
--generate-ssh-keys

Connect and verify:

az aks get-credentials --resource-group task-api-rg --name task-api-prod
kubectl get nodes

Phase 3: Deploy the Complete Stack

With your cluster provisioned, deploy the complete production stack. This sequence is universal across all providers.

Step 1: Install Dapr

dapr init -k --wait

Output:

Making the jump to hyperspace...
Deploying the Dapr control plane to your cluster...
Success! Dapr has been installed to namespace dapr-system.

Verify:

dapr status -k

Output:

NAME                   NAMESPACE    HEALTHY  STATUS   REPLICAS  VERSION
dapr-sentry dapr-system True Running 1 1.12.0
dapr-operator dapr-system True Running 1 1.12.0
dapr-placement-server dapr-system True Running 1 1.12.0
dapr-sidecar-injector dapr-system True Running 1 1.12.0

Step 2: Install Traefik Ingress

helm repo add traefik https://traefik.github.io/charts
helm repo update
helm install traefik traefik/traefik -n traefik --create-namespace

Output:

NAME: traefik
NAMESPACE: traefik
STATUS: deployed

Get the Load Balancer IP:

kubectl get svc traefik -n traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Output:

143.244.156.78

Record this IP for DNS configuration.

Step 3: Install cert-manager

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true

Output:

NAME: cert-manager
NAMESPACE: cert-manager
STATUS: deployed

Step 4: Configure Let's Encrypt ClusterIssuer

File: cluster-issuer.yaml

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: your-email@example.com
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: traefik

Apply:

kubectl apply -f cluster-issuer.yaml

Output:

clusterissuer.cert-manager.io/letsencrypt-prod created

Step 5: Deploy Task API

File: task-api-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: task-api
annotations:
dapr.io/enabled: "true"
dapr.io/app-id: "task-api"
dapr.io/app-port: "8000"
spec:
replicas: 2
selector:
matchLabels:
app: task-api
template:
metadata:
labels:
app: task-api
annotations:
dapr.io/enabled: "true"
dapr.io/app-id: "task-api"
dapr.io/app-port: "8000"
spec:
containers:
- name: task-api
image: yourusername/task-api:v1
ports:
- containerPort: 8000
resources:
limits:
memory: "512Mi"
cpu: "500m"
requests:
memory: "256Mi"
cpu: "250m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: task-api
spec:
selector:
app: task-api
ports:
- port: 80
targetPort: 8000

Apply:

kubectl apply -f task-api-deployment.yaml

Output:

deployment.apps/task-api created
service/task-api created

Step 6: Configure Ingress with TLS

File: task-api-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: task-api-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: traefik
tls:
- hosts:
- tasks.yourdomain.com
secretName: task-api-tls
rules:
- host: tasks.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: task-api
port:
number: 80

Apply:

kubectl apply -f task-api-ingress.yaml

Output:

ingress.networking.k8s.io/task-api-ingress created

Phase 4: The Convergence Loop

Real deployments rarely work perfectly on the first try. This phase demonstrates the iterative refinement process that produces working production systems.

Iteration 1: Initial Deployment Attempt

After applying all resources, check status:

kubectl get pods
kubectl get ingress
kubectl get certificate

Common issues and their indicators:

IssueSymptomDetection Command
Image pull failureImagePullBackOff statuskubectl describe pod task-api-xxx
DNS not propagatedCertificate stuck at Falsekubectl get certificate
Health check failingCrashLoopBackOffkubectl logs task-api-xxx
Load Balancer pendingEXTERNAL-IP: <pending>kubectl get svc traefik -n traefik

Iteration 2: Troubleshoot and Fix

For image pull issues:

kubectl describe pod task-api-xxx | grep -A 5 "Events"

If you see "unauthorized," create an image pull secret:

kubectl create secret docker-registry regcred \
--docker-server=docker.io \
--docker-username=yourusername \
--docker-password=yourtoken

For DNS issues:

Point your domain's A record to the Load Balancer IP. Verify propagation:

dig tasks.yourdomain.com +short

Output:

143.244.156.78

For certificate issues:

Check cert-manager logs:

kubectl logs -n cert-manager -l app=cert-manager

Iteration 3: Verify Success

When all issues are resolved:

# All pods running
kubectl get pods

Output:

NAME                        READY   STATUS    RESTARTS   AGE
task-api-7b9f5d6c4d-abc12 2/2 Running 0 5m
task-api-7b9f5d6c4d-def34 2/2 Running 0 5m
# Certificate ready
kubectl get certificate

Output:

NAME           READY   SECRET         AGE
task-api-tls True task-api-tls 3m
# HTTPS endpoint working
curl https://tasks.yourdomain.com/health

Output:

{"status": "healthy", "service": "task-api"}

Phase 5: Production Verification Checklist

Go back to your specification and verify each success criterion:

Success CriterionStatusEvidence
Cluster provisioned in < 10 minutesPASSkubectl get nodes shows Ready in 8m
Full deployment in < 30 minutesPASSTotal deployment time: 22 minutes
HTTPS endpoint accessiblePASScurl https://tasks.yourdomain.com/health returns 200
Health check returns 200PASSResponse: {"status": "healthy"}
Teardown verified ($0 cost)PENDINGExecute Phase 6

Phase 6: Complete Teardown with Cost Verification

Production capstones aren't complete without teardown. You must prove zero ongoing costs.

For DigitalOcean DOKS

# Delete the cluster
doctl kubernetes cluster delete task-api-prod --force

# Verify no lingering resources
doctl kubernetes cluster list
doctl compute load-balancer list

Output:

ID    Name    Region    Status
(empty - no clusters)

For Hetzner + K3s

# Delete the cluster
hetzner-k3s delete --cluster-name task-api-prod

# Verify no servers remain
hcloud server list

Output:

ID   NAME   STATUS   IPV4   IPV6   DATACENTER
(empty - no servers)

For Azure AKS

# Delete the resource group (removes everything)
az group delete --name task-api-rg --yes --no-wait

# Verify deletion
az aks list --output table

Cost Verification

Check your provider dashboard:

  1. Log in to your cloud provider's billing console
  2. Verify no active resources
  3. Confirm next billing cycle shows $0 for this project

Update your specification checklist:

Success CriterionStatus
Teardown verified ($0 cost)PASS

All criteria met. Specification satisfied.


Final Skill Evaluation Rubric

Your multi-cloud-deployer skill has been tested and refined throughout this chapter. Evaluate its production readiness:

CriterionWeightPass CriteriaYour Score
Spec completeness20%All required sections present/20
Deployment success40%HTTPS endpoint accessible/40
Skill quality30%Generates commands for 3+ providers/30
Teardown verification10%$0 ongoing cost confirmed/10

Total: /100

Scoring guide:

  • 90-100: Your skill is production-ready and sellable as a Digital FTE component
  • 70-89: Minor gaps to address, but functional for personal use
  • 50-69: Significant improvements needed before production use
  • Below 50: Return to earlier lessons and rebuild skill foundations

Reflect on Your Skill

Your multi-cloud-deployer skill has evolved through this chapter. It started as a skeleton created from official documentation. Now it's been tested against real cloud infrastructure.

Final Test: Ask your skill:

Generate a complete deployment plan for Task API on DigitalOcean DOKS
with Dapr, Ingress, and TLS.

Evaluate the output:

  • Does it include all provisioning commands (doctl kubernetes cluster create)?
  • Does it include the deployment sequence (Dapr -> Traefik -> cert-manager -> app)?
  • Does it include verification steps (kubectl get pods, curl https://...)?
  • Does it include teardown commands (doctl kubernetes cluster delete)?

Your skill is production-ready when:

  • It generates accurate CLI commands for at least 3 providers (DOKS, Hetzner, AKS)
  • It includes the universal "provision -> connect -> deploy" pattern
  • It can answer questions about cost and tradeoffs between providers
  • It produces deployment sequences that work on first try (or identifies likely issues)

This skill is now part of your Digital FTE portfolio.

You don't just "know cloud deployment." You OWN a verified, production-tested skill that can deploy AI agent services to any major cloud provider. This is the outcome of the Skill-First Learning Pattern: not knowledge, but assets.


Try With AI

You've completed the capstone by following the specification-first approach. Now extend your deployment skills through AI collaboration.

Prompt 1: Specification Review

Review my deployment specification for gaps:

[Paste your deployment-spec.md content]

Questions to consider:
- What failure scenarios am I not accounting for?
- Should I add rollback procedures?
- What monitoring should I configure for production?

What you're learning: AI can review specifications and identify blind spots. It might suggest failover strategies, backup procedures, or monitoring configurations you hadn't considered. You evaluate each suggestion against your actual production requirements.

Prompt 2: Multi-Cloud Comparison

I've deployed my Task API to DigitalOcean DOKS. Help me understand:

1. What would change if I deployed to GKE instead?
2. What would change if I deployed to EKS?
3. What's truly universal vs provider-specific?

Focus on practical differences that affect deployment time and cost.

What you're learning: The "provision -> connect -> deploy" pattern is universal, but provisioning commands differ. AI helps you understand which skills transfer directly and which require adaptation.

Prompt 3: Production Hardening

My Task API is deployed with HTTPS. What production hardening steps
am I missing? Consider:

- Network policies
- Pod security standards
- Resource quotas
- Backup procedures

Prioritize by risk: what gaps would cause the biggest production issues?

What you're learning: A working deployment isn't a hardened deployment. AI helps you identify the gap between "it runs" and "it's production-ready." You evaluate each recommendation against your application's actual risk profile.

Safety note: When sharing deployment specifications with AI, redact actual domain names, IP addresses, and cloud credentials. Replace real values with placeholders like yourdomain.com or YOUR_API_TOKEN.