Updated Feb 23, 2026

Capstone: Full Production Deployment

Throughout this chapter, you've built cloud deployment knowledge step by step: cluster provisioning with doctl and hetzner-k3s, load balancer configuration, stack deployment with Dapr and Traefik, secrets management, and production verification. Now it's time to apply everything to a complete production scenario.

In previous lessons, you deployed components individually. You tested pieces in isolation. But production deployment isn't about individual commands working in isolation. Production deployment is about the complete sequence: from empty cloud account to working HTTPS endpoint serving real traffic.

This capstone brings it all together. You'll write a deployment specification FIRST, provision real infrastructure, deploy your complete stack, verify with HTTPS, and execute a clean teardown. The specification-first approach is what separates professional cloud engineering from Vibe Coding.

The result: your multi-cloud-deployer skill becomes production-ready. It's not a theoretical exercise. It's a verified, tested Digital FTE component you can use for real deployments.

Phase 1: Write the Deployment Specification

Before touching any CLI tools, write your specification. This forces you to think about environment choices, resource requirements, success criteria, and cost constraints before any infrastructure exists.

Create this file in your project directory:

File: deployment-spec.md

# Deployment Specification: Task API Production

## Target Environment
- **Provider**: [DOKS | Hetzner | AKS]
- **Region**: [region]
- **Cluster Name**: task-api-prod
- **Domain**: tasks.yourdomain.com

## Resource Requirements
- **Nodes**: 3
- **Node Size**: 2 vCPU, 4GB RAM
- **Replicas**: 2
- **Memory Limit**: 512Mi
- **CPU Limit**: 500m

## Stack Components
- [x] Task API (FastAPI + SQLModel)
- [x] Dapr sidecar (state management)
- [x] Traefik Ingress
- [x] cert-manager + Let's Encrypt

## Success Criteria
- [ ] Cluster provisioned in < 10 minutes
- [ ] Full deployment in < 30 minutes
- [ ] HTTPS endpoint accessible
- [ ] Health check returns 200
- [ ] Teardown verified ($0 cost)

## Non-Goals
- [ ] Database persistence (use in-memory for capstone)
- [ ] Monitoring stack (covered in Ch55)
- [ ] CI/CD integration (covered in Ch54)

Why specification first?

Without a spec, you'd start running doctl kubernetes cluster create and figure things out as you go. That's Vibe Coding. You might choose the wrong node size. You might forget TLS configuration until you're debugging certificate errors at 2 AM. You might deploy to the most expensive region.

The spec makes constraints explicit BEFORE you provision. It's your contract with yourself and your budget.

Phase 2: Choose Your Deployment Path

This capstone supports three deployment paths. Each path produces identical outcomes with different providers and cost profiles.

Path A: DigitalOcean DOKS (Production Path)

Best for: Teams, real traffic, managed SLA

Monthly cost: ~$48+ (3-node cluster minimum)

Provisioning command:

doctl kubernetes cluster create task-api-prod \
  --region nyc1 \
  --node-pool "name=default;size=s-2vcpu-4gb;count=3" \
  --wait

Output:

Notice: Cluster is provisioning, waiting for cluster to be running
..............
Notice: Cluster created, fetching credentials
Notice: Adding cluster credentials to kubeconfig file found in "/home/user/.kube/config"

Connect and verify:

doctl kubernetes cluster kubeconfig save task-api-prod
kubectl get nodes

Output:

NAME                      STATUS   ROLES    AGE   VERSION
task-api-prod-default-0   Ready    <none>   2m    v1.28.2
task-api-prod-default-1   Ready    <none>   2m    v1.28.2
task-api-prod-default-2   Ready    <none>   2m    v1.28.2

Path B: Hetzner + K3s (Budget Path)

Best for: Personal practice, budget-conscious learners

Monthly cost: ~$15 (3x CX22 servers)

Provisioning command:

hetzner-k3s create \
  --cluster-name task-api-prod \
  --location fsn1 \
  --masters-pool-size 1 \
  --workers-pool-size 2 \
  --instance-type cx22

Output:

Creating cluster task-api-prod in fsn1...
Creating master node...
Creating worker nodes...
Installing k3s on master...
Joining workers to cluster...
Cluster ready. Kubeconfig saved to ./kubeconfig

Connect and verify:

export KUBECONFIG=./kubeconfig
kubectl get nodes

Output:

NAME                       STATUS   ROLES                  AGE   VERSION
task-api-prod-master-1     Ready    control-plane,master   3m    v1.28.2+k3s1
task-api-prod-worker-1     Ready    <none>                 2m    v1.28.2+k3s1
task-api-prod-worker-2     Ready    <none>                 2m    v1.28.2+k3s1

Path C: Azure AKS (Enterprise Path)

Best for: Enterprise environments, Azure ecosystem integration

Monthly cost: ~$75+ (3-node cluster)

Provisioning command:

az aks create \
  --resource-group task-api-rg \
  --name task-api-prod \
  --location eastus \
  --node-count 3 \
  --node-vm-size Standard_B2s \
  --generate-ssh-keys

Connect and verify:

az aks get-credentials --resource-group task-api-rg --name task-api-prod
kubectl get nodes

Phase 3: Deploy the Complete Stack

With your cluster provisioned, deploy the complete production stack. This sequence is universal across all providers.

Step 1: Install Dapr

dapr init -k --wait

Output:

Making the jump to hyperspace...
Deploying the Dapr control plane to your cluster...
Success! Dapr has been installed to namespace dapr-system.

Verify:

dapr status -k

Output:

NAME                   NAMESPACE    HEALTHY  STATUS   REPLICAS  VERSION
dapr-sentry            dapr-system  True     Running  1         1.12.0
dapr-operator          dapr-system  True     Running  1         1.12.0
dapr-placement-server  dapr-system  True     Running  1         1.12.0
dapr-sidecar-injector  dapr-system  True     Running  1         1.12.0

Step 2: Install Traefik Ingress

helm repo add traefik https://traefik.github.io/charts
helm repo update
helm install traefik traefik/traefik -n traefik --create-namespace

Output:

NAME: traefik
NAMESPACE: traefik
STATUS: deployed

Get the Load Balancer IP:

kubectl get svc traefik -n traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Output:

143.244.156.78

Record this IP for DNS configuration.

Step 3: Install cert-manager

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

Output:

NAME: cert-manager
NAMESPACE: cert-manager
STATUS: deployed

Step 4: Configure Let's Encrypt ClusterIssuer

File: cluster-issuer.yaml

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: your-email@example.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            class: traefik

Apply:

kubectl apply -f cluster-issuer.yaml

Output:

clusterissuer.cert-manager.io/letsencrypt-prod created

Step 5: Deploy Task API

File: task-api-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: task-api
  annotations:
    dapr.io/enabled: "true"
    dapr.io/app-id: "task-api"
    dapr.io/app-port: "8000"
spec:
  replicas: 2
  selector:
    matchLabels:
      app: task-api
  template:
    metadata:
      labels:
        app: task-api
      annotations:
        dapr.io/enabled: "true"
        dapr.io/app-id: "task-api"
        dapr.io/app-port: "8000"
    spec:
      containers:
        - name: task-api
          image: yourusername/task-api:v1
          ports:
            - containerPort: 8000
          resources:
            limits:
              memory: "512Mi"
              cpu: "500m"
            requests:
              memory: "256Mi"
              cpu: "250m"
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: task-api
spec:
  selector:
    app: task-api
  ports:
    - port: 80
      targetPort: 8000

Apply:

kubectl apply -f task-api-deployment.yaml

Output:

deployment.apps/task-api created
service/task-api created

Step 6: Configure Ingress with TLS

File: task-api-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: task-api-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: traefik
  tls:
    - hosts:
        - tasks.yourdomain.com
      secretName: task-api-tls
  rules:
    - host: tasks.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: task-api
                port:
                  number: 80

Apply:

kubectl apply -f task-api-ingress.yaml

Output:

ingress.networking.k8s.io/task-api-ingress created

Phase 4: The Convergence Loop

Real deployments rarely work perfectly on the first try. This phase demonstrates the iterative refinement process that produces working production systems.

Iteration 1: Initial Deployment Attempt

After applying all resources, check status:

kubectl get pods
kubectl get ingress
kubectl get certificate

Common issues and their indicators:

Issue	Symptom	Detection Command
Image pull failure	`ImagePullBackOff` status	`kubectl describe pod task-api-xxx`
DNS not propagated	Certificate stuck at `False`	`kubectl get certificate`
Health check failing	`CrashLoopBackOff`	`kubectl logs task-api-xxx`
Load Balancer pending	`EXTERNAL-IP: <pending>`	`kubectl get svc traefik -n traefik`

Iteration 2: Troubleshoot and Fix

For image pull issues:

kubectl describe pod task-api-xxx | grep -A 5 "Events"

If you see "unauthorized," create an image pull secret:

kubectl create secret docker-registry regcred \
  --docker-server=docker.io \
  --docker-username=yourusername \
  --docker-password=yourtoken

For DNS issues:

Point your domain's A record to the Load Balancer IP. Verify propagation:

dig tasks.yourdomain.com +short

Output:

143.244.156.78

For certificate issues:

Check cert-manager logs:

kubectl logs -n cert-manager -l app=cert-manager

Iteration 3: Verify Success

When all issues are resolved:

# All pods running
kubectl get pods

Output:

NAME                        READY   STATUS    RESTARTS   AGE
task-api-7b9f5d6c4d-abc12   2/2     Running   0          5m
task-api-7b9f5d6c4d-def34   2/2     Running   0          5m

# Certificate ready
kubectl get certificate

Output:

NAME           READY   SECRET         AGE
task-api-tls   True    task-api-tls   3m

# HTTPS endpoint working
curl https://tasks.yourdomain.com/health

Output:

{"status": "healthy", "service": "task-api"}

Phase 5: Production Verification Checklist

Go back to your specification and verify each success criterion:

Success Criterion	Status	Evidence
Cluster provisioned in < 10 minutes	PASS	`kubectl get nodes` shows Ready in 8m
Full deployment in < 30 minutes	PASS	Total deployment time: 22 minutes
HTTPS endpoint accessible	PASS	`curl https://tasks.yourdomain.com/health` returns 200
Health check returns 200	PASS	Response: `{"status": "healthy"}`
Teardown verified ($0 cost)	PENDING	Execute Phase 6

Phase 6: Complete Teardown with Cost Verification

Production capstones aren't complete without teardown. You must prove zero ongoing costs.

For DigitalOcean DOKS

# Delete the cluster
doctl kubernetes cluster delete task-api-prod --force

# Verify no lingering resources
doctl kubernetes cluster list
doctl compute load-balancer list

Output:

ID    Name    Region    Status
(empty - no clusters)

For Hetzner + K3s

# Delete the cluster
hetzner-k3s delete --cluster-name task-api-prod

# Verify no servers remain
hcloud server list

Output:

ID   NAME   STATUS   IPV4   IPV6   DATACENTER
(empty - no servers)

For Azure AKS

# Delete the resource group (removes everything)
az group delete --name task-api-rg --yes --no-wait

# Verify deletion
az aks list --output table

Cost Verification

Check your provider dashboard:

Log in to your cloud provider's billing console
Verify no active resources
Confirm next billing cycle shows $0 for this project

Update your specification checklist:

Success Criterion	Status
Teardown verified ($0 cost)	PASS

All criteria met. Specification satisfied.

Final Skill Evaluation Rubric

Your multi-cloud-deployer skill has been tested and refined throughout this chapter. Evaluate its production readiness:

Criterion	Weight	Pass Criteria	Your Score
Spec completeness	20%	All required sections present	/20
Deployment success	40%	HTTPS endpoint accessible	/40
Skill quality	30%	Generates commands for 3+ providers	/30
Teardown verification	10%	$0 ongoing cost confirmed	/10

Total: /100

Scoring guide:

90-100: Your skill is production-ready and sellable as a Digital FTE component
70-89: Minor gaps to address, but functional for personal use
50-69: Significant improvements needed before production use
Below 50: Return to earlier lessons and rebuild skill foundations

Reflect on Your Skill

Your multi-cloud-deployer skill has evolved through this chapter. It started as a skeleton created from official documentation. Now it's been tested against real cloud infrastructure.

Final Test: Ask your skill:

Generate a complete deployment plan for Task API on DigitalOcean DOKS
with Dapr, Ingress, and TLS.

Evaluate the output:

Does it include all provisioning commands (doctl kubernetes cluster create)?
Does it include the deployment sequence (Dapr -> Traefik -> cert-manager -> app)?
Does it include verification steps (kubectl get pods, curl https://...)?
Does it include teardown commands (doctl kubernetes cluster delete)?

Your skill is production-ready when:

It generates accurate CLI commands for at least 3 providers (DOKS, Hetzner, AKS)
It includes the universal "provision -> connect -> deploy" pattern
It can answer questions about cost and tradeoffs between providers
It produces deployment sequences that work on first try (or identifies likely issues)

This skill is now part of your Digital FTE portfolio.

You don't just "know cloud deployment." You OWN a verified, production-tested skill that can deploy AI agent services to any major cloud provider. This is the outcome of the Skill-First Learning Pattern: not knowledge, but assets.

Try With AI

You've completed the capstone by following the specification-first approach. Now extend your deployment skills through AI collaboration.

Prompt 1: Specification Review

Review my deployment specification for gaps:

[Paste your deployment-spec.md content]

Questions to consider:
- What failure scenarios am I not accounting for?
- Should I add rollback procedures?
- What monitoring should I configure for production?

What you're learning: AI can review specifications and identify blind spots. It might suggest failover strategies, backup procedures, or monitoring configurations you hadn't considered. You evaluate each suggestion against your actual production requirements.

Prompt 2: Multi-Cloud Comparison

I've deployed my Task API to DigitalOcean DOKS. Help me understand:

1. What would change if I deployed to GKE instead?
2. What would change if I deployed to EKS?
3. What's truly universal vs provider-specific?

Focus on practical differences that affect deployment time and cost.

What you're learning: The "provision -> connect -> deploy" pattern is universal, but provisioning commands differ. AI helps you understand which skills transfer directly and which require adaptation.

Prompt 3: Production Hardening

My Task API is deployed with HTTPS. What production hardening steps
am I missing? Consider:

- Network policies
- Pod security standards
- Resource quotas
- Backup procedures

Prioritize by risk: what gaps would cause the biggest production issues?

What you're learning: A working deployment isn't a hardened deployment. AI helps you identify the gap between "it runs" and "it's production-ready." You evaluate each recommendation against your application's actual risk profile.

Safety note: When sharing deployment specifications with AI, redact actual domain names, IP addresses, and cloud credentials. Replace real values with placeholders like yourdomain.com or YOUR_API_TOKEN.

Phase 1: Write the Deployment Specification​

Phase 2: Choose Your Deployment Path​

Path A: DigitalOcean DOKS (Production Path)​

Path B: Hetzner + K3s (Budget Path)​

Path C: Azure AKS (Enterprise Path)​

Phase 3: Deploy the Complete Stack​

Step 1: Install Dapr​

Step 2: Install Traefik Ingress​

Step 3: Install cert-manager​

Step 4: Configure Let's Encrypt ClusterIssuer​

Step 5: Deploy Task API​

Step 6: Configure Ingress with TLS​

Phase 4: The Convergence Loop​

Iteration 1: Initial Deployment Attempt​

Iteration 2: Troubleshoot and Fix​

Iteration 3: Verify Success​

Phase 5: Production Verification Checklist​

Phase 6: Complete Teardown with Cost Verification​

For DigitalOcean DOKS​

For Hetzner + K3s​

For Azure AKS​

Cost Verification​

Final Skill Evaluation Rubric​

Reflect on Your Skill​

Try With AI​

Phase 1: Write the Deployment Specification

Phase 2: Choose Your Deployment Path

Path A: DigitalOcean DOKS (Production Path)

Path B: Hetzner + K3s (Budget Path)

Path C: Azure AKS (Enterprise Path)

Phase 3: Deploy the Complete Stack

Step 1: Install Dapr

Step 2: Install Traefik Ingress

Step 3: Install cert-manager

Step 4: Configure Let's Encrypt ClusterIssuer

Step 5: Deploy Task API

Step 6: Configure Ingress with TLS

Phase 4: The Convergence Loop

Iteration 1: Initial Deployment Attempt

Iteration 2: Troubleshoot and Fix

Iteration 3: Verify Success

Phase 5: Production Verification Checklist

Phase 6: Complete Teardown with Cost Verification

For DigitalOcean DOKS

For Hetzner + K3s

For Azure AKS

Cost Verification

Final Skill Evaluation Rubric

Reflect on Your Skill

Try With AI