Principle 6: Constraints and Safety

You give an AI system access to your codebase. It's working well—making helpful changes, running tests, suggesting improvements. Then you notice something odd in git history. The AI deleted a directory you didn't ask it to touch. It ran commands you don't remember approving. It's refactoring code you specifically said not to change.

This isn't science fiction—these are real incidents that have happened. AI systems are powerful, and power without constraints is dangerous.

This principle is about balancing capability with safety. You want AI to be effective—but not so effective it causes damage. You want autonomy—but not so much autonomy that you lose control. The solution is thoughtful constraints and safety measures.

The Risk Spectrum: Understanding What Can Go Wrong

Before designing safety measures, understand what you're protecting against.

Category 1: Data Loss (Destructive Operations)

AI deletes or overwrites important data:

rm -rf on the wrong directory
Overwriting files without confirmation
Git operations that discard work
Database changes without backups

Impact: Hours to weeks of lost work Likelihood: Medium—AI follows instructions literally

Category 2: Security Vulnerabilities

AI introduces security issues:

Hardcoded credentials in code
Insecure authentication implementations
SQL injection vulnerabilities
Dependency confusion attacks

Impact: System compromise, data breach Likelihood: Medium—AI doesn't automatically think like an attacker

Category 3: Cost Overruns

AI generates expensive operations:

Infinite loops in cloud resources
API calls without rate limiting
Inefficient algorithms consuming compute
Unintended large-scale operations

Impact: Unexpected cloud bills Likelihood: Low—AI tries to be efficient, but doesn't know costs

Category 4: Reputation Damage

AI makes changes that affect users:

Offensive content in user-facing materials
Bugs that corrupt user data
Performance issues that cause downtime
Privacy violations

Impact: Lost trust, user churn Likelihood: Low—but high impact

Category 5: Workflow Disruption

AI interferes with team processes:

Commits that break CI/CD
Changes that conflict with others' work
Alters agreed-upon conventions
Makes conflicting changes across branches

Impact: Team friction, lost productivity Likelihood: Medium—AI doesn't know team context

The Safety Hierarchy: Defense in Depth

No single safety measure is sufficient. You need layers—each protecting against different failure modes.

Layer 1: Technical Constraints

What: Hard limits on what AI can do

Examples:

# Read-only filesystem access (sandbox)
# Network restrictions (no external API calls)
# Resource limits (CPU, memory, disk)
# Whitelisted commands only

Protects against: Accidental damage, runaway processes

Layer 2: Permission Controls

What: Require approval for certain actions

Examples:

# Approve before: deleting files
# Approve before: running git push
# Approve before: installing packages
# Approve before: modifying config files

Protects against: Unintended destructive operations

Layer 3: Environment Isolation

What: Separate AI work from production

Examples:

# AI works in staging/sandbox environment
# Production requires manual deployment
# Separate database instances
# Separate API keys/tokens

Protects against: Production incidents

Layer 4: Process Controls

What: Workflow that incorporates safety

Examples:

# Always review diffs before applying
# Run tests before committing
# Peer review for AI-generated changes
# Rollback plans prepared in advance

Protects against: Bad code reaching production

Layer 5: Human Verification

What: Human review before impact

Examples:

# Review AI suggestions before accepting
# Manual approval for deployments
# Security review for sensitive changes
# Testing in isolated environment first

Protects against: All categories—final safety net

Permission Models: Choosing Your Safety Level

Different AI tools offer different permission models. Understanding them helps you choose appropriate settings.

Model 1: Permissive (Auto-Approve Safe Operations)

How it works: AI executes read operations and safe writes automatically; prompts for destructive actions

Best for: Experienced users, trusted AI, familiar codebase

Example configuration:

Auto-approve:
- Read operations (cat, grep, find)
- Test execution (npm test, pytest)
- Git status/diff
- File creation (new files only)

Require approval:
- File deletion (rm)
- Git reset/rebase
- Package installation
- Config file changes

Model 2: Confirming (Approve All Writes)

How it works: AI prompts before any write operation

Best for: New AI collaboration, unfamiliar codebase, learning phase

Example configuration:

Auto-approve:
- Read operations only

Require approval:
- All write operations
- All file modifications
- All command execution

Model 3: Restricted (Sandbox Mode)

How it works: AI can only read; cannot modify anything

Best for: Exploration, code review, understanding unfamiliar codebases

Example configuration:

Auto-approve:
- Read operations only

Blocked:
- All write operations
- All command execution
- All file modifications

Choosing Your Model

Situation	Recommended Model	Rationale
First time with AI	Confirming	Build trust before autonomy
Routine work on familiar project	Permissive	Efficiency for safe operations
Exploring unfamiliar code	Restricted	Understand before modifying
Production systems	Confirming + Staging	Extra caution for critical systems
Prototype/experimental work	Permissive	Speed over caution, rollback available

The Destructive Operations List

Know which commands require extra scrutiny. These should always trigger confirmation:

File Operations

rm, rm -rf              # Delete files/directories
mv                      # Move (can overwrite)
cp                      # Copy (can overwrite)
> file                  # Redirect and overwrite
dd                      # Low-level disk write

Version Control

git reset --hard        # Discard all changes
git rebase              # Rewrite history
git push --force        # Overwrite remote
git clean -fd           # Delete untracked files
git checkout -- .       # Discard working directory changes

Package Management

npm install             # Can change dependencies
pip install             # Can change dependencies
apt install             # System-level changes
brew install            # System-level changes

System Operations

sudo                    # Elevated privileges
systemctl               # Service management
kill -9                # Force kill processes
reboot, shutdown        # System operations

Data Operations

DROP DATABASE           # Database destruction
DELETE FROM            # Data deletion (without WHERE)
TRUNCATE TABLE         # Remove all data
UPDATE (no WHERE)      # Modify all rows

Sandboxing: Isolating AI Work

The most effective safety measure: don't let AI touch production directly.

Sandbox Strategies

1. Docker Container Sandbox

# Run AI work in container
docker run -it -v $(pwd):/workspace -w /workspace node:18 bash

# AI works inside container
# Can't affect host system
# Can't access production resources

2. Staging Environment

AI works on: staging.example.com
Manual deploy: production.example.com

AI can make all the changes it wants to staging
You review before promoting to production

3. Feature Branch Workflow

# AI works on feature branch
git checkout -b feature/ai-work

# Changes isolated from main
# Merge only after review

4. Separate Credentials

# .env.ai - AI's environment
AI_DATABASE_URL=postgresql://localhost:5432/sandbox_db
AI_API_KEY=sandbox_key_limited_permissions

# .env - Production (never shown to AI)
PRODUCTION_DATABASE_URL=postgresql://prod-server:5432/real_db
PRODUCTION_API_KEY=production_key_full_permissions

Trust Gradualism: Easing into Autonomy

Don't go from zero autonomy to full autonomy overnight. Build trust gradually.

Phase 1: Observation Only (Week 1)

AI reads files and explains them
AI suggests changes but doesn't apply them
You manually apply AI suggestions
Goal: Understand AI's capabilities and patterns

Phase 2: Supervised Autonomy (Week 2-4)

AI makes changes in sandbox/feature branches
You review all diffs before applying
Destructive operations always require approval
Goal: Build confidence with safety net

Phase 3: Selective Autonomy (Month 2-3)

AI autonomously handles safe operations (tests, linting)
AI handles routine refactors within approved patterns
Destructive operations still require approval
Goal: Accelerate routine work while maintaining oversight

Phase 4: Calibrated Autonomy (Month 3+)

AI autonomously handles most operations
Pre-approve known-safe command patterns
Approval only for novel or high-risk operations
Goal: Maximum efficiency with maintained safety

Trust Signals to Track

Track these to decide when to increase autonomy:

Error rate: How often does AI make mistakes?
Correction ease: How easy is it to fix AI mistakes?
Pattern adherence: Does AI follow project conventions?
Risk awareness: Does AI avoid known dangerous operations?

Safety Checklist: Before Each Session

Before starting an AI session, verify:

Environment:

Working in correct directory (not production)
On correct branch (feature branch, not main)
Environment variables set correctly (sandbox credentials)
Uncommitted work is backed up or committed

Tool Configuration:

Permission mode appropriate for task
Destructive operations require approval
Read-only mode if just exploring
Logging enabled for audit trail

Mental Model:

Clear task scope (what AI should and shouldn't do)
Identified high-risk operations to watch for
Rollback plan if things go wrong
Stopping point defined

Incident Response: What to Do When Something Goes Wrong

Despite all precautions, things will go wrong. Have a plan.

Immediate Actions

# 1. Stop the AI
# Ctrl+C or stop button

# 2. Assess damage
git status          # What changed?
git diff            # What's the diff?

# 3. If bad, revert
git checkout -- .   # Revert working directory
git reset --hard    # Reset to last commit

# 4. If already committed
git revert HEAD     # Revert the commit
git reset --hard HEAD~1  # Or remove commit entirely

Post-Incident Review

After an incident, ask:

What happened?
Why did safeguards fail?
What constraint would have prevented this?
How do I adjust permissions/configuration?

Example: AI Deleted Wrong Directory

Incident: AI ran rm -rf node_modules/ but executed in wrong directory, deleting source files.

Immediate: Ctrl+C immediately. Assess damage with git status.

Recovery: git checkout -- . to restore from git.

Prevention for next time:

Add safeguard: AI must pwd before destructive operations
Change permission mode: require approval for all rm commands
Add alias: rm → rm -i (interactive mode)

Why This Principle Matters: Trust Through Safety

Paradoxically, constraints enable autonomy. When you have good safety measures:

You feel comfortable giving AI more autonomy
You can focus on high-level direction rather than worrying
AI can be more effective without risking disaster

Without safety measures, you're constantly on edge—afraid to let AI do anything meaningful. With safety measures, you can collaborate confidently.

The goal isn't to prevent AI from doing anything. The goal is to prevent AI from doing certain things—while enabling everything else.

This Principle in Both Interfaces

"Cowork asks before deleting anything. This isn't just UX—it's architectural."

Constraints manifest differently in each interface, but the underlying safety model is the same.

Constraint Type	Claude Code	Claude Cowork
Boundary	Permission flags, CLAUDE.md restrictions	Folder selection, connector permissions
Action	Configured via settings/hooks	Built-in confirmation dialogs
Resource	API cost monitoring, token limits	Subscription limits apply
Output	Specified in prompts/Skills	Skills define output formats

In Cowork: The confirmation dialogs ARE the constraint system. When Cowork asks "Should I delete this file?" or "Should I modify this document?", it's implementing the same safety principle that Claude Code's permission model provides. The difference is that Cowork's constraints are built into the GUI—you don't configure them, you respond to them.

The paradox applies equally: In both interfaces, constraints enable capability. When you trust the safety model, you give the agent more autonomy. Without constraints, you'd never let either agent do meaningful work on important files.

Try With AI

Prompt 1: Risk Assessment Exercise

I want to practice assessing safety risks in AI workflows.

Here's a scenario: I'm planning to have AI help me [describe a task—refactor database schema, update authentication system, optimize performance, etc.]

Help me assess:
1. What could go wrong? Brainstorm specific risks
2. What's the impact if something does go wrong?
3. What safety measures should I put in place?

For each safety measure, categorize it:
- Technical constraint (what to restrict)
- Permission control (what to approve)
- Environment isolation (where to work)
- Process control (how to work)
- Human verification (when to review)

Then, help me create a specific plan: "Before starting, I will X. During work, AI can Y but not Z. After work, I will verify W."

What you're learning: How to identify risks and design appropriate safety measures. You're developing the skill of anticipating problems before they occur and structuring AI work to be safe by design.

Prompt 2: Permission Model Design

I want to design an appropriate permission model for my situation.

Here's my context:
- [Your experience level with AI]
- [How familiar you are with your codebase]
- [What you're working on—prototype, production, personal project, team project]
- [Your risk tolerance—low, medium, high]

Help me design a permission model:
1. Should I use Permissive, Confirming, or Restricted mode? Why?
2. What operations should be auto-approved?
3. What operations should require approval?
4. What operations should be blocked entirely?

Also, help me understand:
- When should I move to a more permissive model?
- What signals should I track to build trust?
- What would cause me to tighten restrictions?

What you're learning: How to choose appropriate permission models based on context and experience. You're learning to calibrate autonomy based on trust and risk—balancing safety with effectiveness.

Prompt 3: Sandbox Setup

I want to set up a safe sandbox environment for AI work.

Help me design a sandboxing strategy for my project:
- [Project type—web app, data pipeline, scripts, etc.]
- [Current setup—local development, cloud, etc.]

I want to make sure:
1. AI can't affect production systems
2. AI can experiment freely without risk
3. I can easily promote AI work to production after review

Design a sandbox setup that includes:
- Directory structure (sandbox vs production)
- Git workflow (branches, merge process)
- Environment variables (sandbox credentials)
- Database approach (separate instances)
- Deployment process (manual promotion after review)

After we design it, help me actually set it up step by step.

What you're learning: How to create isolated environments where AI can work safely. You're learning to structure your workflow so that AI experimentation never puts production at risk—enabling confident collaboration.

Safety Note

When in doubt, start with more restrictions and ease into autonomy. It's always easier to loosen constraints later than to recover from a preventable incident. The best safety measure is a cautious approach—especially when you're just starting with AI collaboration.

The Risk Spectrum: Understanding What Can Go Wrong​

Category 1: Data Loss (Destructive Operations)​

Category 2: Security Vulnerabilities​

Category 3: Cost Overruns​

Category 4: Reputation Damage​

Category 5: Workflow Disruption​

The Safety Hierarchy: Defense in Depth​

Layer 1: Technical Constraints​

Layer 2: Permission Controls​

Layer 3: Environment Isolation​

Layer 4: Process Controls​

Layer 5: Human Verification​

Permission Models: Choosing Your Safety Level​

Model 1: Permissive (Auto-Approve Safe Operations)​

Model 2: Confirming (Approve All Writes)​

Model 3: Restricted (Sandbox Mode)​

Choosing Your Model​

The Destructive Operations List​

File Operations​

Version Control​

Package Management​

System Operations​

Data Operations​

Sandboxing: Isolating AI Work​

Sandbox Strategies​

Trust Gradualism: Easing into Autonomy​

Phase 1: Observation Only (Week 1)​

Phase 2: Supervised Autonomy (Week 2-4)​

Phase 3: Selective Autonomy (Month 2-3)​

Phase 4: Calibrated Autonomy (Month 3+)​

Trust Signals to Track​

Safety Checklist: Before Each Session​

Incident Response: What to Do When Something Goes Wrong​

Immediate Actions​

Post-Incident Review​

Example: AI Deleted Wrong Directory​

Why This Principle Matters: Trust Through Safety​

This Principle in Both Interfaces​

Try With AI​

Prompt 1: Risk Assessment Exercise​

Prompt 2: Permission Model Design​

Prompt 3: Sandbox Setup​

Safety Note​

The Risk Spectrum: Understanding What Can Go Wrong

Category 1: Data Loss (Destructive Operations)

Category 2: Security Vulnerabilities

Category 3: Cost Overruns

Category 4: Reputation Damage

Category 5: Workflow Disruption

The Safety Hierarchy: Defense in Depth

Layer 1: Technical Constraints

Layer 2: Permission Controls

Layer 3: Environment Isolation

Layer 4: Process Controls

Layer 5: Human Verification

Permission Models: Choosing Your Safety Level

Model 1: Permissive (Auto-Approve Safe Operations)

Model 2: Confirming (Approve All Writes)

Model 3: Restricted (Sandbox Mode)

Choosing Your Model

The Destructive Operations List

File Operations

Version Control

Package Management

System Operations

Data Operations

Sandboxing: Isolating AI Work

Sandbox Strategies

Trust Gradualism: Easing into Autonomy

Phase 1: Observation Only (Week 1)

Phase 2: Supervised Autonomy (Week 2-4)

Phase 3: Selective Autonomy (Month 2-3)

Phase 4: Calibrated Autonomy (Month 3+)

Trust Signals to Track

Safety Checklist: Before Each Session

Incident Response: What to Do When Something Goes Wrong

Immediate Actions

Post-Incident Review

Example: AI Deleted Wrong Directory

Why This Principle Matters: Trust Through Safety

This Principle in Both Interfaces

Try With AI

Prompt 1: Risk Assessment Exercise

Prompt 2: Permission Model Design

Prompt 3: Sandbox Setup

Safety Note