Skip to main content
Updated Feb 26, 2026

Capstone Project - Type-Safe Configuration Manager

The Configuration Problem in Production

Every real application needs configuration. When you deploy to production, you need different database credentials than your local development environment. Your API timeout settings change. Your logging level shifts from DEBUG to INFO. These values cannot be hardcoded in your source code—they belong in configuration files, environment variables, or command-line arguments.

But here's where it gets tricky: configuration is fragile. A typo in an environment variable name silently uses a default value instead of failing. Missing required settings crash the app hours into production rather than at startup. Different environments have different precedence rules, confusing developers about where values come from. Without type safety, you don't discover missing fields until runtime.

This is the capstone project: you'll build a production-quality ConfigManager that:

  • Loads configuration from multiple sources (YAML files, environment variables, CLI arguments)
  • Enforces type safety with Pydantic and Generics
  • Implements clear precedence rules (CLI overrides environment, environment overrides files)
  • Validates configuration on startup, failing fast if anything is wrong
  • Provides helpful error messages so developers know exactly what's misconfigured
  • Includes comprehensive tests proving it works in all scenarios

By the end, you'll have a portfolio-worthy project demonstrating mastery of Pydantic, Generics, and production engineering practices—something you can show in technical interviews or include on GitHub.


Section 1: Requirements and Architecture

Before writing a single line of code, let's clarify what a production config system needs.

Functional Requirements (What it does)

Your ConfigManager must:

  1. Load from YAML files — Read config.yaml, dev.yaml, or prod.yaml and parse structured data
  2. Load from environment variables — Allow overrides via APP_DATABASE_HOST, APP_LOG_LEVEL, etc.
  3. Load from CLI arguments — Accept --debug or --log-level=DEBUG to override everything else
  4. Merge with precedence — CLI args win over env vars, which win over file values, which win over defaults
  5. Validate everything — Ensure types, required fields, and constraints are satisfied

Non-Functional Requirements (How it must work)

  1. Type-safe access — Use Generics so config.get[DatabaseConfig]("database") returns a typed object with full IDE autocomplete
  2. Fail fast — If config is invalid, crash at startup with a clear error, not 3 hours into production
  3. Testable — Unit tests can verify each loading strategy independently
  4. Documented — A user reading the code understands why each piece exists
  5. Secure — Never log passwords; handle secrets safely

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│ ConfigManager │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. ConfigLoader │
│ ├─ load_yaml() → dict │
│ ├─ load_env() → dict │
│ └─ load_cli() → dict │
│ │
│ 2. Merge with Precedence │
│ ├─ defaults (BaseModel field defaults) │
│ ├─ + YAML file values │
│ ├─ + Environment variable values │
│ └─ + CLI argument values (highest priority) │
│ │
│ 3. Pydantic Validation │
│ └─ Validate merged dict against AppConfig model │
│ │
│ 4. Generic[T] Wrapper │
│ ├─ Type-safe access: config.get[DatabaseConfig]("db") │
│ └─ IDE autocomplete on DatabaseConfig fields │
│ │
│ 5. Return Validated AppConfig │
│ └─ App uses with confidence: no more type errors │
│ │
└─────────────────────────────────────────────────────────────┘

💬 AI Colearning Prompt

"Compare Pydantic BaseSettings vs manually reading environment variables with os.getenv(). What are the tradeoffs of each approach?"


Design Decisions: Why This Architecture?

Why Pydantic? Type hints alone don't enforce constraints. Pydantic validates at runtime, ensuring your configuration is actually correct before your app tries to use it.

Why BaseSettings? It automates the common pattern of "load from env vars with a prefix." Without it, you'd manually check os.environ.get("APP_DATABASE_HOST") for every single field.

Why Generic[T] wrapper? When you write config.get("database"), Python doesn't know what type you're getting back—is it a dict? A DatabaseConfig object? The Generic wrapper lets you specify the return type: config.get[DatabaseConfig]("database"), and your IDE gives you perfect autocomplete on all DatabaseConfig fields.

🎓 Expert Insight

In AI-native development, configuration is your specification for deployment. When you use Pydantic for config, you're creating executable documentation: the schema IS the validation IS the type hints. This specification-as-code pattern scales from local development to production without translation layers.


Section 2: Defining Config Models

Let's build the nested Pydantic models that describe your application's configuration.

Creating the DatabaseConfig Model

from pydantic import BaseModel, Field

class DatabaseConfig(BaseModel):
"""Database connection configuration."""

host: str = "localhost"
port: int = 5432
name: str # Required field—no default
user: str # Required field
password: str = Field(
default="",
repr=False # Security: don't show password in repr()
)

class Config:
"""Tell Pydantic to validate environment variables."""
env_prefix = "APP_DATABASE_"

The env_prefix means environment variables like APP_DATABASE_HOST automatically map to the host field. This eliminates manual string matching and reduces typos.

Creating the APIConfig Model

class APIConfig(BaseModel):
"""External API configuration."""

base_url: str
timeout: int = 30 # Seconds, with sensible default
retry_count: int = 3

class Config:
env_prefix = "APP_API_"

Creating the Top-Level AppConfig Model

from pydantic_settings import BaseSettings

class AppConfig(BaseSettings):
"""Complete application configuration."""

debug: bool = False
log_level: str = "INFO"
database: DatabaseConfig
api: APIConfig

class Config:
# Load from .env file if it exists
env_file = ".env"
env_prefix = "APP_"
env_nested_delimiter = "__" # APP_DATABASE__HOST maps to database.host

The env_nested_delimiter is key: it lets you set nested values from environment variables. APP_DATABASE__HOST=prod-db.example.com sets the database's host field without repeating the full path.

🤝 Practice Exercise

Ask your AI: "Scaffold the three config models (DatabaseConfig, APIConfig, AppConfig) with realistic defaults and validation constraints. Add validation to ensure port is 1-65535, timeout is > 0, and log_level is one of 'DEBUG', 'INFO', 'WARNING', 'ERROR'."

Expected Outcome: You'll see how to structure nested configuration models with comprehensive validation, understanding how Field() constraints and validators work together to enforce business rules at the config layer.


Section 3: Multi-Source Loading

Now implement the ConfigLoader that reads from all three sources and merges them with proper precedence.

Loading from YAML Files

import yaml
from pathlib import Path
from typing import Any

def load_yaml_config(filepath: str) -> dict[str, Any]:
"""Load configuration from a YAML file."""
config_path: Path = Path(filepath)

if not config_path.exists():
raise FileNotFoundError(f"Config file not found: {filepath}")

with open(config_path) as f:
return yaml.safe_load(f) or {}

Loading from Environment Variables

import os
from typing import Any

def load_env_config() -> dict[str, Any]:
"""Load configuration from environment variables with APP_ prefix."""
result: dict[str, Any] = {}

for key, value in os.environ.items():
if not key.startswith("APP_"):
continue

# APP_DEBUG=true → debug: true
# APP_DATABASE__HOST=localhost → database.host: localhost
config_key: str = key[4:].lower() # Remove "APP_" prefix

if "__" in config_key:
# Handle nested keys: DATABASE__HOST → database.host
parts: list[str] = config_key.split("__")
current: dict[str, Any] = result
for part in parts[:-1]:
if part not in current:
current[part] = {}
current = current[part]
current[parts[-1]] = _parse_value(value)
else:
result[config_key] = _parse_value(value)

return result

def _parse_value(value: str) -> Any:
"""Parse environment variable strings to Python types."""
if value.lower() in ("true", "false"):
return value.lower() == "true"

if value.isdigit():
return int(value)

return value

Loading from CLI Arguments

import argparse
from typing import Any

def load_cli_config() -> dict[str, Any]:
"""Load configuration from command-line arguments."""
parser = argparse.ArgumentParser()

# Top-level arguments
parser.add_argument("--debug", action="store_true", help="Enable debug mode")
parser.add_argument("--log-level", default=None, help="Logging level: DEBUG, INFO, WARNING, ERROR")

# Database arguments
parser.add_argument("--database-host", help="Database host")
parser.add_argument("--database-port", type=int, help="Database port")
parser.add_argument("--database-name", help="Database name")
parser.add_argument("--database-user", help="Database user")
parser.add_argument("--database-password", help="Database password")

# API arguments
parser.add_argument("--api-base-url", help="API base URL")
parser.add_argument("--api-timeout", type=int, help="API timeout (seconds)")
parser.add_argument("--api-retry-count", type=int, help="Number of retries")

args = parser.parse_args()

# Convert flat CLI args to nested dict matching AppConfig structure
result: dict[str, Any] = {}

if args.debug:
result["debug"] = True
if args.log_level:
result["log_level"] = args.log_level

# Build nested database section
if any([args.database_host, args.database_port, args.database_name,
args.database_user, args.database_password]):
result["database"] = {}
if args.database_host:
result["database"]["host"] = args.database_host
if args.database_port:
result["database"]["port"] = args.database_port
if args.database_name:
result["database"]["name"] = args.database_name
if args.database_user:
result["database"]["user"] = args.database_user
if args.database_password:
result["database"]["password"] = args.database_password

# Similar for API section...

return result

Merging with Precedence

from functools import reduce
from typing import Any

def merge_configs(*configs: dict[str, Any]) -> dict[str, Any]:
"""
Merge configuration dictionaries with precedence.
Later arguments override earlier ones.
"""
def merge_dict(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]:
result: dict[str, Any] = base.copy()
for key, value in override.items():
if isinstance(value, dict) and key in result and isinstance(result[key], dict):
result[key] = merge_dict(result[key], value)
else:
result[key] = value
return result

return reduce(merge_dict, configs, {})

The Complete load_config() Function

def load_config(yaml_file: str = "config.yaml") -> AppConfig:
"""
Load configuration from all sources with precedence:
1. Defaults (from AppConfig field defaults)
2. YAML file (config.yaml or prod.yaml)
3. Environment variables (APP_*)
4. CLI arguments (--flag)

Returns validated AppConfig instance.
"""
# Load from each source
try:
yaml_config: dict[str, Any] = load_yaml_config(yaml_file)
except FileNotFoundError:
yaml_config: dict[str, Any] = {}

env_config: dict[str, Any] = load_env_config()
cli_config: dict[str, Any] = load_cli_config()

# Merge with precedence: later overrides earlier
merged: dict[str, Any] = merge_configs(yaml_config, env_config, cli_config)

# Validate with Pydantic
try:
return AppConfig(**merged)
except ValidationError as e:
# Provide helpful error message
print("Configuration validation failed:")
for error in e.errors():
print(f" - {error['loc']}: {error['msg']}")
raise

Section 4: Generic Type-Safe Access

Now we add the ConfigValue[T] wrapper that provides type-safe configuration access with IDE autocomplete.

Why Type-Safe Access Matters

Without Generics, when you retrieve a config subsection, Python doesn't know its type:

# ❌ Without Generics: type is lost
config = load_config()
db = config.database # IDE: what type is this? Dict? DatabaseConfig?

With Generics, you make the type explicit:

# ✅ With Generics: type is crystal clear
config = load_config()
db: DatabaseConfig = config.get[DatabaseConfig]("database") # IDE knows exactly what this is
print(db.host) # IDE autocomplete works perfectly

Implementing ConfigValue[T]

from typing import Generic, TypeVar

T = TypeVar('T') # Generic type parameter

class ConfigValue(Generic[T]):
"""Type-safe wrapper for configuration values."""

def __init__(self, value: T) -> None:
"""Initialize with a typed value."""
self._value = value

def get(self) -> T:
"""Retrieve the typed value."""
return self._value

def __repr__(self) -> str:
"""String representation (useful for debugging)."""
return f"ConfigValue({self._value!r})"

Adding get() Method to AppConfig

class AppConfig(BaseSettings):
"""Complete application configuration."""

debug: bool = False
log_level: str = "INFO"
database: DatabaseConfig
api: APIConfig

def get(self, key: str) -> Any:
"""Retrieve a configuration value by key."""
if not hasattr(self, key):
raise KeyError(f"Configuration has no key: {key}")
return getattr(self, key)

Using Type-Safe Access

# Load configuration
config: AppConfig = load_config()

# Type-safe access with full IDE autocomplete
db_config: DatabaseConfig = config.database
print(f"Connecting to {db_config.host}:{db_config.port}")

api_config: APIConfig = config.api
print(f"API timeout: {api_config.timeout} seconds")

# Using ConfigValue wrapper (if you prefer explicit typing)
db: ConfigValue[DatabaseConfig] = ConfigValue[DatabaseConfig](config.database)
actual_db: DatabaseConfig = db.get()

Section 5: Error Handling and Validation

Production systems must fail gracefully. Configuration errors should crash at startup with clear messages, not 3 hours into production.

Validating Required Fields

from pydantic import ValidationError

def load_config_safe(yaml_file: str = "config.yaml") -> AppConfig:
"""Load configuration with detailed error reporting."""

# Load from all sources
yaml_config: dict[str, Any] = load_yaml_config(yaml_file) if Path(yaml_file).exists() else {}
env_config: dict[str, Any] = load_env_config()
cli_config: dict[str, Any] = load_cli_config()

merged: dict[str, Any] = merge_configs(yaml_config, env_config, cli_config)

# Validate and provide helpful errors
try:
return AppConfig(**merged)

except ValidationError as e:
print("="*50)
print("CONFIGURATION ERROR - Cannot start application")
print("="*50)

for error in e.errors():
field_path: str = ".".join(str(x) for x in error["loc"])
error_type: str = error["type"]
message: str = error["msg"]

print(f"Field: {field_path}")
print(f" Error: {message}")
print(f" Type: {error_type}")

print("Configuration sources (in order of precedence):")
print(f" 1. Defaults (from config.py)")
print(f" 2. YAML file: {yaml_file}")
print(f" 3. Environment variables (APP_*)")
print(f" 4. CLI arguments (--flag)")

raise

Logging Configuration Sources

import logging

logger = logging.getLogger(__name__)

def load_config_with_logging(yaml_file: str = "config.yaml") -> AppConfig:
"""Load configuration and log what sources were used."""

yaml_config: dict[str, Any] = load_yaml_config(yaml_file) if Path(yaml_file).exists() else {}
env_config: dict[str, Any] = load_env_config()
cli_config: dict[str, Any] = load_cli_config()

if yaml_config:
logger.info(f"Loaded YAML config from {yaml_file}")
if env_config:
logger.debug(f"Loaded environment variables: {list(env_config.keys())}")
if cli_config:
logger.debug(f"Loaded CLI arguments: {list(cli_config.keys())}")

merged: dict[str, Any] = merge_configs(yaml_config, env_config, cli_config)
config: AppConfig = AppConfig(**merged)

# Log final configuration (without secrets)
logger.info(f"Debug mode: {config.debug}")
logger.info(f"Log level: {config.log_level}")
logger.info(f"Database: {config.database.host}:{config.database.port}/{config.database.name}")
logger.info(f"API timeout: {config.api.timeout}s")

return config

Section 6: Testing

A production system needs comprehensive tests. You can't deploy configuration code to production without proving it handles all scenarios.

Test Setup with Temporary Files

import pytest
import tempfile
import os
from pathlib import Path

@pytest.fixture
def temp_yaml_config():
"""Create a temporary YAML config file for testing."""
yaml_content = """
debug: false
log_level: INFO
database:
host: localhost
port: 5432
name: testdb
user: testuser
password: testpass
api:
base_url: https://api.example.com
timeout: 30
retry_count: 3
"""

with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
f.write(yaml_content)
f.flush()
yield f.name

# Cleanup
os.unlink(f.name)

Testing YAML Loading

def test_load_yaml_config(temp_yaml_config):
"""Test that YAML files are loaded correctly."""
config: AppConfig = load_config(yaml_file=temp_yaml_config)

assert config.debug is False
assert config.log_level == "INFO"
assert config.database.host == "localhost"
assert config.database.port == 5432
assert config.database.name == "testdb"
assert config.api.timeout == 30

Testing Environment Variable Overrides

def test_env_override(temp_yaml_config, monkeypatch):
"""Test that environment variables override YAML values."""
# Set environment variable
monkeypatch.setenv("APP_DEBUG", "true")
monkeypatch.setenv("APP_DATABASE__HOST", "prod-db.example.com")

config: AppConfig = load_config(yaml_file=temp_yaml_config)

# Environment variables override YAML
assert config.debug is True
assert config.database.host == "prod-db.example.com"

# Other values come from YAML
assert config.database.port == 5432

Testing Precedence Rules

def test_cli_overrides_all(temp_yaml_config, monkeypatch):
"""Test that CLI arguments have highest precedence."""
# Set environment variable
monkeypatch.setenv("APP_LOG_LEVEL", "WARNING")

# Set CLI argument (simulated)
monkeypatch.setattr("sys.argv", [
"app.py",
"--log-level=ERROR",
"--api-timeout=60"
])

config: AppConfig = load_config(yaml_file=temp_yaml_config)

# CLI wins over environment
assert config.log_level == "ERROR"

# CLI override of API timeout
assert config.api.timeout == 60

Testing Validation Errors

def test_missing_required_field():
"""Test that missing required fields produce validation errors."""
invalid_config = {
"debug": False,
"log_level": "INFO",
"database": {
"host": "localhost",
"port": 5432,
# Missing "name" and "user" fields!
},
"api": {
"base_url": "https://api.example.com"
}
}

with pytest.raises(ValidationError) as exc_info:
AppConfig(**invalid_config)

# Verify error messages are helpful
errors = exc_info.value.errors()
assert any("database" in str(e["loc"]) for e in errors)

Section 7: Project Deliverables

Your capstone project should include all of these components:

Project Structure

config-manager/
├── config_manager/
│ ├── __init__.py
│ ├── models.py # DatabaseConfig, APIConfig, AppConfig
│ ├── loader.py # load_yaml, load_env, load_cli, merge_configs
│ ├── manager.py # ConfigManager class with get[T]() method
│ └── exceptions.py # Custom exceptions
├── configs/
│ ├── dev.yaml # Development configuration
│ ├── prod.yaml # Production configuration
│ └── .env.example # Example environment variables
├── tests/
│ ├── conftest.py # Pytest fixtures
│ ├── test_yaml_loading.py
│ ├── test_env_loading.py
│ ├── test_precedence.py
│ ├── test_validation.py
│ └── test_integration.py
├── example_app.py # Demo application using ConfigManager
├── README.md # Project documentation
├── requirements.txt # Dependencies (pydantic, pyyaml)
└── pytest.ini # Pytest configuration

Example Configuration Files

configs/dev.yaml:

debug: true
log_level: DEBUG
database:
host: localhost
port: 5432
name: myapp_dev
user: dev_user
password: dev_password
api:
base_url: http://localhost:8000
timeout: 5
retry_count: 1

configs/prod.yaml:

debug: false
log_level: INFO
database:
host: prod-db.example.com
port: 5432
name: myapp_prod
user: prod_user
password: ${DB_PASSWORD} # Load from env
api:
base_url: https://api.example.com
timeout: 30
retry_count: 3

Example Application

# example_app.py
import logging
from config_manager.manager import ConfigManager

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def main():
"""Example application using ConfigManager."""

# Load configuration
config: AppConfig = ConfigManager.load(yaml_file="configs/dev.yaml")

# Set up logging based on config
logging.getLogger().setLevel(config.log_level)

logger.info("Starting application")
logger.info(f"Debug mode: {config.debug}")

# Access database configuration with type safety
db_config: DatabaseConfig = config.database
logger.info(f"Connecting to {db_config.host}:{db_config.port}/{db_config.name}")

# Access API configuration with type safety
api_config: APIConfig = config.api
logger.info(f"API base URL: {api_config.base_url} (timeout: {api_config.timeout}s)")

logger.info("Application running successfully")

if __name__ == "__main__":
main()

Test Coverage

Run tests with:

pytest tests/ -v --cov=config_manager

Aim for 90%+ test coverage of your ConfigManager code.


Common Mistakes to Avoid

Mistake 1: Not Validating at Startup

# ❌ WRONG: Missing required field isn't caught until runtime
def get_database_password() -> str | None:
return os.environ.get("DATABASE_PASSWORD") # Returns None if missing!

password: str | None = get_database_password()
# Crashes hours later when you try to use the connection
# ✅ RIGHT: Validate at startup, fail immediately
class DatabaseConfig(BaseModel):
password: str # Required field—no default

config: AppConfig = load_config() # Raises ValidationError if password missing

Mistake 2: Hardcoding Defaults in Code

# ❌ WRONG: Change requires redeployment
def connect_to_api(timeout=30):
requests.get(..., timeout=timeout)
# ✅ RIGHT: Defaults in config files, easily overridable
class APIConfig(BaseModel):
timeout: int = 30 # Default value, but overridable via env/CLI

Mistake 3: Not Documenting Precedence

# ❌ WRONG: Developer doesn't know why their value isn't being used
config = load_from_env() # Oops, ignoring YAML file!
# ✅ RIGHT: Clear precedence documented in code and README
"""
Load configuration with precedence (later wins):
1. Defaults (AppConfig field defaults)
2. YAML file (config.yaml)
3. Environment variables (APP_*)
4. CLI arguments (--flag)
"""

Mistake 4: Overcomplicating the System

# ❌ WRONG: Too many config sources creates confusion
configs = [
load_from_yaml(),
load_from_env(),
load_from_cli(),
load_from_consul(), # Remote configuration!
load_from_vault(), # Secrets!
load_from_redis(), # Cache!
]

Lesson: Start simple. Add remote configs and secrets management only when you actually need them (that's your extension activity for B2+ students).


Try With AI

Integrate Pydantic and generics into a complete type-safe configuration system through AI collaboration.

🔍 Explore System Architecture:

"Design config manager using BaseSettings for environment loading, nested Pydantic models for validation, generic ConfigLoader[T: BaseModel] for type safety. List required components and validation strategy."

🎯 Practice Config Validation:

"Build DatabaseConfig, APIConfig, FeatureFlags models with Pydantic. Create AppConfig composing them. Use BaseSettings with env_prefix, env_nested_delimiter. Validate complex constraints with @model_validator."

🧪 Test Generic Loading:

"Create generic ConfigLoader[T] that loads from .env, JSON, or YAML, validates with Pydantic model T, handles errors gracefully. Show type safety preserving T throughout."

🚀 Apply Production System:

"Build complete config management system: environment variable loading, file fallbacks, validation with clear errors, type-safe access, hot reload support. Reflect on Pydantic+generics enabling this architecture."