Updated Feb 23, 2026

Capstone - Merge Task API Adapters

You've built two specialized adapters across Chapters 65-66: the TaskMaster persona adapter that gives your model a distinctive, encouraging voice, and the agentic adapter that enables reliable tool-calling. Now you'll merge them into a unified model—a Digital FTE that combines personality with capability.

This capstone is Layer 4: Spec-Driven Integration. You'll work from a specification, compose techniques from previous lessons, and produce a production-ready merged model.

The Specification

Task API Unified Model Specification

Intent: Merge persona and agentic adapters into a single model that maintains TaskMaster's voice while reliably executing tool calls—the complete Task API Digital FTE.

Success Criteria:

Metric	Target	Measurement
Persona preservation	>85%	Voice consistency, tone, encouragement patterns
Agentic preservation	>90%	Tool selection, argument extraction, JSON validity
Combined capability	No regression	Both capabilities function when invoked together
Latency	<500ms	p95 inference time
Memory	<8 GB	Inference RAM requirement

Source Adapters:

./adapters/task_api_persona (Chapter 65)
./adapters/task_api_agentic (Chapter 66)
Base model: unsloth/Llama-3.2-3B-Instruct

Merge Strategy: TIES (primary), with DARE-TIES backup if conflicts detected

Non-Goals:

General conversational capability beyond Task API context
Support for models other than the base model used in training
Real-time streaming (batch inference is sufficient)

Phase 1: Adapter Analysis (15 minutes)

Load and Inspect Adapters

Before merging, understand what you're combining:

from safetensors import safe_open
from pathlib import Path
import numpy as np

def analyze_adapter(adapter_path: str) -> dict:
    """Analyze adapter structure and weight distribution."""

    adapter_files = list(Path(adapter_path).glob("*.safetensors"))
    if not adapter_files:
        raise ValueError(f"No safetensors found in {adapter_path}")

    analysis = {
        "path": adapter_path,
        "layers": {},
        "total_params": 0,
        "weight_stats": []
    }

    for shard in adapter_files:
        with safe_open(str(shard), framework="pt", device="cpu") as f:
            for key in f.keys():
                tensor = f.get_tensor(key)
                params = tensor.numel()
                analysis["total_params"] += params

                # Track layer info
                layer_name = key.split(".")[0] if "." in key else key
                if layer_name not in analysis["layers"]:
                    analysis["layers"][layer_name] = {"params": 0, "keys": []}
                analysis["layers"][layer_name]["params"] += params
                analysis["layers"][layer_name]["keys"].append(key)

                # Weight statistics
                analysis["weight_stats"].append({
                    "key": key,
                    "shape": list(tensor.shape),
                    "mean": float(tensor.mean()),
                    "std": float(tensor.std()),
                    "max": float(tensor.max()),
                    "min": float(tensor.min())
                })

    return analysis

# Analyze both adapters
persona_analysis = analyze_adapter("./adapters/task_api_persona")
agentic_analysis = analyze_adapter("./adapters/task_api_agentic")

print(f"Persona Adapter:")
print(f"  Total parameters: {persona_analysis['total_params']:,}")
print(f"  Layers: {len(persona_analysis['layers'])}")

print(f"\nAgentic Adapter:")
print(f"  Total parameters: {agentic_analysis['total_params']:,}")
print(f"  Layers: {len(agentic_analysis['layers'])}")

Output:

Persona Adapter:
  Total parameters: 4,194,304
  Layers: 28

Agentic Adapter:
  Total parameters: 4,194,304
  Layers: 28

Check Compatibility

def check_merge_compatibility(adapter_a: dict, adapter_b: dict) -> dict:
    """Verify adapters are compatible for merging."""

    compatibility = {
        "compatible": True,
        "issues": [],
        "warnings": []
    }

    # Check parameter count
    if adapter_a["total_params"] != adapter_b["total_params"]:
        compatibility["issues"].append(
            f"Parameter count mismatch: {adapter_a['total_params']} vs {adapter_b['total_params']}"
        )
        compatibility["compatible"] = False

    # Check layer structure
    if set(adapter_a["layers"].keys()) != set(adapter_b["layers"].keys()):
        compatibility["issues"].append("Layer structure mismatch")
        compatibility["compatible"] = False

    # Check for overlapping weight distributions (potential conflict)
    for stat_a in adapter_a["weight_stats"][:5]:  # Sample check
        for stat_b in adapter_b["weight_stats"][:5]:
            if stat_a["key"] == stat_b["key"]:
                # Check sign agreement
                if np.sign(stat_a["mean"]) != np.sign(stat_b["mean"]):
                    compatibility["warnings"].append(
                        f"Potential sign conflict: {stat_a['key']}"
                    )

    return compatibility

compat = check_merge_compatibility(persona_analysis, agentic_analysis)
print(f"Compatible: {compat['compatible']}")
if compat["issues"]:
    print(f"Issues: {compat['issues']}")
if compat["warnings"]:
    print(f"Warnings: {compat['warnings']}")

Output:

Compatible: True
Warnings: ['Potential sign conflict: model.layers.0.self_attn.q_proj.lora_A']

Sign conflicts are expected—that's why we use TIES.

Phase 2: Baseline Merge (15 minutes)

Create TIES Configuration

# merge_config.yaml
merge_method: ties
slices:
  - sources:
      - model: ./adapters/task_api_persona
        layer_range: [0, 28]
      - model: ./adapters/task_api_agentic
        layer_range: [0, 28]
parameters:
  weight: 0.5           # Equal weighting to start
  density: 0.5          # Default density
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16
tokenizer_source: base

Execute Merge

# Run merge with memory optimization
mergekit-yaml merge_config.yaml ./merged_v1 --lazy --low-cpu-mem

# Verify output
ls -la ./merged_v1/

Output:

Loading base model: unsloth/Llama-3.2-3B-Instruct
Loading adapter: ./adapters/task_api_persona
Loading adapter: ./adapters/task_api_agentic
Applying TIES merging...
  Trimming with density=0.5
  Resolving 1,245 sign conflicts
  Merging remaining parameters
Saving to ./merged_v1
Merge complete in 3m 42s

total 6.1G
-rw-r--r-- 1 user user 2.0G model-00001-of-00003.safetensors
-rw-r--r-- 1 user user 2.0G model-00002-of-00003.safetensors
-rw-r--r-- 1 user user 2.1G model-00003-of-00003.safetensors
-rw-r--r-- 1 user user  654 config.json
-rw-r--r-- 1 user user  500K tokenizer.json

Phase 3: Capability Evaluation (25 minutes)

Design Evaluation Suite

EVALUATION_SUITE = {
    "persona": {
        "description": "Evaluate TaskMaster voice preservation",
        "tests": [
            {
                "id": "persona_001",
                "input": "Hello!",
                "expected_traits": ["greeting", "encouraging", "task-focused"],
                "not_expected": ["generic", "formal", "robotic"]
            },
            {
                "id": "persona_002",
                "input": "I just finished a big project!",
                "expected_traits": ["celebration", "positive reinforcement", "motivation"],
                "not_expected": ["dismissive", "neutral"]
            },
            {
                "id": "persona_003",
                "input": "I'm feeling overwhelmed with my tasks",
                "expected_traits": ["empathy", "supportive", "actionable advice"],
                "not_expected": ["critical", "dismissive"]
            },
            # ... 50 persona trait tests
        ],
        "scoring": {
            "method": "trait_presence",
            "threshold": 0.85  # 85% trait match
        }
    },

    "agentic": {
        "description": "Evaluate tool-calling preservation",
        "tests": [
            {
                "id": "agentic_001",
                "input": "Create a task to review the budget",
                "expected_tool": "create_task",
                "expected_args": {"title": "review the budget"}
            },
            {
                "id": "agentic_002",
                "input": "What tasks do I have due this week?",
                "expected_tool": "list_tasks",
                "expected_args": {"due_before": ".*"}  # Regex match
            },
            {
                "id": "agentic_003",
                "input": "Mark the budget review complete",
                "expected_tool": "complete_task",
                "expected_args": {"task_id": ".*"}
            },
            # ... 50 tool-calling tests
        ],
        "scoring": {
            "method": "exact_match",
            "threshold": 0.90  # 90% tool accuracy
        }
    },

    "combined": {
        "description": "Evaluate persona + agentic working together",
        "tests": [
            {
                "id": "combined_001",
                "input": "I need help creating a task for my big presentation next week",
                "expected_tool": "create_task",
                "expected_traits": ["encouraging", "supportive"],
                "notes": "Should call tool AND respond with TaskMaster voice"
            },
            {
                "id": "combined_002",
                "input": "What should I focus on today?",
                "expected_tool": "list_tasks",
                "expected_traits": ["prioritization guidance", "motivating"],
                "notes": "Should query tasks AND provide persona-styled response"
            },
        ],
        "scoring": {
            "method": "combined",
            "tool_weight": 0.6,
            "persona_weight": 0.4,
            "threshold": 0.85
        }
    }
}

Run Evaluation

from transformers import AutoModelForCausalLM, AutoTokenizer

def evaluate_merged_model(model_path: str, suite: dict) -> dict:
    """Comprehensive evaluation of merged model."""

    # Load model
    model = AutoModelForCausalLM.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    results = {}

    # Evaluate persona
    print("Evaluating persona preservation...")
    persona_results = evaluate_persona_suite(model, tokenizer, suite["persona"])
    results["persona"] = persona_results
    print(f"  Score: {persona_results['score']:.2%}")

    # Evaluate agentic
    print("Evaluating agentic preservation...")
    agentic_results = evaluate_agentic_suite(model, tokenizer, suite["agentic"])
    results["agentic"] = agentic_results
    print(f"  Score: {agentic_results['score']:.2%}")

    # Evaluate combined
    print("Evaluating combined capability...")
    combined_results = evaluate_combined_suite(model, tokenizer, suite["combined"])
    results["combined"] = combined_results
    print(f"  Score: {combined_results['score']:.2%}")

    # Overall assessment
    results["overall"] = {
        "persona_pass": persona_results["score"] >= suite["persona"]["scoring"]["threshold"],
        "agentic_pass": agentic_results["score"] >= suite["agentic"]["scoring"]["threshold"],
        "combined_pass": combined_results["score"] >= suite["combined"]["scoring"]["threshold"],
    }
    results["overall"]["all_pass"] = all(results["overall"].values())

    return results

# Evaluate baseline merge
results_v1 = evaluate_merged_model("./merged_v1", EVALUATION_SUITE)

Output:

Evaluating persona preservation...
  Score: 82.00%
Evaluating agentic preservation...
  Score: 94.50%
Evaluating combined capability...
  Score: 83.20%

Analyze Results

def print_evaluation_report(results: dict, version: str):
    """Print formatted evaluation report."""

    print("=" * 60)
    print(f"MERGED MODEL EVALUATION - {version}")
    print("=" * 60)

    for category in ["persona", "agentic", "combined"]:
        cat_results = results[category]
        status = "PASS" if results["overall"][f"{category}_pass"] else "FAIL"
        print(f"{category.upper():15} {cat_results['score']:.1%}  [{status}]")

        # Show failures
        if cat_results.get("failures"):
            print(f"  Failed cases ({len(cat_results['failures'])}):")
            for fail in cat_results["failures"][:3]:
                print(f"    - {fail['id']}: {fail['reason'][:50]}")

    print("=" * 60)
    overall_status = "PASS" if results["overall"]["all_pass"] else "NEEDS IMPROVEMENT"
    print(f"OVERALL: {overall_status}")

print_evaluation_report(results_v1, "v1 (TIES default)")

Output:

============================================================
MERGED MODEL EVALUATION - v1 (TIES default)
============================================================
PERSONA         82.0%  [FAIL]
  Failed cases (9):
    - persona_003: Missing empathy trait
    - persona_007: Response too brief
    - persona_012: Generic phrasing instead of TaskMaster
AGENTIC         94.5%  [PASS]
COMBINED        83.2%  [FAIL]
  Failed cases (8):
    - combined_001: Tool called but response lacked persona
    - combined_005: Persona present but no tool call
============================================================
OVERALL: NEEDS IMPROVEMENT

Persona preservation is below threshold. Let's tune.

Phase 4: Parameter Optimization (15 minutes)

Hypothesis: Persona Needs Higher Weight

The agentic adapter might be dominating. Try adjusting weights:

# merge_config_v2.yaml
merge_method: ties
slices:
  - sources:
      - model: ./adapters/task_api_persona
        layer_range: [0, 28]
        parameters:
          weight: 0.6    # Increase persona weight
      - model: ./adapters/task_api_agentic
        layer_range: [0, 28]
        parameters:
          weight: 0.4    # Decrease agentic weight
parameters:
  density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

Execute and Evaluate

mergekit-yaml merge_config_v2.yaml ./merged_v2 --lazy --low-cpu-mem

results_v2 = evaluate_merged_model("./merged_v2", EVALUATION_SUITE)
print_evaluation_report(results_v2, "v2 (persona=0.6, agentic=0.4)")

Output:

============================================================
MERGED MODEL EVALUATION - v2 (persona=0.6, agentic=0.4)
============================================================
PERSONA         87.5%  [PASS]
AGENTIC         91.2%  [PASS]
COMBINED        86.8%  [PASS]
============================================================
OVERALL: PASS

Weight adjustment fixed persona preservation while maintaining agentic quality.

Try DARE-TIES for Compression

Can we achieve similar quality with fewer parameters?

# merge_config_v3.yaml
merge_method: dare_ties
slices:
  - sources:
      - model: ./adapters/task_api_persona
        layer_range: [0, 28]
        parameters:
          weight: 0.6
      - model: ./adapters/task_api_agentic
        layer_range: [0, 28]
        parameters:
          weight: 0.4
parameters:
  density: 0.3    # Keep only 30% of parameters
  rescale: true
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

results_v3 = evaluate_merged_model("./merged_v3", EVALUATION_SUITE)
print_evaluation_report(results_v3, "v3 (DARE-TIES density=0.3)")

Output:

============================================================
MERGED MODEL EVALUATION - v3 (DARE-TIES density=0.3)
============================================================
PERSONA         84.2%  [FAIL]
AGENTIC         88.5%  [FAIL]
COMBINED        82.1%  [FAIL]
============================================================
OVERALL: NEEDS IMPROVEMENT

DARE-TIES with 70% drop is too aggressive. Try density=0.4:

# Quick test with density=0.4
results_v4 = evaluate_merged_model("./merged_v4", EVALUATION_SUITE)
print_evaluation_report(results_v4, "v4 (DARE-TIES density=0.4)")

Output:

============================================================
MERGED MODEL EVALUATION - v4 (DARE-TIES density=0.4)
============================================================
PERSONA         86.1%  [PASS]
AGENTIC         90.8%  [PASS]
COMBINED        85.2%  [PASS]
============================================================
OVERALL: PASS

DARE-TIES with density=0.4 achieves compression while passing quality gates.

Phase 5: Final Selection and Packaging (10 minutes)

Compare All Versions

def compare_versions(results_dict: dict):
    """Compare all merge versions."""

    print("=" * 70)
    print(f"{'Version':<25} {'Persona':<12} {'Agentic':<12} {'Combined':<12} {'Status'}")
    print("=" * 70)

    for version, results in results_dict.items():
        persona = f"{results['persona']['score']:.1%}"
        agentic = f"{results['agentic']['score']:.1%}"
        combined = f"{results['combined']['score']:.1%}"
        status = "PASS" if results['overall']['all_pass'] else "FAIL"
        print(f"{version:<25} {persona:<12} {agentic:<12} {combined:<12} {status}")

    print("=" * 70)

compare_versions({
    "v1 (TIES default)": results_v1,
    "v2 (TIES weighted)": results_v2,
    "v3 (DARE density=0.3)": results_v3,
    "v4 (DARE density=0.4)": results_v4,
})

Output:

======================================================================
Version                   Persona      Agentic      Combined     Status
======================================================================
v1 (TIES default)         82.0%        94.5%        83.2%        FAIL
v2 (TIES weighted)        87.5%        91.2%        86.8%        PASS
v3 (DARE density=0.3)     84.2%        88.5%        82.1%        FAIL
v4 (DARE density=0.4)     86.1%        90.8%        85.2%        PASS
======================================================================

Select Production Model

Decision matrix:

Version	Quality	Compression	Recommendation
v2 (TIES)	Highest	None	Best quality
v4 (DARE)	Acceptable	60% params	Best efficiency

For production: Use v2 for maximum quality For edge deployment: Use v4 for reduced memory

Package for Deployment

import shutil
import json

def package_merged_model(
    source_dir: str,
    output_name: str,
    metadata: dict
) -> str:
    """Package merged model with metadata for deployment."""

    output_dir = f"./releases/{output_name}"
    Path(output_dir).mkdir(parents=True, exist_ok=True)

    # Copy model files
    for f in Path(source_dir).glob("*"):
        if f.is_file():
            shutil.copy(f, output_dir)

    # Add metadata
    metadata_path = Path(output_dir) / "merge_metadata.json"
    with open(metadata_path, "w") as f:
        json.dump(metadata, f, indent=2)

    # Create README
    readme = f"""# {metadata['name']}

{metadata['description']}

## Capabilities
- TaskMaster persona (voice, encouragement, productivity focus)
- Agentic tool-calling (Task API integration)

## Metrics
- Persona preservation: {metadata['metrics']['persona']:.1%}
- Agentic preservation: {metadata['metrics']['agentic']:.1%}
- Combined capability: {metadata['metrics']['combined']:.1%}

## Usage
Load with transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("{output_dir}")
tokenizer = AutoTokenizer.from_pretrained("{output_dir}")

## Source Adapters
- Persona: {metadata['sources']['persona']}
- Agentic: {metadata['sources']['agentic']}

## Merge Configuration
- Method: {metadata['merge_config']['method']}
- Weights: {metadata['merge_config']['weights']}
- Density: {metadata['merge_config']['density']}
"""

    with open(Path(output_dir) / "README.md", "w") as f:
        f.write(readme)

    print(f"Packaged model to {output_dir}")
    return output_dir


# Package production model
package_merged_model(
    source_dir="./merged_v2",
    output_name="task-api-unified-v1.0",
    metadata={
        "name": "Task API Unified Model",
        "version": "1.0.0",
        "description": "TaskMaster persona + Agentic tool-calling for Task API",
        "base_model": "unsloth/Llama-3.2-3B-Instruct",
        "sources": {
            "persona": "./adapters/task_api_persona",
            "agentic": "./adapters/task_api_agentic"
        },
        "merge_config": {
            "method": "ties",
            "weights": {"persona": 0.6, "agentic": 0.4},
            "density": 0.5
        },
        "metrics": {
            "persona": 0.875,
            "agentic": 0.912,
            "combined": 0.868
        }
    }
)

Output:

Packaged model to ./releases/task-api-unified-v1.0

Checkpoint: Production Readiness

Criterion	Status	Evidence
Adapters analyzed	Done	Compatible structure, sign conflicts identified
Baseline merge complete	Done	TIES default
Evaluation suite designed	Done	Persona, agentic, combined tests
Parameter tuning	Done	Weight adjustment (0.6/0.4)
Quality gates passed	Done	All metrics above threshold
Compression explored	Done	DARE-TIES viable at density=0.4
Model packaged	Done	Release with metadata and README

Your Task API Digital FTE is production-ready.

Reflect on Your Skill

Your model-merging skill is now complete. Review your capability:

Adapter analysis: Compatibility checking, conflict detection
Strategy selection: TIES vs DARE-TIES decision framework
Parameter tuning: Weight and density optimization
Evaluation design: Multi-capability preservation testing
Production packaging: Metadata, documentation, versioning

This skill is reusable for any model merging project.

Try With AI

Prompt 1: Debug Quality Regression

My merged model passed all tests in isolation but fails in production:
- Users report the persona "feels off" sometimes
- Tool calls occasionally miss when embedded in conversation

My evaluation suite tests each capability separately. What am I missing?

Help me design tests that catch:
1. Context-dependent capability switching
2. Long conversation degradation
3. Edge cases where capabilities interact poorly

What you're learning: Evaluation completeness—understanding gaps between synthetic tests and real usage.

Prompt 2: Optimize for Different Deployment

I have my Task API unified model (3B params, 6GB VRAM).
I need to deploy it in three scenarios:

1. Cloud API: Maximize quality, cost is secondary
2. Edge device: Must fit in 4GB RAM
3. Batch processing: Optimize for throughput, not latency

For each scenario, what would you change about my merge configuration
or post-merge optimization? Walk through the tradeoffs.

What you're learning: Deployment-aware optimization—tailoring models to operational requirements.

Prompt 3: Plan Next Iteration

My v1.0 merged model works but I want v2.0 with improvements:

Current metrics:
- Persona: 87.5%
- Agentic: 91.2%
- Combined: 86.8%

I have ideas for improvement:
1. Add distilled reasoning from GPT-4
2. Retrain persona adapter with more data
3. Add a third "safety" adapter

Help me prioritize and plan:
1. Which improvement has highest ROI?
2. How would merging change with 3 adapters?
3. What risks should I watch for?

What you're learning: Roadmap planning—prioritizing improvements for production systems.

Safety Note

Merged models combine capabilities—including any biases or failure modes from source adapters. Your unified model may exhibit unexpected behaviors when persona and agentic capabilities interact. For production deployment, implement monitoring for:

Unusual response patterns
Tool calls with persona-style arguments (contamination)
Persona responses that should have been tool calls

Never assume merged model behavior is the simple sum of its parts. Continuous monitoring is essential.

The Specification​

Task API Unified Model Specification​

Phase 1: Adapter Analysis (15 minutes)​

Load and Inspect Adapters​

Check Compatibility​

Phase 2: Baseline Merge (15 minutes)​

Create TIES Configuration​

Execute Merge​

Phase 3: Capability Evaluation (25 minutes)​

Design Evaluation Suite​

Run Evaluation​

Analyze Results​

Phase 4: Parameter Optimization (15 minutes)​

Hypothesis: Persona Needs Higher Weight​

Execute and Evaluate​

Try DARE-TIES for Compression​

Phase 5: Final Selection and Packaging (10 minutes)​

Compare All Versions​

Select Production Model​

Package for Deployment​

Checkpoint: Production Readiness​

Reflect on Your Skill​

Try With AI​

Prompt 1: Debug Quality Regression​

Prompt 2: Optimize for Different Deployment​

Prompt 3: Plan Next Iteration​

Safety Note​

The Specification

Task API Unified Model Specification

Phase 1: Adapter Analysis (15 minutes)

Load and Inspect Adapters

Check Compatibility

Phase 2: Baseline Merge (15 minutes)

Create TIES Configuration

Execute Merge

Phase 3: Capability Evaluation (25 minutes)

Design Evaluation Suite

Run Evaluation

Analyze Results

Phase 4: Parameter Optimization (15 minutes)

Hypothesis: Persona Needs Higher Weight

Execute and Evaluate

Try DARE-TIES for Compression

Phase 5: Final Selection and Packaging (10 minutes)

Compare All Versions

Select Production Model

Package for Deployment

Checkpoint: Production Readiness

Reflect on Your Skill

Try With AI

Prompt 1: Debug Quality Regression

Prompt 2: Optimize for Different Deployment

Prompt 3: Plan Next Iteration

Safety Note