Updated Feb 23, 2026

Build Your Model Merging Skill

You've trained two specialized adapters: a TaskMaster persona adapter (Chapter 65) and an agentic tool-calling adapter (Chapter 66). Now you need to combine them. But before you learn the theory of model merging, you'll build a model-merging skill that will guide your decisions throughout this chapter and beyond.

This is the Skill-First Learning Pattern. Instead of learning a technology and then maybe creating a skill, you create the skill first—grounded in official documentation—and then refine it as you learn. By the end of this lesson, you'll have reusable intelligence that captures MergeKit's merging strategies, YAML configuration patterns, and RAM optimization techniques.

Why Skill-First for Model Merging?

Model merging has a deceptively simple surface: "combine two models into one." But beneath that simplicity lies crucial decisions:

Decision	Wrong Choice Impact
Merge strategy	Wrong strategy = capability interference, degraded performance
Layer handling	Incorrect ranges = corrupted model weights
RAM management	No sharding = out-of-memory crashes on consumer hardware
Weight ratios	Imbalanced mixing = one capability dominates

A skill captures this decision framework. When you encounter a merging scenario six months from now, you won't rely on memory—you'll invoke your skill.

Step 1: Clone Your Skills Lab Fresh

Every chapter starts clean. No assumptions about prior state.

# Navigate to your workspace
cd ~/workspace

# Clone or reset skills-lab (use your preferred approach)
git clone https://github.com/your-org/skills-lab.git ch67-skills-lab
cd ch67-skills-lab

Output:

Cloning into 'ch67-skills-lab'...
remote: Enumerating objects: 245, done.
remote: Counting objects: 100% (245/245), done.
Receiving objects: 100% (245/245), 89.42 KiB | 1.29 MiB/s, done.

Alternatively, if you're working in an existing repo:

# Create a fresh directory for Chapter 67
mkdir -p ~/workspace/ch67-model-merging
cd ~/workspace/ch67-model-merging

Step 2: Write Your LEARNING-SPEC

Before fetching documentation, articulate what you want to learn and how you'll know you've succeeded. This prevents aimless reading.

Create LEARNING-SPEC.md:

# LEARNING-SPEC: Model Merging

## What I Want to Learn
- How to combine multiple LoRA adapters into a single model
- Which merging strategies exist (TIES, SLERP, DARE) and when to use each
- How to handle RAM constraints on consumer hardware (12GB limit)
- MergeKit YAML configuration patterns

## Why This Matters
I have two trained adapters:
1. TaskMaster persona adapter (distinctive voice)
2. Agentic tool-calling adapter (reliable JSON output)

I need to combine them without losing either capability. Wrong strategy choice
means one capability dominates or both degrade.

## Success Criteria
1. [ ] Skill can recommend appropriate merge strategy for a given scenario
2. [ ] Skill includes YAML configuration templates for common cases
3. [ ] Skill explains RAM optimization techniques for 12GB constraint
4. [ ] Skill distinguishes when to merge vs. when to retrain combined

## Source Documents
- MergeKit GitHub repository documentation
- Arcee AI blog posts on merging techniques
- HuggingFace model merging guides

Why write this first? Without a spec, you'll read documentation passively. With a spec, you read actively—hunting for answers to YOUR questions.

Step 3: Fetch MergeKit Documentation

Now invoke your documentation-fetching skill. In Claude Code:

/fetching-library-docs MergeKit model merging

Or manually gather from the official repository:

# MergeKit GitHub: https://github.com/arcee-ai/mergekit
# Key documentation:
# - README.md for installation and basic usage
# - docs/ for strategy explanations
# - examples/ for YAML configuration templates

Key MergeKit Concepts

From the official documentation, extract these core patterns:

Supported Merge Methods:

Method	Best For	How It Works
linear	Simple averaging	Weighted average of model parameters
slerp	Two similar models	Spherical linear interpolation preserving geometric properties
ties	Multiple distinct capabilities	Trim-Elect-Sign: removes redundant params, resolves sign conflicts
dare_ties	Complementary skills	Drop and rescale + TIES conflict resolution
passthrough	Layer extraction	Copy specific layers without modification

YAML Configuration Structure:

merge_method: ties
slices:
  - sources:
      - model: ./adapter_1
        layer_range: [0, 32]
      - model: ./adapter_2
        layer_range: [0, 32]
parameters:
  weight: 0.5  # per-source weight
  density: 0.5  # for TIES/DARE methods
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

RAM Optimization:

# Sharded merging for limited RAM
merge_method: ties
slices:
  # Process layers in batches
  - sources:
      - model: ./adapter_1
        layer_range: [0, 8]
      - model: ./adapter_2
        layer_range: [0, 8]
  - sources:
      - model: ./adapter_1
        layer_range: [8, 16]
      - model: ./adapter_2
        layer_range: [8, 16]
# ... continue for remaining layers

Step 4: Create Your Model-Merging Skill

Now synthesize what you've learned into a reusable skill.

Create .claude/skills/model-merging/SKILL.md:

---
name: model-merging
description: "This skill should be used when combining multiple LoRA adapters or fine-tuned models into a single unified model. Use when students have trained separate adapters for different capabilities (persona, tool-calling, domain knowledge) and need to merge them without losing functionality."
---

# Model Merging Skill

## When to Use This Skill

Invoke when you need to:
- Combine multiple LoRA adapters into one model
- Merge fine-tuned models with complementary capabilities
- Optimize merged model for RAM-constrained environments
- Decide between merging strategies (TIES, SLERP, DARE)

## Decision Framework: Merge vs. Retrain

Before merging, consider:

| Question | If Yes | If No |
|----------|--------|-------|
| Are adapters trained on overlapping data? | Retrain combined | Safe to merge |
| Do capabilities interfere (conflicting outputs)? | Retrain with multi-task | Merge is viable |
| Is one adapter significantly larger/dominant? | Consider weighted merge | Standard merge |
| Do you have compute budget for retraining? | Consider both options | Merge is only option |

## Strategy Selection Guide

### Use SLERP When:
- Merging exactly 2 models
- Models are similar (same base, similar data)
- You want smooth interpolation between behaviors

### Use TIES When:
- Merging 2+ models with distinct capabilities
- Models may have parameter conflicts
- You want to preserve strongest signals from each

### Use DARE-TIES When:
- Merging complementary skills
- Adapter parameters are mostly redundant
- You want aggressive compression (drop 90%+ parameters)

### Use Linear When:
- Simple weighted average is sufficient
- Quick baseline before trying advanced methods

## YAML Configuration Templates

### Two-Adapter Merge (TIES)

```yaml
merge_method: ties
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
      - model: ./agentic_adapter
        layer_range: [0, 32]
parameters:
  weight: 0.5
  density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

RAM-Optimized Sharded Merge

merge_method: ties
# Process 8 layers at a time for 12GB RAM
slices:
  - sources:
      - model: ./adapter_1
        layer_range: [0, 8]
      - model: ./adapter_2
        layer_range: [0, 8]
  - sources:
      - model: ./adapter_1
        layer_range: [8, 16]
      - model: ./adapter_2
        layer_range: [8, 16]
  - sources:
      - model: ./adapter_1
        layer_range: [16, 24]
      - model: ./adapter_2
        layer_range: [16, 24]
  - sources:
      - model: ./adapter_1
        layer_range: [24, 32]
      - model: ./adapter_2
        layer_range: [24, 32]
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

Weighted Merge (Favor One Adapter)

merge_method: ties
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
        parameters:
          weight: 0.7  # Favor persona
      - model: ./agentic_adapter
        layer_range: [0, 32]
        parameters:
          weight: 0.3
parameters:
  density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

Evaluation Checklist

After merging, verify:

Persona trait consistency (compare to persona-only model)
Tool-calling accuracy (compare to agentic-only model)
No capability regression (both should work)
RAM usage within constraints
Inference latency acceptable

Common Pitfalls

Pitfall	Solution
OOM during merge	Use sharded layer processing
Capability loss	Try TIES with higher density
One adapter dominates	Adjust per-source weights
Inconsistent outputs	Evaluate base model compatibility

Resources

MergeKit: https://github.com/arcee-ai/mergekit
TIES Paper: Yadav et al., 2023
DARE Paper: Yu et al., 2023

## Step 5: Verify Your Skill

Test the skill by invoking it with a real scenario.

**Test prompt:**

I have two LoRA adapters:

Persona adapter (200 examples, casual voice)
Tool-calling adapter (500 examples, strict JSON output)

Both trained on Llama-3-8B. I have 12GB RAM available. Which merge strategy should I use and why?

**Expected skill-informed response:**

The skill should recommend:
1. **Strategy**: TIES (distinct capabilities, potential parameter conflicts)
2. **Weight balance**: Consider 0.4/0.6 favoring tool-calling (more training data)
3. **RAM handling**: Sharded merge processing 8 layers at a time
4. **Verification**: Test both persona consistency and JSON accuracy post-merge

If your skill provides this guidance, you've succeeded.

## Step 6: Update Your LEARNING-SPEC

Return to your specification and check off what you've accomplished:

```markdown
## Success Criteria
1. [x] Skill can recommend appropriate merge strategy for a given scenario
2. [x] Skill includes YAML configuration templates for common cases
3. [x] Skill explains RAM optimization techniques for 12GB constraint
4. [x] Skill distinguishes when to merge vs. when to retrain combined

What You've Built

Your model-merging skill now captures:

Component	Content
Decision framework	Merge vs. retrain criteria
Strategy selection	When to use TIES/SLERP/DARE
Configuration templates	Ready-to-use YAML
RAM optimization	Sharded merging patterns
Evaluation checklist	Post-merge verification

This skill will guide your work throughout the remaining lessons and serve you in future projects.

Try With AI

Use your AI companion to extend and validate your skill.

Prompt 1: Challenge Your Decision Framework

Review the "Merge vs. Retrain" decision framework in my model-merging skill.
I'm concerned it might be too simplistic. Ask me challenging questions:

1. What if adapters have partial data overlap (30%)?
2. What if capabilities are complementary but use conflicting base models?
3. What if I need to merge 3+ adapters, not just 2?

Help me identify gaps in my decision framework and suggest improvements.

What you're learning: Critical evaluation of your own skill—developing the meta-skill of improving reusable intelligence through adversarial questioning.

Prompt 2: Generate Edge Case Templates

My model-merging skill has templates for common cases, but I'm worried about
edge cases. Help me create YAML templates for:

1. Merging adapters with different LoRA ranks (r=16 vs r=32)
2. Merging an adapter with the base model itself (not two adapters)
3. Merging when one adapter is much larger (10x parameters)

For each, explain what could go wrong and how the template addresses it.

What you're learning: Template generalization—expanding your skill to handle scenarios beyond the happy path.

Prompt 3: Prepare for the Chapter

I'm about to learn model merging in depth (Lessons 1-6). Looking at my
current model-merging skill, what concepts am I missing that I'll likely
need to add? Consider:

- Mathematical foundations I haven't captured
- Failure modes I haven't documented
- Evaluation metrics beyond my checklist

Don't explain these now—just list what I should watch for as I learn,
so I can update my skill as I go.

What you're learning: Proactive learning—identifying knowledge gaps before encountering them, turning passive learning into active skill refinement.

Safety Note

Your skill is grounded in official MergeKit documentation, but merging techniques evolve. Before using configurations from your skill in production, verify against the current MergeKit repository. Model merging can produce unexpected behaviors—always evaluate merged models thoroughly before deployment.

Why Skill-First for Model Merging?​

Step 1: Clone Your Skills Lab Fresh​

Step 2: Write Your LEARNING-SPEC​

Step 3: Fetch MergeKit Documentation​

Key MergeKit Concepts​

Step 4: Create Your Model-Merging Skill​

RAM-Optimized Sharded Merge​

Weighted Merge (Favor One Adapter)​

Evaluation Checklist​

Common Pitfalls​

Resources​

What You've Built​

Try With AI​

Prompt 1: Challenge Your Decision Framework​

Prompt 2: Generate Edge Case Templates​

Prompt 3: Prepare for the Chapter​

Safety Note​