Updated Feb 23, 2026

Merging Techniques Deep Dive

You've decided to merge your adapters rather than retrain. Now comes the technical question: how exactly do you combine model weights?

The simplest approach—averaging—often works surprisingly well. But when capabilities conflict or adapters are highly specialized, naive averaging degrades both. This lesson explores the major merging techniques: Linear, SLERP, TIES, and DARE. You'll understand not just what each does, but why it works and when to use it.

The Problem: Parameter Interference

Before diving into solutions, understand the problem they solve.

When you fine-tune two adapters on different tasks, they adjust overlapping parameters in different directions:

Base Model Weight (Layer 5, Neuron 42): 0.15

Persona Adapter Change:  +0.08  → New value: 0.23
Agentic Adapter Change:  -0.05  → New value: 0.10

What should the merged value be? Options:

Average: (0.23 + 0.10) / 2 = 0.165
Keep persona: 0.23
Keep agentic: 0.10
Something else?

Simple averaging gives 0.165—but this represents neither capability well. The persona wanted to increase this weight; the agentic wanted to decrease it. Averaging creates a compromise that satisfies neither.

This is parameter interference: overlapping modifications that cancel or distort each other when combined naively.

Why Interference Matters

Not all parameters experience interference:

Interference Type	Frequency	Impact
No overlap	~70% of parameters	Both changes preserved perfectly
Same direction	~15% of parameters	Changes reinforce (good)
Opposite direction	~15% of parameters	Changes cancel (bad)

The ~15% of conflicting parameters can significantly harm capability. Merging techniques differ in how they handle this minority of problematic parameters.

Technique 1: Linear Interpolation (Baseline)

The simplest approach: weighted average of all parameters.

How It Works

# Conceptually:
merged_weight = alpha * adapter_1_weight + (1 - alpha) * adapter_2_weight

# Example with alpha = 0.5 (equal weighting):
merged = 0.5 * persona_value + 0.5 * agentic_value

For task vectors (differences from base model):

merged_delta = alpha * delta_1 + (1 - alpha) * delta_2
merged_model = base_model + merged_delta

Strengths

Strength	Why It Matters
Simple	Easy to understand, debug, implement
Fast	Single pass through parameters
Adjustable	Alpha controls blend ratio

Weaknesses

Weakness	Impact
Ignores conflicts	Averaging destroys opposing changes
Dilutes strong signals	Important changes get reduced
No redundancy handling	Keeps unnecessary parameter changes

When to Use

Quick baseline to see if merging is viable
Adapters trained on very similar data (few conflicts expected)
One adapter dominates (set alpha = 0.8 or higher)

MergeKit Configuration

merge_method: linear
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
        parameters:
          weight: 0.5
      - model: ./agentic_adapter
        layer_range: [0, 32]
        parameters:
          weight: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

Technique 2: SLERP (Spherical Linear Interpolation)

SLERP interpolates vectors along the surface of a sphere rather than in a straight line.

The Intuition

Imagine model weights as a point on a high-dimensional sphere. Linear interpolation moves directly between two points—but this path cuts through the interior of the sphere, reducing magnitude.

SLERP stays on the sphere's surface, preserving the "size" of the weight vector.

          Linear (cuts through)
         ●─────────────────●
        /                   \
       /                     \
      ●─────────────────────●
          SLERP (follows curve)

Why Magnitude Matters

Research suggests that weight magnitude encodes learned strength. Reducing magnitude through linear interpolation may weaken learned behaviors. SLERP preserves magnitude while smoothly transitioning between models.

When SLERP Helps

SLERP matters most when:

Combining exactly two models
Models have similar training (same task, different random seeds)
You want smooth interpolation without magnitude loss

Strengths

Strength	Why It Matters
Preserves magnitude	Learned strength maintained
Smooth interpolation	Continuous transition between models
Geometric properties	Respects high-dimensional structure

Weaknesses

Weakness	Impact
Only two models	Cannot merge 3+ models directly
Still averages conflicts	Same sign-conflict problem as linear
Computationally heavier	Requires trigonometric operations

MergeKit Configuration

merge_method: slerp
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
      - model: ./agentic_adapter
        layer_range: [0, 32]
parameters:
  t: 0.5  # Interpolation factor (0 = first model, 1 = second)
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

Technique 3: TIES-Merging (Trim, Elect, Sign)

TIES (Trim, Elect Signs, and Merge) directly addresses parameter interference through a three-step process.

Step 1: Trim

Most fine-tuning changes are small—essentially noise. TIES trims parameters with small magnitude changes:

Before Trim:
  Parameter 1: +0.002  (tiny change)
  Parameter 2: +0.150  (significant change)
  Parameter 3: -0.001  (tiny change)
  Parameter 4: -0.200  (significant change)

After Trim (keep top 50%):
  Parameter 1: 0       (trimmed)
  Parameter 2: +0.150  (kept)
  Parameter 3: 0       (trimmed)
  Parameter 4: -0.200  (kept)

This focuses the merge on parameters that actually matter.

Step 2: Elect Signs

For parameters where adapters disagree on direction, TIES "votes" to elect a consensus sign:

Parameter 42:
  Adapter 1: +0.08 (positive)
  Adapter 2: -0.05 (negative)
  Adapter 3: +0.12 (positive)

Elected sign: POSITIVE (2 vs 1)

The winning sign determines the final direction.

Step 3: Merge with Resolved Signs

After electing signs, TIES averages only the values that agree with the elected sign:

Parameter 42 (elected sign: POSITIVE):
  Adapter 1: +0.08  → include
  Adapter 2: -0.05  → exclude
  Adapter 3: +0.12  → include

Merged value: (0.08 + 0.12) / 2 = 0.10

Why TIES Works

By eliminating noise (trim) and resolving conflicts (elect), TIES produces cleaner merged parameters:

Problem	TIES Solution
Noise in fine-tuning	Trimmed away
Sign conflicts	Resolved by voting
Magnitude dilution	Only agreed values averaged

Strengths

Strength	Why It Matters
Handles conflicts	Sign election resolves interference
Reduces noise	Trimming focuses on important changes
Scales to N models	Can merge 3+ adapters at once

Weaknesses

Weakness	Impact
Loses minority signals	Minority sign gets discarded
Requires density tuning	Trim percentage affects results
More complex	Harder to debug than linear

MergeKit Configuration

merge_method: ties
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
      - model: ./agentic_adapter
        layer_range: [0, 32]
parameters:
  weight: 0.5     # Per-model weight
  density: 0.5    # Keep top 50% of changes (trim rest)
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

Tuning Density

The density parameter controls how aggressively TIES trims:

Density	Effect	Use When
0.3	Very aggressive trim	Adapters share many redundant changes
0.5	Balanced (default)	General purpose
0.7	Light trim	Adapters have distinct, important changes
1.0	No trim (TIES sign election only)	Want conflict resolution but keep all values

Technique 4: DARE (Drop And REscale)

DARE takes an even more aggressive approach: randomly drop most parameter changes, then rescale the survivors.

The Core Insight

Research found that fine-tuned model capabilities are surprisingly robust. You can drop 90% or even 99% of parameter changes and still preserve most functionality. The remaining 10% captures the essential learning.

How It Works

Step 1: Drop

Randomly set most delta parameters to zero:

Before Drop (drop_rate = 0.9):
  delta_1: +0.08
  delta_2: +0.15
  delta_3: -0.05
  delta_4: +0.20
  delta_5: -0.12

After Drop (random selection):
  delta_1: 0       (dropped)
  delta_2: +0.15   (kept)
  delta_3: 0       (dropped)
  delta_4: 0       (dropped)
  delta_5: -0.12   (kept)

Step 2: Rescale

Multiply surviving values to compensate for dropped values:

Rescale factor: 1 / (1 - drop_rate) = 1 / 0.1 = 10

Rescaled:
  delta_2: +0.15 * 10 = +1.50
  delta_5: -0.12 * 10 = -1.20

Why Rescaling Works

The intuition: if you keep 10% of changes, those changes need to do 10x the work to approximate the original effect. Rescaling maintains the expected magnitude of the aggregate change.

DARE + TIES

MergeKit offers dare_ties: DARE's dropping combined with TIES's sign election. This handles:

Redundancy (DARE's aggressive dropping)
Conflicts (TIES's sign election)

Strengths

Strength	Why It Matters
Extreme compression	Model capabilities from tiny parameter changes
Reduces interference	Fewer non-zero parameters = fewer conflicts
Fast merging	Sparse operations are efficient

Weaknesses

Weakness	Impact
Random drops	Results vary run-to-run
Risk of capability loss	Too aggressive dropping hurts
Requires tuning	Drop rate is critical

MergeKit Configuration

merge_method: dare_ties
slices:
  - sources:
      - model: ./persona_adapter
        layer_range: [0, 32]
      - model: ./agentic_adapter
        layer_range: [0, 32]
parameters:
  weight: 0.5
  density: 0.3  # Keep only 30% of parameters (drop 70%)
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

Technique Comparison Summary

Technique	Core Mechanism	Best For	Handles Conflicts?	RAM Efficient?
Linear	Weighted average	Quick baseline	No	Yes
SLERP	Spherical interpolation	2 similar models	No	Yes
TIES	Trim + Sign election	Distinct capabilities	Yes	Yes
DARE	Drop + Rescale	Aggressive compression	Partial	Yes
DARE-TIES	Drop + Sign election	Best of both	Yes	Yes

Choosing Your Technique: Decision Tree

┌────────────────────────────────────────────────────────┐
│                 How many models?                       │
└───────────────────────┬────────────────────────────────┘
                        │
        ┌───────────────┴───────────────┐
        │                               │
   ┌────▼────┐                    ┌─────▼─────┐
   │  TWO    │                    │  THREE+   │
   └────┬────┘                    └─────┬─────┘
        │                               │
   Are they similar?               Use TIES or DARE-TIES
        │                               │
   ┌────┴────┐                          │
   │         │                          ▼
┌──▼──┐   ┌──▼──┐                  (SLERP not available)
│ YES │   │ NO  │
└──┬──┘   └──┬──┘
   │         │
   ▼         ▼
SLERP     TIES or DARE-TIES

For two distinct adapters (like persona + agentic):

Start with TIES (density=0.5)
Evaluate both capabilities
If one capability dominates, adjust weights
If both weak, try lower density or DARE-TIES
Compare to linear baseline to ensure TIES helps

Applying to Task API Adapters

For your TaskMaster persona + agentic tool-calling merge:

Recommendation: TIES

Rationale:

Two distinct capabilities (persona style vs. structured output)
Likely parameter conflicts (both modify generation behavior)
Both capabilities are critical (can't sacrifice either)

Starting configuration:

merge_method: ties
slices:
  - sources:
      - model: ./task_api_persona_adapter
        layer_range: [0, 32]
      - model: ./task_api_agentic_adapter
        layer_range: [0, 32]
parameters:
  weight: 0.5
  density: 0.5
base_model: unsloth/Llama-3.2-3B-Instruct
dtype: float16

If persona traits are weak post-merge, try:

Increase persona weight to 0.6
Decrease density to 0.3 (more aggressive conflict resolution)

If tool-calling accuracy drops, try:

Increase agentic weight to 0.6
Try DARE-TIES (compression may help structure)

Try With AI

Use your AI companion to deepen your understanding of these techniques.

Prompt 1: Visualize the Differences

I've read about Linear, SLERP, TIES, and DARE merging techniques.
Help me build intuition by walking through a concrete example:

Imagine two adapters that both modify the same 10 parameters.
Show me what happens to those 10 parameters under each technique:
1. Some parameters where adapters agree
2. Some where they disagree (opposite signs)
3. Some where one has a large change, the other small

Walk through step-by-step how each technique handles each case.
Use simple numbers I can follow mentally.

What you're learning: Concrete understanding—moving from abstract descriptions to intuitive grasp through worked examples.

Prompt 2: Diagnose My Merge Failure

I merged two adapters and the result is bad. Help me diagnose:

Merged model behavior:
- Persona traits are barely visible
- Tool-calling works but feels "generic"
- Neither capability is as strong as the individual adapters

My merge config:
merge_method: linear
weight: 0.5

What went wrong? What technique should I try instead?
Walk me through the reasoning, not just the answer.

What you're learning: Diagnostic reasoning—using technique knowledge to troubleshoot real merge failures.

Prompt 3: Update My Skill

After learning about TIES, SLERP, and DARE, my model-merging skill
from Lesson 0 feels incomplete. It has strategy selection guidance,
but it doesn't explain the *mechanisms*.

Help me write a new section for my skill called "Technique Mechanisms"
that explains in 5-6 sentences each:
1. Why TIES uses sign election (what problem it solves)
2. Why DARE's dropping works (the insight that enables it)
3. When SLERP beats linear (the magnitude preservation benefit)

Keep it brief—this is reference material, not a tutorial.

What you're learning: Skill refinement—integrating new knowledge into your reusable intelligence asset.

Safety Note

Merging techniques make assumptions about parameter distributions and interference patterns that may not hold for all models. The techniques were developed primarily for decoder-only language models (GPT, Llama). Results may vary for other architectures. Always validate merged models on representative test sets before deployment—technique selection affects model behavior in subtle ways.

The Problem: Parameter Interference​

Why Interference Matters​

Technique 1: Linear Interpolation (Baseline)​

How It Works​

Strengths​

Weaknesses​

When to Use​

MergeKit Configuration​

Technique 2: SLERP (Spherical Linear Interpolation)​

The Intuition​

Why Magnitude Matters​

When SLERP Helps​

Strengths​

Weaknesses​

MergeKit Configuration​

Technique 3: TIES-Merging (Trim, Elect, Sign)​

Step 1: Trim​

Step 2: Elect Signs​

Step 3: Merge with Resolved Signs​

Why TIES Works​

Strengths​

Weaknesses​

MergeKit Configuration​

Tuning Density​

Technique 4: DARE (Drop And REscale)​

The Core Insight​

How It Works​

Why Rescaling Works​

DARE + TIES​

Strengths​

Weaknesses​

MergeKit Configuration​

Technique Comparison Summary​

Choosing Your Technique: Decision Tree​

Applying to Task API Adapters​

Try With AI​

Prompt 1: Visualize the Differences​

Prompt 2: Diagnose My Merge Failure​

Prompt 3: Update My Skill​

Safety Note​

The Problem: Parameter Interference

Why Interference Matters

Technique 1: Linear Interpolation (Baseline)

How It Works

Strengths

Weaknesses

When to Use

MergeKit Configuration

Technique 2: SLERP (Spherical Linear Interpolation)

The Intuition

Why Magnitude Matters

When SLERP Helps

Strengths

Weaknesses

MergeKit Configuration

Technique 3: TIES-Merging (Trim, Elect, Sign)

Step 1: Trim

Step 2: Elect Signs

Step 3: Merge with Resolved Signs

Why TIES Works

Strengths

Weaknesses

MergeKit Configuration

Tuning Density

Technique 4: DARE (Drop And REscale)

The Core Insight

How It Works

Why Rescaling Works

DARE + TIES

Strengths

Weaknesses

MergeKit Configuration

Technique Comparison Summary

Choosing Your Technique: Decision Tree

Applying to Task API Adapters

Try With AI

Prompt 1: Visualize the Differences

Prompt 2: Diagnose My Merge Failure

Prompt 3: Update My Skill

Safety Note