Updated Feb 23, 2026

Build Your Fine-Tuning Skill

You are about to learn supervised fine-tuning. But here is the pattern that separates effective AI-native developers from those who struggle: build your skill first, then learn the technology.

In traditional learning, you study a topic, maybe take notes, and hope you remember it later. In Skill-First learning, you create a reusable intelligence asset before you even understand the technology deeply. This asset grows with you as you learn, and by the end of the chapter, you own a production-ready skill you can sell or deploy.

This lesson follows the same pattern you used in Part 6 and Part 7. Clone a fresh skills-lab, fetch official documentation, and build your llmops-fine-tuner skill from authoritative sources rather than memory.

Why Skill-First for Fine-Tuning?

Fine-tuning has dozens of hyperparameters, multiple training strategies, and evolving best practices. Trying to memorize all of this is futile. But encoding it into a skill that you can invoke, test, and improve makes this knowledge permanently accessible and actionable.

Traditional Approach	Skill-First Approach
Read tutorials, forget details	Build skill, query anytime
Notes scattered across files	Single authoritative source
Knowledge decays over time	Skill improves with use
Cannot delegate to AI	AI executes your skill

By the end of this chapter, your llmops-fine-tuner skill will guide you through:

Dataset preparation and validation
Model and adapter configuration
Training loop setup and monitoring
Checkpoint management and export

Step 1: Clone a Fresh Skills-Lab

Start with a clean environment. This prevents state from previous experiments from affecting your work.

# Clone the skills-lab repository
git clone https://github.com/panaversity/skills-lab.git ~/skills-lab-ch64

# Navigate to the directory
cd ~/skills-lab-ch64

# Create the skill directory structure
mkdir -p .claude/skills/llmops-fine-tuner

Output:

Cloning into '/Users/you/skills-lab-ch64'...

Step 2: Write Your LEARNING-SPEC.md

Before creating the skill, define what you are trying to accomplish. This specification guides both your learning and the skill you create.

# LEARNING-SPEC.md

## What I Want to Learn
Supervised fine-tuning of large language models using LoRA/QLoRA with Unsloth
on Colab Free Tier (T4 GPU, 15GB VRAM).

## Why This Matters
I want to create proprietary models that encode my domain expertise,
enabling Digital FTEs that are differentiated from generic foundation models.

## Success Criteria
1. I can explain why LoRA reduces 99% of trainable parameters
2. I can configure hyperparameters (rank, alpha, learning rate) with reasoning
3. I can run a training loop on Colab T4 without OOM errors
4. I can export a fine-tuned model to GGUF for Ollama deployment
5. My skill accurately reflects official Unsloth documentation

## Constraints
- Must work on Colab Free Tier (T4, 15GB VRAM, 12GB RAM)
- Must use Unsloth for 2x speed and 60% memory reduction
- Must produce GGUF output for local deployment

## Running Example
Fine-tune a model to become a Task API Assistant (from Chapter 40)
that understands task management domain language.

Save this file:

# Create the learning spec
cat > LEARNING-SPEC.md << 'EOF'
# LEARNING-SPEC.md

## What I Want to Learn
Supervised fine-tuning of large language models using LoRA/QLoRA with Unsloth
on Colab Free Tier (T4 GPU, 15GB VRAM).

## Why This Matters
I want to create proprietary models that encode my domain expertise,
enabling Digital FTEs that are differentiated from generic foundation models.

## Success Criteria
1. I can explain why LoRA reduces 99% of trainable parameters
2. I can configure hyperparameters (rank, alpha, learning rate) with reasoning
3. I can run a training loop on Colab T4 without OOM errors
4. I can export a fine-tuned model to GGUF for Ollama deployment
5. My skill accurately reflects official Unsloth documentation

## Constraints
- Must work on Colab Free Tier (T4, 15GB VRAM, 12GB RAM)
- Must use Unsloth for 2x speed and 60% memory reduction
- Must produce GGUF output for local deployment

## Running Example
Fine-tune a model to become a Task API Assistant (from Chapter 40)
that understands task management domain language.
EOF

Step 3: Fetch Official Documentation

The skill must be grounded in official documentation, not AI memory which may be outdated or hallucinated.

Use Claude Code or your AI assistant:

/fetching-library-docs unsloth fine-tuning

Fetch the official Unsloth documentation covering:
1. LoRA hyperparameters (rank, alpha, learning rate)
2. QLoRA 4-bit configuration
3. SFTTrainer setup
4. Model saving and export options

Key sources to reference:

Step 4: Create Your llmops-fine-tuner Skill

Based on the documentation, create your skill file:

cat > .claude/skills/llmops-fine-tuner/SKILL.md << 'EOF'
---
name: llmops-fine-tuner
description: This skill should be used when fine-tuning language models with LoRA/QLoRA using Unsloth. Use when preparing datasets, configuring training, running SFT, and exporting models.
---

# LLMOps Fine-Tuner Skill

## Purpose

Guide supervised fine-tuning of LLMs using Parameter-Efficient Fine-Tuning (PEFT)
with LoRA/QLoRA and Unsloth optimization on consumer GPUs.

## When to Use This Skill

Invoke this skill when:
- Preparing datasets for supervised fine-tuning
- Selecting base models appropriate for task and hardware
- Configuring LoRA/QLoRA hyperparameters
- Running training loops with SFTTrainer
- Monitoring training progress and detecting issues
- Exporting models (adapter, merged, GGUF)

## Prerequisite Knowledge

- Understanding of transformers architecture (attention layers, MLP)
- Familiarity with Hugging Face ecosystem (transformers, datasets, PEFT)
- Python proficiency for debugging training issues

## Hardware Context

**Colab Free Tier (T4 GPU):**
- 15GB VRAM
- 12GB RAM
- Use QLoRA (4-bit) for models up to 8B parameters
- Gradient accumulation required for effective batch sizes

## LoRA Configuration Guide

### Core Parameters

| Parameter | Recommended | Range | Purpose |
|-----------|-------------|-------|---------|
| `r` (rank) | 16 | 8-64 | Controls adapter capacity |
| `lora_alpha` | 16 (= r) | r to 2*r | Scaling factor for updates |
| `lora_dropout` | 0 | 0-0.1 | Regularization (0 for most cases) |
| `target_modules` | All attention + MLP | Varies | Which layers to adapt |

### Target Modules (Recommended)

```python
target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",  # Attention
    "gate_proj", "up_proj", "down_proj",      # MLP
]

Rank Selection Framework

Use Case	Recommended Rank	Rationale
Simple style transfer	8	Few new patterns needed
Domain adaptation	16	Moderate new knowledge
Complex tasks	32-64	Significant new capabilities
If unsure	16	Safe default, adjust based on results

QLoRA Configuration

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # NormalFloat4 (optimal)
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,       # Reduces memory further
)

Why NF4? The NormalFloat4 data type is information-theoretically optimal for normally distributed weights, which matches how LLM weights are initialized.

Training Arguments Guide

Starting Configuration

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./outputs",
    per_device_train_batch_size=4,       # Limited by VRAM
    gradient_accumulation_steps=4,        # Effective batch = 16
    num_train_epochs=3,                   # Start small
    learning_rate=2e-4,                   # Higher for LoRA
    warmup_ratio=0.03,                    # 3% warmup
    weight_decay=0.01,                    # Light regularization
    fp16=True,                            # Mixed precision
    logging_steps=10,
    save_strategy="epoch",
    optim="adamw_8bit",                   # Memory efficient
)

Learning Rate Guidelines

Scenario	Learning Rate	Reasoning
Standard SFT	2e-4	Default for LoRA fine-tuning
Continued pre-training	5e-5	Gentler for knowledge preservation
Small datasets (<1K)	1e-4	Avoid overfitting
Large datasets (>10K)	2e-4 to 5e-4	Can be more aggressive

Unsloth Optimization

Why Unsloth?

2x faster training through optimized kernels
60% less VRAM via memory-efficient attention
No accuracy loss compared to standard training

Setup Pattern

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True,           # QLoRA mode
    dtype=None,                   # Auto-detect
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    use_gradient_checkpointing="unsloth",
)

Export Options

Method	Size	Use Case
Adapter only	~30MB	When base model available
Merged (FP16)	~16GB	Cloud deployment with full precision
GGUF q4_k_m	~4GB	Local Ollama deployment

GGUF Export

# Save as GGUF for Ollama
model.save_pretrained_gguf(
    "model-gguf",
    tokenizer,
    quantization_method="q4_k_m",
)

Troubleshooting Decision Tree

Out of Memory (OOM)

Reduce per_device_train_batch_size (try 2, then 1)
Increase gradient_accumulation_steps proportionally
Reduce max_seq_length
Enable use_gradient_checkpointing="unsloth"
If still OOM: use smaller base model

Training Loss Not Decreasing

Check learning rate (try 2e-4 first)
Verify data format matches expected template
Ensure tokenizer has proper chat template
Check for data quality issues

Overfitting (Val loss increases while train loss decreases)

Reduce epochs (try 1-2)
Increase weight_decay
Add more training data
Reduce rank if dataset is small

Validation Checks

Before declaring training complete:

Training loss decreased consistently
No OOM errors during training
Model generates coherent responses
Response format matches training data
Export completed without errors EOF

## Step 5: Verify Your Skill

Test that your skill was created correctly:

```bash
# Check the skill exists
ls -la .claude/skills/llmops-fine-tuner/

# View the skill content
head -50 .claude/skills/llmops-fine-tuner/SKILL.md

Output:

total 8
drwxr-xr-x  3 you  staff   96 Jan  1 10:00 .
drwxr-xr-x  3 you  staff   96 Jan  1 10:00 ..
-rw-r--r--  1 you  staff  4521 Jan  1 10:00 SKILL.md

What Happens Next

You now have an llmops-fine-tuner skill that is grounded in official documentation. As you progress through this chapter:

Lesson	How Your Skill Improves
L01: SFT Fundamentals	Add mental model explanations
L02: PEFT and LoRA	Refine rank/alpha selection logic
L03: QLoRA	Add quantization troubleshooting
L04: Unsloth	Add platform-specific optimizations
L05+: Training workflow	Add monitoring patterns
Capstone	Validate skill produces working model

Each lesson will include a "Reflect on Your Skill" section where you update and improve this skill based on what you learned.

Try With AI

Use your AI companion (Claude, ChatGPT, Gemini, or similar).

Prompt 1: Verify Skill Structure

I just created my llmops-fine-tuner skill. Review the structure and tell me:
1. Does it follow the SKILL.md format correctly?
2. Is the content grounded in documentation (not hallucinated)?
3. What sections should I add as I learn more about fine-tuning?

Here is my skill:
[paste your SKILL.md content]

What you are learning: Critical evaluation of your own skill structure. Your AI partner helps identify gaps before you invest time in an incomplete skill.

Prompt 2: Connect to Your Domain

I want to fine-tune a model for [your domain: legal documents, medical records,
customer support, code review, etc.]. Looking at my llmops-fine-tuner skill,
what domain-specific sections should I add? What unique challenges does my
domain present for fine-tuning?

What you are learning: Domain adaptation. Fine-tuning is not one-size-fits-all. Your AI partner helps you anticipate domain-specific challenges.

Prompt 3: Validate Against Official Docs

Compare my skill's LoRA configuration recommendations against the official
Unsloth documentation. Are there any discrepancies? Any best practices
I should add?

Specifically check:
1. Recommended rank values
2. Target modules list
3. Learning rate guidance

What you are learning: Documentation verification. You are building the habit of validating AI-generated content against authoritative sources.

Safety Note

As you create skills from documentation, remember that AI tools may not have the most current information. Always verify critical configuration values against the official source. The Unsloth documentation is updated regularly as best practices evolve.

Why Skill-First for Fine-Tuning?​

Step 1: Clone a Fresh Skills-Lab​

Step 2: Write Your LEARNING-SPEC.md​

Step 3: Fetch Official Documentation​

Step 4: Create Your llmops-fine-tuner Skill​

Rank Selection Framework​

QLoRA Configuration​

Training Arguments Guide​

Starting Configuration​

Learning Rate Guidelines​

Unsloth Optimization​

Why Unsloth?​

Setup Pattern​

Export Options​

GGUF Export​

Troubleshooting Decision Tree​

Out of Memory (OOM)​

Training Loss Not Decreasing​

Overfitting (Val loss increases while train loss decreases)​

Validation Checks​

What Happens Next​

Try With AI​

Prompt 1: Verify Skill Structure​

Prompt 2: Connect to Your Domain​

Prompt 3: Validate Against Official Docs​

Safety Note​