Catastrophic forgetting = khi fine-tune model trên dataset task mới, model quên khả năng chung đã học pre-training. Ví dụ: fine-tune GPT trên medical Q&A xong → trả lời tốt medical nhưng kém coding, yếu reasoning, mất tone "trợ lý hữu ích".
Nguyên nhân: gradient update vào weights làm lệch khỏi distribution pre-training. Loss của task mới thấp nhưng capability cũ bị "overwritten" ở các layer chia sẻ.
Triệu chứng:
- Benchmark chung (MMLU, HellaSwag, HumanEval) drop nhiều.
- Model "too narrow" — chỉ giỏi task fine-tune, refuse hoặc kém với task ngoài.
- Mất alignment: tone thô, không refuse harmful như trước.
Biện pháp phòng tránh:
1. Parameter-Efficient Fine-Tuning (PEFT) — LoRA/QLoRA freeze weights gốc, chỉ update adapter nhỏ → core capability giữ nguyên. Cách đơn giản và hiệu quả nhất. Merge adapter vào base model khi cần inference tối ưu, hoặc giữ adapter riêng để swap.
2. Rehearsal / Replay — trộn dataset task mới với data từ phân phối pre-training (hoặc instruction-tuning general-purpose). Tỷ lệ 70% task mới + 30% general là common. Dataset rehearsal phổ biến: Alpaca, Dolly, UltraChat, SlimOrca.
3. Regularization
- KL divergence penalty — loss thêm term KL(new_model || base_model) để giữ output distribution gần base.
- EWC (Elastic Weight Consolidation) — penalize update vào weights "quan trọng" cho task cũ.
4. Low learning rate + few epochs — train quá nhiều/quá mạnh → forget mạnh. Start với lr=1e-5 cho full FT, 2e-4 cho LoRA, 1-3 epoch thường đủ. Monitor loss dev set; stop sớm khi bắt đầu overfit.
5. Freeze lower layers — chỉ unfreeze các layer trên (gần output). Layer dưới học feature chung, layer trên học task-specific. Thường freeze 50-70% layer đầu.
6. Multi-task training — train đồng thời nhiều task thay vì sequential; gradient cân bằng giảm forgetting.
Eval bắt buộc: benchmark đa domain (MMLU, HellaSwag, HumanEval) trước/sau fine-tune; task-specific test set; theo dõi "alignment tax" (model còn refuse harmful, giữ tone không).
Rule of thumb: fine-tune LLM general-purpose → ưu tiên PEFT + rehearsal. Full fine-tune chỉ khi có infrastructure và tolerate capability loss.
Catastrophic forgetting = when fine-tuning on a new-task dataset, the model forgets the general capabilities learned in pre-training. E.g. fine-tune GPT on medical Q&A → great on medical but weak on coding, poor reasoning, loses the "helpful assistant" tone.
Cause: gradient updates push weights off the pre-training distribution. Loss on the new task drops, but old capabilities get "overwritten" in shared layers.
Symptoms:
- General benchmarks (MMLU, HellaSwag, HumanEval) drop significantly.
- Model becomes "too narrow" — only good at the fine-tuned task, refuses or struggles otherwise.
- Alignment degrades: harsh tone, no longer refusing harmful requests as before.
Prevention:
1. Parameter-Efficient Fine-Tuning (PEFT) — LoRA/QLoRA freezes base weights, updates only small adapters → core capabilities preserved. Simplest and most effective. Merge adapter into base for optimized inference, or keep it separate to swap.
2. Rehearsal / Replay — mix the new-task dataset with pre-training-distribution data (or general instruction-tuning data). 70% task + 30% general is common. Popular rehearsal datasets: Alpaca, Dolly, UltraChat, SlimOrca.
3. Regularization
- KL divergence penalty — add a KL(new_model || base_model) term to keep output distributions close to the base.
- EWC (Elastic Weight Consolidation) — penalize updates to weights important for the old task.
4. Low learning rate + few epochs — training too much/too hard → heavy forgetting. Start at lr=1e-5 for full FT, 2e-4 for LoRA, 1–3 epochs usually enough. Monitor dev-set loss; early-stop on overfit.
5. Freeze lower layers — only unfreeze upper (closer to output) layers. Lower layers learn generic features, upper ones learn task-specific. Often freeze 50–70% of bottom layers.
6. Multi-task training — train on multiple tasks concurrently; balanced gradients reduce forgetting.
Evaluation must include: multi-domain benchmarks (MMLU, HellaSwag, HumanEval) before/after fine-tuning; task-specific test set; track "alignment tax" (does the model still refuse harmful requests, maintain tone).
Rule of thumb: extending domain knowledge on a general LLM → prefer PEFT + rehearsal. Full fine-tuning only when you have the infrastructure and can tolerate capability loss.