intermediateDeep Learning

Learn transfer learning - the technique of leveraging knowledge from pretrained models to solve new tasks with less data and training time.

transfer-learningpretrained-modelsfine-tuningcomputer-visionnlp

Transfer Learning

Transfer learning uses knowledge gained from solving one problem to help solve a different but related problem. It's one of the most powerful techniques in modern deep learning.

Why Transfer Learning?

The Problem

  • Deep learning needs lots of data
  • Labeling data is expensive
  • Training from scratch takes time and compute
  • Many tasks have limited data

The Solution

  • Start with a model pretrained on large dataset
  • Adapt it to your specific task
  • Achieve good results with less data and time

How It Works

The Intuition

Models learn hierarchical features:

Images (CNN):
  Layer 1: Edges, colors
  Layer 2: Textures, patterns
  Layer 3: Parts (eyes, wheels)
  Layer 4: Objects
  Layer 5: Scenes

Text (Transformer):
  Lower layers: Syntax, grammar
  Middle layers: Semantics
  Upper layers: Task-specific

Lower layers learn general features that transfer across tasks!

Transfer Learning Strategies

Strategy 1: Feature Extraction

Use pretrained model as fixed feature extractor:

# Freeze pretrained layers
for param in pretrained_model.parameters():
    param.requires_grad = False

# Add new classifier head
model = nn.Sequential(
    pretrained_model,
    nn.Linear(features, num_classes)
)

# Only train the new head

When to use:

  • Very small dataset
  • Target task very similar to source
  • Limited compute

Strategy 2: Fine-tuning

Unfreeze some/all layers and train with small learning rate:

# Unfreeze last few layers
for param in pretrained_model.layer4.parameters():
    param.requires_grad = True

# Train with small learning rate
optimizer = Adam(model.parameters(), lr=1e-5)

When to use:

  • Moderate dataset size
  • Have compute resources
  • Target task somewhat different from source

Strategy 3: Gradual Unfreezing

Unfreeze layers progressively during training:

Epoch 1-2: Train only head
Epoch 3-4: Unfreeze last layer, train
Epoch 5-6: Unfreeze more layers, train
...

Prevents catastrophic forgetting of pretrained features.

Fine-tuning Best Practices

Learning Rate

# Lower learning rate than training from scratch
lr = 1e-5 to 1e-4  # vs 1e-3 for from-scratch

# Discriminative learning rates (different lr per layer)
optimizer = Adam([
    {'params': model.base.parameters(), 'lr': 1e-5},
    {'params': model.head.parameters(), 'lr': 1e-3}
])

Data Augmentation

# Match augmentation to pretrained model's training
transforms.Normalize(mean=[0.485, 0.456, 0.406],  # ImageNet stats
                     std=[0.229, 0.224, 0.225])

Batch Size

  • Can often use larger batch sizes with frozen layers
  • Reduce batch size when fine-tuning (more memory for gradients)

Computer Vision Transfer Learning

Popular Pretrained Models

ModelPretrained OnUse Case
ResNetImageNetGeneral, good baseline
EfficientNetImageNetEfficient, accurate
ViTImageNet-21kStrong with more data
CLIP400M image-text pairsZero-shot, multimodal

Example: Image Classification

import torchvision.models as models
import torch.nn as nn

# Load pretrained ResNet
model = models.resnet50(pretrained=True)

# Replace final layer for your task
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, num_classes)

# Fine-tune
for param in model.parameters():
    param.requires_grad = False
for param in model.fc.parameters():
    param.requires_grad = True

NLP Transfer Learning

Evolution

Word2Vec (2013): Pretrained word embeddings
↓
ELMo (2018): Contextualized embeddings
↓
BERT (2019): Pretrained transformers
↓
GPT-3 (2020): Massive pretrained models
↓
ChatGPT (2022): Instruction-tuned LLMs

Modern NLP Pattern

from transformers import AutoModel, AutoTokenizer

# Load pretrained BERT
model = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Add task-specific head
classifier = nn.Sequential(
    model,
    nn.Linear(768, num_classes)
)

# Fine-tune on your data

When Transfer Learning Helps

Dataset Size Guidelines

Your DataStrategy
Very small (<1K)Feature extraction only
Small (1K-10K)Fine-tune top layers
Medium (10K-100K)Fine-tune most/all layers
Large (>100K)Fine-tune all, or train from scratch

Domain Similarity

SimilarityApproach
Very similarFeature extraction works well
Somewhat similarFine-tune top layers
DifferentFine-tune more layers, maybe from scratch

Negative Transfer

When transfer learning hurts:

  • Source and target domains too different
  • Pretrained model captures irrelevant features
  • Target dataset very large (doesn't need transfer)

Signs:

  • Fine-tuned model worse than from-scratch
  • Training loss doesn't decrease

Solutions:

  • Try different pretrained model
  • Fine-tune fewer layers
  • Train from scratch

Key Takeaways

  1. Transfer learning leverages pretrained models for new tasks
  2. Lower layers learn general features that transfer well
  3. Feature extraction: freeze model, train new head
  4. Fine-tuning: unfreeze layers, use small learning rate
  5. More data and different domains → more fine-tuning
  6. Essential for modern CV and NLP (BERT, ResNet, etc.)