beginnerDeep Learning

Learn data augmentation - techniques to artificially expand training data by creating modified versions of existing samples.

data-augmentationregularizationcomputer-visiontraining

Data Augmentation

Data augmentation creates new training samples by applying transformations to existing data. It's one of the most effective techniques for improving model generalization, especially with limited data.

Why Augmentation Works

The Intuition

  • Models should be invariant to certain transformations
  • A rotated cat is still a cat
  • Augmentation teaches these invariances
  • Effectively increases dataset size

Benefits

  1. Reduces overfitting: More diverse training data
  2. Improves generalization: Learns true patterns, not artifacts
  3. Handles data scarcity: Multiplies effective dataset size
  4. Encodes invariances: Makes model robust to transformations

Image Augmentation

Geometric Transformations

import albumentations as A

transform = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
    A.RandomRotate90(p=0.5),
    A.Rotate(limit=15, p=0.5),
    A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=15),
    A.RandomCrop(height=224, width=224),
    A.Perspective(p=0.3),
])

Common transforms:

  • Flip (horizontal/vertical)
  • Rotation
  • Scaling
  • Translation (shift)
  • Cropping
  • Perspective/affine transforms

Color Transformations

transform = A.Compose([
    A.RandomBrightnessContrast(p=0.5),
    A.HueSaturationValue(p=0.5),
    A.RGBShift(p=0.3),
    A.CLAHE(p=0.3),  # Contrast Limited Adaptive Histogram Equalization
    A.ToGray(p=0.1),
])

Noise and Blur

transform = A.Compose([
    A.GaussianBlur(blur_limit=3, p=0.3),
    A.MotionBlur(p=0.2),
    A.GaussNoise(p=0.3),
    A.ISONoise(p=0.2),
])

Cutout / Random Erasing

transform = A.Compose([
    A.CoarseDropout(max_holes=8, max_height=16, max_width=16, p=0.5),
])

Randomly mask out rectangular regions. Forces model to use multiple features.

MixUp

Blend two images and their labels:

lambda_ = np.random.beta(alpha, alpha)
image = lambda_ * image1 + (1 - lambda_) * image2
label = lambda_ * label1 + (1 - lambda_) * label2

Labels become soft (e.g., 0.7 cat, 0.3 dog).

CutMix

Paste patch from one image onto another:

# Cut patch from image2, paste onto image1
image1[y1:y2, x1:x2] = image2[y1:y2, x1:x2]
label = lambda_ * label1 + (1 - lambda_) * label2

AutoAugment / RandAugment

Learned or random sequences of augmentations:

from torchvision.transforms import RandAugment
transform = RandAugment(num_ops=2, magnitude=9)

Text Augmentation

Synonym Replacement

"The quick brown fox" → "The fast brown fox"

Random Insertion

"The quick fox" → "The very quick fox"

Random Swap

"I love cats" → "cats love I"

Random Deletion

"The quick brown fox" → "The brown fox"

Back-Translation

# English → French → English
"Hello world" → "Bonjour le monde" → "Hello everyone"

EDA (Easy Data Augmentation)

from eda import eda

augmented = eda(sentence, alpha_sr=0.1, alpha_ri=0.1, alpha_rs=0.1, p_rd=0.1, num_aug=4)

LLM-based Augmentation

prompt = f"Paraphrase this sentence: '{sentence}'"
augmented = llm.generate(prompt)

Audio Augmentation

# Time stretching
audio_stretched = librosa.effects.time_stretch(audio, rate=1.2)

# Pitch shifting
audio_shifted = librosa.effects.pitch_shift(audio, sr=sr, n_steps=4)

# Add noise
audio_noisy = audio + 0.005 * np.random.randn(len(audio))

# Time masking (SpecAugment)
spec[t1:t2, :] = 0

# Frequency masking (SpecAugment)
spec[:, f1:f2] = 0

Tabular Data Augmentation

Less common but possible:

SMOTE (for imbalanced data)

from imblearn.over_sampling import SMOTE
X_aug, y_aug = SMOTE().fit_resample(X, y)

Noise Injection

X_aug = X + np.random.normal(0, 0.01, X.shape)

Feature Mixup

# Similar to image MixUp
X_aug = lambda_ * X1 + (1 - lambda_) * X2

Best Practices

Match Domain

# Medical images: Usually no horizontal flip (anatomy matters)
# Satellite images: All rotations valid
# Text: Back-translation better than random swaps

Don't Augment Validation Set

train_transform = A.Compose([...augmentations...])
val_transform = A.Compose([A.Resize(224, 224)])  # No augmentation!

Augmentation Intensity

  • Start mild, increase if overfitting
  • Too strong can hurt performance
  • Task-dependent (classification vs detection)

Online vs Offline

Online (during training):

for batch in dataloader:
    augmented = transform(batch)  # Different each epoch

Offline (precompute):

for img in dataset:
    for i in range(5):
        save(transform(img))

Online is usually preferred (more diversity).

Common Pipeline

import albumentations as A
from albumentations.pytorch import ToTensorV2

train_transform = A.Compose([
    A.RandomResizedCrop(224, 224, scale=(0.8, 1.0)),
    A.HorizontalFlip(p=0.5),
    A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    A.GaussNoise(p=0.2),
    A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ToTensorV2(),
])

val_transform = A.Compose([
    A.Resize(256, 256),
    A.CenterCrop(224, 224),
    A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ToTensorV2(),
])

Key Takeaways

  1. Augmentation artificially expands training data
  2. Teaches model invariances to transformations
  3. Image: flips, rotations, color jitter, cutout, mixup
  4. Text: synonym replacement, back-translation
  5. Don't augment validation/test sets
  6. Match augmentation to domain constraints
  7. Online augmentation provides more diversity