Generative Adversarial Networks (GANs)
GANs are a framework for training generative models through an adversarial game between two networks. They revolutionized synthetic data generation, especially for images.
The Adversarial Game
Two Players
- Generator (G): Creates fake samples
- Discriminator (D): Distinguishes real from fake
Noise z → [Generator] → Fake sample
↓
Real sample ─────────────→ [Discriminator] → Real or Fake?
The Competition
- G tries to fool D
- D tries to catch G
- Both improve through competition
Training Objective
Minimax Game
min_G max_D V(D, G) = E[log D(x)] + E[log(1 - D(G(z)))]
Discriminator maximizes:
- log D(x): High score for real samples
- log(1 - D(G(z))): Low score for fake samples
Generator minimizes:
- log(1 - D(G(z))): Make D score fake as real
Alternative Generator Objective
In practice, use:
max log D(G(z)) instead of min log(1 - D(G(z)))
Better gradients early in training.
Training Algorithm
for epoch in range(num_epochs):
# Train Discriminator
for _ in range(d_steps):
real = sample_real_data()
fake = G(sample_noise())
d_loss = -log(D(real)) - log(1 - D(fake))
d_loss.backward()
d_optimizer.step()
# Train Generator
fake = G(sample_noise())
g_loss = -log(D(fake))
g_loss.backward()
g_optimizer.step()
Architecture
Basic GAN
class Generator(nn.Module):
def __init__(self, latent_dim, output_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(latent_dim, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, output_dim),
nn.Tanh() # Output in [-1, 1]
)
def forward(self, z):
return self.net(z)
class Discriminator(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 1),
nn.Sigmoid()
)
def forward(self, x):
return self.net(x)
DCGAN (Deep Convolutional)
For images, use transposed convolutions:
# Generator: upsample
nn.ConvTranspose2d(512, 256, 4, 2, 1) # 4x4 → 8x8
# Discriminator: downsample
nn.Conv2d(3, 64, 4, 2, 1) # 64x64 → 32x32
GAN Variants
Conditional GAN (cGAN)
Condition on class label:
G(z, y) → image of class y
D(x, y) → real image of class y?
Wasserstein GAN (WGAN)
Use Wasserstein distance instead of JS divergence:
Loss = E[D(real)] - E[D(fake)]
- No sigmoid in discriminator
- Clip weights or use gradient penalty
- More stable training
StyleGAN
State-of-the-art image generation:
- Mapping network: z → w
- Style injection at each layer
- Progressive growing
CycleGAN
Unpaired image-to-image translation:
Horse → Zebra (without paired examples)
Pix2Pix
Paired image-to-image translation:
Sketch → Photo (with paired training data)
Training Challenges
Mode Collapse
Problem: Generator produces limited variety
Solutions:
- Mini-batch discrimination
- Unrolled GANs
- Wasserstein loss
Training Instability
Problem: Oscillating or diverging loss
Solutions:
- Two-timescale update (TTUR)
- Spectral normalization
- Progressive training
Vanishing Gradients
Problem: D too good → no gradient for G
Solutions:
- Alternative G loss: max log D(G(z))
- Wasserstein loss
- Label smoothing
Training Tips
Architecture
- Use LeakyReLU in D
- Use BatchNorm (but not in D input layer)
- Tanh output for G, no activation for D (WGAN)
Hyperparameters
lr_G = lr_D = 0.0002
betas = (0.5, 0.999) # Adam
latent_dim = 100
batch_size = 64
Best Practices
- Train D more than G (1-5 steps)
- Use label smoothing (0.9 instead of 1.0)
- Add noise to real images
- Monitor generated samples visually
Evaluation
Inception Score (IS)
IS = exp(E[KL(p(y|x) || p(y))])
Higher = better quality and diversity.
Fréchet Inception Distance (FID)
FID = ||μ_real - μ_fake||² + Tr(Σ_real + Σ_fake - 2√(Σ_real Σ_fake))
Lower = closer to real distribution.
Applications
- Image generation: Faces, art, fashion
- Image-to-image: Style transfer, super-resolution
- Data augmentation: Generate training data
- Video synthesis: Deepfakes (ethically concerning)
- 3D generation: NeRF + GANs
GAN vs VAE vs Diffusion
| Aspect | GAN | VAE | Diffusion |
|---|---|---|---|
| Training | Adversarial | Reconstruction + KL | Denoising |
| Sample quality | High | Medium | Highest |
| Mode coverage | Can collapse | Good | Good |
| Training stability | Hard | Easy | Easy |
| Latent space | No guarantee | Structured | Implicit |
Key Takeaways
- GANs: Generator vs Discriminator game
- Generator creates, Discriminator judges
- Minimax objective drives both to improve
- Training is challenging (mode collapse, instability)
- WGAN improves stability with Wasserstein loss
- Largely superseded by diffusion models for image generation