advancedDeep Learning

Understand GANs - a framework where two networks compete to generate realistic synthetic data.

gansgenerativeadversarialimage-generation

Generative Adversarial Networks (GANs)

GANs are a framework for training generative models through an adversarial game between two networks. They revolutionized synthetic data generation, especially for images.

The Adversarial Game

Two Players

  • Generator (G): Creates fake samples
  • Discriminator (D): Distinguishes real from fake
Noise z → [Generator] → Fake sample
                              ↓
Real sample ─────────────→ [Discriminator] → Real or Fake?

The Competition

  • G tries to fool D
  • D tries to catch G
  • Both improve through competition

Training Objective

Minimax Game

min_G max_D V(D, G) = E[log D(x)] + E[log(1 - D(G(z)))]

Discriminator maximizes:

  • log D(x): High score for real samples
  • log(1 - D(G(z))): Low score for fake samples

Generator minimizes:

  • log(1 - D(G(z))): Make D score fake as real

Alternative Generator Objective

In practice, use:

max log D(G(z))  instead of  min log(1 - D(G(z)))

Better gradients early in training.

Training Algorithm

for epoch in range(num_epochs):
    # Train Discriminator
    for _ in range(d_steps):
        real = sample_real_data()
        fake = G(sample_noise())
        
        d_loss = -log(D(real)) - log(1 - D(fake))
        d_loss.backward()
        d_optimizer.step()
    
    # Train Generator
    fake = G(sample_noise())
    g_loss = -log(D(fake))
    g_loss.backward()
    g_optimizer.step()

Architecture

Basic GAN

class Generator(nn.Module):
    def __init__(self, latent_dim, output_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, output_dim),
            nn.Tanh()  # Output in [-1, 1]
        )
    
    def forward(self, z):
        return self.net(z)

class Discriminator(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.net(x)

DCGAN (Deep Convolutional)

For images, use transposed convolutions:

# Generator: upsample
nn.ConvTranspose2d(512, 256, 4, 2, 1)  # 4x4 → 8x8

# Discriminator: downsample
nn.Conv2d(3, 64, 4, 2, 1)  # 64x64 → 32x32

GAN Variants

Conditional GAN (cGAN)

Condition on class label:

G(z, y) → image of class y
D(x, y) → real image of class y?

Wasserstein GAN (WGAN)

Use Wasserstein distance instead of JS divergence:

Loss = E[D(real)] - E[D(fake)]
  • No sigmoid in discriminator
  • Clip weights or use gradient penalty
  • More stable training

StyleGAN

State-of-the-art image generation:

  • Mapping network: z → w
  • Style injection at each layer
  • Progressive growing

CycleGAN

Unpaired image-to-image translation:

Horse → Zebra (without paired examples)

Pix2Pix

Paired image-to-image translation:

Sketch → Photo (with paired training data)

Training Challenges

Mode Collapse

Problem: Generator produces limited variety

Solutions:

  • Mini-batch discrimination
  • Unrolled GANs
  • Wasserstein loss

Training Instability

Problem: Oscillating or diverging loss

Solutions:

  • Two-timescale update (TTUR)
  • Spectral normalization
  • Progressive training

Vanishing Gradients

Problem: D too good → no gradient for G

Solutions:

  • Alternative G loss: max log D(G(z))
  • Wasserstein loss
  • Label smoothing

Training Tips

Architecture

  • Use LeakyReLU in D
  • Use BatchNorm (but not in D input layer)
  • Tanh output for G, no activation for D (WGAN)

Hyperparameters

lr_G = lr_D = 0.0002
betas = (0.5, 0.999)  # Adam
latent_dim = 100
batch_size = 64

Best Practices

  • Train D more than G (1-5 steps)
  • Use label smoothing (0.9 instead of 1.0)
  • Add noise to real images
  • Monitor generated samples visually

Evaluation

Inception Score (IS)

IS = exp(E[KL(p(y|x) || p(y))])

Higher = better quality and diversity.

Fréchet Inception Distance (FID)

FID = ||μ_real - μ_fake||² + Tr(Σ_real + Σ_fake - 2√(Σ_real Σ_fake))

Lower = closer to real distribution.

Applications

  • Image generation: Faces, art, fashion
  • Image-to-image: Style transfer, super-resolution
  • Data augmentation: Generate training data
  • Video synthesis: Deepfakes (ethically concerning)
  • 3D generation: NeRF + GANs

GAN vs VAE vs Diffusion

AspectGANVAEDiffusion
TrainingAdversarialReconstruction + KLDenoising
Sample qualityHighMediumHighest
Mode coverageCan collapseGoodGood
Training stabilityHardEasyEasy
Latent spaceNo guaranteeStructuredImplicit

Key Takeaways

  1. GANs: Generator vs Discriminator game
  2. Generator creates, Discriminator judges
  3. Minimax objective drives both to improve
  4. Training is challenging (mode collapse, instability)
  5. WGAN improves stability with Wasserstein loss
  6. Largely superseded by diffusion models for image generation