Autoencoders

Autoencoders are neural networks that learn to compress data into a lower-dimensional representation and then reconstruct the original input. They're powerful tools for dimensionality reduction, denoising, and generative modeling.

Architecture

Input → [Encoder] → Latent Code → [Decoder] → Reconstruction
  x    →     z     →     z      →           →      x̂

Components

Encoder: Compresses input to latent representation
Latent Space: Compressed representation (bottleneck)
Decoder: Reconstructs input from latent code

Training Objective

Loss = ||x - x̂||²

Minimize reconstruction error.

Basic Autoencoder

class Autoencoder(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.ReLU(),
            nn.Linear(256, latent_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.ReLU(),
            nn.Linear(256, input_dim),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        z = self.encoder(x)
        x_hat = self.decoder(z)
        return x_hat
    
    def encode(self, x):
        return self.encoder(x)

Types of Autoencoders

Undercomplete Autoencoder

Latent dimension < input dimension:

784 → 64 → 784

Forces compression, learns important features.

Overcomplete Autoencoder

Latent dimension ≥ input dimension:

784 → 1000 → 784

Needs regularization to avoid identity mapping.

Sparse Autoencoder

Add sparsity penalty:

Loss = ||x - x̂||² + λ × sparsity(z)

Most latent units inactive for any input.

Denoising Autoencoder (DAE)

Train to reconstruct from corrupted input:

x̃ = corrupt(x)       # Add noise
z = encoder(x̃)
x̂ = decoder(z)
Loss = ||x - x̂||²    # Reconstruct clean!

Learns robust features.

Contractive Autoencoder (CAE)

Penalize sensitivity to input:

Loss = ||x - x̂||² + λ × ||∂z/∂x||²

Learns smooth, stable representations.

Variational Autoencoder (VAE)

Key Difference

Encode to distribution, not point:

Encoder → μ, σ (mean and std of Gaussian)
z ~ N(μ, σ²)  (sample from distribution)
Decoder(z) → x̂

Loss Function

Loss = Reconstruction + KL Divergence
     = ||x - x̂||² + KL(q(z|x) || p(z))

KL term regularizes latent space to be Gaussian.

Reparameterization Trick

To backpropagate through sampling:

z = μ + σ × ε,  where ε ~ N(0, 1)

Implementation

class VAE(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.ReLU()
        )
        self.fc_mu = nn.Linear(256, latent_dim)
        self.fc_logvar = nn.Linear(256, latent_dim)
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.ReLU(),
            nn.Linear(256, input_dim),
            nn.Sigmoid()
        )
    
    def encode(self, x):
        h = self.encoder(x)
        return self.fc_mu(h), self.fc_logvar(h)
    
    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std
    
    def decode(self, z):
        return self.decoder(z)
    
    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

def vae_loss(x, x_hat, mu, logvar):
    recon = F.mse_loss(x_hat, x, reduction='sum')
    kl = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return recon + kl

Applications

Dimensionality Reduction

latent_codes = model.encode(data)
# Use for visualization, clustering

Non-linear alternative to PCA.

Denoising

Noisy image → Trained denoiser → Clean image

Anomaly Detection

def detect_anomaly(x):
    x_hat = model(x)
    error = ((x - x_hat) ** 2).mean()
    return error > threshold  # High error = anomaly

Generation (VAE)

# Sample from prior
z = torch.randn(1, latent_dim)
generated = model.decode(z)

Data Augmentation

Generate variations of training data.

Convolutional Autoencoders

For images, use conv layers:

class ConvAutoencoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 32, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, 3, stride=2, padding=1),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 1, 3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
        )

Autoencoder vs PCA

Aspect	PCA	Autoencoder
Transformation	Linear	Non-linear
Training	Closed-form	Gradient descent
Interpretability	Eigenvectors	Black box
Scalability	Memory intensive	Mini-batch

With linear activation, autoencoder ≈ PCA.

Tips for Training

Architecture

Symmetry helps (encoder/decoder mirror)
Start simple, add complexity
Bottleneck size: experiment

Loss Functions

MSE for real-valued data
Binary cross-entropy for binary data
Perceptual loss for images

Regularization

Dropout in encoder
Noise injection (denoising)
Weight decay

Key Takeaways

Autoencoders learn compressed representations
Encoder → Latent → Decoder architecture
Training minimizes reconstruction error
VAEs enable generation by modeling latent distribution
Applications: compression, denoising, anomaly detection
For images, use convolutional architectures

Autoencoders

Architecture

Components

Training Objective

Basic Autoencoder

Types of Autoencoders

Undercomplete Autoencoder

Overcomplete Autoencoder

Sparse Autoencoder

Denoising Autoencoder (DAE)

Contractive Autoencoder (CAE)

Variational Autoencoder (VAE)

Key Difference

Loss Function

Reparameterization Trick

Implementation

Applications

Dimensionality Reduction

Denoising

Anomaly Detection

Generation (VAE)

Data Augmentation

Convolutional Autoencoders

Autoencoder vs PCA

Tips for Training

Architecture

Loss Functions

Regularization

Key Takeaways

Related Concepts