Autoencoders
Autoencoders are neural networks that learn to compress data into a lower-dimensional representation and then reconstruct the original input. They're powerful tools for dimensionality reduction, denoising, and generative modeling.
Architecture
Input → [Encoder] → Latent Code → [Decoder] → Reconstruction
x → z → z → → x̂
Components
- Encoder: Compresses input to latent representation
- Latent Space: Compressed representation (bottleneck)
- Decoder: Reconstructs input from latent code
Training Objective
Loss = ||x - x̂||²
Minimize reconstruction error.
Basic Autoencoder
class Autoencoder(nn.Module):
def __init__(self, input_dim, latent_dim):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, 256),
nn.ReLU(),
nn.Linear(256, latent_dim)
)
self.decoder = nn.Sequential(
nn.Linear(latent_dim, 256),
nn.ReLU(),
nn.Linear(256, input_dim),
nn.Sigmoid()
)
def forward(self, x):
z = self.encoder(x)
x_hat = self.decoder(z)
return x_hat
def encode(self, x):
return self.encoder(x)
Types of Autoencoders
Undercomplete Autoencoder
Latent dimension < input dimension:
784 → 64 → 784
Forces compression, learns important features.
Overcomplete Autoencoder
Latent dimension ≥ input dimension:
784 → 1000 → 784
Needs regularization to avoid identity mapping.
Sparse Autoencoder
Add sparsity penalty:
Loss = ||x - x̂||² + λ × sparsity(z)
Most latent units inactive for any input.
Denoising Autoencoder (DAE)
Train to reconstruct from corrupted input:
x̃ = corrupt(x) # Add noise
z = encoder(x̃)
x̂ = decoder(z)
Loss = ||x - x̂||² # Reconstruct clean!
Learns robust features.
Contractive Autoencoder (CAE)
Penalize sensitivity to input:
Loss = ||x - x̂||² + λ × ||∂z/∂x||²
Learns smooth, stable representations.
Variational Autoencoder (VAE)
Key Difference
Encode to distribution, not point:
Encoder → μ, σ (mean and std of Gaussian)
z ~ N(μ, σ²) (sample from distribution)
Decoder(z) → x̂
Loss Function
Loss = Reconstruction + KL Divergence
= ||x - x̂||² + KL(q(z|x) || p(z))
KL term regularizes latent space to be Gaussian.
Reparameterization Trick
To backpropagate through sampling:
z = μ + σ × ε, where ε ~ N(0, 1)
Implementation
class VAE(nn.Module):
def __init__(self, input_dim, latent_dim):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, 256),
nn.ReLU()
)
self.fc_mu = nn.Linear(256, latent_dim)
self.fc_logvar = nn.Linear(256, latent_dim)
self.decoder = nn.Sequential(
nn.Linear(latent_dim, 256),
nn.ReLU(),
nn.Linear(256, input_dim),
nn.Sigmoid()
)
def encode(self, x):
h = self.encoder(x)
return self.fc_mu(h), self.fc_logvar(h)
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def decode(self, z):
return self.decoder(z)
def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterize(mu, logvar)
return self.decode(z), mu, logvar
def vae_loss(x, x_hat, mu, logvar):
recon = F.mse_loss(x_hat, x, reduction='sum')
kl = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return recon + kl
Applications
Dimensionality Reduction
latent_codes = model.encode(data)
# Use for visualization, clustering
Non-linear alternative to PCA.
Denoising
Noisy image → Trained denoiser → Clean image
Anomaly Detection
def detect_anomaly(x):
x_hat = model(x)
error = ((x - x_hat) ** 2).mean()
return error > threshold # High error = anomaly
Generation (VAE)
# Sample from prior
z = torch.randn(1, latent_dim)
generated = model.decode(z)
Data Augmentation
Generate variations of training data.
Convolutional Autoencoders
For images, use conv layers:
class ConvAutoencoder(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Conv2d(1, 32, 3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, 3, stride=2, padding=1),
nn.ReLU()
)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(64, 32, 3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.ConvTranspose2d(32, 1, 3, stride=2, padding=1, output_padding=1),
nn.Sigmoid()
)
Autoencoder vs PCA
| Aspect | PCA | Autoencoder |
|---|---|---|
| Transformation | Linear | Non-linear |
| Training | Closed-form | Gradient descent |
| Interpretability | Eigenvectors | Black box |
| Scalability | Memory intensive | Mini-batch |
With linear activation, autoencoder ≈ PCA.
Tips for Training
Architecture
- Symmetry helps (encoder/decoder mirror)
- Start simple, add complexity
- Bottleneck size: experiment
Loss Functions
- MSE for real-valued data
- Binary cross-entropy for binary data
- Perceptual loss for images
Regularization
- Dropout in encoder
- Noise injection (denoising)
- Weight decay
Key Takeaways
- Autoencoders learn compressed representations
- Encoder → Latent → Decoder architecture
- Training minimizes reconstruction error
- VAEs enable generation by modeling latent distribution
- Applications: compression, denoising, anomaly detection
- For images, use convolutional architectures