intermediateClassical Machine Learning

Understand PCA - the fundamental dimensionality reduction technique that finds the most important directions of variance in your data.

dimensionality-reductionunsupervised-learninglinear-algebrapreprocessing

Principal Component Analysis (PCA)

PCA is the most widely used dimensionality reduction technique. It transforms data into a new coordinate system where the axes (principal components) capture maximum variance.

PCA Visualization

The Core Idea

Find directions (principal components) that:

  1. Capture maximum variance in the data
  2. Are orthogonal (perpendicular) to each other
  3. Are ordered by variance explained
Original 2D data:        After PCA:
    •  •                    PC1 →
   •  •  •                 •  •  •  •
  •  •  •  •               •  •  •
   •  •  •                   ↑
    •  •                    PC2

The Algorithm

Step 1: Center the Data

X_centered = X - mean(X)

Step 2: Compute Covariance Matrix

Σ = (1/n) × Xᵀ × X

Step 3: Find Eigenvectors and Eigenvalues

Σv = λv

Eigenvectors = principal component directions
Eigenvalues = variance explained by each component

Step 4: Sort and Select

Sort eigenvectors by eigenvalue (descending)
Keep top k components

Step 5: Transform

X_reduced = X_centered × W

Where W = [v₁, v₂, ..., vₖ] (top k eigenvectors)

Variance Explained

Individual Component

Variance explained by PCᵢ = λᵢ / Σλⱼ

Cumulative Variance

Cumulative = Σᵢ₌₁ᵏ λᵢ / Σλⱼ

Often keep components that explain 95% of variance.

Choosing Number of Components

Scree Plot

Plot eigenvalues, look for "elbow":

Variance
    |\
    | \
    |  \__
    |     \___
    |_________
         PC

Cumulative Variance Threshold

pca = PCA(n_components=0.95)  # Keep 95% of variance

Kaiser Criterion

Keep components with eigenvalue > 1 (when using correlation matrix).

PCA Properties

What PCA Does

  • Decorrelates features
  • Orders by importance (variance)
  • Reduces dimensionality
  • Can reveal hidden structure

What PCA Doesn't Do

  • Consider class labels (unsupervised)
  • Guarantee better classification
  • Handle non-linear relationships
  • Work well with categorical data

When to Use PCA

Good for:

  • Visualization (reduce to 2-3 dimensions)
  • Noise reduction (remove low-variance components)
  • Feature decorrelation
  • Speeding up other algorithms
  • Handling multicollinearity

Not good for:

  • When all features are important
  • Non-linear relationships (use kernel PCA)
  • When interpretability is crucial
  • Sparse data (use TruncatedSVD)

Practical Considerations

Scaling

from sklearn.preprocessing import StandardScaler

# ALWAYS scale before PCA!
X_scaled = StandardScaler().fit_transform(X)
pca.fit(X_scaled)

PCA is sensitive to scale. Standardize first!

Interpreting Components

# Component loadings (how much each feature contributes)
components = pca.components_  # Shape: (n_components, n_features)

# PC1 = 0.5*feature1 + 0.3*feature2 - 0.4*feature3 + ...

Reconstruction

# Transform to lower dimension
X_reduced = pca.transform(X_scaled)

# Reconstruct (with some loss)
X_reconstructed = pca.inverse_transform(X_reduced)

# Reconstruction error
error = np.mean((X_scaled - X_reconstructed)**2)

Code Example

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Scale data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Fit PCA
pca = PCA(n_components=0.95)  # Keep 95% variance
X_reduced = pca.fit_transform(X_scaled)

print(f"Original dimensions: {X.shape[1]}")
print(f"Reduced dimensions: {X_reduced.shape[1]}")
print(f"Variance explained: {pca.explained_variance_ratio_}")

# Plot cumulative variance
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Variance Explained')

PCA vs Other Methods

MethodLinearSupervisedUse Case
PCAYesNoGeneral dim reduction
LDAYesYesClassification
t-SNENoNoVisualization
UMAPNoNoVisualization + structure
AutoencodersNoNoComplex patterns

Kernel PCA

For non-linear relationships:

from sklearn.decomposition import KernelPCA

kpca = KernelPCA(n_components=2, kernel='rbf')
X_kpca = kpca.fit_transform(X)

Maps to higher dimension, then applies PCA.

Key Takeaways

  1. PCA finds orthogonal directions of maximum variance
  2. Principal components are eigenvectors of covariance matrix
  3. Always standardize features before PCA
  4. Keep enough components to explain ~95% variance
  5. Good for visualization, denoising, and preprocessing
  6. Use kernel PCA for non-linear patterns

Practice Questions

Test your understanding with these related interview questions: