Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors are fundamental concepts in linear algebra that appear throughout machine learning, from PCA to understanding how neural networks transform data.

The Basic Idea

For a square matrix A, an eigenvector v is a non-zero vector that, when multiplied by A, only gets scaled (not rotated):

A × v = λ × v

Where:

v is the eigenvector (direction that stays the same)
λ (lambda) is the eigenvalue (the scaling factor)

Intuitive Understanding

Think of a matrix as a transformation. Most vectors change direction when transformed. But eigenvectors are special - they only stretch or shrink, staying on the same line.

Example: Consider a 2D stretch transformation:

Vectors along the stretch direction are eigenvectors
The stretch factor is the eigenvalue

Computing Eigenvectors

To find eigenvalues, solve:

det(A - λI) = 0

This gives you eigenvalues. Then for each λ, solve:

(A - λI)v = 0

to find the corresponding eigenvector.

Properties

Symmetric Matrices (A = Aᵀ)

All eigenvalues are real (not complex)
Eigenvectors are orthogonal
Can always be diagonalized

This is why covariance matrices (symmetric!) are so nice for PCA.

Eigendecomposition

A matrix can often be decomposed as:

A = VΛV⁻¹

Where V contains eigenvectors and Λ is diagonal with eigenvalues.

Trace and Determinant

Sum of eigenvalues = trace(A)
Product of eigenvalues = det(A)

Applications in Machine Learning

Principal Component Analysis (PCA)

PCA finds directions of maximum variance:

Compute covariance matrix of data
Find its eigenvectors and eigenvalues
Eigenvectors are principal components
Eigenvalues indicate variance explained

Larger eigenvalue = more important direction.

Spectral Clustering

Build similarity/Laplacian matrix
Compute eigenvectors
Cluster in the eigenvector space

Eigenvectors reveal the cluster structure.

Google's PageRank

The importance of web pages is the eigenvector of the link matrix corresponding to eigenvalue 1.

Neural Network Analysis

Hessian eigenvalues: Indicate loss landscape curvature
Large eigenvalues → sharp minima
Eigenspectrum helps understand trainability

Markov Chains

The stationary distribution is an eigenvector with eigenvalue 1.

Singular Value Decomposition (SVD)

SVD generalizes eigendecomposition to non-square matrices:

A = UΣVᵀ

U: left singular vectors (eigenvectors of AAᵀ)
Σ: singular values (sqrt of eigenvalues)
V: right singular vectors (eigenvectors of AᵀA)

SVD is used for:

Matrix factorization in recommendations
Latent semantic analysis in NLP
Image compression

Numerical Considerations

Power Iteration

Simple algorithm to find largest eigenvalue:

Start with random vector v
Repeat: v = Av / ||Av||
Converges to principal eigenvector

Computational Cost

Full eigendecomposition: O(n³)
Just top k eigenvectors: Much faster (Lanczos, randomized methods)

Key Takeaways

Eigenvectors are directions preserved by a transformation
Eigenvalues are the corresponding scaling factors
PCA uses eigenvectors of covariance matrix
Symmetric matrices have real eigenvalues and orthogonal eigenvectors
SVD extends these ideas to rectangular matrices