Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors are fundamental concepts in linear algebra that appear throughout machine learning, from PCA to understanding how neural networks transform data.
The Basic Idea
For a square matrix A, an eigenvector v is a non-zero vector that, when multiplied by A, only gets scaled (not rotated):
A × v = λ × v
Where:
- v is the eigenvector (direction that stays the same)
- λ (lambda) is the eigenvalue (the scaling factor)
Intuitive Understanding
Think of a matrix as a transformation. Most vectors change direction when transformed. But eigenvectors are special - they only stretch or shrink, staying on the same line.
Example: Consider a 2D stretch transformation:
- Vectors along the stretch direction are eigenvectors
- The stretch factor is the eigenvalue
Computing Eigenvectors
To find eigenvalues, solve:
det(A - λI) = 0
This gives you eigenvalues. Then for each λ, solve:
(A - λI)v = 0
to find the corresponding eigenvector.
Properties
Symmetric Matrices (A = Aᵀ)
- All eigenvalues are real (not complex)
- Eigenvectors are orthogonal
- Can always be diagonalized
This is why covariance matrices (symmetric!) are so nice for PCA.
Eigendecomposition
A matrix can often be decomposed as:
A = VΛV⁻¹
Where V contains eigenvectors and Λ is diagonal with eigenvalues.
Trace and Determinant
- Sum of eigenvalues = trace(A)
- Product of eigenvalues = det(A)
Applications in Machine Learning
Principal Component Analysis (PCA)
PCA finds directions of maximum variance:
- Compute covariance matrix of data
- Find its eigenvectors and eigenvalues
- Eigenvectors are principal components
- Eigenvalues indicate variance explained
Larger eigenvalue = more important direction.
Spectral Clustering
- Build similarity/Laplacian matrix
- Compute eigenvectors
- Cluster in the eigenvector space
Eigenvectors reveal the cluster structure.
Google's PageRank
The importance of web pages is the eigenvector of the link matrix corresponding to eigenvalue 1.
Neural Network Analysis
- Hessian eigenvalues: Indicate loss landscape curvature
- Large eigenvalues → sharp minima
- Eigenspectrum helps understand trainability
Markov Chains
The stationary distribution is an eigenvector with eigenvalue 1.
Singular Value Decomposition (SVD)
SVD generalizes eigendecomposition to non-square matrices:
A = UΣVᵀ
- U: left singular vectors (eigenvectors of AAᵀ)
- Σ: singular values (sqrt of eigenvalues)
- V: right singular vectors (eigenvectors of AᵀA)
SVD is used for:
- Matrix factorization in recommendations
- Latent semantic analysis in NLP
- Image compression
Numerical Considerations
Power Iteration
Simple algorithm to find largest eigenvalue:
- Start with random vector v
- Repeat: v = Av / ||Av||
- Converges to principal eigenvector
Computational Cost
- Full eigendecomposition: O(n³)
- Just top k eigenvectors: Much faster (Lanczos, randomized methods)
Key Takeaways
- Eigenvectors are directions preserved by a transformation
- Eigenvalues are the corresponding scaling factors
- PCA uses eigenvectors of covariance matrix
- Symmetric matrices have real eigenvalues and orthogonal eigenvectors
- SVD extends these ideas to rectangular matrices