beginnerClassical Machine Learning

Understand K-Nearest Neighbors (KNN) - the simplest machine learning algorithm that makes predictions based on similar examples.

classificationregressioninstance-basednon-parametric

K-Nearest Neighbors (KNN)

K-Nearest Neighbors is one of the simplest and most intuitive machine learning algorithms. It makes predictions based on the principle that similar things exist in close proximity.

The Algorithm

To predict for a new point:

  1. Calculate distance to all training points
  2. Find the k nearest neighbors
  3. Aggregate their labels/values
New point (?)        Training data
     ↓               ● ● ■ ●
   [?]    →    Find k=3 nearest:  ●  ●  ■
                                  ↓
                           Majority vote: ●
  • Classification: Majority vote among k neighbors
  • Regression: Average (or weighted average) of k neighbors

Distance Metrics

The choice of distance measure matters!

Euclidean Distance (L2)

d(x, y) = √(Σ(xᵢ - yᵢ)²)

The straight-line distance. Most common.

Manhattan Distance (L1)

d(x, y) = Σ|xᵢ - yᵢ|

Sum of absolute differences. Better for high dimensions.

Minkowski Distance

d(x, y) = (Σ|xᵢ - yᵢ|^p)^(1/p)

Generalization: p=1 is Manhattan, p=2 is Euclidean.

Cosine Similarity

sim(x, y) = (x · y) / (||x|| × ||y||)

Measures angle between vectors. Great for text/sparse data.

The Importance of k

k is the key hyperparameter:

k valueBehavior
k = 1Most flexible, highest variance, sensitive to noise
k = smallComplex decision boundary, potential overfitting
k = largeSmoother boundary, potential underfitting
k = nPredicts majority class always

Rule of thumb: Start with k = √n (square root of training size).

Tip: Use odd k for binary classification to avoid ties.

Weighted KNN

Not all neighbors are equally relevant. Weight by distance:

prediction = Σ wᵢ × yᵢ / Σ wᵢ

where wᵢ = 1/d(x, xᵢ) or wᵢ = 1/d(x, xᵢ)²

Closer neighbors have more influence.

Feature Scaling is Critical

KNN uses distances, so feature scales matter enormously!

Problem: If age is 0-100 and salary is 0-1,000,000, salary dominates.

Solution: Always normalize features:

  • Min-max scaling: (x - min) / (max - min)
  • Standardization: (x - μ) / σ

The Curse of Dimensionality

KNN suffers badly in high dimensions:

  1. Distances become meaningless: All points become equidistant
  2. Sparse data: Need exponentially more data to cover space
  3. Computation explodes: More dimensions = slower

Solutions:

  • Dimensionality reduction (PCA)
  • Feature selection
  • Use for low-dimensional problems

Advantages

  1. Simple: Easy to understand and implement
  2. No training: Just store the data
  3. Naturally handles multi-class: No modification needed
  4. Non-parametric: No assumptions about data distribution
  5. Adapts to new data: Just add points

Disadvantages

  1. Slow prediction: Must scan all training data
  2. Memory intensive: Stores entire dataset
  3. Curse of dimensionality: Fails in high dimensions
  4. Sensitive to scale: Must normalize features
  5. Sensitive to noise: Outliers affect predictions

Making KNN Faster

KD-Trees

Partition space for faster neighbor search.

  • O(log n) search in low dimensions
  • Degrades in high dimensions

Ball Trees

Alternative structure, better for high dimensions.

Approximate Nearest Neighbors

Trade accuracy for speed.

  • Libraries: Annoy, FAISS, ScaNN
  • Used in production recommendation systems

When to Use KNN

Good for:

  • Low-dimensional data
  • When you need a quick baseline
  • Small datasets
  • When similar examples should have similar labels

Avoid for:

  • High-dimensional data (>20 features)
  • Large datasets (slow)
  • When features are on different scales (unless you scale)
  • Real-time predictions on big data

Key Takeaways

  1. KNN predicts based on the k most similar examples
  2. Always scale your features!
  3. Choose k to balance bias and variance
  4. Fails in high dimensions (curse of dimensionality)
  5. Simple but can be slow for large datasets

Practice Questions

Test your understanding with these related interview questions: