intermediateClassical Machine Learning

Learn about ensemble methods - techniques that combine multiple models to achieve better performance than any single model alone.

ensemblebaggingboostingstackingrandom-forest

Ensemble Methods

Ensemble methods combine multiple models to produce better predictions. The idea is simple: a committee of models is often smarter than any individual model.

Why Ensembles Work

Wisdom of the Crowd

If models make independent errors, averaging reduces total error:

Single model error: ε
N independent models averaged: ε/√N

Bias-Variance Decomposition

  • Bagging: Reduces variance (averaging stabilizes)
  • Boosting: Reduces bias (sequential correction)
  • Stacking: Can reduce both

Bagging (Bootstrap Aggregating)

How It Works

  1. Create N bootstrap samples (sample with replacement)
  2. Train one model on each sample
  3. Aggregate predictions (vote or average)
Data → [Bootstrap 1] → Model 1 → \
     → [Bootstrap 2] → Model 2 →  → Aggregate → Prediction
     → [Bootstrap 3] → Model 3 → /

Random Forest

Bagging + random feature selection:

  • Each tree sees random subset of features
  • Reduces correlation between trees
  • Even better variance reduction

When to Use

  • High-variance models (deep trees)
  • Want to reduce overfitting
  • Can parallelize training

Boosting

How It Works

Train models sequentially, each focusing on previous errors:

Model 1: Train on all data
         ↓ identify errors
Model 2: Focus on Model 1's mistakes
         ↓ identify remaining errors
Model 3: Focus on remaining mistakes
         ↓
Final: Weighted combination of all models

AdaBoost

  • Reweight samples based on errors
  • Misclassified samples get higher weight
  • Each model votes with weight based on accuracy

Gradient Boosting

  • Fit new model to residuals (errors)
  • Add to ensemble with learning rate
  • More flexible than AdaBoost

XGBoost, LightGBM, CatBoost

Optimized gradient boosting implementations:

  • Regularization
  • Efficient computation
  • Handling missing values
  • State-of-the-art for tabular data

When to Use

  • Want maximum predictive power
  • Tabular/structured data
  • Can accept longer training time

Stacking

How It Works

Use model predictions as features for a meta-model:

Level 0: [Model A, Model B, Model C]
              ↓         ↓         ↓
         pred_A    pred_B    pred_C
              \        |        /
               \       |       /
Level 1:      [Meta-Model (Blender)]
                      ↓
               Final Prediction

Key Points

  • Use cross-validation predictions to avoid leakage
  • Meta-model learns optimal combination
  • Can stack multiple levels

When to Use

  • Diverse base models available
  • Competition/maximum performance needed
  • Have enough data for validation

Voting

Hard Voting

Majority vote for classification:

Model A: Class 1
Model B: Class 0
Model C: Class 1
Result: Class 1 (2 vs 1)

Soft Voting

Average probabilities:

Model A: [0.7, 0.3]
Model B: [0.4, 0.6]
Model C: [0.8, 0.2]
Average: [0.63, 0.37] → Class 0

Soft voting usually works better.

Comparison

MethodReducesTrainingDiversity
BaggingVarianceParallelBootstrap samples
BoostingBiasSequentialError focus
StackingBothTwo-stageDifferent algorithms
VotingVarianceIndependentDifferent algorithms

Practical Tips

Diversity Matters

Ensemble gains come from diversity:

  • Different algorithms (trees, linear, neural)
  • Different hyperparameters
  • Different feature subsets
  • Different training samples

Diminishing Returns

1 model  → 5 models:  Big improvement
5 models → 10 models: Moderate improvement
10 → 100 models:      Small improvement

Computation vs Accuracy

  • Simple average of 3-5 models often sufficient
  • Beyond 10 models rarely worth the cost
  • Production constraints may limit ensemble size

Code Example

from sklearn.ensemble import (
    RandomForestClassifier,
    GradientBoostingClassifier,
    VotingClassifier,
    StackingClassifier
)

# Voting
voting = VotingClassifier([
    ('rf', RandomForestClassifier()),
    ('gb', GradientBoostingClassifier()),
    ('svm', SVC(probability=True))
], voting='soft')

# Stacking
stacking = StackingClassifier(
    estimators=[('rf', rf), ('gb', gb)],
    final_estimator=LogisticRegression()
)

Key Takeaways

  1. Ensembles combine models for better predictions
  2. Bagging reduces variance (Random Forest)
  3. Boosting reduces bias (XGBoost, LightGBM)
  4. Stacking learns optimal combination
  5. Diversity among base models is crucial
  6. Often the winning approach in competitions

Practice Questions

Test your understanding with these related interview questions: