beginnerClassical Machine Learning

Master linear regression - the foundational algorithm for predicting continuous values and understanding relationships between variables.

regressionlinear-modelssupervised-learninginterpretability

Linear Regression

Linear regression is the foundation of predictive modeling. It models the relationship between variables using a linear equation and serves as the starting point for understanding more complex algorithms.

The Model

Simple Linear Regression

One feature, one target:

y = β₀ + β₁x + ε

β₀: intercept (y when x=0)
β₁: slope (change in y per unit x)
ε: error term

Multiple Linear Regression

Multiple features:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

In matrix form:

y = Xβ + ε

Finding the Coefficients

Ordinary Least Squares (OLS)

Minimize sum of squared errors:

min Σ(yᵢ - ŷᵢ)²

Closed-form solution:

β = (XᵀX)⁻¹Xᵀy

Gradient Descent

For large datasets, iteratively update:

β = β - α × ∇Loss

Assumptions

Linear regression makes several assumptions:

1. Linearity

Relationship between X and y is linear.

Check: Plot residuals vs fitted values (should be random)

2. Independence

Observations are independent of each other.

Violation: Time series, clustered data

3. Homoscedasticity

Constant variance of errors across all X values.

Check: Residuals should have constant spread

4. Normality

Errors are normally distributed.

Check: Q-Q plot of residuals

5. No Multicollinearity

Features are not highly correlated with each other.

Check: VIF (Variance Inflation Factor) < 10

Interpreting Coefficients

Salary = 30,000 + 2,500×Years + 5,000×Degree
  • Intercept (30,000): Base salary with 0 years, no degree
  • Years (2,500): Each year adds $2,500, holding degree constant
  • Degree (5,000): Having degree adds $5,000, holding years constant

Standardized Coefficients

To compare importance across features:

β_standardized = β × (σₓ / σᵧ)

Evaluation Metrics

R² (Coefficient of Determination)

R² = 1 - (SS_res / SS_tot)

SS_res = Σ(y - ŷ)²  (residual sum of squares)
SS_tot = Σ(y - ȳ)²  (total sum of squares)
  • R² = 1: Perfect fit
  • R² = 0: Model no better than mean
  • Can be negative for bad models

Adjusted R²

Penalizes adding useless features:

Adj R² = 1 - (1-R²)(n-1)/(n-p-1)

RMSE (Root Mean Squared Error)

RMSE = √(Σ(y - ŷ)² / n)

In same units as target.

MAE (Mean Absolute Error)

MAE = Σ|y - ŷ| / n

More robust to outliers than RMSE.

Regularized Linear Regression

Ridge (L2)

Loss = Σ(y - ŷ)² + λΣβ²

Shrinks coefficients, keeps all features.

Lasso (L1)

Loss = Σ(y - ŷ)² + λΣ|β|

Can zero out coefficients (feature selection).

Elastic Net

Loss = Σ(y - ŷ)² + λ₁Σ|β| + λ₂Σβ²

Combines L1 and L2.

Common Issues

Multicollinearity

Problem: Correlated features → unstable coefficients

Solutions:

  • Remove correlated features
  • Use regularization (Ridge)
  • PCA before regression

Outliers

Problem: Large errors dominate OLS

Solutions:

  • Remove outliers
  • Use robust regression (Huber loss)
  • Transform target (log)

Non-linear Relationships

Problem: Linear model can't capture curves

Solutions:

  • Polynomial features: x, x², x³
  • Log transform: log(x)
  • Splines
  • Use non-linear models

Code Example

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import r2_score, mean_squared_error

# Basic linear regression
model = LinearRegression()
model.fit(X_train, y_train)

print(f"Intercept: {model.intercept_}")
print(f"Coefficients: {model.coef_}")

# Predictions
y_pred = model.predict(X_test)

# Evaluation
print(f"R²: {r2_score(y_test, y_pred):.3f}")
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.3f}")

# Regularized versions
ridge = Ridge(alpha=1.0).fit(X_train, y_train)
lasso = Lasso(alpha=0.1).fit(X_train, y_train)

When to Use Linear Regression

Good for:

  • Interpretability is important
  • Linear relationships
  • Baseline model
  • Understanding feature effects

Consider alternatives when:

  • Complex non-linear patterns
  • Need better accuracy (try gradient boosting)
  • Classification problem (use logistic regression)

Key Takeaways

  1. Linear regression fits y = Xβ + ε
  2. OLS minimizes squared errors
  3. Check assumptions: linearity, homoscedasticity, normality
  4. R² measures explained variance
  5. Use regularization (Ridge, Lasso) to prevent overfitting
  6. Coefficients show effect of each feature (holding others constant)

Practice Questions

Test your understanding with these related interview questions: