Linear Regression

Linear regression is the foundation of predictive modeling. It models the relationship between variables using a linear equation and serves as the starting point for understanding more complex algorithms.

The Model

Simple Linear Regression

One feature, one target:

y = β₀ + β₁x + ε

β₀: intercept (y when x=0)
β₁: slope (change in y per unit x)
ε: error term

Multiple Linear Regression

Multiple features:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

In matrix form:

y = Xβ + ε

Finding the Coefficients

Ordinary Least Squares (OLS)

Minimize sum of squared errors:

min Σ(yᵢ - ŷᵢ)²

Closed-form solution:

β = (XᵀX)⁻¹Xᵀy

Gradient Descent

For large datasets, iteratively update:

β = β - α × ∇Loss

Assumptions

Linear regression makes several assumptions:

1. Linearity

Relationship between X and y is linear.

Check: Plot residuals vs fitted values (should be random)

2. Independence

Observations are independent of each other.

Violation: Time series, clustered data

3. Homoscedasticity

Constant variance of errors across all X values.

Check: Residuals should have constant spread

4. Normality

Errors are normally distributed.

Check: Q-Q plot of residuals

5. No Multicollinearity

Features are not highly correlated with each other.

Check: VIF (Variance Inflation Factor) < 10

Interpreting Coefficients

Salary = 30,000 + 2,500×Years + 5,000×Degree

Intercept (30,000): Base salary with 0 years, no degree
Years (2,500): Each year adds $2,500, holding degree constant
Degree (5,000): Having degree adds $5,000, holding years constant

Standardized Coefficients

To compare importance across features:

β_standardized = β × (σₓ / σᵧ)

Evaluation Metrics

R² (Coefficient of Determination)

R² = 1 - (SS_res / SS_tot)

SS_res = Σ(y - ŷ)²  (residual sum of squares)
SS_tot = Σ(y - ȳ)²  (total sum of squares)

R² = 1: Perfect fit
R² = 0: Model no better than mean
Can be negative for bad models

Adjusted R²

Penalizes adding useless features:

Adj R² = 1 - (1-R²)(n-1)/(n-p-1)

RMSE (Root Mean Squared Error)

RMSE = √(Σ(y - ŷ)² / n)

In same units as target.

MAE (Mean Absolute Error)

MAE = Σ|y - ŷ| / n

More robust to outliers than RMSE.

Regularized Linear Regression

Ridge (L2)

Loss = Σ(y - ŷ)² + λΣβ²

Shrinks coefficients, keeps all features.

Lasso (L1)

Loss = Σ(y - ŷ)² + λΣ|β|

Can zero out coefficients (feature selection).

Elastic Net

Loss = Σ(y - ŷ)² + λ₁Σ|β| + λ₂Σβ²

Combines L1 and L2.

Common Issues

Multicollinearity

Problem: Correlated features → unstable coefficients

Solutions:

Remove correlated features
Use regularization (Ridge)
PCA before regression

Outliers

Problem: Large errors dominate OLS

Solutions:

Remove outliers
Use robust regression (Huber loss)
Transform target (log)

Non-linear Relationships

Problem: Linear model can't capture curves

Solutions:

Polynomial features: x, x², x³
Log transform: log(x)
Splines
Use non-linear models

Code Example

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import r2_score, mean_squared_error

# Basic linear regression
model = LinearRegression()
model.fit(X_train, y_train)

print(f"Intercept: {model.intercept_}")
print(f"Coefficients: {model.coef_}")

# Predictions
y_pred = model.predict(X_test)

# Evaluation
print(f"R²: {r2_score(y_test, y_pred):.3f}")
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.3f}")

# Regularized versions
ridge = Ridge(alpha=1.0).fit(X_train, y_train)
lasso = Lasso(alpha=0.1).fit(X_train, y_train)

When to Use Linear Regression

Good for:

Interpretability is important
Linear relationships
Baseline model
Understanding feature effects

Consider alternatives when:

Complex non-linear patterns
Need better accuracy (try gradient boosting)
Classification problem (use logistic regression)

Key Takeaways

Linear regression fits y = Xβ + ε
OLS minimizes squared errors
Check assumptions: linearity, homoscedasticity, normality
R² measures explained variance
Use regularization (Ridge, Lasso) to prevent overfitting
Coefficients show effect of each feature (holding others constant)

Linear Regression

The Model

Simple Linear Regression

Multiple Linear Regression

Finding the Coefficients

Ordinary Least Squares (OLS)

Gradient Descent

Assumptions

1. Linearity

2. Independence

3. Homoscedasticity

4. Normality

5. No Multicollinearity

Interpreting Coefficients

Standardized Coefficients

Evaluation Metrics

R² (Coefficient of Determination)

Adjusted R²

RMSE (Root Mean Squared Error)

MAE (Mean Absolute Error)

Regularized Linear Regression

Ridge (L2)

Lasso (L1)

Elastic Net

Common Issues

Multicollinearity

Outliers

Non-linear Relationships

Code Example

When to Use Linear Regression

Key Takeaways

Related Concepts

Practice Questions