intermediateClassical Machine Learning

Learn about model explainability - techniques to understand why models make predictions, essential for trust, debugging, and compliance.

explainabilityshaplimeinterpretabilityfeature-importance

Model Explainability & Interpretability

Explainability answers: "Why did the model make this prediction?" It's crucial for trust, debugging, regulatory compliance, and scientific understanding.

Interpretability vs Explainability

Interpretable models: Inherently understandable

  • Linear regression, decision trees, rule lists
  • Can directly inspect how features affect output

Explainable AI (XAI): Methods to explain black-box models

  • SHAP, LIME, attention visualization
  • Post-hoc explanations of complex models

Global vs Local Explanations

Global: How does the model work overall?

  • Feature importance
  • Partial dependence plots
  • Model-level summaries

Local: Why this specific prediction?

  • LIME, SHAP values for one instance
  • Counterfactual explanations

Inherently Interpretable Models

Linear Models

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

Interpretation: 
- βᵢ = change in y per unit change in xᵢ (holding others constant)
- Sign indicates direction of effect
- Magnitude indicates importance (if features scaled)

Decision Trees

            [Age > 30?]
            /         \
          Yes          No
           ↓            ↓
     [Income > 50K?]  [Deny]
       /      \
    [Approve] [Deny]

Explanation: "Denied because age ≤ 30"

Rule Lists

IF age > 30 AND income > 50K THEN approve
ELSE IF credit_score > 700 THEN approve
ELSE deny

Feature Importance

Permutation Importance

Shuffle one feature, measure accuracy drop:

from sklearn.inspection import permutation_importance

result = permutation_importance(model, X_test, y_test, n_repeats=10)
importances = result.importances_mean

Pros: Model-agnostic, works on any model Cons: Slow, correlated features split importance

Tree-Based Importance

# Mean decrease in impurity
importances = model.feature_importances_

# Or use permutation for more reliable results

Warning: MDI importance biased toward high-cardinality features.

SHAP (SHapley Additive exPlanations)

Based on game theory - fairly distribute prediction among features:

import shap

explainer = shap.TreeExplainer(model)  # Or KernelExplainer for any model
shap_values = explainer.shap_values(X)

# Summary plot
shap.summary_plot(shap_values, X)

# Single prediction explanation
shap.force_plot(explainer.expected_value, shap_values[0], X.iloc[0])

Interpreting SHAP Values

Base value (average prediction): 0.5
Feature contributions:
  +0.3  income = $80K
  +0.15 age = 45
  -0.1  debt_ratio = 0.4
  ------
Final prediction: 0.85

Sum of SHAP values = prediction - base value

SHAP Plots

Summary plot: Feature importance + direction of effects

Dependence plot: How one feature affects predictions

Force plot: Single prediction breakdown

Waterfall plot: Step-by-step contribution

LIME (Local Interpretable Model-agnostic Explanations)

Approximate complex model locally with simple model:

import lime.lime_tabular

explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train.values,
    feature_names=X_train.columns,
    class_names=['Negative', 'Positive']
)

exp = explainer.explain_instance(X_test.iloc[0], model.predict_proba)
exp.show_in_notebook()

How LIME Works

  1. Generate perturbed samples around the instance
  2. Get model predictions for perturbations
  3. Fit weighted linear model locally
  4. Linear coefficients = feature importance for that instance

Pros: Model-agnostic, intuitive explanations Cons: Explanations can be unstable, depends on perturbation method

Partial Dependence Plots

Show average effect of a feature on predictions:

from sklearn.inspection import PartialDependenceDisplay

PartialDependenceDisplay.from_estimator(model, X, features=['age', 'income'])

Shows: Marginal effect, averaging over other features Limitation: Assumes feature independence

Individual Conditional Expectation (ICE)

Like PDP but shows line for each instance:

PartialDependenceDisplay.from_estimator(
    model, X, features=['age'], kind='both'  # Shows both ICE and PDP
)

Reveals heterogeneous effects across instances.

Counterfactual Explanations

"What would need to change for a different prediction?"

Prediction: Loan denied

Counterfactual:
- If income increased from $40K to $52K → Approved
- OR if debt_ratio decreased from 0.45 to 0.30 → Approved
import dice_ml

dice_exp = dice_ml.Dice(data, model)
counterfactuals = dice_exp.generate_counterfactuals(instance, total_CFs=3)

Attention Visualization (Deep Learning)

For transformer models, visualize attention weights:

# Which tokens the model focuses on
attention_weights = model.get_attention_weights(input)

# Visualize
import bertviz
bertviz.head_view(attention_weights, tokens)

Warning: Attention ≠ explanation. High attention doesn't always mean importance for prediction.

Explainability Trade-offs

Model TypeAccuracyInterpretability
Linear/LogisticLowerHigh
Decision TreeLowerHigh
Random ForestHigherMedium (importance)
XGBoostHigherMedium (SHAP)
Neural NetworkHighestLow (needs XAI)

Best Practices

1. Start Interpretable

# Try simple first
baseline = LogisticRegression()
if baseline_score > threshold:
    deploy(baseline)  # Interpretable by default!

2. Combine Methods

  • Global: Feature importance + PDP
  • Local: SHAP + counterfactuals

3. Validate Explanations

  • Do explanations match domain knowledge?
  • Are they consistent across similar instances?
  • Do they reveal actual model behavior?

4. Document Limitations

  • Explanations are approximations
  • May not capture all model behavior
  • Can be misleading if misused

Key Takeaways

  1. Interpretability builds trust and aids debugging
  2. Simple models (linear, trees) are inherently interpretable
  3. SHAP provides theoretically grounded feature attributions
  4. LIME gives local linear approximations
  5. Use multiple methods for robust understanding
  6. Always validate explanations against domain knowledge
  7. Trade-off exists between accuracy and interpretability

Practice Questions

Test your understanding with these related interview questions: