Bayes' Theorem
Bayes' theorem is one of the most important concepts in probability theory and machine learning. It provides a mathematical framework for updating our beliefs when we receive new evidence.
The Formula
Bayes' theorem states:
P(A|B) = P(B|A) × P(A) / P(B)
Where:
- P(A|B) is the posterior probability - probability of A given we observed B
- P(B|A) is the likelihood - probability of observing B if A is true
- P(A) is the prior probability - our initial belief about A
- P(B) is the evidence - total probability of observing B
Intuitive Understanding
Think of Bayes' theorem as a learning rule. You start with some belief (prior), observe some evidence, and update your belief accordingly (posterior).
Example: Suppose you want to know if someone has a disease based on a positive test result.
- Prior P(Disease): How common is the disease in the population?
- Likelihood P(Positive|Disease): How accurate is the test when someone has the disease?
- Evidence P(Positive): Overall rate of positive tests
- Posterior P(Disease|Positive): Probability of having disease given a positive test
Why It Matters in ML
1. Naive Bayes Classifiers
One of the simplest yet surprisingly effective classification algorithms is based directly on Bayes' theorem. Despite the "naive" assumption of feature independence, it works remarkably well for text classification and spam filtering.
2. Bayesian Inference
Bayesian methods allow us to:
- Quantify uncertainty in model parameters
- Incorporate prior knowledge into models
- Make predictions with confidence intervals
3. Probabilistic Graphical Models
Bayes' theorem underpins Bayesian networks and other probabilistic models that capture complex dependencies in data.
The Prior Controversy
One challenge with Bayesian methods is choosing the prior. Critics argue this introduces subjectivity. However:
- With enough data, the posterior converges regardless of the prior
- Priors can encode valuable domain knowledge
- Uninformative priors can be used when we lack prior knowledge
Bayesian vs. Frequentist
Two schools of thought in statistics:
| Aspect | Bayesian | Frequentist |
|---|---|---|
| Probability | Degree of belief | Long-run frequency |
| Parameters | Random variables | Fixed unknowns |
| Uncertainty | Posterior distribution | Confidence intervals |
| Prior knowledge | Explicitly incorporated | Not directly used |
Practical Applications
- Spam filtering: P(spam|words in email)
- Medical diagnosis: P(disease|symptoms)
- A/B testing: Bayesian analysis of conversion rates
- Recommendation systems: P(user likes item|past behavior)
Key Takeaways
- Bayes' theorem updates beliefs based on evidence
- Prior + Likelihood → Posterior
- It's the foundation for probabilistic machine learning
- Naive Bayes is a direct application for classification
- With sufficient data, the prior becomes less important