Bayes & Conditional Probability

Conditional probability: the tool that identifies statisticians — they calculate differently.

Fundamentals 15 min Beginner May 10, 2026

A medical test is 95% accurate. You test positive. How likely are you to actually be sick? Most people answer "95%." The real answer: about 16%. This is not a trick — it is Bayes' theorem in action.

Your brain is wired to get probabilities wrong — especially when rare events are involved. This article walks you through three tools: conditional probability (what "given that" means mathematically), Bayes' theorem (the formula for reversing probabilities), and Bayesian updating (how AI systems learn from data).

Conditional Probability — "Given That..."

Conditional Probability

AnalogyDefinition
You draw a card from a standard 52-card deck. P(Ace of Hearts) = 1/52. Someone tells you: "The card is red." This eliminates 26 cards. Your new universe is 26 red cards. P(Ace of Hearts | red) = 1/26. The information "red" halved your possibilities and doubled your probability. That is conditional probability: new information shrinks the possibility space.

Example

The card analogy works perfectly because the numbers are exact and small enough to verify by hand. In reality, you work with estimated frequencies (disease prevalence, test accuracy), not perfectly known card counts.

Example: The Medical Test

1% of the population has disease X. The test detects sick people 95% of the time (sensitivity). In healthy people, it falsely shows positive 5% of the time. Take 1,000 people:

1,000 people: 10 sick, 990 healthy
Of 10 sick: 9 test positive (95%)
Of 990 healthy: 50 false positives (5%)
Total positive: 9 + 50 = 59
5
P(sick | positive) = 9/59 ≈ 16%

You test positive. The probability of actually being sick: only 16%. The 990 healthy people produce 50 false alarms that flood the 9 true positives. The base rate (1% prevalence) dominates.

The Base Rate Fallacy

Do not confuse P(positive | sick) with P(sick | positive). Test accuracy (how well the test detects the sick) and diagnostic probability (how likely you are sick given a positive test) are completely different things. The base rate — how rare the disease is — makes all the difference.

Misconception: 95% Accurate Test = 95% Probability of Being Sick

95% describes P(positive | sick) — how well the test finds sick people. But you want P(sick | positive) — how likely you are sick. Without the base rate (how common the disease is), the accuracy number is meaningless. For rare diseases, even excellent tests produce mostly false alarms.

Bayes' Theorem — The Formula for Updating Beliefs

Bayes' Theorem

AnalogyDefinition
You hear a noise at night. Your initial beliefs (priors): P(cat knocked something over) = 80%, P(burglar) = 0.1%. Then you hear glass breaking — P(glass | cat) = 10%, but P(glass | burglar) = 90%. Bayes updates your belief: P(burglar) rises significantly because glass breaking is much more likely under the burglar scenario. But the extremely low prior pulls the posterior back — it does not jump to 90%. The prior and the likelihood compete, and Bayes mediates.

Example

Humans do not compute numbers at night — they react with gut feelings and heuristics. Bayes requires explicit probabilities. This gap between intuitive and mathematical updating is exactly what the article teaches.

The Four Building Blocks

Prior P(A) Your belief about A before seeing evidence B (e.g., disease is rare: 1%)
Likelihood P(B|A) How probable the evidence B is if A is true (e.g., test positive if sick: 95%)
Evidence P(B) The total probability of seeing B across all scenarios
Posterior P(A|B) Your updated belief about A after seeing B — the answer you actually want

Worked Example

P(sick | positive) = P(positive | sick) x P(sick) / P(positive)
                    = 0.95 x 0.01 / 0.059
                    = 0.0095 / 0.059
                    ≈ 0.161 → about 16.1%

P(positive) = P(positive | sick) x P(sick) + P(positive | healthy) x P(healthy)
            = 0.95 x 0.01 + 0.05 x 0.99
            = 0.0095 + 0.0495
            = 0.059
Frequency Table

Count 1,000 people, divide positive-tested sick by all positives. Intuitive, but only practical with small numbers.

Bayes' Formula

Prior x Likelihood / Evidence. Same answer (16%), but universally applicable — even when you cannot count 1,000 people.

Misconception: Strong Evidence Overrides Everything

Even if P(B|A) is high — the likelihood alone does not determine the result. With an extremely low prior (e.g., a disease affects 0.01%), the posterior stays low even with a 99% test. The prior always pulls the result in its direction. Only with moderate priors or multiple independent pieces of evidence can the evidence overcome the prior.

Interactive: Compute Bayes' Theorem

Adjust the prior, sensitivity, and false positive rate and watch how the posterior changes. Try the predefined scenarios from the article (rare disease, spam filter) and experiment with your own values.

Scenario: Medical Test

A disease affects a certain proportion of the population. A test detects the disease with a certain hit rate, but also produces false-positive results in healthy people. How likely is the disease really when the test comes back positive?

Input Values

%
%
%
Examples:

Bayes' Calculation

P(B)= P(B|A) × P(A) + P(B|¬A) × P(¬A)
P(B)= 0.9500 × 0.0100 + 0.0500 × 0.9900 = 0.0590
P(A|B)= P(B|A) × P(A) / P(B)
P(A|B)= 0.9500 × 0.0100 / 0.0590 = 0.1610

Result

16.1%
P(A|B) — Probability of disease given a positive test
Before (Prior)
1%
After (Posterior)
16.1%
Bayes factor: The test has increased by a factor of 16.1 the probability (from 1% to 16.1%).
The Bayes Paradox

Although the test has 95% sensitivity, the probability given a positive result is only 16.1%. This is due to the low base rate (1%): Among 1,000 tested people, there are 50 false alarms but only 10 true hits. The false positives overwhelm the real cases.

Visualized: 1,000 People Tested

10
Sick & tested positive
(True Positive)
1
Sick & tested negative
(False Negative)
50
Healthy & tested positive
(False Positive)
941
Healthy & tested negative
(True Negative)

Of 60 positive tests, only 10 are actually sick. This gives P(A|B) = 10/60 = 16.1%.

Bayesian Updating — Learning Step by Step

Bayesian Updating

AnalogyDefinition
A doctor listens to symptoms one at a time: Patient arrives → Prior: common cold is most likely. Symptom fever → Posterior shifts toward flu. Symptom rash → Posterior shifts further, now considering measles. Each symptom is a Bayes update: the posterior from the previous symptom becomes the prior for the next one.

Example

Doctors rarely compute explicit probabilities — they use pattern recognition and clinical experience. A Bayesian algorithm computes actual numbers. This contrast shows what Bayesian AI achieves that human intuition only approximates.

Example: Spam Filter (Naive Bayes)

A spam filter classifies emails using word frequencies. Prior: P(Spam) = 0.4, P(Ham) = 0.6. The email contains the word "winner":

P("winner" | Spam) = 0.8
P("winner" | Ham)  = 0.05

P(Spam | "winner") = (0.8 x 0.4) / (0.8 x 0.4 + 0.05 x 0.6)
                    = 0.32 / 0.35
                    ≈ 0.914 → 91.4% spam probability

If the email also contains "click," another Bayes update further increases the spam probability. Each word is a piece of evidence. This is Naive Bayes — "naive" because it assumes words are independent given the class. A simplification that works remarkably well in practice.

Python: Naive Bayes

from sklearn.naive_bayes import MultinomialNB

# Training features (word frequencies)
X_train = [[5, 1, 0], [4, 2, 0], [0, 1, 3], [1, 0, 4]]
y_train = ['spam', 'spam', 'ham', 'ham']

clf = MultinomialNB()
clf.fit(X_train, y_train)

# Classify a new email
clf.predict([[3, 1, 0]])  # → 'spam'

Misconception: The Prior Does Not Matter — Just Let the Data Speak

With large datasets, yes — the prior's influence fades. But with limited data (common in practice: medical imaging, rare events, small user bases), the prior can dominate the posterior. Choosing a thoughtful prior is not bias — it is informed reasoning.

Training an AI model follows the same pattern: the prior is the model's random initial weights. The evidence is the training data. The posterior is the trained model. Each training batch updates the weights — just as Bayes updates the posterior with each new piece of evidence. Understanding Bayes means understanding how AI learns.

Interactive: Prior vs. Posterior Comparison

Drag the slider to transition between prior (flat initial distribution before evidence) and posterior (concentrated distribution after evidence). Observe how Bayesian updating shifts belief from a vague guess to a precise conviction.

Underfitting vs. Overfitting

Move the slider to switch between underfitting (left) and overfitting (right). The blue dots are training data. The curve shows how the model interprets the data.

UnderfittingOverfitting
Auto
‹ ›
📉

Underfitting

The model is too simple. It doesn't even recognize the obvious patterns in the training data. Like a student who hasn't understood the task.

Model complexityToo low
Training errorHigh
Test errorHigh
📈

Overfitting

The model is too complex. It memorizes every single data point, including noise. Like a student who memorizes answers instead of understanding.

Model complexityToo high
Training errorVery low
Test errorHigh
🎯
Sweet Spot: Good Fit

The optimal compromise lies in the middle: complex enough to recognize real patterns, but simple enough to generalize to new data. Techniques like regularization, cross-validation, and early stopping help find this point.

Key Takeaways

  • P(A|B) ≠ P(B|A) — confusing the direction of conditioning is the most common probability error (Base Rate Fallacy).
  • Bayes' formula P(A|B) = P(B|A) x P(A) / P(B) combines prior knowledge (prior) with new observation (likelihood) to produce the updated belief (posterior).
  • Bayesian updating is iterative: each posterior becomes the new prior. With little data, the prior dominates; with lots of data, the evidence dominates. Spam filters, medical diagnostics, and AI training all use exactly this principle.

Quiz: Bayes & Conditional Probability

Question 1 / 4

What does P(A|B) mean?

Select one answer
Answer Key: 1) B · 2) B · 3) C · 4) D

Checkpoint: Bayes & Conditional Probability

  • I can explain why the base rate (prior) is so crucial when interpreting a positive test result.
  • I can explain the difference between P(A|B) and P(B|A) with an example.
  • I can compute a posterior using Bayes' formula when given the prior, likelihood, and evidence.