Probability & Expected Value

Expected value: the average of futures, weighted by probability.

Fundamentals 14 min Beginner May 10, 2026

A casino doesn't need luck — it needs math. With 37 roulette fields and a single green zero, every spin is an independent random event. Yet over millions of spins, the house reliably earns 2.7 cents per euro bet. How can the outcome of a single spin be unpredictable, while the long-run result is almost guaranteed?

The answer lies in three ideas: probability, distributions, and expected value. Together, they form the mathematical language that casinos, meteorologists, and AI systems use to make decisions under uncertainty.

The Language of Uncertainty

Probability

AnalogyDefinition
A roulette wheel is a closed system with exactly 37 possible outcomes and known, fixed probabilities — P(Red) = 18/37 ≈ 48.65%. The question "What is the probability that AI reaches human-level intelligence by 2030?" is a one-time event — you cannot repeat it 1,000 times and count. Here, an expert estimates the probability based on prior knowledge and new evidence: a degree of belief, not a relative frequency.

Both interpretations produce valid numbers but answer different questions — one about repeatable experiments, the other about unique events. For a medical diagnosis, for example, both perspectives have value.

Frequentist (Counting)

Probability = relative frequency over many repetitions. Example: P(Red) = 18/37 because 18 of 37 fields are red. Works in closed systems with repeatable experiments.

Bayesian (Believing)

Probability = degree of belief. Example: "20% chance of AGI by 2030" is based on prior knowledge and evidence, not counting. Works for one-off events that cannot be repeated.

Example: European Roulette

European Roulette: 37 fields
(18 red, 18 black, 1 green)

P(Red)           = 18/37 ≈ 48.65%
P(Number 17)     = 1/37  ≈ 2.7%
P(Red OR Black)  = 36/37 ≈ 97.3%

All 37 individual probabilities
sum to exactly 1.

Each field has exactly one probability, and all together they sum to 1 — that is the fundamental rule of any probability distribution. The green zero is the house edge: it shifts the odds slightly in the casino's favour.

Misconception: The Gambler's Fallacy

"Five reds in a row — black must be next!" Wrong. Each spin is independent — the wheel has no memory. P(Black) remains 18/37 on every spin, regardless of what came before. This thinking error is called the Gambler's Fallacy and stems from treating independent events as if they were dependent.

Interactive: Coin Flip Simulator

What happens when a coin isn't perfectly fair? The temperature slider controls the bias: at low temperature, the coin almost always lands on one side. At high temperature, the probabilities approach 50:50. Click Sample repeatedly and watch how the observed frequency converges to the theoretical probability as the sample size grows — the law of large numbers in action.

An LLM has generated the beginning "Das Wetter heute ___" ("The weather today ___") and computes probabilities for the next word. The most natural continuation is "ist" ("is") — but the temperature determines whether the model always picks the safe choice or dares more unusual continuations.

0.1 (focused)2.0 (creative)
Standard (T≈1.0): The original logit probabilities are used. Balance between precision and variety.

Probability Distribution (at T=1.0)

Kopf
55.0%
Zahl
45.0%

Results (0 Samples)

No samples yet — click "Sample token"

Start the Experiment

Click "Sample token" to see how the LLM samples at the current temperature. Observe how the distribution of results approaches the theoretical probability with more samples.

The Map of All Possibilities

Distribution (PMF & PDF)

AnalogyDefinition
A fair die has six equal bars in its distribution diagram — each face appears 1/6 of the time. A loaded die with P(6) = 0.5 and P(1-5) = 0.1 each has one tall bar and five short ones. Both are valid distributions because all values sum to 1. "Loaded" does not mean "broken" — it simply reflects a different reality.

Distribution Types: An Overview

Uniform Distribution All outcomes equally likely. Example: fair die (1/6 each).
Normal Distribution Bell-shaped, symmetric around the mean. Example: heights, measurement errors.
Poisson Distribution Counts rare events in fixed time intervals. Example: customer calls per hour.
Binomial Distribution Number of successes in n yes/no trials. Example: number of heads in 10 coin flips.

Example: Fair vs. Loaded Die

1,000 simulated rolls with a fair die: expect ~167 of each value (1-6). 1,000 rolls with a loaded die (P(6) = 0.5): expect ~500 sixes, ~100 of each other value. Both histograms sum to 1,000 rolls. The shape of the distribution — uniform or skewed — determines in ML which algorithms and preprocessing steps are appropriate.

Expected Value: The Average of the Future

The expected value E(X) is the weighted average of all possible outcomes: each outcome is multiplied by its probability and everything is summed. This number does not predict what happens on the next trial — it is the anchor around which many trials cluster. This convergence is called the Law of Large Numbers.

Calculating Expected Value

Fair die:
E(X) = 1·(1/6) + 2·(1/6) + 3·(1/6) + 4·(1/6) + 5·(1/6) + 6·(1/6)
E(X) = 21/6 = 3.5

Roulette, 1 EUR on Red:
Win: +1 EUR with P = 18/37
Lose: -1 EUR with P = 19/37
E(X) = 1·(18/37) + (-1)·(19/37) = -1/37 ≈ -0.027 EUR

→ Per euro bet, the casino earns 2.7 cents.
→ Over 1 million bets: ~27,000 EUR profit.

E(X) = 3.5 for the fair die — a number that never appears on a single roll, yet exactly predicts the long-run average over thousands of rolls. For roulette, E(X) = -0.027 means any individual player can win in the short run, but the house edge is mathematically guaranteed over time.

Deal or No Deal: Expected Value vs. Gut Feeling

In the game show "Deal or No Deal," three briefcases remain: 100 EUR, 1,000 EUR, and 500,000 EUR. The expected value is (100 + 1,000 + 500,000) / 3 ≈ 167,033 EUR. The banker offers 120,000 EUR. A purely rational, risk-neutral player rejects the offer — it is below the expected value. But most real humans accept, because a guaranteed 120,000 EUR feels more valuable than a 1-in-3 shot at 500,000 EUR. This gap between expected value and human behaviour was explained by Daniel Bernoulli in 1738: the subjective utility of money decreases as wealth increases.

Three Steps of Understanding

1
Probability: Measure uncertainty as a number between 0 and 1
2
Distribution: Assemble all uncertainties into a complete map
3
Expected Value: Compress the map into a single decision-relevant number

Four Layers of Understanding

From Single Event to Decision

Single Event: "What can happen?"
Distribution: "How likely is each outcome?"
Expected Value: "What is the average result?"
Decision: "What should I do?"

In reinforcement learning, agents model the expected cumulative reward of each action and pick the highest — the exploration/exploitation trade-off is fundamentally about uncertain expected values. Classifiers output probability distributions over classes; the predicted class is the one with the highest probability. Expected value and variance together quantify "how good on average" and "how much could it vary" — both are critical for real-world AI deployment. In Python, you simulate these distributions with the `random` module — for example `random.gauss()` for the normal distribution or `random.randint()` for discrete uniform.

Key Takeaways

  1. Probability is a number between 0 and 1 that makes uncertainty measurable. Frequentists count, Bayesians estimate — both approaches are valid.
  2. A distribution is the complete map of all possibilities. Whether fair or "loaded" — every valid distribution sums to 1.
  3. Expected value compresses the map into a single decision number. It does not predict the next outcome, but it shows where many trials cluster — and that is exactly how casinos, insurers, and AI agents make their decisions.

Quiz: Probability & Expected Value

Question 1 / 4
Not completed

What does a probability of 0.6 mean for an event?

Select one answer
Answer Key: 1) B · 2) B · 3) C · 4) B

Checkpoint: Probability & Expected Value

  • I can explain what probability means and describe the difference between frequentist and Bayesian interpretation.
  • I can calculate the expected value of a simple scenario (dice, roulette) and explain why this number often cannot occur on a single trial (e.g. 3.5 on a die).
  • I can recognise the Gambler's Fallacy and explain why independent events have no memory.