Probability & Expected Value

A casino doesn't need luck — it needs math. With 37 roulette fields and a single green zero, every spin is an independent random event. Yet over millions of spins, the house reliably earns 2.7 cents per euro bet. How can the outcome of a single spin be unpredictable, while the long-run result is almost guaranteed?

The answer lies in three ideas: probability, distributions, and expected value. Together, they form the mathematical language that casinos, meteorologists, and AI systems use to make decisions under uncertainty.

The Language of Uncertainty

A roulette wheel is a closed system with exactly 37 possible outcomes and known, fixed probabilities — P(Red) = 18/37 ≈ 48.65%. The question "What is the probability that AI reaches human-level intelligence by 2030?" is a one-time event — you cannot repeat it 1,000 times and count. Here, an expert estimates the probability based on prior knowledge and new evidence: a degree of belief, not a relative frequency.

Analogy:

A roulette wheel is a closed system with exactly 37 possible outcomes and known, fixed probabilities — P(Red) = 18/37 ≈ 48.65%. The question "What is the probability that AI reaches human-level intelligence by 2030?" is a one-time event — you cannot repeat it 1,000 times and count. Here, an expert estimates the probability based on prior knowledge and new evidence: a degree of belief, not a relative frequency.

Definition:

Probability assigns a number between 0 (impossible) and 1 (certain) to each possible event. Two schools interpret this number differently: frequentists define it as the long-run relative frequency across many identical experiments. Bayesians define it as a degree of belief about a single event. Key rule: all probabilities for mutually exclusive, exhaustive outcomes must sum to exactly 1.

Both interpretations produce valid numbers but answer different questions — one about repeatable experiments, the other about unique events. For a medical diagnosis, for example, both perspectives have value.

Frequentist (Counting)

Probability = relative frequency over many repetitions. Example: P(Red) = 18/37 because 18 of 37 fields are red. Works in closed systems with repeatable experiments.

Bayesian (Believing)

Probability = degree of belief. Example: "20% chance of AGI by 2030" is based on prior knowledge and evidence, not counting. Works for one-off events that cannot be repeated.

European Roulette: 37 fields
(18 red, 18 black, 1 green)

P(Red)           = 18/37 ≈ 48.65%
P(Number 17)     = 1/37  ≈ 2.7%
P(Red OR Black)  = 36/37 ≈ 97.3%

All 37 individual probabilities
sum to exactly 1.

Each field has exactly one probability, and all together they sum to 1 — that is the fundamental rule of any probability distribution. The green zero is the house edge: it shifts the odds slightly in the casino's favour.

"Five reds in a row — black must be next!" Wrong. Each spin is independent — the wheel has no memory. P(Black) remains 18/37 on every spin, regardless of what came before. This thinking error is called the Gambler's Fallacy and stems from treating independent events as if they were dependent.

Interactive: Coin Flip Simulator

What happens when a coin isn't perfectly fair? The temperature slider controls the bias: at low temperature, the coin almost always lands on one side. At high temperature, the probabilities approach 50:50. Click Sample repeatedly and watch how the observed frequency converges to the theoretical probability as the sample size grows — the law of large numbers in action.

An LLM has generated the beginning "Das Wetter heute ___" ("The weather today ___") and computes probabilities for the next word. The most natural continuation is "ist" ("is") — but the temperature determines whether the model always picks the safe choice or dares more unusual continuations.

Temperature: 1.0

0.1 (focused)2.0 (creative)

Standard (T≈1.0): The original logit probabilities are used. Balance between precision and variety.

Probability Distribution (at T=1.0)

Kopf

55.0%

Zahl

45.0%

Results (0 Samples)

No samples yet — click "Sample token"

Start the Experiment

Click "Sample token" to see how the LLM samples at the current temperature. Observe how the distribution of results approaches the theoretical probability with more samples.

Interactive: Estimate π from pure randomness

Can you discover the circle constant π using nothing but chance? Astonishingly, yes. Throw random points into a square with a circle drawn inside it: because the share landing inside the circle is exactly π/4, the value 4 × hits/throws estimates π — and the more you throw, the closer you get to 3.14159. It is the same law of large numbers you saw with the coin flip above, this time turned into geometry.

The Map of All Possibilities

A fair die has six equal bars in its distribution diagram — each face appears 1/6 of the time. A loaded die with P(6) = 0.5 and P(1-5) = 0.1 each has one tall bar and five short ones. Both are valid distributions because all values sum to 1. "Loaded" does not mean "broken" — it simply reflects a different reality.

Analogy:

A fair die has six equal bars in its distribution diagram — each face appears 1/6 of the time. A loaded die with P(6) = 0.5 and P(1-5) = 0.1 each has one tall bar and five short ones. Both are valid distributions because all values sum to 1. "Loaded" does not mean "broken" — it simply reflects a different reality.

Definition:

A probability distribution assigns probabilities to every possible outcome of a random variable. For discrete variables (dice, coins), a Probability Mass Function (PMF) gives each outcome its own bar. For continuous variables (temperature, time), a Probability Density Function (PDF) describes the density — the probability of an exact value is 0, and only intervals have positive probability.

Uniform Distribution All outcomes equally likely. Example: fair die (1/6 each).

Normal Distribution Bell-shaped, symmetric around the mean. Example: heights, measurement errors.

Poisson Distribution Counts rare events in fixed time intervals. Example: customer calls per hour.

Binomial Distribution Number of successes in n yes/no trials. Example: number of heads in 10 coin flips.

1,000 simulated rolls with a fair die: expect ~167 of each value (1-6). 1,000 rolls with a loaded die (P(6) = 0.5): expect ~500 sixes, ~100 of each other value. Both histograms sum to 1,000 rolls. The shape of the distribution — uniform or skewed — determines in ML which algorithms and preprocessing steps are appropriate.

Expected Value: The Average of the Future

The expected value E(X) is the weighted average of all possible outcomes: each outcome is multiplied by its probability and everything is summed. This number does not predict what happens on the next trial — it is the anchor around which many trials cluster. This convergence is called the Law of Large Numbers.

Fair die:
E(X) = 1·(1/6) + 2·(1/6) + 3·(1/6) + 4·(1/6) + 5·(1/6) + 6·(1/6)
E(X) = 21/6 = 3.5

Roulette, 1 EUR on Red:
Win: +1 EUR with P = 18/37
Lose: -1 EUR with P = 19/37
E(X) = 1·(18/37) + (-1)·(19/37) = -1/37 ≈ -0.027 EUR

→ Per euro bet, the casino earns 2.7 cents.
→ Over 1 million bets: ~27,000 EUR profit.

E(X) = 3.5 for the fair die — a number that never appears on a single roll, yet exactly predicts the long-run average over thousands of rolls. For roulette, E(X) = -0.027 means any individual player can win in the short run, but the house edge is mathematically guaranteed over time.

In the game show "Deal or No Deal," three briefcases remain: 100 EUR, 1,000 EUR, and 500,000 EUR. The expected value is (100 + 1,000 + 500,000) / 3 ≈ 167,033 EUR. The banker offers 120,000 EUR. A purely rational, risk-neutral player rejects the offer — it is below the expected value. But most real humans accept, because a guaranteed 120,000 EUR feels more valuable than a 1-in-3 shot at 500,000 EUR. This gap between expected value and human behaviour was explained by Daniel Bernoulli in 1738: the subjective utility of money decreases as wealth increases.

1

Probability: Measure uncertainty as a number between 0 and 1

2

Distribution: Assemble all uncertainties into a complete map

3

Expected Value: Compress the map into a single decision-relevant number

From Single Event to Decision

Single Event: "What can happen?"

Distribution: "How likely is each outcome?"

Expected Value: "What is the average result?"

Decision: "What should I do?"

In reinforcement learning, agents model the expected cumulative reward of each action and pick the highest — the exploration/exploitation trade-off is fundamentally about uncertain expected values. Classifiers output probability distributions over classes; the predicted class is the one with the highest probability. Expected value and variance together quantify "how good on average" and "how much could it vary" — both are critical for real-world AI deployment. In Python, you simulate these distributions with the `random` module — for example `random.gauss()` for the normal distribution or `random.randint()` for discrete uniform.

Probability is a number between 0 and 1 that makes uncertainty measurable. Frequentists count, Bayesians estimate — both approaches are valid.
A distribution is the complete map of all possibilities. Whether fair or "loaded" — every valid distribution sums to 1.
Expected value compresses the map into a single decision number. It does not predict the next outcome, but it shows where many trials cluster — and that is exactly how casinos, insurers, and AI agents make their decisions.

What does a probability of 0.6 mean for an event?

The event is impossible

In roughly 60 out of 100 comparable cases the event occurs — or the degree of belief is 60%

The event occurs in exactly 6 out of 10 trials, without exception

60% of the data is incorrect

1. What does a probability of 0.6 mean for an event?

☐ A) The event is impossible
☐ B) In roughly 60 out of 100 comparable cases the event occurs — or the degree of belief is 60%
☐ C) The event occurs in exactly 6 out of 10 trials, without exception
☐ D) 60% of the data is incorrect

2. A game costs 2 EUR to play. With probability 1/4 you win 10 EUR back, otherwise nothing. What is the expected value per game?

☐ A) +10.00 EUR
☐ B) +0.50 EUR
☐ C) -2.00 EUR
☐ D) +2.50 EUR

3. In roulette, red comes up five times in a row. How does the probability for black change on the next spin?

☐ A) It increases because black is due
☐ B) It decreases because red is on a streak
☐ C) It stays at 18/37 — each spin is independent
☐ D) It is exactly 50%

4. A researcher says: "The probability that this new drug works is 70%." Which interpretation is he using?

☐ A) Frequentist — he has seen it work in 70% of cases
☐ B) Bayesian — he is expressing his degree of belief based on available evidence
☐ C) Neither — 70% is not a valid probability
☐ D) Both interpretations simultaneously

Answer Key: 1) B · 2) B · 3) C · 4) B

Checkpoint: Probability & Expected Value

I can explain what probability means and describe the difference between frequentist and Bayesian interpretation.
I can calculate the expected value of a simple scenario (dice, roulette) and explain why this number often cannot occur on a single trial (e.g. 3.5 on a die).
I can recognise the Gambler's Fallacy and explain why independent events have no memory.

Probability & Expected Value

The Language of Uncertainty

Probability

Analogy:

Definition:

Example: European Roulette

Misconception: The Gambler's Fallacy

Interactive: Coin Flip Simulator

Probability Distribution (at T=1.0)

Results (0 Samples)

Interactive: Estimate π from pure randomness

The Map of All Possibilities

Distribution (PMF & PDF)

Analogy:

Definition:

Distribution Types: An Overview

Example: Fair vs. Loaded Die

Expected Value: The Average of the Future

Calculating Expected Value

Deal or No Deal: Expected Value vs. Gut Feeling

Three Steps of Understanding

Four Layers of Understanding

From Single Event to Decision

Deep Dive: Probability in AI

Key Takeaways

Quiz: Probability & Expected Value

What does a probability of 0.6 mean for an event?

Checkpoint: Probability & Expected Value

The Language of Uncertainty

Probability

Analogy:

Definition:

Example: European Roulette

Misconception: The Gambler's Fallacy

Interactive: Coin Flip Simulator

Probability Distribution (at T=1.0)

Results (0 Samples)

Interactive: Estimate π from pure randomness

The Map of All Possibilities

Distribution (PMF & PDF)

Analogy:

Definition:

Distribution Types: An Overview

Example: Fair vs. Loaded Die

Expected Value: The Average of the Future

Calculating Expected Value

Deal or No Deal: Expected Value vs. Gut Feeling

Three Steps of Understanding

Four Layers of Understanding

From Single Event to Decision

Deep Dive: Probability in AI

Key Takeaways

Quiz: Probability & Expected Value

What does a probability of 0.6 mean for an event?

Checkpoint: Probability & Expected Value

Cited sources

The Legend of Abraham Wald

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

Related Content

Article

Bayes & Conditional Probability

Distributions: The Shape of Data

Measures of Central Tendency: Where Is the Middle?

Correlation vs. Causation

Agents in Conflict — Game Theory

Linear & Logistic Regression

How AI Measures Its Mistakes: Loss Functions

Spread of Data

Demo

Naive Bayes (Classification)

Q-Learning

Supervised Learning

Glossary

Timeline