Dispersion measures explained: what the average deliberately conceals.
Fundamentals 13 min Beginner May 10, 2026
You know the average — but the average alone does not reveal how far the data actually spreads. Two data sets with the same mean can look completely different: one tightly clustered, the other wildly scattered.
Spread measures capture this hidden dimension. This article walks you through three increasingly precise tools: range (quick but crude), variance (precise but in squared units), and standard deviation (the gold standard that speaks the same language as your data).
Range — The Quick-and-Dirty Spread Measure
Range
AnalogyDefinition
Berlin in July: between 8 °C and 38 °C. Technically true — a cold rain day hit 8 °C, a heat wave peaked at 38 °C. But most days sit between 20 °C and 28 °C. The range only shows the extremes and hides the typical cluster.
Example
With weather, you have prior knowledge to interpret the range. With an unfamiliar data set (a new sensor, a new market), you cannot. The analogy also does not model how data between the extremes is distributed.
Analogy:
Berlin in July: between 8 °C and 38 °C. Technically true — a cold rain day hit 8 °C, a heat wave peaked at 38 °C. But most days sit between 20 °C and 28 °C. The range only shows the extremes and hides the typical cluster.
Example
With weather, you have prior knowledge to interpret the range. With an unfamiliar data set (a new sensor, a new market), you cannot. The analogy also does not model how data between the extremes is distributed.
Definition:
Range = maximum - minimum. It captures the total span of a data set in one subtraction. Strength: instant to compute, gives a first sense of scale. Weakness: it depends only on two extreme values and ignores all data between them. A single outlier can inflate the range without changing the typical behaviour of the data.
Example: Two Classes, Same Average
Class A: [40, 42, 45, 48, 50, 52, 55, 58, 60] — range 20, mean ~50. Class B: [10, 45, 48, 49, 50, 51, 52, 55, 90] — range 80, mean ~50. Class B looks four times more spread, but most students score nearly identically to A — only two outliers (10 and 90) inflate the range.
Class A vs. Class B
Class B (outliers) Range 80 · Two extremes (10, 90) · Core also at 45-55
Class A (tight) Range 20 · All values between 40 and 60 · Evenly distributed
Misconception: Small Mean = Small Spread
Mean and spread are independent properties. [0, 0, 0, 100] and [24, 25, 25, 26] both have a mean of 25 but wildly different spreads. Only spread measures reveal the difference.
Variance — Average Squared Distance
Variance
AnalogyDefinition
Imagine shooting at a bulls-eye target. The mean is the centre of your shot cluster. Variance measures how far your shots scatter from the centre — but instead of the straight-line distance, you square each miss. A shot 3 cm off contributes 9 to the variance; a shot 1 cm off contributes only 1. A few wildly off shots dominate the score.
Example
In real target practice, you measure straight-line distance; variance uses squared distance, so the unit changes. Also, target practice is 2D; variance as computed here is 1D.
Analogy:
Imagine shooting at a bulls-eye target. The mean is the centre of your shot cluster. Variance measures how far your shots scatter from the centre — but instead of the straight-line distance, you square each miss. A shot 3 cm off contributes 9 to the variance; a shot 1 cm off contributes only 1. A few wildly off shots dominate the score.
Example
In real target practice, you measure straight-line distance; variance uses squared distance, so the unit changes. Also, target practice is 2D; variance as computed here is 1D.
Definition:
Variance quantifies spread as the average of squared deviations from the mean. Four steps: (1) compute mean, (2) subtract mean from each value, (3) square each deviation, (4) average the squares. Squaring prevents positive and negative deviations from cancelling out (the sum of deviations is always 0). Downside: variance is in squared units (minutes squared if data is in minutes).
Variance in 4 Steps
1
Compute the mean (add all values, divide by count)
2
Compute deviations (each value minus the mean)
3
Square each deviation (negatives become positive, large ones get amplified)
4
Average the squares (divide by n for population, by n-1 for sample)
For a complete population: yes. For a sample drawn from a larger population: divide by n-1 (Bessel's correction). The sample mean is computed from the data itself, making deviations artificially small — n-1 compensates for this lost degree of freedom.
Deep Dive: Bessel's Correction
Why n-1 instead of n? When you compute the mean from your sample and then measure deviations from that mean, the deviations are systematically too small. The reason: the sample mean sits centrally in the sample by definition and minimises distances — it underestimates the true population spread. Dividing by n-1 (instead of n) corrects this bias. The number n-1 is called "degrees of freedom": if you know n values and their mean, the last value is no longer freely choosable.
Standard Deviation — Variance in Human Units
Standard Deviation
AnalogyDefinition
Waiting time at a service counter: average 10 minutes, standard deviation 1 minute — you can plan for 9 to 11 minutes. Average 10 minutes, standard deviation 6 minutes — some days nearly instant, others 16+ minutes. Same mean, totally different experience. Standard deviation translates the abstract variance into a concrete answer: how much does it vary?
Example
The 68-95-99.7 rule is exact only for normal distributions. For skewed data (income, web traffic), many values can fall outside 2 standard deviations on one side.
Analogy:
Waiting time at a service counter: average 10 minutes, standard deviation 1 minute — you can plan for 9 to 11 minutes. Average 10 minutes, standard deviation 6 minutes — some days nearly instant, others 16+ minutes. Same mean, totally different experience. Standard deviation translates the abstract variance into a concrete answer: how much does it vary?
Example
The 68-95-99.7 rule is exact only for normal distributions. For skewed data (income, web traffic), many values can fall outside 2 standard deviations on one side.
Definition:
Standard deviation = square root of variance. It restores the original unit (minutes, euros, degrees) and represents the "typical distance" of a data point from the mean. For approximately normal distributions, the 68-95-99.7 rule holds: ~68% of values within 1 standard deviation, ~95% within 2, ~99.7% within 3.
Calculation from the Variance Example
Variance = 4, so standard deviation = sqrt(4) = 2. Data [2, 4, 4, 4, 5, 5, 7, 9] with mean 5. The interval [5-2, 5+2] = [3, 7] contains 6 of 8 values (75%) — close to the expected 68% for normally distributed data.
The 68-95-99.7 Rule
1 Std Dev ~68% of all values lie within 1 standard deviation of the mean
2 Std Dev ~95% of all values lie within 2 standard deviations of the mean
3 Std Dev ~99.7% of all values lie within 3 standard deviations of the mean
Example: Temperature Comparison
Mean 20 °C, standard deviation 2 °C → 68% of days between 18 °C and 22 °C. Same mean with standard deviation 8 °C → 68% between 12 °C and 28 °C — a fundamentally different climate.
Misconception: Variance and Standard Deviation Are the Same
Variance is in squared units; standard deviation is in original units. They measure the same concept (spread) on different scales. Reports and interpretations almost always use standard deviation because it is directly comparable to the data.
Deep Dive: Standardisation in ML
In machine learning pipelines, features are often z-normalised: z = (x - mean) / standard deviation. The result has mean 0 and standard deviation 1. Why? Algorithms like k-Nearest Neighbors or Gradient Descent treat all features equally — without normalisation, features with large numeric values dominate.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Each feature now has mean 0, std 1
Range vs. Standard Deviation
Range
Maximum minus minimum. Instant to compute. Shows only the extremes. A single outlier can distort everything. Good as a first overview.
Standard Deviation
Square root of average squared deviation. In original units. Considers all data points. The standard in statistics and ML.
Interactive: Compute Variance & Standard Deviation
You have learned about range, variance, and standard deviation. Enter your own data points and watch live how the spread measures change. Compare the tight dataset with the wide one — and see how a single outlier makes the variance explode.
Example datasets:
6.14Mean
4.41Variance
2.10Standard Deviation
7Count
6Range
Sorted:3, 4, 5, 6, 8, 8, 9
Key Takeaways
Range (max - min) is quick but misleading — a single outlier can inflate it while 99% of data stays clustered.
Variance averages the squared deviations from the mean — squaring ensures large deviations count more and prevents positive/negative cancellation.
Standard deviation = sqrt(variance), restoring the original unit — combine it with the mean to say: most values lie within 1 standard deviation (68% for normal distributions).
Quiz: Measures of Spread
Question 1 / 4
Not completed
Why is the range misleading when outliers are present?
1. Why is the range misleading when outliers are present?
☐ A) The range is always zero
☐ B) A single extreme value can inflate the range even though 99% of data stays clustered
☐ C) The range uses squared units
☐ D) The range ignores the maximum
2. Temperature data: mean 20 °C, standard deviation 3 °C. Within which range do approximately 68% of values fall (assuming normal distribution)?
☐ A) 14 °C to 26 °C
☐ B) 17 °C to 23 °C
☐ C) 20 °C to 23 °C
☐ D) 11 °C to 29 °C
3. You compute the variance of a data set: 16. What is the standard deviation?
☐ A) 16
☐ B) 4
☐ C) 256
☐ D) 8
4. Two manufacturing lines produce screws. Line A: mean 50 mm, std 0.1 mm. Line B: mean 50 mm, std 2 mm. Which line likely produces more defective screws?
☐ A) Line A — smaller deviation means more defects
☐ B) Line B — larger standard deviation means more screws fall outside the tolerance range
☐ C) Both produce the same number of defects
☐ D) This cannot be determined
Answer Key: 1) B · 2) B · 3) B · 4) B
Learning Goals
I can use a data example to explain why the mean alone is not enough to judge a sensor's reliability.
I can compute the variance of a small data set step by step by hand.
I can apply the 68-95-99.7 rule to predict what percentage of manufactured parts fall within a given tolerance.