Spread of Data

Dispersion measures explained: what the average deliberately conceals.

Fundamentals 13 min Beginner May 10, 2026

You know the average — but the average alone does not reveal how far the data actually spreads. Two data sets with the same mean can look completely different: one tightly clustered, the other wildly scattered.

Spread measures capture this hidden dimension. This article walks you through three increasingly precise tools: range (quick but crude), variance (precise but in squared units), and standard deviation (the gold standard that speaks the same language as your data).

Range — The Quick-and-Dirty Spread Measure

Range

AnalogyDefinition
Berlin in July: between 8 °C and 38 °C. Technically true — a cold rain day hit 8 °C, a heat wave peaked at 38 °C. But most days sit between 20 °C and 28 °C. The range only shows the extremes and hides the typical cluster.

Example

With weather, you have prior knowledge to interpret the range. With an unfamiliar data set (a new sensor, a new market), you cannot. The analogy also does not model how data between the extremes is distributed.

Example: Two Classes, Same Average

Class A: [40, 42, 45, 48, 50, 52, 55, 58, 60] — range 20, mean ~50. Class B: [10, 45, 48, 49, 50, 51, 52, 55, 90] — range 80, mean ~50. Class B looks four times more spread, but most students score nearly identically to A — only two outliers (10 and 90) inflate the range.

Class A vs. Class B

Class B (outliers) Range 80 · Two extremes (10, 90) · Core also at 45-55
Class A (tight) Range 20 · All values between 40 and 60 · Evenly distributed

Misconception: Small Mean = Small Spread

Mean and spread are independent properties. [0, 0, 0, 100] and [24, 25, 25, 26] both have a mean of 25 but wildly different spreads. Only spread measures reveal the difference.

Variance — Average Squared Distance

Variance

AnalogyDefinition
Imagine shooting at a bulls-eye target. The mean is the centre of your shot cluster. Variance measures how far your shots scatter from the centre — but instead of the straight-line distance, you square each miss. A shot 3 cm off contributes 9 to the variance; a shot 1 cm off contributes only 1. A few wildly off shots dominate the score.

Example

In real target practice, you measure straight-line distance; variance uses squared distance, so the unit changes. Also, target practice is 2D; variance as computed here is 1D.

Variance in 4 Steps

1
Compute the mean (add all values, divide by count)
2
Compute deviations (each value minus the mean)
3
Square each deviation (negatives become positive, large ones get amplified)
4
Average the squares (divide by n for population, by n-1 for sample)

Worked Example

Data: [2, 4, 4, 4, 5, 5, 7, 9]   Mean = 40/8 = 5

Deviations:  -3, -1, -1, -1,  0,  0, +2, +4   (Sum = 0)
Squared:      9,  1,  1,  1,  0,  0,  4, 16   (Sum = 32)

Population variance: 32/8 = 4
Sample variance:     32/7 = 4.57

Python: statistics Module

import statistics

data = [2, 4, 4, 4, 5, 5, 7, 9]

statistics.pvariance(data)  # → 4.0    (population)
statistics.variance(data)   # → 4.571  (sample, n-1)

Misconception: Always Divide by n

For a complete population: yes. For a sample drawn from a larger population: divide by n-1 (Bessel's correction). The sample mean is computed from the data itself, making deviations artificially small — n-1 compensates for this lost degree of freedom.

Why n-1 instead of n? When you compute the mean from your sample and then measure deviations from that mean, the deviations are systematically too small. The reason: the sample mean sits centrally in the sample by definition and minimises distances — it underestimates the true population spread. Dividing by n-1 (instead of n) corrects this bias. The number n-1 is called "degrees of freedom": if you know n values and their mean, the last value is no longer freely choosable.

Standard Deviation — Variance in Human Units

Standard Deviation

AnalogyDefinition
Waiting time at a service counter: average 10 minutes, standard deviation 1 minute — you can plan for 9 to 11 minutes. Average 10 minutes, standard deviation 6 minutes — some days nearly instant, others 16+ minutes. Same mean, totally different experience. Standard deviation translates the abstract variance into a concrete answer: how much does it vary?

Example

The 68-95-99.7 rule is exact only for normal distributions. For skewed data (income, web traffic), many values can fall outside 2 standard deviations on one side.

Calculation from the Variance Example

Variance = 4, so standard deviation = sqrt(4) = 2. Data [2, 4, 4, 4, 5, 5, 7, 9] with mean 5. The interval [5-2, 5+2] = [3, 7] contains 6 of 8 values (75%) — close to the expected 68% for normally distributed data.

The 68-95-99.7 Rule

1 Std Dev ~68% of all values lie within 1 standard deviation of the mean
2 Std Dev ~95% of all values lie within 2 standard deviations of the mean
3 Std Dev ~99.7% of all values lie within 3 standard deviations of the mean

Example: Temperature Comparison

Mean 20 °C, standard deviation 2 °C → 68% of days between 18 °C and 22 °C. Same mean with standard deviation 8 °C → 68% between 12 °C and 28 °C — a fundamentally different climate.

Misconception: Variance and Standard Deviation Are the Same

Variance is in squared units; standard deviation is in original units. They measure the same concept (spread) on different scales. Reports and interpretations almost always use standard deviation because it is directly comparable to the data.

In machine learning pipelines, features are often z-normalised: z = (x - mean) / standard deviation. The result has mean 0 and standard deviation 1. Why? Algorithms like k-Nearest Neighbors or Gradient Descent treat all features equally — without normalisation, features with large numeric values dominate.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Each feature now has mean 0, std 1

Range vs. Standard Deviation

Range

Maximum minus minimum. Instant to compute. Shows only the extremes. A single outlier can distort everything. Good as a first overview.

Standard Deviation

Square root of average squared deviation. In original units. Considers all data points. The standard in statistics and ML.

Interactive: Compute Variance & Standard Deviation

You have learned about range, variance, and standard deviation. Enter your own data points and watch live how the spread measures change. Compare the tight dataset with the wide one — and see how a single outlier makes the variance explode.

Example datasets:
6.14Mean
4.41Variance
2.10Standard Deviation
7Count
6Range
Sorted:3, 4, 5, 6, 8, 8, 9

Key Takeaways

  • Range (max - min) is quick but misleading — a single outlier can inflate it while 99% of data stays clustered.
  • Variance averages the squared deviations from the mean — squaring ensures large deviations count more and prevents positive/negative cancellation.
  • Standard deviation = sqrt(variance), restoring the original unit — combine it with the mean to say: most values lie within 1 standard deviation (68% for normal distributions).

Quiz: Measures of Spread

Question 1 / 4
Not completed

Why is the range misleading when outliers are present?

Select one answer
Answer Key: 1) B · 2) B · 3) B · 4) B

Learning Goals

  • I can use a data example to explain why the mean alone is not enough to judge a sensor's reliability.
  • I can compute the variance of a small data set step by step by hand.
  • I can apply the 68-95-99.7 rule to predict what percentage of manufactured parts fall within a given tolerance.