Spread of Data

You know the average — but the average alone does not reveal how far the data actually spreads. Two data sets with the same mean can look completely different: one tightly clustered, the other wildly scattered.

Spread measures capture this hidden dimension. This article walks you through three increasingly precise tools: range (quick but crude), variance (precise but in squared units), and standard deviation (the gold standard that speaks the same language as your data).

Range — The Quick-and-Dirty Spread Measure

Berlin in July: between 8 °C and 38 °C. Technically true — a cold rain day hit 8 °C, a heat wave peaked at 38 °C. But most days sit between 20 °C and 28 °C. The range only shows the extremes and hides the typical cluster.

Example

With weather, you have prior knowledge to interpret the range. With an unfamiliar data set (a new sensor, a new market), you cannot. The analogy also does not model how data between the extremes is distributed.

Analogy:

Berlin in July: between 8 °C and 38 °C. Technically true — a cold rain day hit 8 °C, a heat wave peaked at 38 °C. But most days sit between 20 °C and 28 °C. The range only shows the extremes and hides the typical cluster.

Example

With weather, you have prior knowledge to interpret the range. With an unfamiliar data set (a new sensor, a new market), you cannot. The analogy also does not model how data between the extremes is distributed.

Definition:

Range = maximum - minimum. It captures the total span of a data set in one subtraction. Strength: instant to compute, gives a first sense of scale. Weakness: it depends only on two extreme values and ignores all data between them. A single outlier can inflate the range without changing the typical behaviour of the data.

Example: Two Classes, Same Average

Class A: [40, 42, 45, 48, 50, 52, 55, 58, 60] — range 20, mean ~50. Class B: [10, 45, 48, 49, 50, 51, 52, 55, 90] — range 80, mean ~50. Class B looks four times more spread, but most students score nearly identically to A — only two outliers (10 and 90) inflate the range.

Class A vs. Class B

Class B (outliers) Range 80 · Two extremes (10, 90) · Core also at 45-55

Class A (tight) Range 20 · All values between 40 and 60 · Evenly distributed

Mean and spread are independent properties. [0, 0, 0, 100] and [24, 25, 25, 26] both have a mean of 25 but wildly different spreads. Only spread measures reveal the difference.

Variance — Average Squared Distance

Imagine shooting at a bulls-eye target. The mean is the centre of your shot cluster. Variance measures how far your shots scatter from the centre — but instead of the straight-line distance, you square each miss. A shot 3 cm off contributes 9 to the variance; a shot 1 cm off contributes only 1. A few wildly off shots dominate the score.

Example

In real target practice, you measure straight-line distance; variance uses squared distance, so the unit changes. Also, target practice is 2D; variance as computed here is 1D.

Analogy:

Imagine shooting at a bulls-eye target. The mean is the centre of your shot cluster. Variance measures how far your shots scatter from the centre — but instead of the straight-line distance, you square each miss. A shot 3 cm off contributes 9 to the variance; a shot 1 cm off contributes only 1. A few wildly off shots dominate the score.

Example

In real target practice, you measure straight-line distance; variance uses squared distance, so the unit changes. Also, target practice is 2D; variance as computed here is 1D.

Definition:

Variance quantifies spread as the average of squared deviations from the mean. Four steps: (1) compute mean, (2) subtract mean from each value, (3) square each deviation, (4) average the squares. Squaring prevents positive and negative deviations from cancelling out (the sum of deviations is always 0). Downside: variance is in squared units (minutes squared if data is in minutes).

Variance in 4 Steps

1

Compute the mean (add all values, divide by count)

2

Compute deviations (each value minus the mean)

3

Square each deviation (negatives become positive, large ones get amplified)

4

Average the squares (divide by n for population, by n-1 for sample)

Worked Example

Data: [2, 4, 4, 4, 5, 5, 7, 9]   Mean = 40/8 = 5

Deviations:  -3, -1, -1, -1,  0,  0, +2, +4   (Sum = 0)
Squared:      9,  1,  1,  1,  0,  0,  4, 16   (Sum = 32)

Population variance: 32/8 = 4
Sample variance:     32/7 = 4.57

Python: statistics Module

import statistics

data = [2, 4, 4, 4, 5, 5, 7, 9]

statistics.pvariance(data)  # → 4.0    (population)
statistics.variance(data)   # → 4.571  (sample, n-1)

For a complete population: yes. For a sample drawn from a larger population: divide by n-1 (Bessel's correction). The sample mean is computed from the data itself, making deviations artificially small — n-1 compensates for this lost degree of freedom.

Why n-1 instead of n? When you compute the mean from your sample and then measure deviations from that mean, the deviations are systematically too small. The reason: the sample mean sits centrally in the sample by definition and minimises distances — it underestimates the true population spread. Dividing by n-1 (instead of n) corrects this bias. The number n-1 is called "degrees of freedom": if you know n values and their mean, the last value is no longer freely choosable.

Standard Deviation — Variance in Human Units

Waiting time at a service counter: average 10 minutes, standard deviation 1 minute — you can plan for 9 to 11 minutes. Average 10 minutes, standard deviation 6 minutes — some days nearly instant, others 16+ minutes. Same mean, totally different experience. Standard deviation translates the abstract variance into a concrete answer: how much does it vary?

Example

The 68-95-99.7 rule is exact only for normal distributions. For skewed data (income, web traffic), many values can fall outside 2 standard deviations on one side.

Analogy:

Waiting time at a service counter: average 10 minutes, standard deviation 1 minute — you can plan for 9 to 11 minutes. Average 10 minutes, standard deviation 6 minutes — some days nearly instant, others 16+ minutes. Same mean, totally different experience. Standard deviation translates the abstract variance into a concrete answer: how much does it vary?

Example

The 68-95-99.7 rule is exact only for normal distributions. For skewed data (income, web traffic), many values can fall outside 2 standard deviations on one side.

Definition:

Standard deviation = square root of variance. It restores the original unit (minutes, euros, degrees) and represents the "typical distance" of a data point from the mean. For approximately normal distributions, the 68-95-99.7 rule holds: ~68% of values within 1 standard deviation, ~95% within 2, ~99.7% within 3.

Calculation from the Variance Example

Variance = 4, so standard deviation = sqrt(4) = 2. Data [2, 4, 4, 4, 5, 5, 7, 9] with mean 5. The interval [5-2, 5+2] = [3, 7] contains 6 of 8 values (75%) — close to the expected 68% for normally distributed data.

The 68-95-99.7 Rule

1 Std Dev

2 Std Dev

3 Std Dev

Example: Temperature Comparison

Mean 20 °C, standard deviation 2 °C → 68% of days between 18 °C and 22 °C. Same mean with standard deviation 8 °C → 68% between 12 °C and 28 °C — a fundamentally different climate.

Variance is in squared units; standard deviation is in original units. They measure the same concept (spread) on different scales. Reports and interpretations almost always use standard deviation because it is directly comparable to the data.

In machine learning pipelines, features are often z-normalised: z = (x - mean) / standard deviation. The result has mean 0 and standard deviation 1. Why? Algorithms like k-Nearest Neighbors or Gradient Descent treat all features equally — without normalisation, features with large numeric values dominate.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Each feature now has mean 0, std 1

Range vs. Standard Deviation

Range

Maximum minus minimum. Instant to compute. Shows only the extremes. A single outlier can distort everything. Good as a first overview.

Standard Deviation

Square root of average squared deviation. In original units. Considers all data points. The standard in statistics and ML.

Interactive: Compute Variance & Standard Deviation

You have learned about range, variance, and standard deviation. Enter your own data points and watch live how the spread measures change. Compare the tight dataset with the wide one — and see how a single outlier makes the variance explode.

Data points (comma-separated)

Example datasets:

6.14Mean

4.41Variance

2.10Standard Deviation

7Count

6Range

Sorted:3, 4, 5, 6, 8, 8, 9

Range (max - min) is quick but misleading — a single outlier can inflate it while 99% of data stays clustered.
Variance averages the squared deviations from the mean — squaring ensures large deviations count more and prevents positive/negative cancellation.
Standard deviation = sqrt(variance), restoring the original unit — combine it with the mean to say: most values lie within 1 standard deviation (68% for normal distributions).

Why is the range misleading when outliers are present?

The range is always zero

A single extreme value can inflate the range even though 99% of data stays clustered

The range uses squared units

The range ignores the maximum

1. Why is the range misleading when outliers are present?

☐ A) The range is always zero
☐ B) A single extreme value can inflate the range even though 99% of data stays clustered
☐ C) The range uses squared units
☐ D) The range ignores the maximum

2. Temperature data: mean 20 °C, standard deviation 3 °C. Within which range do approximately 68% of values fall (assuming normal distribution)?

☐ A) 14 °C to 26 °C
☐ B) 17 °C to 23 °C
☐ C) 20 °C to 23 °C
☐ D) 11 °C to 29 °C

3. You compute the variance of a data set: 16. What is the standard deviation?

☐ A) 16
☐ B) 4
☐ C) 256
☐ D) 8

4. Two manufacturing lines produce screws. Line A: mean 50 mm, std 0.1 mm. Line B: mean 50 mm, std 2 mm. Which line likely produces more defective screws?

☐ A) Line A — smaller deviation means more defects
☐ B) Line B — larger standard deviation means more screws fall outside the tolerance range
☐ C) Both produce the same number of defects
☐ D) This cannot be determined

Answer Key: 1) B · 2) B · 3) B · 4) B

Learning Goals

I can use a data example to explain why the mean alone is not enough to judge a sensor's reliability.
I can compute the variance of a small data set step by step by hand.
I can apply the 68-95-99.7 rule to predict what percentage of manufactured parts fall within a given tolerance.

Spread of Data

Range — The Quick-and-Dirty Spread Measure

Range

Example

Analogy:

Example

Definition:

Example: Two Classes, Same Average

Class A vs. Class B

Misconception: Small Mean = Small Spread

Variance — Average Squared Distance

Variance

Example

Analogy:

Example

Definition:

Variance in 4 Steps

Worked Example

Python: statistics Module

Misconception: Always Divide by n

Deep Dive: Bessel's Correction

Standard Deviation — Variance in Human Units

Standard Deviation

Example

Analogy:

Example

Definition:

Calculation from the Variance Example

The 68-95-99.7 Rule

Example: Temperature Comparison

Misconception: Variance and Standard Deviation Are the Same

Deep Dive: Standardisation in ML

Range vs. Standard Deviation

Interactive: Compute Variance & Standard Deviation

Key Takeaways

Quiz: Measures of Spread

Why is the range misleading when outliers are present?

Learning Goals

Range — The Quick-and-Dirty Spread Measure

Range

Example

Analogy:

Example

Definition:

Example: Two Classes, Same Average

Class A vs. Class B

Misconception: Small Mean = Small Spread

Variance — Average Squared Distance

Variance

Example

Analogy:

Example

Definition:

Variance in 4 Steps

Worked Example

Python: statistics Module

Misconception: Always Divide by n

Deep Dive: Bessel's Correction

Standard Deviation — Variance in Human Units

Standard Deviation

Example

Analogy:

Example

Definition:

Calculation from the Variance Example

The 68-95-99.7 Rule

Example: Temperature Comparison

Misconception: Variance and Standard Deviation Are the Same

Deep Dive: Standardisation in ML

Range vs. Standard Deviation

Interactive: Compute Variance & Standard Deviation

Key Takeaways

Quiz: Measures of Spread

Why is the range misleading when outliers are present?

Learning Goals

Related Content

Article

Measures of Central Tendency: Where Is the Middle?

Distributions: The Shape of Data

Correlation vs. Causation

Bias & Data Quality

The Path to the Valley: Gradient Descent

Linear & Logistic Regression

How Good Is Your Model? Metrics That Actually Matter

When the Model Memorizes (Overfitting)

Probability & Expected Value

Glossary

Timeline