"Average salary: $80,000" — yet almost everyone you know earns far less. That's not a lie; it's what happens when a few high earners pull the mean upward. Descriptive statistics are the tools that let you see through this kind of numerical illusion: a handful of key metrics to capture the full picture of any dataset.
1. Central Tendency: Where Is the "Center" of the Data?
Central tendency measures the representative value of a dataset. The three most common measures are:
Mean
The sum of all values divided by the count. Intuitive to calculate, but easily skewed by outliers.
Example: 9 employees earning $30,000 and one CEO earning $600,000 gives a mean of $87,000 — a number that represents nobody in the room.
Median
The middle value when data is sorted from smallest to largest (average of the two middle values if the count is even). Highly resistant to outliers — making it the preferred metric for salaries, housing prices, and income distributions.
Same example: the median is $30,000, which accurately reflects most employees' reality.
Mode
The most frequently occurring value. A dataset can have multiple modes or none at all. Best suited for categorical data (e.g., which product flavor is most popular, which city has the most orders).
| Metric | Best For | Sensitivity to Outliers |
|---|---|---|
| Mean | Symmetric distributions, no extreme values | High (easily pulled) |
| Median | Skewed distributions (salaries, prices) | Low (robust) |
| Mode | Categorical data, finding popular choices | Not applicable |
2. Spread: How Scattered Is the Data?
Knowing where the center is isn't enough. "Class average: 75" could mean everyone scored exactly 75, or half scored 50 and half scored 100. Spread tells you how dispersed the values are.
Range
Maximum − Minimum. Simple to compute, but relies on only two data points and can be badly distorted by a single outlier.
Variance
The average of the squared differences from the mean. Squaring prevents positive and negative deviations from canceling out, and amplifies larger deviations.
- Population variance — divides by n; use when you have the full dataset
- Sample variance — divides by n−1 (Bessel's correction); use when estimating a population from a sample
Standard Deviation
The square root of variance — same unit as the original data, so it's much easier to interpret. Example: mean height 170 cm, standard deviation 8 cm → roughly 68% of people fall between 162–178 cm (±1 SD).
Coefficient of Variation (CV)
Standard deviation ÷ mean × 100%. Used to compare spread across datasets with different scales. A salary SD of $5,000 and a housing price SD of $500,000 can't be directly compared — but their CVs can.
3. Quartiles: A More Robust View of Spread
Standard deviation is sensitive to outliers. Quartiles offer a robust alternative:
- Q1 (First Quartile) — 25% of data falls below this value
- Q2 (Median) — 50% of data falls below this value
- Q3 (Third Quartile) — 75% of data falls below this value
- IQR (Interquartile Range) = Q3 − Q1 — spans the middle 50% of the data
Because IQR ignores the top and bottom 25% of values, it's entirely unaffected by extreme outliers. Box plots are built on Q1, Q2, Q3, and IQR, and are one of the most effective ways to visualize data distributions.
4. Skewness and Kurtosis: The Shape of the Distribution
Skewness
Describes whether the distribution is symmetric:
- Skewness ≈ 0 — roughly symmetric; mean ≈ median
- Positive skew (right-skewed) — long tail on the right; a few very high values pull the mean up (e.g., income, wealth)
- Negative skew (left-skewed) — long tail on the left; a few very low values pull the mean down (e.g., exam scores near the maximum)
In a right-skewed distribution: mean > median > mode — which is exactly why salary reports use the median as the more honest representative figure.
Kurtosis
Describes the "peakedness" of the distribution:
- High kurtosis — data is concentrated near the mean, but with heavier tails (more extreme values)
- Low kurtosis — data is spread more evenly, without a sharp central peak
Summary
- Mean — the most common summary, but unreliable when outliers are present; always pair with the median
- Median — more representative for skewed distributions like salaries and housing prices
- Mode — best for categorical data or identifying the most popular option
- Standard deviation — quantifies spread in the same unit as the data
- IQR — a robust alternative to standard deviation that ignores extreme values
- Skewness — tells you whether the distribution is symmetric, and which central tendency metric to report
Descriptive statistics don't require advanced math — understanding the intuition behind each metric is what actually lets you read data honestly. The next time you see "average salary," remember to ask: What's the median?