"A poll shows the candidate at 48% support with a margin of error of ±3%." Do you know how that margin of error is calculated? Why can polling just 1,000 people represent millions of voters? Sampling statistics answers a fundamental question: you don't need to survey everyone to make reliable inferences about a population — as long as your method is sound.
1. Why Do We Need Sampling?
Ideally, surveying everyone (a census) gives the most accurate results. But in practice, a census is often infeasible:
- Too costly: Interviewing every customer takes enormous time and resources
- Too slow: By the time you finish, the question has changed
- Destructive testing: You can't test every light bulb's lifespan by burning them all out
- Infinite population: You can't survey future users in advance
The core insight of sampling statistics: A random, representative sample — even a tiny fraction of the population — lets you infer population characteristics with quantified precision.
2. Population vs. Sample
| Concept | Definition | Symbol | Example |
|---|---|---|---|
| Population | The entire group of interest | N | All registered voters |
| Sample | A subset drawn from the population | n | 1,068 randomly selected voters |
| Population parameter | The true characteristic (usually unknown) | μ, p | True support rate among all voters |
| Sample statistic | Estimated value from the sample | x̄, p̂ | 48% support rate in the sample |
3. Confidence Intervals: Quantifying Uncertainty
"48% ±3%" fully stated is: 95% confidence interval of 45%–51%. This means:
If you repeated this sampling procedure 100 times, approximately 95 of those intervals would contain the true population value.
Confidence Interval Formula (Proportions)
CI = p̂ ± Z × √(p̂(1−p̂)/n)
- p̂: Sample proportion (e.g., 0.48)
- Z: Z-score for confidence level (95% → 1.96, 99% → 2.576)
- n: Sample size
4. How to Determine Sample Size
Sample size formula: n = Z² × p(1−p) / E²
| Margin of Error | Required Sample Size (95% CI, p=0.5) |
|---|---|
| ±10% | 96 |
| ±5% | 384 |
| ±3% | 1,067 |
| ±2% | 2,401 |
| ±1% | 9,604 |
Key insight: Halving the margin of error requires 4× the sample size. Another counterintuitive result: population size barely affects required sample size — 1,000 respondents works similarly for a city of 100,000 or a country of 300 million.
5. A/B Testing: Experimental Design for the Digital Age
Hypothesis Testing Steps
- Null hypothesis H₀: No difference between groups (the change has no effect)
- Alternative hypothesis H₁: Group B outperforms Group A
- Randomly assign users to A or B, collect data
- Calculate p-value: probability of observing this large a difference if H₀ were true
- If p < 0.05, reject H₀ — the difference is statistically significant
Interpreting p-Values Correctly
The p-value is the most commonly misunderstood statistic. It is not the probability that H₀ is true. It is: the probability of observing data at least this extreme, assuming H₀ is true.
Statistical significance ≠ practical importance. With a large enough sample, even a 0.01% difference can be statistically significant. Always evaluate effect size alongside p-values.
6. Common Sampling Biases
- Selection bias: Sample doesn't represent the population (e.g., online surveys excluding non-internet users)
- Survivorship bias: Only seeing cases that "survived," missing those that didn't
- Response bias: Respondents give socially desirable answers rather than honest ones
- Multiple comparisons: Running 20 tests simultaneously, expect ~1 false positive at p < 0.05 by chance alone
7. How to Read Statistics in the Media
- What's the sample size? n=30 vs. n=1,000 have very different reliability
- What's the confidence interval? Point estimates without margins of error are incomplete
- How was the sample collected? Random sampling = representative; self-selected online polls = not
- Is the difference actually significant? 48% vs. 46% with ±3% margin is statistically indistinguishable
Summary
- Sampling basics: Randomness and representativeness are the foundation of reliable inference
- Confidence intervals: Quantify uncertainty — not "95% chance the true value is in this range"
- Sample size: Halving error requires 4× the sample; population size barely matters
- A/B testing: Random assignment + hypothesis testing; significance ≠ importance
- p-value: Probability of this data given H₀ is true — widely misunderstood
The ultimate goal of statistical inference isn't a precise number — it's making defensible judgments under uncertainty. The next time you see "margin of error ±3%," you'll know exactly what it means.