Complete Guide to Sampling Statistics: Sample Size, Confidence Intervals, and A/B Testing

"A poll shows the candidate at 48% support with a margin of error of ±3%." Do you know how that margin of error is calculated? Why can polling just 1,000 people represent millions of voters? Sampling statistics answers a fundamental question: you don't need to survey everyone to make reliable inferences about a population — as long as your method is sound.

1. Why Do We Need Sampling?

Ideally, surveying everyone (a census) gives the most accurate results. But in practice, a census is often infeasible:

Too costly: Interviewing every customer takes enormous time and resources
Too slow: By the time you finish, the question has changed
Destructive testing: You can't test every light bulb's lifespan by burning them all out
Infinite population: You can't survey future users in advance

The core insight of sampling statistics: A random, representative sample — even a tiny fraction of the population — lets you infer population characteristics with quantified precision.

2. Population vs. Sample

Concept	Definition	Symbol	Example
Population	The entire group of interest	N	All registered voters
Sample	A subset drawn from the population	n	1,068 randomly selected voters
Population parameter	The true characteristic (usually unknown)	μ, p	True support rate among all voters
Sample statistic	Estimated value from the sample	x̄, p̂	48% support rate in the sample

3. Confidence Intervals: Quantifying Uncertainty

"48% ±3%" fully stated is: 95% confidence interval of 45%–51%. This means:

If you repeated this sampling procedure 100 times, approximately 95 of those intervals would contain the true population value.

Confidence Interval Formula (Proportions)

CI = p̂ ± Z × √(p̂(1−p̂)/n)

p̂: Sample proportion (e.g., 0.48)
Z: Z-score for confidence level (95% → 1.96, 99% → 2.576)
n: Sample size

Try it out: Enter your data into the Statistics Calculator to quickly compute mean and standard deviation, then use the formula above to calculate confidence intervals.

4. How to Determine Sample Size

Sample size formula: n = Z² × p(1−p) / E²

Margin of Error	Required Sample Size (95% CI, p=0.5)
±10%	96
±5%	384
±3%	1,067
±2%	2,401
±1%	9,604

Key insight: Halving the margin of error requires 4× the sample size. Another counterintuitive result: population size barely affects required sample size — 1,000 respondents works similarly for a city of 100,000 or a country of 300 million.

5. A/B Testing: Experimental Design for the Digital Age

Hypothesis Testing Steps

Null hypothesis H₀: No difference between groups (the change has no effect)
Alternative hypothesis H₁: Group B outperforms Group A
Randomly assign users to A or B, collect data
Calculate p-value: probability of observing this large a difference if H₀ were true
If p < 0.05, reject H₀ — the difference is statistically significant

Interpreting p-Values Correctly

The p-value is the most commonly misunderstood statistic. It is not the probability that H₀ is true. It is: the probability of observing data at least this extreme, assuming H₀ is true.

Statistical significance ≠ practical importance. With a large enough sample, even a 0.01% difference can be statistically significant. Always evaluate effect size alongside p-values.

Visualize your results: Plot A/B group distributions in the Chart Generator using bar or line charts to intuitively assess effect size beyond just the p-value.

6. Common Sampling Biases

Selection bias: Sample doesn't represent the population (e.g., online surveys excluding non-internet users)
Survivorship bias: Only seeing cases that "survived," missing those that didn't
Response bias: Respondents give socially desirable answers rather than honest ones
Multiple comparisons: Running 20 tests simultaneously, expect ~1 false positive at p < 0.05 by chance alone

7. How to Read Statistics in the Media

What's the sample size? n=30 vs. n=1,000 have very different reliability
What's the confidence interval? Point estimates without margins of error are incomplete
How was the sample collected? Random sampling = representative; self-selected online polls = not
Is the difference actually significant? 48% vs. 46% with ±3% margin is statistically indistinguishable

Calculate percentages on the fly: When you need to verify percentage changes or proportions while reading reports, the Percentage Calculator keeps you from being misled by poorly presented statistics.

Summary

Sampling basics: Randomness and representativeness are the foundation of reliable inference
Confidence intervals: Quantify uncertainty — not "95% chance the true value is in this range"
Sample size: Halving error requires 4× the sample; population size barely matters
A/B testing: Random assignment + hypothesis testing; significance ≠ importance
p-value: Probability of this data given H₀ is true — widely misunderstood

The ultimate goal of statistical inference isn't a precise number — it's making defensible judgments under uncertainty. The next time you see "margin of error ±3%," you'll know exactly what it means.