Complete Guide to Sampling Statistics: Sample Size, Confidence Intervals, and A/B Testing

"A poll shows the candidate at 48% support with a margin of error of ±3%." Do you know how that margin of error is calculated? Why can polling just 1,000 people represent millions of voters? Sampling statistics answers a fundamental question: you don't need to survey everyone to make reliable inferences about a population — as long as your method is sound.

1. Why Do We Need Sampling?

Ideally, surveying everyone (a census) gives the most accurate results. But in practice, a census is often infeasible:

  • Too costly: Interviewing every customer takes enormous time and resources
  • Too slow: By the time you finish, the question has changed
  • Destructive testing: You can't test every light bulb's lifespan by burning them all out
  • Infinite population: You can't survey future users in advance

The core insight of sampling statistics: A random, representative sample — even a tiny fraction of the population — lets you infer population characteristics with quantified precision.

2. Population vs. Sample

ConceptDefinitionSymbolExample
PopulationThe entire group of interestNAll registered voters
SampleA subset drawn from the populationn1,068 randomly selected voters
Population parameterThe true characteristic (usually unknown)μ, pTrue support rate among all voters
Sample statisticEstimated value from the samplex̄, p̂48% support rate in the sample

3. Confidence Intervals: Quantifying Uncertainty

"48% ±3%" fully stated is: 95% confidence interval of 45%–51%. This means:

If you repeated this sampling procedure 100 times, approximately 95 of those intervals would contain the true population value.

Confidence Interval Formula (Proportions)

CI = p̂ ± Z × √(p̂(1−p̂)/n)

  • : Sample proportion (e.g., 0.48)
  • Z: Z-score for confidence level (95% → 1.96, 99% → 2.576)
  • n: Sample size
Try it out: Enter your data into the Statistics Calculator to quickly compute mean and standard deviation, then use the formula above to calculate confidence intervals.

4. How to Determine Sample Size

Sample size formula: n = Z² × p(1−p) / E²

Margin of ErrorRequired Sample Size (95% CI, p=0.5)
±10%96
±5%384
±3%1,067
±2%2,401
±1%9,604

Key insight: Halving the margin of error requires 4× the sample size. Another counterintuitive result: population size barely affects required sample size — 1,000 respondents works similarly for a city of 100,000 or a country of 300 million.

5. A/B Testing: Experimental Design for the Digital Age

Hypothesis Testing Steps

  1. Null hypothesis H₀: No difference between groups (the change has no effect)
  2. Alternative hypothesis H₁: Group B outperforms Group A
  3. Randomly assign users to A or B, collect data
  4. Calculate p-value: probability of observing this large a difference if H₀ were true
  5. If p < 0.05, reject H₀ — the difference is statistically significant

Interpreting p-Values Correctly

The p-value is the most commonly misunderstood statistic. It is not the probability that H₀ is true. It is: the probability of observing data at least this extreme, assuming H₀ is true.

Statistical significance ≠ practical importance. With a large enough sample, even a 0.01% difference can be statistically significant. Always evaluate effect size alongside p-values.

Visualize your results: Plot A/B group distributions in the Chart Generator using bar or line charts to intuitively assess effect size beyond just the p-value.

6. Common Sampling Biases

  • Selection bias: Sample doesn't represent the population (e.g., online surveys excluding non-internet users)
  • Survivorship bias: Only seeing cases that "survived," missing those that didn't
  • Response bias: Respondents give socially desirable answers rather than honest ones
  • Multiple comparisons: Running 20 tests simultaneously, expect ~1 false positive at p < 0.05 by chance alone

7. How to Read Statistics in the Media

  1. What's the sample size? n=30 vs. n=1,000 have very different reliability
  2. What's the confidence interval? Point estimates without margins of error are incomplete
  3. How was the sample collected? Random sampling = representative; self-selected online polls = not
  4. Is the difference actually significant? 48% vs. 46% with ±3% margin is statistically indistinguishable
Calculate percentages on the fly: When you need to verify percentage changes or proportions while reading reports, the Percentage Calculator keeps you from being misled by poorly presented statistics.

Summary

  • Sampling basics: Randomness and representativeness are the foundation of reliable inference
  • Confidence intervals: Quantify uncertainty — not "95% chance the true value is in this range"
  • Sample size: Halving error requires 4× the sample; population size barely matters
  • A/B testing: Random assignment + hypothesis testing; significance ≠ importance
  • p-value: Probability of this data given H₀ is true — widely misunderstood

The ultimate goal of statistical inference isn't a precise number — it's making defensible judgments under uncertainty. The next time you see "margin of error ±3%," you'll know exactly what it means.