A/B Sample Size Calculator

Calculate the necessary sample size for A/B testing based on conversion rates and desired statistical power.

%
%
%
%

How A/B Sample Size is Calculated

A/B sample size calculation uses four inputs to estimate how many visitors each variant needs before results become trustworthy.

The baseline conversion rate sets the starting point, the minimum detectable effect (MDE) defines the smallest lift worth catching, statistical power describes the probability of spotting that lift when it really exists, and the significance level (alpha) controls how often random noise will be mistaken for a real win.

The calculator plugs these into a standard two-proportion formula, typically assuming a two-tailed test.

Smaller effects, lower baselines, higher power, and stricter alpha all push the required sample size up, often dramatically.

When to Use A/B Sample Size Calculator

Run this calculator before launching any A/B test, ideally during the planning stage when you are still drafting the hypothesis.

It tells you whether the traffic you can realistically gather inside a reasonable window, usually one to four weeks, is enough to detect the lift you care about.

It is also useful when reviewing past tests that ended inconclusively, since underpowered designs are a common reason for flat results.

Marketers, product managers, and growth teams use it to scope experiments, prioritize the test backlog, and set honest expectations with stakeholders about how long a meaningful winner will take to surface.

Common Mistakes with A/B Sample Size

A frequent mistake is peeking at results and stopping the test the moment one variant looks ahead, which inflates false-positive rates well beyond the chosen alpha.

Another is picking an MDE that sounds ambitious, like a 20 percent lift on an already-optimized page, which produces a small sample but rarely reflects real-world gains.

Teams also forget to account for low baseline conversion rates, which require far more traffic than higher-converting flows.

Splitting traffic across too many variants, ignoring weekly seasonality, and mixing logged-in and anonymous users in the same bucket can all quietly invalidate the math behind the sample size you calculated.

A/B Sample Size vs Statistical Power

Sample size and statistical power are tightly linked but describe different things.

Power, usually set at 80 or 90 percent, is the chance your test will detect a real difference if one truly exists, while sample size is the number of users needed to reach that level of sensitivity.

Raising power from 80 to 90 percent can increase the required visitors by roughly a third, and pushing to 95 percent costs even more.

Lower power saves traffic but raises the risk of missing genuine winners, a false negative.

Most teams settle around 80 percent power as a workable balance between test duration and confidence in the outcome.