Ab Test Calculator - Significance, Lift, and Sample Size
Use this AB test calculator to compare two variants with a two-proportion z-test. Get p-value, percentage lift, 95% confidence interval, and required sample size per variant.
Ab Test Calculator
Results
What Is Ab Test Calculator?
An AB test calculator is a two-proportion hypothesis-testing tool that compares the conversion rates of variants A and B and returns a z-score, a two-tailed p-value, the percentage lift, an unpooled confidence interval, and the per-variant sample size needed to detect a chosen minimum effect. Enter the visitor and conversion counts, set the significance level and minimum detectable effect, and the AB test calculator tells you whether the difference is statistically significant.
- • Landing page and CTA tests: Marketing teams can plug in visitor and conversion counts to learn whether a new variant beats the control.
- • Email subject line and send-time tests: Email marketers can confirm that an open-rate or click-rate lift is real instead of short-term noise.
- • Product onboarding and checkout experiments: Product and growth teams can validate that a flow change produces a statistically significant lift before shipping it to everyone.
Under the null hypothesis that the two variants are equally good, the pooled conversion rate is the best estimate of the common conversion probability, and the difference is divided by the pooled standard error to produce a z-score.
For tests on continuous outcomes such as revenue per visitor, the workflow shifts from a z-test to a t-test.
For tests on continuous outcomes such as revenue per visitor or session duration, the z-test becomes a t-test and the T-Test Calculator is the right tool.
How Ab Test Calculator Works
The calculator runs a two-proportion z-test with the pooled standard error under the null hypothesis of equal conversion rates, then computes an unpooled confidence interval and a per-variant sample size using the two-proportion power formula.
- visitorsA (n1): Number of visitors, recipients, or sessions exposed to variant A (the control).
- conversionsA (x1): Number of conversion events recorded for variant A. May be zero; may not exceed visitorsA.
- visitorsB (n2): Number of visitors exposed to variant B (the treatment). May differ from n1 in a skewed traffic split.
- conversionsB (x2): Number of conversion events recorded for variant B.
- alpha: Two-tailed significance level. Conventional values are 0.10, 0.05, and 0.01.
- minimumDetectableEffect: Smallest absolute difference in conversion rates (as a decimal) the sample-size branch should be able to detect. Default 0.01 (one percentage point).
The z-score uses the pooled standard error because, under the null hypothesis, the best single estimate of the common conversion probability is the combined conversion rate. The confidence interval uses the unpooled standard error.
The sample-size row uses the two-proportion power formula with z_alpha/2 and z_beta at 80 percent power, the same convention used by Optimizely, Evan Miller's guide, and most introductory statistics textbooks.
Landing page test, 10,000 visitors per variant, 5.0% vs 5.6% conversion
visitorsA = 10,000, conversionsA = 500, visitorsB = 10,000, conversionsB = 560, alpha = 0.05, minimumDetectableEffect = 0.01
p1 = 0.0500, p2 = 0.0560. p_pool = 0.0530. SE_pool = 0.003169. z = -1.8938. Two-tailed p-value = 0.0583. The 95% CI for (p2 - p1) is (-0.00021, 0.01221).
Z = -1.8938, two-tailed p = 0.0583, lift = 12 percent, required sample size per variant is about 8,155 visitors.
Borderline result. The lift is real but the sample is too small to call it statistically significant.
According to OpenIntro Statistics, OpenIntro Statistics uses the pooled conversion probability p_pool = (x1 + x2) / (n1 + n2) under the null hypothesis and a z-score of (p1 - p2) divided by sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2)).
When the outcome is a categorical table rather than a per-visitor count, the Chi-Square Calculator runs the same hypothesis test on a contingency table.
Key Concepts Explained
Four ideas carry the meaning behind every result, and they are the same ideas you will see in any z-test on a statistics exam.
Two-proportion z-test
The statistical test for comparing two conversion rates. It assumes visitors in each variant are independent and the conversion probability is constant within each variant.
Pooled vs. unpooled standard error
The pooled standard error uses the combined conversion rate and is correct under the null hypothesis for the z-score. The unpooled standard error uses each variant's own conversion rate and is correct for the confidence interval.
P-value and statistical significance
The two-tailed p-value is the probability of seeing a difference at least as large as the observed one if the two variants truly have the same conversion rate.
Statistical power and minimum detectable effect
Statistical power (1 - beta) is the chance of detecting a real effect of a given size, with 80 percent as the conventional target.
The z-score and p-value come from the two-proportion z-test, the confidence interval comes from the unpooled standard error, and the sample size comes from rearranging the power formula. A walk-through of the normal distribution is the right next step for anyone new to the formula.
A deeper walk-through of the normal distribution behind the z-score is the right next step, and the Z-Score Calculator shows the formula with a single-proportion example.
How to Use This Calculator
Five short steps give a complete read on any split test, from significance and lift to required sample size.
- 1 Enter visitors and conversions for variant A: Type the visitors exposed to variant A (the control) and the conversions for that variant. The default 10,000 visitors and 500 conversions reproduces a 5 percent baseline rate.
- 2 Enter visitors and conversions for variant B: Type the visitor and conversion counts for variant B, the treatment. The counts do not have to match variant A exactly; the calculator handles skewed traffic splits.
- 3 Set the significance level: Pick alpha of 0.10, 0.05, or 0.01. The default 0.05 matches the conventional 95 percent significance threshold.
- 4 Set the minimum detectable effect: Enter the smallest absolute difference (as a decimal, so 0.01 for one percentage point) that the sample-size calculation should be able to detect.
- 5 Read the results panel: The result panel shows the conversion rate for each variant, the absolute difference, the percentage lift, the z-score, the two-tailed p-value, the confidence interval, the required sample size per variant, and a plain-English significance label.
If you ran a 10-day landing page test with 10,000 visitors per variant, 500 conversions on A and 560 on B, using alpha = 0.05 with a minimum detectable effect of one percentage point, the calculator returns z = -1.894, p = 0.058, lift = 12 percent, and a required sample size of about 8,155 visitors per variant.
If the next decision is how wide the CI is for a single proportion rather than the difference between two, the Confidence Interval Calculator runs the same workflow on one sample at a time.
Benefits of Using This Calculator
A purpose-built calculator removes the hand-rolled spreadsheet work and gives product, marketing, and statistics users one place to read significance, lift, and sample size.
- • Fast two-proportion significance from raw counts: Enter visitor and conversion counts for variants A and B and immediately get a z-score and a two-tailed p-value.
- • Percentage lift and confidence interval in one view: The same form returns the absolute difference, the relative uplift, and the unpooled confidence interval.
- • Sample-size planning alongside the analysis: The required-sample-size row shows how many visitors per variant you would need to detect the chosen minimum effect.
- • Handles skewed traffic splits: When the two variants collect different visitor counts, the calculator uses each variant's own sample size in the standard error and confidence interval.
- • Direct link between hypothesis test and confidence interval: Because the same alpha controls both the p-value threshold and the confidence level, the significance conclusion and the CI line up.
Once the running sample size per variant reaches the 'required sample size per variant' row, you can stop the test at the next boundary without peeking bias under the conventional null hypothesis.
When you already have the z-score from another tool and just need the two-tailed p-value, the P-Value Calculator returns it directly without re-entering visitor and conversion counts.
Factors That Affect Your Results
Three variables determine what the result looks like, and two limitations tell you when to extend the analysis.
Sample size and minimum detectable effect
Small samples can hide real lifts below the chosen alpha. The required-sample-size row tells you how many visitors per variant are needed to detect a given effect.
Significance level (alpha) and statistical power
Lowering alpha (e.g. from 0.05 to 0.01) makes the test stricter and increases the required sample size. Raising power from 80 to 90 percent has the same effect.
Significance level (alpha) and statistical power
Lowering alpha (e.g. from 0.05 to 0.01) makes the test stricter and increases the required sample size. Raising power from 80 to 90 percent has the same effect.
Skewed traffic split between variants
When variant A and variant B collect very different numbers of visitors, the unpooled standard error is larger than if split 50/50. The confidence interval widens and the z-score loses power.
- • The two-proportion z-test assumes visitors within each variant are independent and identically distributed, and that the conversion probability is constant during the test. Sequential peeking without correction can inflate the false-positive rate above the chosen alpha.
- • The required-sample-size formula assumes the conversion rates during the test match the planned baseline and minimum detectable effect. If the actual rates drift, the test may need more visitors than the row predicts.
When the test runs for a fixed calendar window instead of a sample-size target, the 'required sample size per variant' row is a diagnostic: it tells you whether the planned window reaches the chosen power.
According to Wikipedia, Z-test, Wikipedia documents that the two-sample z-test for proportions uses the pooled standard error under the null hypothesis, with critical values of about 1.96 for a 95 percent two-tailed test and 0.84 for 80 percent power.
According to Omni Calculator, Omni Calculator's AB test calculator accepts visitors and conversions for each variant and returns conversion rates, relative uplift, a two-proportion z-test statistic, and a two-tailed p-value, the same workflow this calculator follows.
Frequently Asked Questions
Q: What is an A/B test calculator?
A: An A/B test calculator is a statistical tool that compares the conversion rates of two variants (a control and a treatment) using a two-proportion z-test. It returns the z-score, the two-tailed p-value, the absolute and relative lift, the unpooled confidence interval for the difference, and the sample size per variant needed to detect a chosen minimum effect at 80 percent power.
Q: How do you calculate A/B test significance?
A: Compute the conversion rates p1 = conversionsA / visitorsA and p2 = conversionsB / visitorsB, then compute the pooled conversion probability p_pool = (conversionsA + conversionsB) / (visitorsA + visitorsB). Divide the difference (p1 - p2) by the pooled standard error sqrt(p_pool * (1 - p_pool) * (1/visitorsA + 1/visitorsB)) to get a z-score, and convert the absolute z-score into a two-tailed p-value using the standard normal distribution.
Q: What is a good sample size for an A/B test?
A: The conventional starting point is the sample size that gives 80 percent power to detect the minimum effect you actually care about at the conventional alpha = 0.05. For a 5 percent baseline rate and a 1 percentage-point minimum lift, that is about 8,000 visitors per variant; for a 10 percent baseline and a 1 percentage-point lift it falls to roughly 4,000 visitors per variant. The calculator's required-sample-size row gives the exact number for the chosen baseline and minimum detectable effect.
Q: How is the p-value calculated in an A/B test?
A: The p-value is two times the upper-tail probability of the standard normal distribution at the absolute value of the z-score, i.e. p = 2 * (1 - Phi(|z|)). It is the probability of seeing a difference at least as large as the observed one if the two variants truly have the same conversion rate. A p-value below the chosen alpha (commonly 0.05) means the observed difference is statistically significant.
Q: What is a statistically significant A/B test result?
A: A result is statistically significant when the two-tailed p-value is below the chosen alpha, most often 0.05. That means a difference as large as the observed one would happen less than 5 percent of the time if the two variants were really equally good, so the observed lift is unlikely to be explained by sampling noise. Statistical significance does not imply a large or business-meaningful lift; it just means the lift is unlikely to be zero.
Q: How do you interpret the confidence interval of an A/B test?
A: The unpooled 95 percent confidence interval for (p2 - p1) gives a range of plausible values for the true difference between the two conversion rates. If the interval sits entirely on one side of zero, the lift is statistically significant at alpha = 0.05; if the interval crosses zero, the difference is not statistically significant. Wider intervals mean less certainty about the true lift and usually mean the test needs more visitors.