Power Analysis Calculator - Two-Sample Power and Sample Size
Use this power analysis calculator with the normal z-approximation to estimate power or the minimum sample size per group for a two-sample comparison.
Power Analysis Calculator
Results
What Is Power Analysis Calculator?
A power analysis calculator estimates the sample size a study needs to detect an effect, or the probability that an existing study would detect an effect of a given size. Use this power analysis calculator before a two-sample comparison to set enrollment, after a pilot to confirm a chosen sample size, or when reviewing a manuscript to check that a published study had a reasonable chance of finding the effect it claims. The output is a screening tool for study design, not a promise of certainty.
- • Plan enrollment for a two-arm trial: Set the effect size you expect to detect, pick alpha and target power, and read the per-group enrollment that hits the target.
- • Audit a finished study: Plug in the published effect size and actual sample sizes to see whether the test had a reasonable chance of finding that effect.
- • Compare balanced and unbalanced designs: Change the enrollment ratio to see how unequal groups change the total enrollment needed for the same target power.
Most published studies aim for 80% power, which means an 80% chance of detecting the expected effect and a 20% chance of a false-negative result. Power, sample size, effect size, and significance level are linked, so any three determine the fourth, and the calculator is the algebra that turns that relationship into a number you can defend in a protocol.
If you already have a target power and want a dedicated sample-size workflow with margin of error and confidence level, the Sample Size Calculator solves the same enrollment question with a different set of inputs.
How Power Analysis Calculator Works
- Cohen's d: Standardized mean difference between the two groups. A d of 0.5 means the means differ by half a pooled standard deviation.
- Alpha: Significance level. The probability of rejecting the null hypothesis when it is actually true.
- Sample sizes: Number of participants in group 1 (n1) and group 2 (n2). The enrollment ratio sets n2 = n1 * ratio.
- Target power: Probability of rejecting the null hypothesis when the alternative is true. 1 - beta, where beta is the Type II error rate.
- Tails: One- or two-tailed test. Two-tailed is the default for two-sample comparisons because the direction is usually not known in advance.
The calculator uses the standard normal z-approximation for a two-sample test of means with equal variances. The noncentrality parameter summarizes the gap between the null and alternative hypotheses in standard-error units; larger ncp means the test statistic lands further into the rejection region when the effect is real.
For the minimum sample size, the calculator solves for the smallest n1 such that the implied noncentrality parameter (z_crit + z_target_power) is reached, then rounds up to the next whole participant so the target power is never missed by a fraction.
Balanced design at Cohen's benchmark
Cohen's d = 0.5, alpha = 0.05 two-tailed, n1 = 64 per group, ratio = 1, target power = 0.80.
Noncentrality = 0.5 * sqrt(64 / 2) = 2.828. Critical z = 1.96. Power = Phi(2.828 - 1.96) = 0.807.
Power is 80.75%.
A balanced two-sample design with 64 participants per group and a medium effect reaches 80% power with a small margin. The minimum sample size for exactly 80% power is 63 per group.
According to NIST/SEMATECH e-Handbook of Statistical Methods, the critical z-value for a two-tailed test at alpha = 0.05 is approximately 1.96, and the z-score that gives 80% power is approximately 0.842
Once the power analysis says your study is large enough, the T-Test Calculator runs the actual two-sample test on the data once it is collected.
Key Concepts Explained
These four ideas sit behind every power analysis result. Understand them once and the numbers will make sense every time.
Statistical power
Long-run probability that a test rejects the null hypothesis when the alternative is true. Power = 1 - beta, where beta is the Type II error rate.
Cohen's d
Standardized mean difference between two groups, equal to the raw mean difference divided by the pooled standard deviation. Benchmarks: 0.2 (small), 0.5 (medium), 0.8 (large).
Significance level (alpha)
Probability of a Type I error, set before data are collected. Most two-sample studies use 0.05 two-tailed, a 5% chance of declaring a difference that is noise.
Noncentrality parameter
Expected value of the test statistic under the alternative. In the two-sample z-approximation, ncp = |d| * sqrt(n1 * n2 / (n1 + n2)), the bridge between effect size, sample sizes, and power.
Type I and Type II errors are not the same: alpha is the chance of a false positive when there is no real effect; beta (1 - power) is the chance of a false negative when there is one. Both matter for honest study design.
When you already have two group means and pooled standard deviations from a pilot study, the Cohen's d Calculator converts those raw numbers into the Cohen's d that the power analysis takes as input.
How to Use This Calculator
Enter the four study-design quantities you already know, then read the power and the minimum sample size together.
- 1 Enter the expected effect size: Type the standardized mean difference you expect to detect. Use 0.5 if you have no prior estimate and want a Cohen-medium benchmark.
- 2 Set the significance level: Use 0.05 two-tailed for most comparison studies. Lower alpha (such as 0.01) demands more participants for the same power.
- 3 Enter the planned sample size per group: Type the smaller group size (n1). Group 2 will be n1 * enrollment ratio.
- 4 Set the enrollment ratio: Use 1 for equal groups, or your planned ratio for unbalanced designs. Total N will reflect the imbalance.
- 5 Set the target power: Use 0.80 (80%) as the conventional target. Bump to 0.90 if a missed effect would be especially costly.
- 6 Read power and minimum sample size: Compare the reported power with the target power, and check the minimum sample size against your planned enrollment.
A team plans a randomized trial with a medium effect (d = 0.5) at alpha = 0.05 two-tailed. With n1 = 64 per group, ratio 1, and target power 0.80, the calculator reports power 80.75%, minimum sample size 63 per group, noncentrality 2.828, critical z 1.960, and Type II error rate 19.25%. The plan just clears 80% with margin.
For a conversion-rate or proportion-style A/B comparison rather than a continuous mean, the AB Test Calculator runs the same sample-size reasoning against binary outcomes.
Benefits of Using This Calculator
The power analysis calculator turns a vague study-design conversation into a number you can defend in a protocol or grant.
- • Convert assumptions into enrollment numbers: Effect size, alpha, and target power reduce to a per-group sample size you can hand to a study coordinator or budget.
- • Audit published findings: Plug in the published effect size and actual sample size to check whether the test had a reasonable chance of finding that effect.
- • Test the cost of unequal groups: Change the enrollment ratio to see how unbalanced enrollment affects the total number of participants you need.
- • Compare one-tailed and two-tailed plans: Switch the tail setting to see when a one-tailed test reaches the same power with fewer participants, and when the gain is too small to justify the assumption.
- • Plan for a smaller or larger effect: Move effect size from 0.5 to 0.3 or 0.8 to see how the required sample size scales with the effect you realistically expect.
When alpha and effect size are fixed by the field (such as alpha = 0.05 and a benchmark d of 0.5), the only knob left is the sample size, and power analysis makes that knob explicit.
After the study is run, the P-Value Calculator turns the test statistic and degrees of freedom into the p-value that the alpha threshold was designed to control.
Factors That Affect Your Results
Several inputs and assumptions move the result, sometimes by an order of magnitude.
Effect size
The required sample size scales with 1 / d^2. A medium effect (d = 0.5) needs about a quarter of the participants that a small effect (d = 0.25) needs for the same power.
Significance level (alpha)
Lower alpha raises the critical z-value, which raises the required noncentrality parameter and therefore the sample size. Going from 0.05 to 0.01 roughly doubles the sample size for the same power.
Target power
Higher target power needs a larger z_power, which adds to the required noncentrality parameter. Going from 80% to 90% increases the sample size by roughly a third.
Enrollment ratio
The most efficient design is balanced enrollment. As the ratio moves away from 1, the same total sample size gives less power and you need more participants overall.
One-tailed vs two-tailed test
A one-tailed test reuses alpha on one side only, so the same alpha gives a smaller critical value and reaches the target power with fewer participants. It is only valid when the direction of the effect is fixed in advance.
- • The calculator uses the z-approximation, which assumes large samples and approximately normal data. For very small samples, skewed distributions, or unequal variances, a t-distribution based tool is more accurate.
- • The result is only as good as the effect size you enter. Cohen's benchmarks are a starting point, not a substitute for a pilot estimate, a published prior, or a domain-specific minimum clinically important difference.
If you have multiple primary endpoints, the chance of finding at least one false positive grows with the number of tests, so adjust alpha or the sample size when you plan more than one comparison.
The NIST/SEMATECH e-Handbook of Statistical Methods gives the one-sample known-sigma sample-size formula using z critical values for alpha and beta; the calculator extends that normal-approximation form to two-sample tests using d = delta / sigma and a harmonic-mean factor (1 + 1/ratio).
The American Psychological Association Dictionary of Psychology gives Cohen d benchmarks as 0.2 (small), 0.5 (medium), and 0.8 (large), attributing those conventions to Cohen Statistical Power Analysis for the Behavioral Sciences.
To translate the same effect size and sample size into the precision of the estimated difference, the Confidence Interval Calculator reports the width of the confidence interval at your chosen confidence level.
Frequently Asked Questions
Q: What is a power analysis calculator used for?
A: A power analysis calculator estimates the statistical power of a planned test or the minimum sample size needed to reach a target power. It links four quantities: the standardized effect size, the significance level (alpha), the sample size, and the probability of correctly rejecting the null hypothesis when the effect is real.
Q: How do I calculate statistical power for a two-sample test?
A: Compute the noncentrality parameter ncp = |d| * sqrt(n1 * n2 / (n1 + n2)), find the critical z-value z_crit for your alpha and tail direction, and read power as the probability that a normal variable with mean ncp lands beyond z_crit on the rejection side. The calculator runs these steps from effect size, alpha, sample size, enrollment ratio, and tail selection.
Q: How many participants do I need for 80% power?
A: For a medium effect (Cohen's d = 0.5), alpha = 0.05 two-tailed, and balanced groups, the standard answer is about 63 participants per group (126 total), which gives 80% power using the z-approximation. Cohen's published tables list 64 per group because the t-approximation used there is slightly more conservative. Smaller effects or stricter alpha need more; larger effects or one-tailed tests need fewer.
Q: What is a good effect size for a power analysis?
A: Use a domain-specific estimate from prior data, a pilot study, or a published minimum clinically important difference. When no estimate is available, Cohen's benchmarks are 0.2 for a small effect, 0.5 for a medium effect, and 0.8 for a large effect, but these are coarse defaults rather than field-specific values.
Q: What is the difference between alpha and power?
A: Alpha is the probability of a Type I error, the chance of declaring an effect that is actually noise. Power is the probability of correctly detecting an effect that is real, equal to 1 - beta, where beta is the Type II error rate. Alpha is set by the researcher; power falls out of the effect size, alpha, and sample size.
Q: Does power change when the two groups have different sample sizes?
A: Yes. For a fixed total number of participants, balanced groups give the highest power. The most efficient design is to keep enrollment equal between groups; an unbalanced design needs more total participants to reach the same power.