Enter visitors and conversions for each variant to get statistical significance, p-value, confidence level, and a clear win/loss verdict.
Statistical significance in A/B testing means that the difference in conversion rates between your control and variant is unlikely to be due to random chance. When a result is statistically significant (typically at p < 0.05 or 95% confidence), you can be reasonably confident that the observed difference reflects a real effect — not just noise in the data.
95% confidence (p < 0.05) is the standard for most A/B tests, meaning there is only a 5% chance of a false positive. For low-risk tests — like copy or color changes — 90% confidence may be acceptable to move faster. For high-stakes changes such as checkout redesigns or pricing pages, consider 99% confidence to minimize the risk of acting on a false positive.
The required sample size depends on three factors: your baseline conversion rate, the minimum detectable effect (MDE) you care about, and your target statistical power (typically 80%). As a rule of thumb, detecting a 10% relative improvement on a 3% conversion rate requires roughly 10,000–15,000 visitors per variant. This calculator shows the estimated minimum sample size when your current data is not yet significant.
The p-value is the probability of observing a difference as large as (or larger than) the one you measured, assuming there is actually no difference between the two variants. A p-value of 0.05 means there is a 5% chance the result is due to random chance. Lower p-values indicate stronger evidence that the difference is real. A p-value below 0.05 is the conventional threshold for declaring a statistically significant result.
Statistical significance tells you whether a difference is unlikely to be due to chance. Practical significance (also called effect size or business significance) tells you whether the difference is large enough to matter for your business goals. A test with tens of thousands of visitors can be statistically significant for a 0.1% relative uplift — but that uplift might not be worth the engineering effort to ship. Always evaluate the relative uplift alongside the p-value before making a decision.