Test Types and Distributions
One-tailed vs two-tailed tests and understanding the normal distribution.
Choosing Your Question
When you run an A/B test, you need to choose what kind of question you're asking: "Is there a difference?" (two-tailed) or "Is B better?" (one-tailed). This choice affects how you interpret results and calculate significance.
Most teams use two-tailed tests, but understanding the difference will help you interpret results correctly.
The Normal Distribution
Before diving into test types, let's quickly cover the normal distribution (bell curve). This is the foundation of most A/B testing statistics.
Why It Matters
When you run an A/B test with enough users, the distribution of your metric (conversion rate, revenue, etc.) tends to follow a normal distribution thanks to the Central Limit Theorem. This lets us use well-understood statistical tests.
Key property: In a normal distribution, ~95% of values fall within 2 standard deviations of the mean. This is why we use "95% confidence" as the standard threshold.
- •68% of data falls within 1 standard deviation
- •95% of data falls within 2 standard deviations
- •99.7% of data falls within 3 standard deviations
One-Tailed vs Two-Tailed Tests
Two-Tailed Test (Most Common)
Tests if variant B is different from control (better OR worse). Used when you don't know the direction of change.
Fail to reject (95%)Reject H0 (2.5% in each tail = 5% total)
Example Question:
"Is variant B's conversion rate different from control?"
Critical Regions:
z ≤ -1.96 or z ≥ 1.96 (extreme values on either side)
One-Tailed Test (Less Common)
Tests if variant B is better (or worse) than control. Used when you only care about one direction.
Note: This plot shows a right-tailed test (B better). A left-tailed test just mirrors the shaded region.
Fail to reject (95%)Reject H0 (5% in right tail only)
Example Question:
"Is variant B's conversion rate higher than control?"
Critical Region:
z ≥ 1.645 (extreme values in one direction)
Which Should You Use?
| Aspect | Two-Tailed | One-Tailed |
|---|---|---|
| Question | "Is there a difference?" | "Is B better/worse?" |
| Use When | Exploring any change | Only care about one direction |
| Power | Lower (splits α across both tails) | Higher (full α in one tail) |
| Risk | Misses nothing | Ignores opposite direction |
| Recommendation | Use this (default) | Rare, justified cases only |
⚠️ Warning: One-tailed tests are controversial. They require pre-commitment (decided before seeing data) and ignore the possibility of harm. Most teams use two-tailed tests as the safe default. Only use one-tailed if you have a strong business reason and can justify it upfront.
The Math Behind It
The key difference is where you put your significance level (α = 0.05):
Two-Tailed Test
α = 0.05 is split: 0.025 in each tail
Critical value: ±1.96 standard deviations
Interpretation: Reject H₀ if result falls in either extreme tail (top 2.5% or bottom 2.5%)
One-Tailed Test
α = 0.05 is all in one tail
Critical value: +1.645 standard deviations (right tail)
Interpretation: Reject H₀ only if result falls in top 5% (ignores bottom tail)
One-Tailed Tests Are More Powerful
Because all 5% of α goes into one tail, one-tailed tests reach significance with smaller effects. The critical value (1.645) is lower than two-tailed (1.96). This makes them more sensitive but also more risky-you'll miss effects in the opposite direction.
Real-World Examples
✓ Use Two-Tailed
- 1.New checkout flow: You don't know if it'll help or hurt conversions. Test both directions.
- 2.Pricing change: Could increase or decrease revenue. Check both.
- 3.UI redesign: Might delight users or confuse them. Stay open to either outcome.
⚠️ Rarely Use One-Tailed
- 1.Safety features: Testing if a new algorithm reduces fraud. Only care about improvement, can't make it worse.
- 2.Cost reduction: Testing if server optimization lowers costs. Reverse direction is impossible by design.
- 3.Non-inferiority: Proving new system is "not worse" than old (specialized use case).
Common Pitfalls
❌ Don't Switch After Seeing Data
Wrong: "Results show B is better, so let's use a one-tailed test to get significance faster."
This is p-hacking and invalidates your test. You must choose one-tailed or two-tailed before running the test and stick with it.
Key Takeaways
- ✓Normal distribution (bell curve) is the foundation of A/B testing statistics.
- ✓Two-tailed tests check for any difference (better OR worse). Use this as default.
- ✓One-tailed tests only check one direction (better OR worse, not both).
- ✓One-tailed tests are more powerful but risky-they ignore opposite effects.
- ✓Choose test type before running the experiment, not after seeing data.
- ✓When in doubt, use two-tailed-it's the safer, more conservative choice.