Hypothesis Testing: The Scientific Method
Learn null hypothesis, alternative hypothesis, p-values, and statistical significance.
The Scientific Method for Business
Every A/B test follows the same fundamental process: hypothesis testing. This is the scientific method applied to business decisions. You make a claim, collect evidence, and determine if the evidence is strong enough to support your claim.
Understanding how hypothesis testing works will help you interpret p-values, confidence intervals, and statistical significance correctly.
The Two Hypotheses
Null Hypothesis
The "nothing changed" hypothesis. Assumes no difference between variants.
Example: "The new checkout flow has the same conversion rate as the old flow."
Alternative Hypothesis
The "something changed" hypothesis. What you're trying to prove.
Example: "The new checkout flow has a different conversion rate than the old flow."
The Testing Process
Assume H₀ is True
Start by assuming there's no difference between variants.
Collect Data
Run your test and observe the actual results.
Calculate P-Value
"If H₀ is true, how likely is this data?" This gives you the p-value.
Make Decision
If p-value < 0.05 (threshold), reject H₀ and accept H₁. Otherwise, fail to reject H₀.
Decision Outcomes
Reject H₀
p-value < 0.05
Result: Statistically significant. Accept H₁.
Fail to Reject H₀
p-value ≥ 0.05
Result: Not significant. Can't conclude difference exists.
Common Mistake
DON'T say "accept H₀"
Why: Lack of evidence ≠ evidence of no effect.
Why Start with H₀?
Innocent Until Proven Guilty
In Frequentist statistics, we assume the null hypothesis is true (no difference) until we have strong evidence otherwise. This is like a court trial: the defendant is innocent until proven guilty. We don't "prove" the variant is better-we show that "no difference" is unlikely given our data.
This might feel backward at first. You're testing a new checkout flow because you think it'll improve conversions, yet you start by assuming it doesn't change anything. Why?
Because it's easier to disprove than to prove. We can calculate "how unlikely is this data if there's no difference?" If the answer is "very unlikely," we reject the idea of no difference and conclude there is a difference.
What Is a P-Value?
The p-value is the cornerstone of hypothesis testing. It answers this question:
"If H₀ is true (no difference), how likely would we see data at least this extreme?"
Example: Checkout Test
Scenario: You test a new checkout flow. Control converts at 10%, variant at 12%.
Null hypothesis: "Both variants have the same true conversion rate."
P-value calculation: "If both variants truly convert at the same rate, what's the probability we'd see a 2 percentage point difference (or larger) just by chance?"
Result: p-value = 0.04 (4%). This means there's a 4% chance we'd see this difference if there's no real effect.
Interpreting P-Values
p < 0.05 (Significant)
"This data would be very unlikely if H₀ is true. Reject H₀ and conclude there's a real difference."
p ≥ 0.05 (Not Significant)
"This data could easily happen by chance. Can't reject H₀. No evidence of a difference."
Common Misunderstandings
What P-Value Is NOT
- ❌ NOT the probability that H₀ is true
- ❌ NOT the probability that H₁ is true
- ❌ NOT the probability the result is due to chance
- ❌ NOT the probability you're making a mistake
What P-Value IS
✓ The probability of seeing data at least this extreme if H₀ is true (no real difference).
Language Matters
✓ Say This
- • "We reject the null hypothesis"
- • "The result is statistically significant"
- • "We have evidence of a difference"
- • "We fail to reject H₀"
× Don't Say This
- • "We accept the null hypothesis"
- • "We proved variant B is better"
- • "There's a 95% chance B is better"
- • "The result is due to chance"
Why "fail to reject" instead of "accept"? Because absence of evidence isn't evidence of absence. Not finding a difference doesn't prove no difference exists-it might just mean your test wasn't sensitive enough (sample size too small, effect too small to detect).
Key Takeaways
- ✓H₀ (null hypothesis): Assumes no difference between variants.
- ✓H₁ (alternative hypothesis): What you're trying to prove (there is a difference).
- ✓P-value: Probability of seeing data this extreme if H₀ is true.
- ✓p < 0.05: Reject H₀, result is statistically significant.
- ✓p ≥ 0.05: Fail to reject H₀, no evidence of difference.
- ✓Never say "accept H₀"-use "fail to reject H₀" instead.