Reality Check: Speed vs Velocity
Understanding the experiment lifecycle and the difference between shipping fast vs learning fast.
You're convinced that A/B testing is valuable in theory. But then someone on your team says: "Testing makes us slower. We should just ship and iterate."
This is one of the most common pushbacks to experimentation culture. It deserves a reality check-and a deeper look at what "fast" actually means.
Speed vs. Velocity: The Crucial Difference
Here's the key insight that changes everything:
Speed
How fast you execute
- •Lines of code shipped per week
- •Features launched per quarter
- •Time from idea to production
Velocity
Speed + Direction
- •Progress toward business goals
- •Net positive impact on metrics
- •Value delivered to users
You can have high speed but low velocity if you're shipping lots of features that don't move the needle-or worse, hurt your metrics.
A/B testing might slightly slow down execution, but it dramatically increases velocity by ensuring you're moving in the right direction.
Real Example: A team ships 20 features per quarter without testing. Only 5 actually improve metrics; 10 have no effect; 5 actively hurt the business. Their speed is high, but their velocity is near zero (or negative).
Another team ships 12 features per quarter, all tested. 10 improve metrics; 2 are neutral. They ship fewer features but have much higher velocity.
When Intuition Fails: Real-World Examples
Even experienced product teams get surprised by test results. Here are some common "obvious" changes that failed in A/B tests:
The Postcode Checker
An e-commerce site added a postcode checker to tell users if items could be delivered to their area before they started shopping. Seemed helpful, right? The test showed conversions dropped 12%. Why? Many users saw "not available in your area" and left immediately, when they might have found alternatives if they'd browsed first.
The Flexible Calendar
A travel booking site made their calendar "smarter" by showing flexible dates (±3 days) to help users find cheaper flights. Brilliant idea! Except bookings fell 8%. Users got overwhelmed by too many options and choice paralysis set in. The "helpful" feature created friction.
The Bigger Button
Making a CTA button 50% larger should obviously increase clicks, right? In multiple tests across different sites, larger buttons often performed the same or worse. Why? Because context matters more than size-a larger button can look spammy or break the visual hierarchy.
The lesson? Your intuition is informed by your own experience, which may not match your users' actual behavior. Testing replaces assumptions with evidence.
The Three Phases of the Experiment Lifecycle
Understanding that testing "takes time" is only half the story. It's more useful to break down where that time goes. Every experiment moves through three distinct phases:
The Experiment Lifecycle
Three distinct phases from planning to decision
Pre-test
Days to weeks
Key Question:
What are we testing and why?
Activities:
- •Define hypothesis and success metrics
- •Calculate minimum sample size
- •Design test variants
- •Write comprehensive test plan
- •Set significance level (α) and power
- •Document expected outcomes
Live
Days to weeks
Key Question:
Is everything running correctly?
Activities:
- •Launch test to production
- •Monitor for technical issues (SRM, crashes)
- •Track exposure events
- •Check data collection
- •Wait for sample size to accumulate
- •Resist temptation to peek early
Post-test
Hours to days
Key Question:
What did we learn and what should we do?
Activities:
- •Run statistical analysis
- •Check statistical significance
- •Evaluate practical significance
- •Analyze guardrail metrics
- •Make ship/no-ship decision
- •Document learnings and share
Pre-test
Days to weeks
Key Question:
What are we testing and why?
- •Define hypothesis and success metrics
- •Calculate minimum sample size
- •Design test variants
- •Write comprehensive test plan
- •Set significance level (α) and power
- •Document expected outcomes
Live
Days to weeks
Key Question:
Is everything running correctly?
- •Launch test to production
- •Monitor for technical issues (SRM, crashes)
- •Track exposure events
- •Check data collection
- •Wait for sample size to accumulate
- •Resist temptation to peek early
Post-test
Hours to days
Key Question:
What did we learn and what should we do?
- •Run statistical analysis
- •Check statistical significance
- •Evaluate practical significance
- •Analyze guardrail metrics
- •Make ship/no-ship decision
- •Document learnings and share
Common Mistake: Many teams rush through Pre-test planning and jump straight to Live. This leads to poorly designed tests, unclear hypotheses, and inconclusive results. Invest time upfront in the Pre-test phase to avoid wasting weeks of runtime.
Why Each Phase Matters
Pre-test is where most teams cut corners-and pay for it later. A poorly designed test leads to inconclusive results, wasted runtime, and no learnings. Investing time here prevents weeks of wasted execution.
Live is where patience is tested. Teams want to peek at results early, but doing so invalidates the statistical analysis. The key is monitoring for technical issues (crashes, Sample Ratio Mismatch) without looking at metric results.
Post-test is where you extract value. A good analysis doesn't just say "ship" or "don't ship"-it explains why the result occurred, which user segments were affected, and what to test next. This builds institutional knowledge.
Key Takeaways
- ✓Speed ≠ Velocity. Testing may slow execution slightly but dramatically increases progress toward goals.
- ✓Intuition often fails. Even experienced teams get surprised by test results because user behavior is complex.
- ✓The experiment lifecycle has three phases: Pre-test (design), Live (execution), Post-test (analysis).
- ✓Pre-test planning is crucial. Cutting corners here leads to inconclusive results and wasted time.
- ✓Post-test analysis builds knowledge. Extract learnings beyond just ship/no-ship decisions.