Beyond Superiority: Maximising Insights with Non-Inferiority Tests
Non-inferiority tests are often overlooked in digital A/B testing. Learn how to use them to your advantage.
Non-inferiority tests, often overlooked even by seasoned experimenters, are valuable not just in drug testing but also within the digital world.
While much online literature focuses on hypotheses aiming to demonstrate the variant's superiority over control, this is not always the case in the digital product space.
My appreciation for non-inferiority test designs started when I realized that certain tests I conducted didn't follow the expected B>A outcome for full-scale deployment.
Indeed, some product changes follow a less strict path to production: even if a change isn't clearly superior, factors such as compliance, operational efficiency, or senior leadership's input can push for its implementation.
This is precisely where non-inferiority tests shine, offering a more fitting test design than simple superiority tests.
What is a Non-Inferiority Test
A non-inferiority experiment tries to show that the change(s) included in the variant are not unacceptably worse than the control.
The "unacceptably" term here introduces a very important parameter that needs to be set in a non-inferiority test design: the non-inferiority margin.
The non-inferiority margin represents the maximum difference that we're willing to tolerate between the control and variant groups while still considering the variant to be non-inferior.
Key Concept
The goal is to demonstrate that the variant is not worse than the control by more than a clinically or practically meaningful amount (the non-inferiority margin).
Scenario: Website Redesign
Consider a common practical scenario: the business has decided to migrate its website to a new framework (e.g. Vue.JS). Engineering work has started, and a few UX changes are being introduced alongside the migration. The primary KPI is sales.
The business assumes the new website will also bring more sales, but we know, as experimenters, that is unfortunately not always the case. (Just ask Marks & Spencer.)
A non-inferiority test design helps the business set the right expectations, define a clear go-live plan, and understand the potential impact before full-scale deployment.
Example: Setting the margin
It could be agreed beforehand that sales shouldn't dip by more than 2% compared to the previous design. This 2% non-inferiority margin represents the maximum acceptable reduction in sales attributed to the new website.
It's common to question why we would tolerate any loss rather than reverting to the previous design. However, sometimes rolling back isn't feasible, and the benefits of the new design outweigh the decrease in sales. The redesigned site might offer lower maintenance costs, streamline workflows, or reduce the time required for future product updates.
Relationship Between Non-Inferiority and Confidence Intervals
Confidence intervals play a crucial role in non-inferiority testing. A confidence interval provides a range of values likely to contain the true difference between the variant and control. How this interval relates to the non-inferiority margin determines the outcome.
Non-inferior
The entire CI lies above the negative non-inferiority margin. We can be confident that the variant is not unacceptably worse than the control.
Inconclusive
The CI overlaps with the non-inferiority margin. We cannot rule out that the true difference is worse than what we are willing to accept.
Non-inferior + Superior
The CI lies above both the non-inferiority margin and zero. The variant is not only non-inferior but demonstrably better than the control.
Visual Interpretation Guide
Non-Inferiority Region
The non-inferiority region is a range of values within which the difference between the variant and control is considered acceptable. This region is defined by the non-inferiority margin.
For example, if we set a non-inferiority margin of 2%, the non-inferiority region would be all values higher than -2%. If the true difference lies within this region, we can conclude that the variant is not more than 2% worse than the control.
No Non-Inferiority
Even if the measured difference is higher than the non-inferiority margin, if the confidence interval doesn't entirely lie above the margin, we cannot conclude non-inferiority.
Non-Inferiority Demonstrated
When the confidence interval lies entirely above the non-inferiority margin (but still overlaps with zero), we can conclude non-inferiority.
Non-Inferiority and Superiority
When the confidence interval lies entirely above both the non-inferiority margin and zero, we can conclude both non-inferiority and superiority.
Important
The non-inferiority margin must be defined before the experiment begins based on business or clinical considerations, not after seeing the data. Setting the margin after seeing results invalidates the test.
Benefits of Non-Inferiority Design
Flexibility
Non-inferiority tests use a one-sided hypothesis, which can result in smaller sample sizes compared to two-sided superiority tests when the margin is sufficiently large.
Cost-effectiveness
When the margin allows for a larger effective difference to detect, non-inferiority tests can reach conclusions faster. The key is setting a meaningful margin during planning.
Easier decision-making
Even if the variant does not demonstrate superiority, it may still be acceptable if it meets the non-inferiority criteria. Valuable when achieving superiority may be challenging or unnecessary.
How to Calculate Sample Size and Significance
ABTestResult.com provides powerful calculators to help you determine the minimum sample size required for a non-inferiority test and perform the test analysis.
For power analysis you will be able to set all the necessary parameters: non-inferiority margin, significance level, power, and baseline conversion rate. The calculator will then provide you with the minimum sample size required for the test to be valid.
In order to analyse the test, you will be able to input the test data and the calculator will provide you with the confidence interval and the statistical significance for the test.
Help others learn about non-inferiority testing
If you found this article valuable, share it with your experimentation team and help spread better A/B testing practices.
Related Resources
Analyze your experiment results.
Plan experiments with proper power analysis.
Determine your minimum detectable effect.
Protect experiments from hidden harm.
Learn when a result actually matters.
Why borderline results cause the most damage.
Frequently Asked Questions
Ready to Try Non-Inferiority Testing?
Use our calculators to plan and analyze your non-inferiority tests.