Beyond Superiority: Maximising Insights with Non-Inferiority Tests

By Andrea Corvi
Last updated: 28 February 2024

Unveiling the Power of Non-Inferiority Tests

Non-inferiority tests , often overlooked even by seasoned experimenters, are valuable not just in drug testing but also within the digital world.

While much online literature focuses on hypotheses aiming to demonstrate the variant's superiority over control, this is not always the case in the digital product space.

My appreciation for non-inferiority test designs started when I realized that certain tests I conducted didn't follow the expected B>A outcome for full-scale deployment.

Indeed, some product changes follow a less strict path to production : even if a change isn't clearly superior, factors such as compliance, operational efficiency, or senior leadership's input can push for its implementation.

This is precisely where non-inferiority tests shine, offering a more fitting test design than simple superiority tests.

What is a Non-Inferiority Test

A non-inferiority experiment tries to show that the change(s) included in the variant are not unacceptably worse than the control.

The “unacceptably” term here introduces a very important parameter that needs to be set in a non-inferiority test design: the non-inferiority margin.

The non-inferiority margin represents the maximum difference that we're willing to tolerate between the control and variant groups while still considering the variant to be non-inferior.

Scenario: Website Redesign

Consider a common practical scenario: suppose that the business has made the decision to change the framework used to build a website (e.g. migrating to Vue.JS).

The decision has already been taken and engineering work has started: given the significant investment, also a few changes to the user experience are being introduced as part of the work.

Let's assume that the business operates an e-commerce website and the primary KPI is sales.

The business is assuming that the new website will also bring more sales but we know, as experimenters, that is unfortunately not always the case, right Marks & Spencer? Shareholders attack Marks & Spencer as website revamp loses customers

In this hypothetical example, a non-inferiority test design might help the business to set the right expectations, define a clear go-live plan (with the associated maximum hit in sales willing to accept) and to understand the potential impact of the new website on sales before the full-scale deployment.

For instance, it could be agreed beforehand, during pre-test analysis, that sales shouldn't dip below by more than 2% compared to the previous design.

Here, the 2% non-inferiority margin represents the maximum acceptable reduction in sales attributed to the new website.

It's common to question why we would tolerate a 2% loss rather than reverting to the previous design. However, sometimes rolling back isn't feasible, and the benefits of the new design outweigh the decrease in sales. For example, the redesigned site might offer lower maintenance costs, streamline workflows, or reduce the time required for future product updates.

Relationship Between Non-Inferiority and Confidence Intervals

In a non-inferiority test, we aim to demonstrate that the variant (change) is not worse than the control by more than a pre-specified margin. This margin, as we have seen before, is referred to as the non-inferiority margin.

Confidence intervals play a crucial role in this process. A confidence interval provides a range of values, derived from the statistical data, which is likely to contain the true value of the measure of interest. In the context of a non-inferiority test, we are interested in the difference between the variant and control.

If the entire confidence interval for the difference between the variant and control lies above the negative non-inferiority margin, we can conclude that the variant is not inferior to the control. This is because we can be confident that the true difference is not worse than the non-inferiority margin.

However, if the confidence interval overlaps with the non-inferiority margin (i.e., it includes values worse than the non-inferiority margin), we cannot conclude non-inferiority. This is because we cannot rule out the possibility that the true difference is worse than what we're willing to accept.

In the case that the confidence interval not only lies above the non-inferiority margin but also above zero, we can conclude that the variant is also superior to the control.

Non-Inferiority Region

In the context of non-inferiority tests, the non-inferiority region is a range of values within which the difference between the variant and control is considered acceptable.

This region is defined by the non-inferiority margin, which is the maximum acceptable difference between the control and the new condition. If the true difference lies within the non-inferiority region, we can conclude that the new condition is not inferior to the control.

For example, if we set a non-inferiority margin of 2%, the non-inferiority region would be all values higher than -2%. If the true difference lies within this region, we can conclude that the new version (variant) is not more than 2% worse than the control, and therefore it is not inferior.

It's important to note that the choice of the non-inferiority margin (and therefore the non-inferiority region) should be based on both statistical considerations and subject-matter knowledge. It should represent a meaningful and acceptable difference in the specific context of the test.

non-inferiority region
Non-Inferiority Region

No Non-Inferiority

Even in the case that the test measured a difference that is higher than the non-inferiority margin, if the confidence interval of the difference between variant and control doesn't entirely lie above the non-inferiority margin, we cannot conclude non-inferiority.

no non-inferiority
No Non-Inferiority

Non-Inferiority

In the case that the confidence interval of the difference between variant and control lies entirely above the non-inferiority margin and within the 0, we can conclude non-inferiority.

non-inferiority
Non-Inferiority

Non-Inferiority and Superiority

In the case that the confidence interval of the difference between variant and control lies entirely above the non-inferiority margin and doesn't overlap with the 0, we can conclude non-inferiority and superiority for the variant.

non-inferiority and superiority
Non-Inferiority and Superiority

Benefits of Non-Inferiority Design

Efficiency: Non-inferiority tests are more efficient in terms of sample size requirements compared to traditional superiority tests: they typically require fewer samples compared to the same test run with a superiority design. Faster results = more tests run in the same period.

Cost-effectiveness: With potentially smaller sample sizes and faster results, non-inferiority tests can be more cost-effective compared to superiority tests.

Easier decision-making: Non-inferiority tests allow for a more flexible interpretation of study results. Even if the variant does not demonstrate superiority over the existing one, it may still be considered acceptable if it meets the predetermined non-inferiority criteria. This flexibility can be valuable in situations where achieving superiority may be challenging or unnecessary.

Non-Inferiority Tests: How to Calculate Minimum Sample Size and Significance

ABTestResult.com provides powerful calculators to help you determine the minimum sample size required for a non-inferiority test and perform the test analysis.

For power analysis you will be able to set all the necessary parameters: non-inferiority margin, significance level, power, and baseline conversion rate. The calculator will then provide you with the minimum sample size required for the test to be valid.

In order to analyse the test, you will be able to input the test data and the calculator will provide you with the confidence interval and the statistical significance for the test.

Sections

Whether you're here to learn, do pre-test analysis or post-test analysis, the cards below link to the most relevant page.
test analysis
Test Analysis
Use the power of statistics to analyse your a/b test. In a few clicks, you'll be able to find out about statistical significance, confidence intervals, sample ratio mismatch and much more...
sample sizing
Sample Sizing
Pre-test analysis: find out how many samples you need to run your experiment. It allows for different power and confidence levels, multiple variations (with Šidák correction) and multiple test setups.
utilities
Utilities
In this section you'll find a few useful standalone calculators: standard deviation from group of observations, normality checker for test data, sample ratio mismatch...
resources
Resources
Interested to know more about the theory behind a/b testing? Here you can find interesting reads from some of the most authoritative sources and documentation to upskill yourself on this topic.

References