Non-inferiority tests , often overlooked even by seasoned experimenters, are valuable not just in drug testing but also within the digital world.
While much online literature focuses on hypotheses aiming to demonstrate the variant's superiority over control, this is not always the case in the digital product space.
My appreciation for non-inferiority test designs started when I realized that certain tests I conducted didn't follow the expected B>A outcome for full-scale deployment.
Indeed, some product changes follow a less strict path to production : even if a change isn't clearly superior, factors such as compliance, operational efficiency, or senior leadership's input can push for its implementation.
This is precisely where non-inferiority tests shine, offering a more fitting test design than simple superiority tests.
A non-inferiority experiment tries to show that the change(s) included in the variant are not unacceptably worse than the control.
The “unacceptably” term here introduces a very important parameter that needs to be set in a non-inferiority test design: the non-inferiority margin.
The non-inferiority margin represents the maximum difference that we're willing to tolerate between the control and variant groups while still considering the variant to be non-inferior.
Consider a common practical scenario: suppose that the business has made the decision to change the framework used to build a website (e.g. migrating to Vue.JS).
The decision has already been taken and engineering work has started: given the significant investment, also a few changes to the user experience are being introduced as part of the work.
Let's assume that the business operates an e-commerce website and the primary KPI is sales.
The business is assuming that the new website will also bring more sales but we know, as experimenters, that is unfortunately not always the case, right Marks & Spencer? Shareholders attack Marks & Spencer as website revamp loses customers
In this hypothetical example, a non-inferiority test design might help the business to set the right expectations, define a clear go-live plan (with the associated maximum hit in sales willing to accept) and to understand the potential impact of the new website on sales before the full-scale deployment.
For instance, it could be agreed beforehand, during pre-test analysis, that sales shouldn't dip below by more than 2% compared to the previous design.
Here, the 2% non-inferiority margin represents the maximum acceptable reduction in sales attributed to the new website.
It's common to question why we would tolerate a 2% loss rather than reverting to the previous design. However, sometimes rolling back isn't feasible, and the benefits of the new design outweigh the decrease in sales. For example, the redesigned site might offer lower maintenance costs, streamline workflows, or reduce the time required for future product updates.
In a non-inferiority test, we aim to demonstrate that the variant (change) is not worse than the control by more than a pre-specified margin. This margin, as we have seen before, is referred to as the non-inferiority margin.
Confidence intervals play a crucial role in this process. A confidence interval provides a range of values, derived from the statistical data, which is likely to contain the true value of the measure of interest. In the context of a non-inferiority test, we are interested in the difference between the variant and control.
If the entire confidence interval for the difference between the variant and control lies above the negative non-inferiority margin, we can conclude that the variant is not inferior to the control. This is because we can be confident that the true difference is not worse than the non-inferiority margin.
However, if the confidence interval overlaps with the non-inferiority margin (i.e., it includes values worse than the non-inferiority margin), we cannot conclude non-inferiority. This is because we cannot rule out the possibility that the true difference is worse than what we're willing to accept.
In the case that the confidence interval not only lies above the non-inferiority margin but also above zero, we can conclude that the variant is also superior to the control.
In the context of non-inferiority tests, the non-inferiority region is a range of values within which the difference between the variant and control is considered acceptable.
This region is defined by the non-inferiority margin, which is the maximum acceptable difference between the control and the new condition. If the true difference lies within the non-inferiority region, we can conclude that the new condition is not inferior to the control.
For example, if we set a non-inferiority margin of 2%, the non-inferiority region would be all values higher than -2%. If the true difference lies within this region, we can conclude that the new version (variant) is not more than 2% worse than the control, and therefore it is not inferior.
It's important to note that the choice of the non-inferiority margin (and therefore the non-inferiority region) should be based on both statistical considerations and subject-matter knowledge. It should represent a meaningful and acceptable difference in the specific context of the test.
Even in the case that the test measured a difference that is higher than the non-inferiority margin, if the confidence interval of the difference between variant and control doesn't entirely lie above the non-inferiority margin, we cannot conclude non-inferiority.
In the case that the confidence interval of the difference between variant and control lies entirely above the non-inferiority margin and within the 0, we can conclude non-inferiority.
In the case that the confidence interval of the difference between variant and control lies entirely above the non-inferiority margin and doesn't overlap with the 0, we can conclude non-inferiority and superiority for the variant.
Efficiency: Non-inferiority tests are more efficient in terms of sample size requirements compared to traditional superiority tests: they typically require fewer samples compared to the same test run with a superiority design. Faster results = more tests run in the same period.
Cost-effectiveness: With potentially smaller sample sizes and faster results, non-inferiority tests can be more cost-effective compared to superiority tests.
Easier decision-making: Non-inferiority tests allow for a more flexible interpretation of study results. Even if the variant does not demonstrate superiority over the existing one, it may still be considered acceptable if it meets the predetermined non-inferiority criteria. This flexibility can be valuable in situations where achieving superiority may be challenging or unnecessary.
ABTestResult.com provides powerful calculators to help you determine the minimum sample size required for a non-inferiority test and perform the test analysis.
For power analysis you will be able to set all the necessary parameters: non-inferiority margin, significance level, power, and baseline conversion rate. The calculator will then provide you with the minimum sample size required for the test to be valid.
In order to analyse the test, you will be able to input the test data and the calculator will provide you with the confidence interval and the statistical significance for the test.