Glossary

Welcome to our comprehensive A/B testing glossary, your go-to resource for mastering the language of experimentation and optimisation. Whether you're a seasoned expert or just starting your journey into the world of data-driven decision-making, this glossary serves as a valuable reference guide. Explore clear and concise definitions for a wide range of terms, from foundational concepts to advanced statistical techniques, all tailored specifically for the realm of A/B testing and conversion rate optimisation. Empower yourself with the knowledge to navigate this field confidently and make informed choices that drive meaningful results for your business.

Definitions

A

A/A Test

An A/A test is a type of experiment in which the control and treatment groups receive the same experience. It is used to validate the testing framework and ensure that there are no systemic issues or biases.

A/B Test

An A/B test is a controlled experiment that compares two variants (A and B) to determine which one performs better for a specific goal or metric.

A/B Testing

A/B testing, also known as split testing or bucket testing, is a method of comparing two versions of a web page, app, or other experience to determine which one performs better.

A/B/n Test

An A/B/n test is a variation of A/B testing where multiple variants (n) are tested against a control (A) to determine the best-performing variation.

Analytics

The systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data.

ARPU

Average Revenue Per User (ARPU) is a metric that calculates the average revenue generated per unique user over a given period.

Absolute Difference

The absolute difference is the magnitude of the difference between two values, ignoring the sign.

Alternative Hypothesis

The alternative hypothesis (H1) is the hypothesis that the researcher hopes to support with evidence from the experiment. It represents the possibility of a difference or effect between the control and treatment groups.

Average

The average, also known as the mean, is a measure of central tendency that represents the sum of all values divided by the total number of values.

Average Order Value

Average Order Value (AOV) is a metric that calculates the average amount spent per order or transaction.

Average Revenue Per User

Average Revenue Per User (ARPU) is a metric that calculates the average revenue generated per unique user over a given period.

Alpha

Alpha (α) is the significance level or the probability of a Type I error, which is the probability of rejecting the null hypothesis when it is true.

B

Bayesian Inference

Bayesian inference is a statistical method that uses prior knowledge or beliefs to update the probability of an event occurring based on new data or evidence.

Before and After

A type of experiment that compares the performance of a metric before and after a change or treatment is introduced.

Binomial Metric

A binomial metric is a metric that has only two possible outcomes, such as success or failure, conversion or non-conversion.

Bonferroni Correction

The Bonferroni correction is a statistical adjustment used to reduce the chances of obtaining false-positive results (Type I errors) when conducting multiple hypothesis tests.

Bounce Rate

Bounce rate is a metric that measures the percentage of visitors who leave a website after viewing only a single page.

Beta

Beta (β) is the probability of a Type II error, which is the probability of failing to reject the null hypothesis when the alternative hypothesis is true.

Bucketing

The process of randomly assigning users or traffic to different groups or variations in an A/B test.

C

Causal Inference

Causal inference is the process of drawing conclusions about causal connections from data, using statistical methods and reasoning to determine whether a cause-and-effect relationship exists between variables.

Chi-Square Test

The chi-square test is a statistical test used to determine if there is a significant difference between the observed and expected frequencies of a categorical variable.

Click-Through

The process of clicking through an online advertisement to the advertiser's destination.

Click-Through Rate

Click-Through Rate (CTR) is a metric that measures the ratio of clicks on a specific link or advertisement to the number of impressions or views.

Cohort Analysis

A subset of behavioral analytics that takes the data from a given dataset and rather than looking at all users as one unit, it breaks them into related groups for analysis.

Confidence Interval

A confidence interval is a range of values that is likely to contain the true population parameter with a certain level of confidence.

Conversion Funnel Optimisation

Conversion funnel optimisation is the process of improving the flow of users or customers through a multi-step process, such as a website's checkout or registration flow, by identifying and addressing bottlenecks or points of high drop-off.

Conversion Rate

Conversion rate is a metric that measures the percentage of visitors who complete a desired action, such as making a purchase or filling out a form.

Control Group

The control group is the baseline or unchanged version of an experience against which the treatment group is compared.

Correlation

Correlation is a statistical measure that indicates the strength and direction of the relationship between two variables.

CTA

Call To Action (CTA) is a prompt on a website that tells the user to take some specified action, such as "Sign Up", "Buy Now", or "Click Here".

D

Data Distribution

Data distribution refers to the pattern or shape of the data points in a dataset, which can be visualised using histograms, box plots, or other graphical representations.

Data-Driven Decision Making

The process of making organisational decisions based on actual data rather than intuition or observation alone.

Decision Trees

Decision trees are a type of machine learning model that uses a tree-like structure to represent decisions and their potential consequences. They are used for tasks such as classification, regression, and making predictions based on a set of input features.

Delta

The difference or change in a metric between the control group and a treatment group in an A/B test.

Dependent Variable

A dependent variable is the variable being measured or observed in an experiment, and its value depends on the independent variable.

Descriptive Statistics

Descriptive statistics are summary measures that describe the characteristics of a dataset, such as mean, median, mode, standard deviation, and quartiles.

Difference-in-Differences

Difference-in-differences is a statistical technique used to estimate the effect of a specific intervention or treatment by comparing the changes in outcomes over time between a treated group and a control group.

E

Effect Size

Effect size is a quantitative measure of the magnitude or strength of a relationship or difference between two groups or variables.

Engagement Rate

A metric that measures the level of engagement that a piece of created content is receiving from an audience.

Evaluation

The process of determining what changes the user is supposed to see, based on the targeting rules of the experiments.

Experiment

An experiment is a controlled study or test conducted to investigate the effect of one or more independent variables on a dependent variable.

Experiment Duration

Experiment duration is the length of time an A/B test or experiment is run to collect data and observe the effects of the variations.

Exposure

An exposed user is a user who has been shown or exposed to the treatment or variation being tested in an A/B experiment.

Exposure Ratio

The exposure ratio is the ratio of users exposed to the treatment group compared to the control group in an A/B test.

F

False Discovery Rate

The false discovery rate (FDR) is a statistical method used to control for multiple comparisons and reduce the chance of false positives in hypothesis testing.

False Negative

A false negative is an incorrect result in which a test fails to identify a condition or effect that is actually present.

False Positive

A false positive is an incorrect result in which a test identifies a condition or effect that is not actually present.

Feature Flag

A technique used in software development to enable or disable features or functionalities for specific users or groups.

Fisher's Exact Test

Fisher's exact test is a statistical test used to analyse contingency tables, particularly when sample sizes are small or when the data violates the assumptions of the chi-square test.

Frequentist Inference

A philosophical approach to statistics that treats probability as a long-run frequency and relies on sample data to make inferences about populations.

Funnel Analysis

Funnel analysis is a technique used to analyse the conversion rate at each step of a multi-step process, such as a checkout or registration flow.

G

Gamification

Gamification is the application of game design elements and principles, such as points, badges, leaderboards, and challenges, in non-game contexts to encourage engagement, motivation, and desired behaviors.

Geometric Mean

The geometric mean is a type of average that is calculated by taking the nth root of the product of n numbers.

Goodness of Fit Test

A goodness of fit test is a statistical test used to determine whether a sample of data follows a hypothesized distribution or not.

Granular Metrics

Granular metrics are metrics that are broken down into smaller, more specific sub-metrics or segments, providing more detailed insights into user behavior or performance.

Group Sequential Testing

A method of conducting an experiment where the data is analysed periodically, and the experiment is stopped or continued based on predefined stopping rules or criteria.

Growth Hacking

Growth hacking is a process of rapid experimentation and data-driven strategies to identify and implement the most effective and efficient ways to grow a business or user base.

Guardrail metric

A secondary metric or KPI that is monitored during an A/B test to ensure that the variations do not have unintended negative consequences.

H

Heatmap

A graphical representation of data where the individual values contained in a matrix are represented as colors.

Hero

The hero is a large banner image prominently placed on a web page, generally in the front and center. It often includes a CTA and is used to drive visitors' attention to a primary goal.

Hypothesis

A hypothesis is a proposed explanation or assumption about a phenomenon that is tested through experimentation or observation.

Hypothesis Testing

Hypothesis testing is a statistical method used to evaluate the evidence against a null hypothesis and determine whether it should be rejected or not.

Holdout Group

A holdout group is a subset of users who are not exposed to any variation in an A/B test, serving as a control group for future experiments.

I

Impressions

The number of times a post, ad, or webpage is viewed.

Inconclusive

The result of an A/B test where there is not enough evidence to declare a winner or loser, often due to insufficient statistical power.

Independent Variable

An independent variable is a factor or condition that is manipulated or varied in an experiment to observe its effect on the dependent variable.

Inferiority Test

A statistical test used to determine whether a new treatment or intervention is worse than an existing experience.

Innovation

Innovation is the process of introducing new ideas, methods, or products that create value and drive progress. It involves identifying opportunities, generating creative solutions, and implementing them successfully.

Intent-to-Treat Analysis

Intent-to-treat analysis is a principle in A/B testing where all participants are included in the analysis regardless of whether they completed the experiment or not.

Interaction Effect

An interaction effect occurs when the effect of one independent variable on the dependent variable varies depending on the level of another independent variable.

K

Kruskal-Wallis Test

The Kruskal-Wallis test is a non-parametric statistical test used to compare the medians of three or more independent groups.

Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov test is a non-parametric statistical test used to determine if two samples are drawn from the same continuous distribution.

L

Landing Page

A standalone web page, created specifically for a marketing or advertising campaign.

Lift

Lift is a metric that measures the relative increase or decrease in a target metric (e.g., conversion rate) between the control and treatment groups in an A/B test.

Likelihood Ratio Test

The likelihood ratio test is a statistical test used to compare the fit of two nested models, one of which is a special case of the other.

Logistic Regression

Logistic regression is a statistical model used for binary classification problems, where the goal is to predict the probability of an event occurring or not, based on one or more independent variables.

Loser

The variation or treatment group that performs worse than the control group in an A/B test, based on the defined success metric.

M

Mann-Whitney U Test

The Mann-Whitney U test is a non-parametric statistical test used to compare the distributions of two independent groups or samples.

Margin of Error

A measure of the uncertainty or potential inaccuracy in a statistical estimate or result, often expressed as a range of values around the calculated value.

Maximum Likelihood Estimation

Maximum likelihood estimation is a method of estimating the parameters of a statistical model by finding the parameter values that maximize the likelihood of observing the data.

Median

The median is the middle value in a sorted dataset, dividing the data into two equal halves.

Metric

A metric is a quantifiable measure used to track and assess the performance of a product, process, or system.

Minimum Detectable Effect

The minimum detectable effect (MDE) is the smallest effect size or difference between the control and treatment groups that an experiment has a reasonable chance of detecting as statistically significant.

Minimum Effect of Interest

The smallest effect size or difference between the control and treatment groups that would be considered practically or commercially significant, even if it is statistically significant.

Mode

The mode is the value or values that occur most frequently in a dataset.

Mu (μ)

In statistics, the Greek letter Mu (μ) is used to denote the population mean, which is the average of all the values in a population. It is a parameter of the population and is unknown in most cases, but it can be estimated from a sample.

Multi-Armed Bandit

A multi-armed bandit is a problem in which a decision-maker must choose between multiple options or "arms" to maximize a reward or minimize a cost, while simultaneously learning from the outcomes of previous choices.

Multiple Comparisons Problem

The issue that arises when conducting multiple statistical tests simultaneously, increasing the probability of obtaining false-positive results (Type I errors).

Multivariate Analysis

A statistical technique used to analyse data that arises from more than one variable.

MVT

Multivariate Testing (MVT) is a technique that tests multiple variations of multiple components simultaneously to determine the best combination.

N

Negative Binomial Distribution

The negative binomial distribution is a probability distribution that models the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified number of failures occur.

Non-Inferiority Test

A non-inferiority test is a statistical test used to determine whether a new treatment or intervention is not worse than an existing standard by more than a pre-specified margin or non-inferiority margin. If you want to know more about it, we have a dedicated article that goes in-depth about non-inferiority tests.

Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is symmetric and bell-shaped, widely used in statistics and probability theory.

Null Hypothesis

The null hypothesis (H0) is the default or baseline hypothesis in statistical testing, which assumes that there is no significant difference or effect between the groups or variables being studied.

O

Observational Study

An observational study is a non-experimental study in which researchers observe and measure variables without intervening or manipulating the system.

OEC

A comprehensive metric or set of criteria used to evaluate the overall success or failure of an A/B test, often combining multiple metrics or Key Performance Indicators (KPIs) into a single measure.

One-Tailed Test

A statistical test that considers only one direction or tail of a distribution, used when the alternative hypothesis specifies the direction of the effect. It can be looking for superiority or inferiority depending on the direction of interest.

Odds Ratio

The odds ratio is a statistical measure used to quantify the association between an exposure and an outcome, often used in logistic regression and case-control studies.

Overpowered Experiment

An overpowered experiment is one with an excessively large sample size, resulting in the ability to detect even trivial or unimportant effects as statistically significant.

P

Paired T-Test

A statistical test used to compare the means of two related or paired samples, often used when the same subjects or items are measured under different conditions.

P-Value

The p-value is a measure of the strength of evidence against the null hypothesis, representing the probability of observing results as extreme as or more extreme than the observed results, assuming the null hypothesis is true.

Pareto Principle

The Pareto principle, also known as the 80/20 rule, states that roughly 80% of the effects come from 20% of the causes.

Peeking

The practice of prematurely inspecting the results of an ongoing A/B test, which can lead to biased decisions and increased risk of false-positive findings.

Permutation Test

A permutation test is a non-parametric statistical test that involves rearranging or permuting the observed data to estimate the probability of obtaining a particular result under the null hypothesis.

Personalisation

Personalisation is the process of tailoring products, services, or experiences to individual users or customers based on their preferences, behaviors, and characteristics, with the aim of providing a more relevant and engaging experience.

Placebo Effect

The placebo effect is a phenomenon in which a person experiences a perceived benefit or improvement in their condition after receiving an inert or sham treatment, due to psychological factors rather than the treatment itself.

Poisson Distribution

The Poisson distribution is a discrete probability distribution that models the number of events occurring in a fixed interval of time or space, given a known average rate and independently of the time since the last event.

Population

The entire group or set of individuals, objects, or observations that a sample is intended to represent or make inferences about.

Power

The probability of correctly rejecting the null hypothesis when the alternative hypothesis is true, or the ability of a test to detect an effect if it exists.

Power Analysis

Power analysis is a statistical technique used to determine the minimum sample size required to detect a specified effect size with a desired level of statistical power.

Practical Significance

Practical significance refers to the real-world or practical implications of a statistically significant result, considering factors such as effect size, cost, and relevance.

Propensity Analysis

Propensity analysis is a statistical technique used to estimate the likelihood or propensity of an individual or group to exhibit a particular behavior or characteristic, based on various factors or covariates.

Propensity Score Matching

Propensity score matching is a statistical technique used in causal inference to account for confounding factors and estimate the effect of a treatment or intervention by matching treated and control units based on their estimated propensity scores.

Q

Q-Q Plot

A Q-Q (quantile-quantile) plot is a graphical method used to compare the distributions of two datasets or to assess whether a dataset follows a specified theoretical distribution.

Quantile

A quantile is a value that divides a distribution into equal groups or intervals, such as quartiles, deciles, or percentiles.

Quasi-Experiment

A quasi-experiment is a type of study design that lacks random assignment of participants to treatment and control groups, but still aims to establish a cause-and-effect relationship between variables.

R

Randomization

Randomization is the process of randomly assigning participants or subjects to different treatment groups in an experiment, minimizing the potential for systematic bias.

Random Sample

A random sample is a subset of a population that is chosen in such a way that each member of the population has an equal chance of being selected.

Regression Analysis

Regression analysis is a statistical technique used to model the relationship between a dependent variable and one or more independent variables.

Relative Delta

The difference or change in a metric between the control group and a treatment group in an A/B test, expressed as a percentage of the control group value.

Relative Difference

The relative difference is a measure of the difference between two values, expressed as a proportion or percentage of the larger or reference value.

Resampling

Resampling is a statistical technique that involves repeatedly drawing samples from a dataset and analysing them to estimate the properties of a population or to test hypotheses.

Response Rate

The response rate is the percentage of users who complete a desired action or conversion in an experiment or campaign.

Risk Ratio

The risk ratio is a measure of the relative risk or probability of an event occurring in one group compared to another group, often used in epidemiological studies.

Return on Investment (ROI)

A performance measure used to evaluate the efficiency or profitability of an investment or compare the efficiency of a number of different investments.

S

Sample Ratio Mismatch (SRM)

A situation in an A/B test where the ratio of users or traffic allocated to the control and treatment groups deviates from the intended or planned ratio.

Sample Size

Sample size is the number of observations or data points included in a study or experiment.

Sampling Bias

Sampling bias occurs when the sample selected for a study or experiment is not representative of the target population, leading to inaccurate or biased results.

Scorecard

A scorecard is a visual tool used to measure and compare the performance of a project, campaign, or strategy against predefined metrics or goals.

Segmentation

The process of dividing a broad consumer or business market, normally consisting of existing and potential customers, into sub-groups of consumers (known as segments).

Sensitivity Analysis

A sensitivity analysis is a technique used to evaluate the impact of changes in input variables or assumptions on the output or results of a model or analysis.

Sequential Testing

Sequential testing is a method of conducting an experiment in which data is analysed periodically, and the experiment is stopped or continued based on predefined stopping rules or criteria.

Session

A series of interactions one user takes within a given time frame on your website.

Shapiro-Wilk Test

The Shapiro-Wilk test is a statistical test used to determine if a sample of data follows a normal distribution.

Sidak correction

A statistical adjustment method used to control for multiple comparisons and reduce the probability of false-positive results (Type I errors) when conducting multiple hypothesis tests.

Signal-to-Noise Ratio

The signal-to-noise ratio (SNR) is a measure of the strength of a desired signal relative to the level of background noise or interference.

Simpson's Paradox

Simpson's paradox is a phenomenon in which a trend or pattern observed in different groups is reversed or contradicted when the groups are combined.

Split

The process of dividing traffic or users into different groups for an A/B test.

Standard Deviation

The standard deviation is a measure of the dispersion or spread of a dataset around its mean, indicating the typical distance between data points and the mean.

Standard Error

The standard error is a measure of the accuracy or precision of a statistic or estimate, representing the expected deviation of the statistic from the true population parameter.

Statistical Power

Statistical power is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true, or the ability of a test to detect an effect if it exists.

Statistical Significance

Statistical significance refers to the likelihood that an observed result or effect is not due to chance alone, but rather represents a real or meaningful difference or relationship.

Student's t-Test

The Student's t-test is a statistical test used to determine if the means of two groups are significantly different from each other.

Success metric

The primary metric or Key Performance Indicator (KPI) used to evaluate the success of a variation in an A/B test.

Superiority Test

A statistical test used to determine whether a new treatment or intervention is superior or better than an existing standard or control.

Survivorship Bias

Survivorship bias is a logical error that occurs when analysis or conclusions are based only on the remaining or surviving entities, ignoring those that did not survive or were eliminated.

T

Test Metric

The specific metric or Key Performance Indicator (KPI) that is being measured and evaluated in an A/B test or experiment.

Test Plan

A detailed document outlining the objectives, hypotheses, metrics, sample sizes, duration, and other aspects of an A/B test or experiment.

Traffic Percentage

The proportion of traffic or users allocated to each variation in an A/B test.

Treatment Group

The group of participants or subjects who receive the experimental or new condition, treatment, or variation in an experiment.

True Negative

A correct result in which a test correctly identifies the absence of a condition or effect.

True Positive

A correct result in which a test correctly identifies the presence of a condition or effect.

T-Test for Proportions

A statistical test used to determine if the difference between two proportions or percentages is statistically significant.

Two-Tailed Test

A statistical test that considers both tails or directions of a distribution, used when the alternative hypothesis does not specify the direction of the effect.

U

Underpowered Experiment

An experiment that does not have enough statistical power to detect a meaningful effect, often due to an insufficient sample size.

Unique Visitor

A visitor to a website who is counted only once during a specified time period, regardless of how many times they visit the site.

User Experience (UX)

A person's emotions and attitudes about using a particular product, system or service.

V

Variance

A measure of the spread or dispersion of a set of data points around the mean value.

Variation

A different version or treatment that is tested against a control group in an A/B test or experiment.

Visitor

A person or consumer who visits a website.

W

Welch's T-Test

A variation of the Student's t-test that is used when the two samples have unequal variances or unequal sample sizes.

Wilcoxon Signed-Rank Test

A non-parametric statistical test used to compare two related or paired samples and determine if there is a significant difference between their median values.

Winner

The variation or treatment group that performs better than the control group in an A/B test, based on the defined success metric.

Y

Year-over-Year (YoY)

A method of evaluating performance by comparing data from one period to the same period in the previous year.

Z

Z-Score

A measure of how many standard deviations a data point is from the mean of a distribution.

Z-Test

A statistical test used to determine if the means of two populations are significantly different, based on the assumption of a normal distribution.

Sections

Whether you're here to learn, do pre-test analysis or post-test analysis, the cards below link to the most relevant page.
test analysis
Test Analysis
Use the power of statistics to analyse your a/b test. In a few clicks, you'll be able to find out about statistical significance, confidence intervals, sample ratio mismatch and much more...
sample sizing
Sample Sizing
Pre-test analysis: find out how many samples you need to run your experiment. It allows for different power and confidence levels, multiple variations (with Šidák correction) and multiple test setups.
utilities
Utilities
In this section you'll find a few useful standalone calculators: standard deviation from group of observations, normality checker for test data, sample ratio mismatch...
resources
Resources
Interested to know more about the theory behind a/b testing? Here you can find interesting reads from some of the most authoritative sources and documentation to upskill yourself on this topic.