What is the best experimentation team structure?

There is no single best structure. The right model depends on your company size, experimentation maturity, available resources, and culture. Most organizations start with a centralized team and evolve toward a Center of Excellence or hybrid model as they scale. The best structure is the one that matches where your organization is right now.

What is an experimentation Center of Excellence (CoE)?

A Center of Excellence is a central team that sets standards, builds shared tools, manages governance, and trains the rest of the organization. Unlike a purely centralized team, the CoE does not run every experiment itself. Instead, it enables product teams to run their own experiments while maintaining consistency and quality across the organization.

When should we move from a centralized to a decentralized experimentation model?

The typical trigger is when your centralized team becomes a bottleneck. If demand for experiments exceeds what the central team can handle (usually around 200 to 500 experiments per year), it is time to start distributing ownership. You will also need strong data literacy across teams and robust shared tooling before making the shift.

How many experiments should a team run per year?

It depends on your maturity level. Early-stage programs typically run 12 to 25 experiments per year. Growing programs run 60 to 180. Mature organizations with multiple teams run 600 to 2,500 or more. Companies like Booking.com and Spotify run tens of thousands annually across hundreds of teams.

What roles do I need on an experimentation team?

At minimum, you need someone who can design experiments (hypothesis and methodology), someone who can implement them (developer or engineer), and someone who can analyse the results (data analyst or scientist). As you scale, you will add dedicated roles like an experimentation program manager, UX researcher, data engineer, and experimentation ambassadors embedded in product teams.

How do companies like Booking.com and Netflix structure their experimentation teams?

Booking.com uses a fully decentralized model where any employee can launch an experiment without management approval, supported by a centralized experimentation platform. Netflix has a similar approach with a central experimentation platform team (XP) that builds infrastructure, while data science teams embedded across the company design and analyse experiments. Both rely on strong central tooling to enable decentralized execution.

What is the hub-and-spoke model for experimentation?

The hub-and-spoke model has a central hub that owns the experimentation platform, tooling, standards, and governance. Spoke teams are embedded within specific business units or product areas, with deep domain expertise. Spokes report into their business unit but maintain a dotted-line relationship back to the hub for methodology and quality standards.

What are the biggest mistakes companies make when building experimentation teams?

The most common mistakes are: treating experimentation as validation rather than learning, running too few experiments to build momentum, overcomplicating the process so teams avoid it, lacking leadership buy-in, and not investing in shared tooling. Another frequent issue is building a Center of Excellence that falls back into a centralized bottleneck because product teams rely on central experts instead of running tests themselves.

Experimentation Team Models

Five ways to structure your experimentation team, from startup to enterprise. Find the model that fits your organisation.

Andrea Corvi

Last updated: 31 January 2026

How you structure your experimentation team shapes everything: how many tests you run, how reliable the results are, how fast teams learn, and whether experimentation actually changes how decisions get made.

There is no universal answer. A three-person startup and a 10,000-person enterprise need fundamentally different setups. What works at one stage often breaks at the next.

This guide covers the five main models companies use, the trade-offs behind each, and practical guidance for choosing the right one. We also look at how organisations typically evolve between models as they grow, and the key roles you will need along the way.

At a Glance: Comparing the Five Models

Model	Velocity	Quality	Best For	Main Risk
Centralized	Low to moderate	High	Small/mid-size, early programs	Bottleneck at scale
Decentralized	Very high	Variable	Large, mature orgs	Inconsistency and chaos
Center of Excellence	High	High	Medium to large, scaling up	Falling back to centralized
Hub-and-Spoke	High	High	Multi-product enterprises	Communication gaps
Embedded	Very high per team	Variable	Strong product-led orgs	Knowledge silos

1. Centralized Experimentation Team

A single, dedicated team of specialists runs all experiments for the entire organisation. They own the full workflow: strategy, research, hypothesis creation, test design, development, deployment, analysis, and reporting. No one else in the company touches experiments.

This team typically reports to Marketing, Product, or a Digital/Growth function. It usually consists of 3 to 10 people depending on company size: a mix of data scientists, analysts, UX researchers, developers, and a program manager.

Other departments submit ideas to the central team, which prioritises and executes them. The team spends 100% of their time on experimentation and CRO.

Advantages

Deep specialisation and concentrated expertise
Consistent methodology and quality across all tests
Centralised knowledge base prevents duplicated work
Easier to get started with a small, focused team
Strong quality control and statistical rigour

Disadvantages

Becomes a bottleneck as demand grows (breaks at 200+ tests/year)
Territorial conflicts with product teams who own the same areas
Product teams feel disempowered and excluded from testing
Experimentation culture stays confined to one team
Central team may lack deep context on specific product areas

Best for

Early-stage experimentation programs still proving their value. Small to mid-size companies with limited resources. Organisations running fewer than 200 experiments per year. RS Group (FTSE 100) initially used a centralised team but found it could only test 5% of shipped features before restructuring.

2. Decentralized Experimentation

Experimentation responsibility is distributed across the organisation. There is no central experimentation team. Each product, marketing, or engineering team owns and runs their own experiments with their own analysts or data scientists. Teams independently ideate, design, build, launch, and analyse their tests.

This is how the most experiment-heavy companies in the world operate. At Booking.com, any employee can launch an experiment on millions of customers without management permission. They run over 1,000 experiments simultaneously. Amazon's two-pizza teams (5 to 8 people) function as autonomous mini-startups, each owning end-to-end experimentation.

Jeff Bezos famously said: "Our success is a function of the number of experiments we run per year, per month, per day."

Advantages

Maximum speed with no queues or dependencies on a central team
Deep domain context leads to better hypotheses
Product teams feel ownership over experiment outcomes
Scales to thousands of experiments per year
Creates a company-wide experimentation culture

Disadvantages

No centralised oversight makes quality control difficult
Some teams go rogue with poorly designed experiments
Knowledge silos and duplicated work across teams
Experiment interference when teams test overlapping user groups
Statistical rigour risks without shared standards

Best for

Large, mature organisations with strong existing data cultures. Companies that already operate in a decentralised manner across other functions. Requires strong shared tooling and platform infrastructure. Spotify runs tens of thousands of experiments annually across 300+ teams using this model.

3. Center of Excellence (CoE)

The Center of Excellence combines the best parts of centralized and decentralized models. A dedicated central team sets standards, builds shared tools, manages governance, shares best practices, and measures overall program impact. Meanwhile, individual product teams own their own testing roadmaps and execute experiments relevant to their domain.

The central team's mission is enablement, not execution. Think "teach someone to fish." The CoE governs governance itself: everyone uses the same tools, technologies, and processes, and all experiments are recorded in a central operating system. But the CoE typically does not come up with test ideas, build tests, or ship features.

This is the model adopted by many of the most mature experimentation organisations. As Harvard Business School professor Stefan Thomke observed when studying Booking.com: "The centralising of our experimentation infrastructure is what makes our organisational decentralisation possible."

Advantages

Combines central expertise with decentralised speed
Scales without the central team becoming a bottleneck
Systematic knowledge sharing across the organisation
Consistent tooling and methodology everywhere
Actively drives cultural transformation toward experimentation

Disadvantages

Requires significant investment in both central and distributed teams
Tends to fall back into centralized bottleneck if not managed well
Without strong leadership support, can be seen as overhead
More moving parts and coordination complexity
Product teams need baseline experimentation knowledge to benefit

Alignment

Experimentation roadmaps ladder up to company-wide growth goals and KPIs

Governance

Clear ownership, accountability, and strategic alignment for every test

Democratisation

Anyone in the organisation can submit ideas and contribute hypotheses

Transparency

All experiment results and learnings are accessible to everyone centrally

Recognition

Experimentation contributions are rewarded and publicly acknowledged

Best for

Medium to large enterprises scaling from a centralized model. Organisations with leadership buy-in for broader investment in experimentation. Companies running 200 to 1,000+ experiments per year. FARFETCH, RS Group, and Optimizely all use variations of this model.

4. Hub-and-Spoke Model

The hub-and-spoke model has a central "hub" that owns the experimentation platform, tooling, process standards, and governance. Distributed "spoke" teams are embedded within specific business domains or product areas. Each spoke specialises in its domain, works with local stakeholders, and drives experiments relevant to their area.

The key structural difference from a standard CoE is the reporting relationship. Spokes report directly into their business unit with a dotted-line back to the hub. The hub gathers standardised metric reporting from each spoke to monitor overall program health.

Netflix uses this approach: a central experimentation platform team builds infrastructure, while data science teams across the company design and analyse experiments independently. Discover Financial operates with six spoke teams, five embedded in lines of business, each with a dotted line back to the hub.

Advantages

New spoke teams scale without requiring hub growth
Spokes develop deep domain expertise in their business area
Central standards with local flexibility
Clear career paths between spokes and the hub

Disadvantages

Cross-spoke learning requires deliberate mechanisms
Dual reporting creates priority ambiguity
Without strong hub governance, spokes can drift apart
Requires a mature, well-resourced hub to support multiple spokes

5. Embedded Specialists

Experimentation specialists (data scientists, analysts, experimentation engineers) are placed directly within product teams. They sit with the team, attend standups, share the same objectives, and run experiments as part of the normal product development workflow.

The distinction from pure decentralized is that these are dedicated experimentation specialists placed within product teams, not generalists wearing an experimentation hat on the side.

Airbnb was an early adopter. They hired their first data scientist, Riley Newman, as an early employee, and data scientists are allocated to specific customer journeys to solve business problems alongside PMs and engineers. Spotify's squads operate as mini-startups with embedded data practitioners, backed by a central experimentation platform.

Advantages

Deep product context leads to better experiment design
No handoffs or queues; the specialist moves at the team's pace
Strong alignment between experiments and product goals
Experimentation becomes part of daily workflow, not a separate process

Disadvantages

Specialists become isolated from peers, stunting growth
Variable quality across teams with no shared standards
Expensive: requires hiring specialists for every team
Learnings stay locked within individual teams
Nobody sees the big picture of experimentation across the organisation

How Companies Evolve Between Models

Most organisations follow a predictable path. They don't jump straight to a decentralized model. They earn their way there through progressive stages of maturity, and each stage has a natural team structure.

Crawl

Centralized

1 to 25/year

A small centralized team is formed. Focus is on proving value, establishing processes, and building credibility with leadership. One person may play multiple roles.

Walk

CoE forming

25 to 60/year

The central team starts training others, creating documentation, and building self-service tools. Product teams begin participating. A champion network forms.

Run

Full CoE

60 to 600/year

The CoE governs standards while multiple product teams run their own tests. Velocity scales to 5 to 10 tests per month per team. Central team focuses on platform, training, and governance.

Fly

Decentralized

600 to 10,000+/year

Experimentation is embedded in every team's workflow. The central function shifts to pure platform and infrastructure. Experimentation is simply how work gets done.

When to transition

Centralisation typically breaks down at around 200 to 500 experiments per year. The triggers for transition are: the central team is a bottleneck, product teams are demanding testing autonomy, leadership is committed to scaling experimentation, and you have enough experimentation-capable people across the organisation to distribute ownership.

Key Roles in Experimentation Teams

At the earliest stages, one person may fill several of these roles. As you scale, they become distinct positions. Here are the core functions you will need.

Experimentation Program Manager

Owns the overall program strategy, KPIs, and OKRs. Sets governance standards, manages stakeholder communication, synthesises cross-team learnings, and evangelises experimentation culture.

Data Scientist

Designs experiments with proper statistical methodology. Analyses results rigorously, guards against common pitfalls like peeking and underpowered tests, and advances the team's methodology.

Experimentation Engineer

Implements tests in code through feature flags and A/B test configurations. Builds and maintains the experimentation platform. Handles QA, tracking, and experiment conflict management.

UX Researcher

Conducts qualitative research to generate hypotheses. Creates mockups for test variations. Ensures experiments align with user experience principles and defines success criteria.

Data Analyst

Analyses experiment data and presents findings in actionable form. Creates reports and dashboards for stakeholders. Monitors ongoing experiments and flags anomalies.

Experimentation Ambassador

A non-dedicated role: someone within a product team who advocates for experimentation. Connects their team to the CoE for best practices and identifies pain points. Essential for the CoE model.

Common Pitfalls to Avoid

Regardless of which model you choose, these are the mistakes that sink experimentation programs most often.

Treating experiments as validation

The number one mistake. If teams only run tests to confirm decisions already made, the program will stall. Experimentation should be about learning, not proving you were right.

Running too few experiments

Programs die from inactivity. If you are only running a handful of tests per quarter, you will never build enough momentum to demonstrate value or learn at a meaningful rate.

Overcomplicating the process

If running an experiment requires 15 approval steps and three weeks of setup, teams will avoid it. Make the process as frictionless as possible. You will learn more from ten imperfect tests than one perfect test that never launches.

Lack of leadership buy-in

Without executive support, experimentation becomes a box-ticking exercise. Leaders need to set the example by consulting experiment data before making decisions.

Not trusting the data

If nobody trusts the data, nobody will trust the experiments. Data quality and instrumentation must be rock solid before you can expect teams to act on results.

CoE falling back to centralized

The most common CoE failure. Distributed teams start relying on central experts instead of running tests themselves. You end up with a centralized team that just has a fancier name.

Which Model Should You Choose?

The answer depends on where you are today, not where you want to be in three years. Here are the signals to look for.

Go centralized

You are just getting started with experimentation. You need to prove value to leadership before investing more. You have limited budget and can only afford one small team. You are running fewer than 100 experiments per year.

Build a Center of Excellence

Your centralized team has become a bottleneck. You have leadership buy-in for broader investment. Multiple product teams are requesting experimentation capability. You are running 200 to 500+ experiments per year and need to scale.

Go decentralized or embedded

Your organisation already operates autonomously across other functions. You have strong data literacy everywhere. Teams have a mature understanding of experimentation. You are targeting 1,000+ experiments per year and need maximum velocity.

Use hub-and-spoke

You have multiple distinct business units or product lines. You need domain-specific experimentation with central governance. You want spokes to scale independently without scaling the hub every time.

Velocity Benchmarks by Maturity Level

Maturity	Team Size	Experiments/Month	Experiments/Year	Typical Model
Crawl	1 to 3	1 to 2	~12 to 25	Centralized
Walk	3 to 5	2 to 5	~25 to 60	Centralized
Run	5 to 10+	5 to 15	~60 to 180	CoE / Hub-and-Spoke
Fly	10+ across teams	50 to 200+	~600 to 2,500+	Decentralized / CoE
Elite	Hundreds of teams	1,000s	10,000s+	Decentralized + platform

Share this article

Help others build better experimentation teams

If you found this article valuable, share it with your team and help spread better A/B testing practices.

Related Resources

Significance Calculator

Analyze your experiment results.

Sample Size Calculator

Plan experiments with proper power analysis.

MDE Calculator

Determine your minimum detectable effect.

Guardrail Metrics

Protect experiments from hidden harm.

Practical Significance

Learn when a result actually matters.

Online Courses

Structured learning paths for experimentation.

Frequently Asked Questions

Build Your Experimentation Practice

Use our tools to plan, run, and analyse experiments with statistical rigour.

Significance Calculator

Sample Size Calculator

Experimentation Team Models

Five ways to structure your experimentation team, from startup to enterprise. Find the model that fits your organisation.

Andrea Corvi

Last updated: 31 January 2026

There is no universal answer. A three-person startup and a 10,000-person enterprise need fundamentally different setups. What works at one stage often breaks at the next.

At a Glance: Comparing the Five Models

Model	Velocity	Quality	Best For	Main Risk
Centralized	Low to moderate	High	Small/mid-size, early programs	Bottleneck at scale
Decentralized	Very high	Variable	Large, mature orgs	Inconsistency and chaos
Center of Excellence	High	High	Medium to large, scaling up	Falling back to centralized
Hub-and-Spoke	High	High	Multi-product enterprises	Communication gaps
Embedded	Very high per team	Variable	Strong product-led orgs	Knowledge silos

1. Centralized Experimentation Team

Other departments submit ideas to the central team, which prioritises and executes them. The team spends 100% of their time on experimentation and CRO.

Advantages

Deep specialisation and concentrated expertise
Consistent methodology and quality across all tests
Centralised knowledge base prevents duplicated work
Easier to get started with a small, focused team
Strong quality control and statistical rigour

Disadvantages

Becomes a bottleneck as demand grows (breaks at 200+ tests/year)
Territorial conflicts with product teams who own the same areas
Product teams feel disempowered and excluded from testing
Experimentation culture stays confined to one team
Central team may lack deep context on specific product areas

Best for

2. Decentralized Experimentation

Jeff Bezos famously said: "Our success is a function of the number of experiments we run per year, per month, per day."

Advantages

Maximum speed with no queues or dependencies on a central team
Deep domain context leads to better hypotheses
Product teams feel ownership over experiment outcomes
Scales to thousands of experiments per year
Creates a company-wide experimentation culture

Disadvantages

No centralised oversight makes quality control difficult
Some teams go rogue with poorly designed experiments
Knowledge silos and duplicated work across teams
Experiment interference when teams test overlapping user groups
Statistical rigour risks without shared standards

Best for

3. Center of Excellence (CoE)

Advantages

Combines central expertise with decentralised speed
Scales without the central team becoming a bottleneck
Systematic knowledge sharing across the organisation
Consistent tooling and methodology everywhere
Actively drives cultural transformation toward experimentation

Disadvantages

Requires significant investment in both central and distributed teams
Tends to fall back into centralized bottleneck if not managed well
Without strong leadership support, can be seen as overhead
More moving parts and coordination complexity
Product teams need baseline experimentation knowledge to benefit

Alignment

Experimentation roadmaps ladder up to company-wide growth goals and KPIs

Governance

Clear ownership, accountability, and strategic alignment for every test

Democratisation

Anyone in the organisation can submit ideas and contribute hypotheses

Transparency

All experiment results and learnings are accessible to everyone centrally

Recognition

Experimentation contributions are rewarded and publicly acknowledged

Best for

4. Hub-and-Spoke Model

Advantages

New spoke teams scale without requiring hub growth
Spokes develop deep domain expertise in their business area
Central standards with local flexibility
Clear career paths between spokes and the hub

Disadvantages

Cross-spoke learning requires deliberate mechanisms
Dual reporting creates priority ambiguity
Without strong hub governance, spokes can drift apart
Requires a mature, well-resourced hub to support multiple spokes

5. Embedded Specialists

The distinction from pure decentralized is that these are dedicated experimentation specialists placed within product teams, not generalists wearing an experimentation hat on the side.

Advantages

Deep product context leads to better experiment design
No handoffs or queues; the specialist moves at the team's pace
Strong alignment between experiments and product goals
Experimentation becomes part of daily workflow, not a separate process

Disadvantages

Specialists become isolated from peers, stunting growth
Variable quality across teams with no shared standards
Expensive: requires hiring specialists for every team
Learnings stay locked within individual teams
Nobody sees the big picture of experimentation across the organisation

How Companies Evolve Between Models

Crawl

Centralized

1 to 25/year

A small centralized team is formed. Focus is on proving value, establishing processes, and building credibility with leadership. One person may play multiple roles.

Walk

CoE forming

25 to 60/year

The central team starts training others, creating documentation, and building self-service tools. Product teams begin participating. A champion network forms.

Run

Full CoE

60 to 600/year

The CoE governs standards while multiple product teams run their own tests. Velocity scales to 5 to 10 tests per month per team. Central team focuses on platform, training, and governance.

Fly

Decentralized

600 to 10,000+/year

Experimentation is embedded in every team's workflow. The central function shifts to pure platform and infrastructure. Experimentation is simply how work gets done.

When to transition

Key Roles in Experimentation Teams

At the earliest stages, one person may fill several of these roles. As you scale, they become distinct positions. Here are the core functions you will need.

Experimentation Program Manager

Owns the overall program strategy, KPIs, and OKRs. Sets governance standards, manages stakeholder communication, synthesises cross-team learnings, and evangelises experimentation culture.

Data Scientist

Designs experiments with proper statistical methodology. Analyses results rigorously, guards against common pitfalls like peeking and underpowered tests, and advances the team's methodology.

Experimentation Engineer

Implements tests in code through feature flags and A/B test configurations. Builds and maintains the experimentation platform. Handles QA, tracking, and experiment conflict management.

UX Researcher

Conducts qualitative research to generate hypotheses. Creates mockups for test variations. Ensures experiments align with user experience principles and defines success criteria.

Data Analyst

Analyses experiment data and presents findings in actionable form. Creates reports and dashboards for stakeholders. Monitors ongoing experiments and flags anomalies.

Experimentation Ambassador

A non-dedicated role: someone within a product team who advocates for experimentation. Connects their team to the CoE for best practices and identifies pain points. Essential for the CoE model.

Common Pitfalls to Avoid

Regardless of which model you choose, these are the mistakes that sink experimentation programs most often.

Treating experiments as validation

The number one mistake. If teams only run tests to confirm decisions already made, the program will stall. Experimentation should be about learning, not proving you were right.

Running too few experiments

Programs die from inactivity. If you are only running a handful of tests per quarter, you will never build enough momentum to demonstrate value or learn at a meaningful rate.

Overcomplicating the process

Lack of leadership buy-in

Without executive support, experimentation becomes a box-ticking exercise. Leaders need to set the example by consulting experiment data before making decisions.

Not trusting the data

If nobody trusts the data, nobody will trust the experiments. Data quality and instrumentation must be rock solid before you can expect teams to act on results.

CoE falling back to centralized

The most common CoE failure. Distributed teams start relying on central experts instead of running tests themselves. You end up with a centralized team that just has a fancier name.

Which Model Should You Choose?

The answer depends on where you are today, not where you want to be in three years. Here are the signals to look for.

Go centralized

Build a Center of Excellence

Go decentralized or embedded

Use hub-and-spoke

Velocity Benchmarks by Maturity Level

Maturity	Team Size	Experiments/Month	Experiments/Year	Typical Model
Crawl	1 to 3	1 to 2	~12 to 25	Centralized
Walk	3 to 5	2 to 5	~25 to 60	Centralized
Run	5 to 10+	5 to 15	~60 to 180	CoE / Hub-and-Spoke
Fly	10+ across teams	50 to 200+	~600 to 2,500+	Decentralized / CoE
Elite	Hundreds of teams	1,000s	10,000s+	Decentralized + platform