Experimentation Team Models
Five ways to structure your experimentation team, from startup to enterprise. Find the model that fits your organisation.
How you structure your experimentation team shapes everything: how many tests you run, how reliable the results are, how fast teams learn, and whether experimentation actually changes how decisions get made.
There is no universal answer. A three-person startup and a 10,000-person enterprise need fundamentally different setups. What works at one stage often breaks at the next.
This guide covers the five main models companies use, the trade-offs behind each, and practical guidance for choosing the right one. We also look at how organisations typically evolve between models as they grow, and the key roles you will need along the way.
At a Glance: Comparing the Five Models
| Model | Velocity | Quality | Best For | Main Risk |
|---|---|---|---|---|
| Centralized | Low to moderate | High | Small/mid-size, early programs | Bottleneck at scale |
| Decentralized | Very high | Variable | Large, mature orgs | Inconsistency and chaos |
| Center of Excellence | High | High | Medium to large, scaling up | Falling back to centralized |
| Hub-and-Spoke | High | High | Multi-product enterprises | Communication gaps |
| Embedded | Very high per team | Variable | Strong product-led orgs | Knowledge silos |
1. Centralized Experimentation Team
A single, dedicated team of specialists runs all experiments for the entire organisation. They own the full workflow: strategy, research, hypothesis creation, test design, development, deployment, analysis, and reporting. No one else in the company touches experiments.
This team typically reports to Marketing, Product, or a Digital/Growth function. It usually consists of 3 to 10 people depending on company size: a mix of data scientists, analysts, UX researchers, developers, and a program manager.
Other departments submit ideas to the central team, which prioritises and executes them. The team spends 100% of their time on experimentation and CRO.
Advantages
- Deep specialisation and concentrated expertise
- Consistent methodology and quality across all tests
- Centralised knowledge base prevents duplicated work
- Easier to get started with a small, focused team
- Strong quality control and statistical rigour
Disadvantages
- Becomes a bottleneck as demand grows (breaks at 200+ tests/year)
- Territorial conflicts with product teams who own the same areas
- Product teams feel disempowered and excluded from testing
- Experimentation culture stays confined to one team
- Central team may lack deep context on specific product areas
Best for
Early-stage experimentation programs still proving their value. Small to mid-size companies with limited resources. Organisations running fewer than 200 experiments per year. RS Group (FTSE 100) initially used a centralised team but found it could only test 5% of shipped features before restructuring.
2. Decentralized Experimentation
Experimentation responsibility is distributed across the organisation. There is no central experimentation team. Each product, marketing, or engineering team owns and runs their own experiments with their own analysts or data scientists. Teams independently ideate, design, build, launch, and analyse their tests.
This is how the most experiment-heavy companies in the world operate. At Booking.com, any employee can launch an experiment on millions of customers without management permission. They run over 1,000 experiments simultaneously. Amazon's two-pizza teams (5 to 8 people) function as autonomous mini-startups, each owning end-to-end experimentation.
Jeff Bezos famously said: "Our success is a function of the number of experiments we run per year, per month, per day."
Advantages
- Maximum speed with no queues or dependencies on a central team
- Deep domain context leads to better hypotheses
- Product teams feel ownership over experiment outcomes
- Scales to thousands of experiments per year
- Creates a company-wide experimentation culture
Disadvantages
- No centralised oversight makes quality control difficult
- Some teams go rogue with poorly designed experiments
- Knowledge silos and duplicated work across teams
- Experiment interference when teams test overlapping user groups
- Statistical rigour risks without shared standards
Best for
Large, mature organisations with strong existing data cultures. Companies that already operate in a decentralised manner across other functions. Requires strong shared tooling and platform infrastructure. Spotify runs tens of thousands of experiments annually across 300+ teams using this model.
3. Center of Excellence (CoE)
The Center of Excellence combines the best parts of centralized and decentralized models. A dedicated central team sets standards, builds shared tools, manages governance, shares best practices, and measures overall program impact. Meanwhile, individual product teams own their own testing roadmaps and execute experiments relevant to their domain.
The central team's mission is enablement, not execution. Think "teach someone to fish." The CoE governs governance itself: everyone uses the same tools, technologies, and processes, and all experiments are recorded in a central operating system. But the CoE typically does not come up with test ideas, build tests, or ship features.
This is the model adopted by many of the most mature experimentation organisations. As Harvard Business School professor Stefan Thomke observed when studying Booking.com: "The centralising of our experimentation infrastructure is what makes our organisational decentralisation possible."
Advantages
- Combines central expertise with decentralised speed
- Scales without the central team becoming a bottleneck
- Systematic knowledge sharing across the organisation
- Consistent tooling and methodology everywhere
- Actively drives cultural transformation toward experimentation
Disadvantages
- Requires significant investment in both central and distributed teams
- Tends to fall back into centralized bottleneck if not managed well
- Without strong leadership support, can be seen as overhead
- More moving parts and coordination complexity
- Product teams need baseline experimentation knowledge to benefit
Alignment
Experimentation roadmaps ladder up to company-wide growth goals and KPIs
Governance
Clear ownership, accountability, and strategic alignment for every test
Democratisation
Anyone in the organisation can submit ideas and contribute hypotheses
Transparency
All experiment results and learnings are accessible to everyone centrally
Recognition
Experimentation contributions are rewarded and publicly acknowledged
Best for
Medium to large enterprises scaling from a centralized model. Organisations with leadership buy-in for broader investment in experimentation. Companies running 200 to 1,000+ experiments per year. FARFETCH, RS Group, and Optimizely all use variations of this model.
4. Hub-and-Spoke Model
The hub-and-spoke model has a central "hub" that owns the experimentation platform, tooling, process standards, and governance. Distributed "spoke" teams are embedded within specific business domains or product areas. Each spoke specialises in its domain, works with local stakeholders, and drives experiments relevant to their area.
The key structural difference from a standard CoE is the reporting relationship. Spokes report directly into their business unit with a dotted-line back to the hub. The hub gathers standardised metric reporting from each spoke to monitor overall program health.
Netflix uses this approach: a central experimentation platform team builds infrastructure, while data science teams across the company design and analyse experiments independently. Discover Financial operates with six spoke teams, five embedded in lines of business, each with a dotted line back to the hub.
Advantages
- New spoke teams scale without requiring hub growth
- Spokes develop deep domain expertise in their business area
- Central standards with local flexibility
- Clear career paths between spokes and the hub
Disadvantages
- Cross-spoke learning requires deliberate mechanisms
- Dual reporting creates priority ambiguity
- Without strong hub governance, spokes can drift apart
- Requires a mature, well-resourced hub to support multiple spokes
5. Embedded Specialists
Experimentation specialists (data scientists, analysts, experimentation engineers) are placed directly within product teams. They sit with the team, attend standups, share the same objectives, and run experiments as part of the normal product development workflow.
The distinction from pure decentralized is that these are dedicated experimentation specialists placed within product teams, not generalists wearing an experimentation hat on the side.
Airbnb was an early adopter. They hired their first data scientist, Riley Newman, as an early employee, and data scientists are allocated to specific customer journeys to solve business problems alongside PMs and engineers. Spotify's squads operate as mini-startups with embedded data practitioners, backed by a central experimentation platform.
Advantages
- Deep product context leads to better experiment design
- No handoffs or queues; the specialist moves at the team's pace
- Strong alignment between experiments and product goals
- Experimentation becomes part of daily workflow, not a separate process
Disadvantages
- Specialists become isolated from peers, stunting growth
- Variable quality across teams with no shared standards
- Expensive: requires hiring specialists for every team
- Learnings stay locked within individual teams
- Nobody sees the big picture of experimentation across the organisation
How Companies Evolve Between Models
Most organisations follow a predictable path. They don't jump straight to a decentralized model. They earn their way there through progressive stages of maturity, and each stage has a natural team structure.
Centralized
1 to 25/year
A small centralized team is formed. Focus is on proving value, establishing processes, and building credibility with leadership. One person may play multiple roles.
CoE forming
25 to 60/year
The central team starts training others, creating documentation, and building self-service tools. Product teams begin participating. A champion network forms.
Full CoE
60 to 600/year
The CoE governs standards while multiple product teams run their own tests. Velocity scales to 5 to 10 tests per month per team. Central team focuses on platform, training, and governance.
Decentralized
600 to 10,000+/year
Experimentation is embedded in every team's workflow. The central function shifts to pure platform and infrastructure. Experimentation is simply how work gets done.
When to transition
Centralisation typically breaks down at around 200 to 500 experiments per year. The triggers for transition are: the central team is a bottleneck, product teams are demanding testing autonomy, leadership is committed to scaling experimentation, and you have enough experimentation-capable people across the organisation to distribute ownership.
Key Roles in Experimentation Teams
At the earliest stages, one person may fill several of these roles. As you scale, they become distinct positions. Here are the core functions you will need.
Experimentation Program Manager
Owns the overall program strategy, KPIs, and OKRs. Sets governance standards, manages stakeholder communication, synthesises cross-team learnings, and evangelises experimentation culture.
Data Scientist
Designs experiments with proper statistical methodology. Analyses results rigorously, guards against common pitfalls like peeking and underpowered tests, and advances the team's methodology.
Experimentation Engineer
Implements tests in code through feature flags and A/B test configurations. Builds and maintains the experimentation platform. Handles QA, tracking, and experiment conflict management.
UX Researcher
Conducts qualitative research to generate hypotheses. Creates mockups for test variations. Ensures experiments align with user experience principles and defines success criteria.
Data Analyst
Analyses experiment data and presents findings in actionable form. Creates reports and dashboards for stakeholders. Monitors ongoing experiments and flags anomalies.
Experimentation Ambassador
A non-dedicated role: someone within a product team who advocates for experimentation. Connects their team to the CoE for best practices and identifies pain points. Essential for the CoE model.
Common Pitfalls to Avoid
Regardless of which model you choose, these are the mistakes that sink experimentation programs most often.
Treating experiments as validation
The number one mistake. If teams only run tests to confirm decisions already made, the program will stall. Experimentation should be about learning, not proving you were right.
Running too few experiments
Programs die from inactivity. If you are only running a handful of tests per quarter, you will never build enough momentum to demonstrate value or learn at a meaningful rate.
Overcomplicating the process
If running an experiment requires 15 approval steps and three weeks of setup, teams will avoid it. Make the process as frictionless as possible. You will learn more from ten imperfect tests than one perfect test that never launches.
Lack of leadership buy-in
Without executive support, experimentation becomes a box-ticking exercise. Leaders need to set the example by consulting experiment data before making decisions.
Not trusting the data
If nobody trusts the data, nobody will trust the experiments. Data quality and instrumentation must be rock solid before you can expect teams to act on results.
CoE falling back to centralized
The most common CoE failure. Distributed teams start relying on central experts instead of running tests themselves. You end up with a centralized team that just has a fancier name.
Which Model Should You Choose?
The answer depends on where you are today, not where you want to be in three years. Here are the signals to look for.
Go centralized
You are just getting started with experimentation. You need to prove value to leadership before investing more. You have limited budget and can only afford one small team. You are running fewer than 100 experiments per year.
Build a Center of Excellence
Your centralized team has become a bottleneck. You have leadership buy-in for broader investment. Multiple product teams are requesting experimentation capability. You are running 200 to 500+ experiments per year and need to scale.
Go decentralized or embedded
Your organisation already operates autonomously across other functions. You have strong data literacy everywhere. Teams have a mature understanding of experimentation. You are targeting 1,000+ experiments per year and need maximum velocity.
Use hub-and-spoke
You have multiple distinct business units or product lines. You need domain-specific experimentation with central governance. You want spokes to scale independently without scaling the hub every time.
Velocity Benchmarks by Maturity Level
| Maturity | Team Size | Experiments/Month | Experiments/Year | Typical Model |
|---|---|---|---|---|
| Crawl | 1 to 3 | 1 to 2 | ~12 to 25 | Centralized |
| Walk | 3 to 5 | 2 to 5 | ~25 to 60 | Centralized |
| Run | 5 to 10+ | 5 to 15 | ~60 to 180 | CoE / Hub-and-Spoke |
| Fly | 10+ across teams | 50 to 200+ | ~600 to 2,500+ | Decentralized / CoE |
| Elite | Hundreds of teams | 1,000s | 10,000s+ | Decentralized + platform |
Help others build better experimentation teams
If you found this article valuable, share it with your team and help spread better A/B testing practices.
Related Resources
Analyze your experiment results.
Plan experiments with proper power analysis.
Determine your minimum detectable effect.
Protect experiments from hidden harm.
Learn when a result actually matters.
Structured learning paths for experimentation.
Frequently Asked Questions
Build Your Experimentation Practice
Use our tools to plan, run, and analyse experiments with statistical rigour.