Loading...
Loading...
The Model Context Protocol (MCP) is an open standard that lets AI assistants use external tools. Instead of copy-pasting data into a chat, your AI can call ABTestResult's statistical tools directly and give you instant, accurate analysis.
“Is my test significant?” — your AI calls the right tool with the right parameters.
All calculations run on your machine. No data leaves your computer. No API keys needed.
Identical statistical methods to our web calculators. Trusted by thousands of experimenters.
Just describe what you need — your AI handles the rest.
Get instant significance testing with p-values, confidence intervals, and lift calculations.
“My A/B test had 5,000 users per group. Control converted at 5%, variant at 6%. Is it significant?”
Calculate how many users you need before launching a test.
“How many users do I need to detect a 10% lift on a 3% conversion rate with 80% power?”
Get probability to be best, expected loss, and credible intervals.
“Run a Bayesian analysis: control 10K users / 500 conversions, variant 10K / 580 conversions”
Find the smallest effect you can detect with your available traffic.
“What's the smallest effect I can detect with 50,000 total users and a 5% baseline?”
Detect Sample Ratio Mismatch before it invalidates your results.
“Check if my traffic split is valid: control 10,234 users, variant 9,766 users”
Analyze before/after or matched-pair data with automatic test selection.
“Run a paired test on these before [10, 12, 8, 15] and after [12, 14, 9, 18] measurements”
Add ABTestResult to your AI assistant in under a minute.
{ "mcpServers": { "abtestresult": { "command": "npx", "args": ["-y", "abtestresult-mcp"] } }}That's it. No API keys, no authentication, no server to run. The MCP server starts automatically when your AI needs it. Just ask a question:
My test ran for 2 weeks. Control: 12,000 visitors with 480 conversions. Variant: 12,000 visitors with 552 conversions. Is this significant?
8 statistical tools your AI assistant can call. Same calculations as our web tools.
analyze_ab_testAnalyze an A/B test with rate/proportion metrics (conversion rate, CTR). Uses a Z-test for proportions.
control_users(number)— Users in control groupcontrol_conversions(number)— Conversions in controlvariant_users(number)— Users in variant groupvariant_conversions(number)— Conversions in variantconfidence_level(number, optional)— Default: 0.95test_type(string, optional)— "two-sided" or "one-sided". Default: "two-sided"// Your AI will return something like:{ "significant": true, "p_value": 0.028295, "control_rate": "5%", "variant_rate": "6%", "relative_lift": "20%", "confidence_interval_diff": { "lower": "0.11%", "upper": "1.89%" }}analyze_ab_test_averageAnalyze an A/B test with continuous metrics (revenue per user, time on page). Uses a pooled T-test.
control_users(number)— Users in controlcontrol_mean(number)— Mean value for controlcontrol_std_dev(number)— Standard deviation for controlvariant_users(number)— Users in variantvariant_mean(number)— Mean value for variantvariant_std_dev(number)— Standard deviation for variantconfidence_level(number, optional)— Default: 0.95// Your AI will return something like:{ "significant": true, "p_value": 0.000142, "control_mean": 45.2, "variant_mean": 48.8, "absolute_lift": 3.6, "relative_lift": "7.96%", "confidence_interval_diff": { "lower": 1.78, "upper": 5.42 }}calculate_sample_sizeCalculate the required sample size for an A/B test. Supports rate and average metrics with Sidak correction.
metric_type(string)— "rate" or "average"baseline(number)— Baseline value (e.g. 5 for 5% conversion rate)mde(number)— Min detectable effect as relative % (e.g. 10 for 10%)confidence_level(number, optional)— Default: 0.95power(number, optional)— Default: 0.80daily_traffic(number, optional)— To estimate duration in days// Your AI will return something like:{ "samples_per_group": 14748, "total_samples": 29496, "estimated_days": 6, "effect_size": 0.0165, "absolute_mde": 0.005}calculate_mdeCalculate the minimum detectable effect given a fixed sample size. Answers: 'What's the smallest lift I can detect?'
total_traffic(number)— Total traffic across all groupsbaseline(number)— Baseline rate (%) or meanmetric_type(string, optional)— Default: "rate"confidence_level(number, optional)— Default: 0.95power(number, optional)— Default: 0.80// Your AI will return something like:{ "mde_absolute": 0.004123, "mde_relative_percent": "8.25%", "interpretation": "You can detect an absolute change of 0.41% (8.25% relative lift) from the 5% baseline."}bayesian_ab_testBayesian A/B test analysis. Returns probability to be best, expected loss, and credible intervals. Supports multiple variants.
metric_type(string)— "rate" or "average"variants(array)— Array of {name, users, conversions} or {name, users, mean, std_dev}simulations(number, optional)— Monte Carlo samples. Default: 100,000credibility(number, optional)— Credible interval level. Default: 0.95// Your AI will return something like:{ "variants": [ { "name": "Control", "probability_to_be_best": "2.3%", "expected_loss": 0.001 }, { "name": "Variant B", "probability_to_be_best": "97.7%", "expected_loss": 0.00008 } ], "relative_lift_distribution": { "mean": "20.1%", "credible_interval": { "lower": "5.8%", "upper": "35.2%" } }}check_srmCheck for Sample Ratio Mismatch — detects bugs in experiment traffic splitting.
control_users(number)— Users in controlvariant_users(number)— Users in variant// Your AI will return something like:{ "has_mismatch": true, "p_value": 0.000038, "summary": "SRM DETECTED — The traffic split is 51.8%/48.2% instead of 50/50. DO NOT trust the results."}paired_testBefore/after paired analysis. Auto-selects Paired T-Test or Wilcoxon Signed-Rank based on data normality.
sample_before(number[])— Measurements before treatmentsample_after(number[])— Measurements after treatment (same length)confidence_level(number, optional)— Default: 0.95// Your AI will return something like:{ "test_used": "Paired T-Test", "significant": true, "p_value": 0.003, "mean_before": 11.2, "mean_after": 13.4, "mean_difference": 2.2, "relative_difference": "19.6%"}survey_sample_sizeCalculate required sample size for surveys using Cochran's formula with finite population correction.
population(number)— Total population sizemargin_of_error(number)— Margin of error as % (e.g. 5 for ±5%)confidence_level(number, optional)— Default: 0.95// Your AI will return something like:{ "required_sample_size": 370, "population": 10000, "margin_of_error": "±5%", "response_rate_needed": "3.7%"}The same statistical methods available in our web tools.
Use these calculators directly in your browser — no setup required.