Compare multiple AI agent configurations to find the best performer by splitting lead traffic across variants with measurable results.
A/B testing in Naturalead lets you run controlled experiments across your AI agent configurations. Instead of guessing which system prompt, conversation stages, or qualification criteria work best, you can split incoming lead traffic between two or more agent variants and measure which one qualifies more leads, generates better conversations, or achieves higher reply rates.This guide walks you through creating an A/B test, running it, and interpreting the results.
When an A/B test is running, Naturalead automatically routes new conversations to different agent configurations based on the weights you assign. Each variant gets a proportional share of traffic, and all conversations are tracked independently so you can compare performance metrics side by side.
A/B tests operate at the conversation level. When a new conversation starts with a lead, the system selects a variant based on the configured weights and uses that variant’s agent configuration for the entire conversation.
At least two agent configurations with different system prompts, stages, or qualification criteria. Create them via the Bot page or the Agent Config API.
Active Lead Flow
Incoming leads or an active campaign so the test variants receive conversations to compare. A/B tests need sufficient sample size to produce meaningful results.
A/B tests require a meaningful volume of conversations to produce statistically significant results. Running a test with fewer than 50 conversations per variant may lead to unreliable conclusions.
First, set up the agent configurations you want to compare. Each variant should differ in one or two dimensions so you can isolate what drives performance.
3
For example, you might compare a concise system prompt against a detailed one:
4
curl
# Variant A: Concise promptcurl -X POST "${API_URL}/api/agent-config" \ -H "X-API-Key: ${API_KEY}" \ -H "Content-Type: application/json" \ -d '{ "name": "Sales Agent - Concise", "systemPrompt": "You are a sales assistant. Qualify leads quickly by asking about budget, timeline, and decision authority.", "goal": "Qualify leads efficiently", "stages": [ { "name": "intro", "description": "Brief greeting" }, { "name": "qualify", "description": "Ask key qualification questions" }, { "name": "close", "description": "Summarize and next steps" } ] }'# Variant B: Consultative promptcurl -X POST "${API_URL}/api/agent-config" \ -H "X-API-Key: ${API_KEY}" \ -H "Content-Type: application/json" \ -d '{ "name": "Sales Agent - Consultative", "systemPrompt": "You are a consultative sales advisor. Build rapport, understand pain points deeply, and guide leads toward a solution that fits their needs.", "goal": "Build trust and qualify leads through consultative conversation", "stages": [ { "name": "rapport", "description": "Build rapport and understand context" }, { "name": "discovery", "description": "Deep dive into pain points and needs" }, { "name": "solution", "description": "Present relevant solutions" }, { "name": "qualify", "description": "Assess fit and next steps" } ] }'
Python
import requestsAPI_URL = "http://localhost:3001"headers = { "X-API-Key": API_KEY, "Content-Type": "application/json",}# Variant A: Concise promptvariant_a = requests.post( f"{API_URL}/api/agent-config", headers=headers, json={ "name": "Sales Agent - Concise", "systemPrompt": "You are a sales assistant. Qualify leads quickly by asking about budget, timeline, and decision authority.", "goal": "Qualify leads efficiently", "stages": [ {"name": "intro", "description": "Brief greeting"}, {"name": "qualify", "description": "Ask key qualification questions"}, {"name": "close", "description": "Summarize and next steps"}, ], },).json()# Variant B: Consultative promptvariant_b = requests.post( f"{API_URL}/api/agent-config", headers=headers, json={ "name": "Sales Agent - Consultative", "systemPrompt": "You are a consultative sales advisor. Build rapport, understand pain points deeply, and guide leads toward a solution that fits their needs.", "goal": "Build trust and qualify leads through consultative conversation", "stages": [ {"name": "rapport", "description": "Build rapport and understand context"}, {"name": "discovery", "description": "Deep dive into pain points and needs"}, {"name": "solution", "description": "Present relevant solutions"}, {"name": "qualify", "description": "Assess fit and next steps"}, ], },).json()variant_a_id = variant_a["_id"]variant_b_id = variant_b["_id"]
Node.js
const headers = { "X-API-Key": API_KEY, "Content-Type": "application/json",};// Variant A: Concise promptconst variantAResponse = await fetch(`${API_URL}/api/agent-config`, { method: "POST", headers, body: JSON.stringify({ name: "Sales Agent - Concise", systemPrompt: "You are a sales assistant. Qualify leads quickly by asking about budget, timeline, and decision authority.", goal: "Qualify leads efficiently", stages: [ { name: "intro", description: "Brief greeting" }, { name: "qualify", description: "Ask key qualification questions" }, { name: "close", description: "Summarize and next steps" }, ], }),});const variantA = await variantAResponse.json();// Variant B: Consultative promptconst variantBResponse = await fetch(`${API_URL}/api/agent-config`, { method: "POST", headers, body: JSON.stringify({ name: "Sales Agent - Consultative", systemPrompt: "You are a consultative sales advisor. Build rapport, understand pain points deeply, and guide leads toward a solution that fits their needs.", goal: "Build trust and qualify leads through consultative conversation", stages: [ { name: "rapport", description: "Build rapport and understand context" }, { name: "discovery", description: "Deep dive into pain points and needs" }, { name: "solution", description: "Present relevant solutions" }, { name: "qualify", description: "Assess fit and next steps" }, ], }),});const variantB = await variantBResponse.json();const variantAId = variantA._id;const variantBId = variantB._id;
5
Create the A/B test
6
Create a test with your two variants. Assign weights to control traffic distribution. The test starts in draft status.
7
curl
curl -X POST "${API_URL}/api/ab-tests" \ -H "X-API-Key: ${API_KEY}" \ -H "Content-Type: application/json" \ -d '{ "name": "Concise vs Consultative - Q2 2026", "description": "Testing whether a concise direct approach or a consultative approach yields higher qualification rates.", "variants": [ { "agentConfigId": "'"${VARIANT_A_ID}"'", "weight": 50 }, { "agentConfigId": "'"${VARIANT_B_ID}"'", "weight": 50 } ] }'
Python
test = requests.post( f"{API_URL}/api/ab-tests", headers=headers, json={ "name": "Concise vs Consultative - Q2 2026", "description": "Testing whether a concise direct approach or a consultative approach yields higher qualification rates.", "variants": [ {"agentConfigId": variant_a_id, "weight": 50}, {"agentConfigId": variant_b_id, "weight": 50}, ], },).json()test_id = test["_id"]print(f"Created A/B test: {test_id}")print(f"Status: {test['status']}") # "draft"
Weights do not need to add up to 100. They represent relative proportions. A test with weights 50/50 splits traffic evenly, while 70/30 sends 70% of conversations to the first variant.
9
Start the test
10
Transition the test from draft to running to begin routing conversations to variants.
Check the test results to see how each variant is performing. The results endpoint provides per-variant metrics including qualification rates, average messages, and conversation duration.
# Pause the testrequests.patch( f"{API_URL}/api/ab-tests/{test_id}/status", headers=headers, json={"status": "paused"},)# Resume laterrequests.patch( f"{API_URL}/api/ab-tests/{test_id}/status", headers=headers, json={"status": "running"},)# Complete the test when you have enough datarequests.patch( f"{API_URL}/api/ab-tests/{test_id}/status", headers=headers, json={"status": "completed"},)
Node.js
// Pause the testawait fetch(`${API_URL}/api/ab-tests/${testId}/status`, { method: "PATCH", headers, body: JSON.stringify({ status: "paused" }),});// Resume laterawait fetch(`${API_URL}/api/ab-tests/${testId}/status`, { method: "PATCH", headers, body: JSON.stringify({ status: "running" }),});// Complete the test when you have enough dataawait fetch(`${API_URL}/api/ab-tests/${testId}/status`, { method: "PATCH", headers, body: JSON.stringify({ status: "completed" }),});
Initial state. Variants can be modified. Not routing traffic.
running
Actively routing conversations to variants based on weights.
paused
Temporarily halted. Can be resumed or completed.
completed
Test is finished. Results are final. Can be deleted.
Valid status transitions:
From
To
draft
running
running
paused
running
completed
paused
running
paused
completed
Variants can only be modified while the test is in draft status. Once a test has started running, changing variants would invalidate the results. If you need different variants, create a new test.
Modify only one dimension per test (e.g., system prompt tone, number of stages, or qualification criteria). Changing multiple variables makes it impossible to attribute performance differences.
Use equal weights
Start with 50/50 splits for the clearest comparison. Use unequal weights only when you want to limit exposure to an experimental variant.
Wait for significance
Let the test run until you have at least 50-100 conversations per variant. The results endpoint includes a confidence level indicator to help you decide when to conclude.
Apply the winner
Once you identify the best-performing variant, update your campaigns and default agent configuration to use the winning agent config. Then set up a new test to continue optimizing.