Skip to main content
A/B testing in Naturalead lets you run controlled experiments across your AI agent configurations. Instead of guessing which system prompt, conversation stages, or qualification criteria work best, you can split incoming lead traffic between two or more agent variants and measure which one qualifies more leads, generates better conversations, or achieves higher reply rates. This guide walks you through creating an A/B test, running it, and interpreting the results.

How A/B testing works

When an A/B test is running, Naturalead automatically routes new conversations to different agent configurations based on the weights you assign. Each variant gets a proportional share of traffic, and all conversations are tracked independently so you can compare performance metrics side by side.
A/B tests operate at the conversation level. When a new conversation starts with a lead, the system selects a variant based on the configured weights and uses that variant’s agent configuration for the entire conversation.

Prerequisites

Before setting up an A/B test, you need:

Multiple Agent Configs

At least two agent configurations with different system prompts, stages, or qualification criteria. Create them via the Bot page or the Agent Config API.

Active Lead Flow

Incoming leads or an active campaign so the test variants receive conversations to compare. A/B tests need sufficient sample size to produce meaningful results.
A/B tests require a meaningful volume of conversations to produce statistically significant results. Running a test with fewer than 50 conversations per variant may lead to unreliable conclusions.

Step-by-step walkthrough

1
Create agent configuration variants
2
First, set up the agent configurations you want to compare. Each variant should differ in one or two dimensions so you can isolate what drives performance.
3
For example, you might compare a concise system prompt against a detailed one:
4
curl
# Variant A: Concise prompt
curl -X POST "${API_URL}/api/agent-config" \
  -H "X-API-Key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Sales Agent - Concise",
    "systemPrompt": "You are a sales assistant. Qualify leads quickly by asking about budget, timeline, and decision authority.",
    "goal": "Qualify leads efficiently",
    "stages": [
      { "name": "intro", "description": "Brief greeting" },
      { "name": "qualify", "description": "Ask key qualification questions" },
      { "name": "close", "description": "Summarize and next steps" }
    ]
  }'

# Variant B: Consultative prompt
curl -X POST "${API_URL}/api/agent-config" \
  -H "X-API-Key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Sales Agent - Consultative",
    "systemPrompt": "You are a consultative sales advisor. Build rapport, understand pain points deeply, and guide leads toward a solution that fits their needs.",
    "goal": "Build trust and qualify leads through consultative conversation",
    "stages": [
      { "name": "rapport", "description": "Build rapport and understand context" },
      { "name": "discovery", "description": "Deep dive into pain points and needs" },
      { "name": "solution", "description": "Present relevant solutions" },
      { "name": "qualify", "description": "Assess fit and next steps" }
    ]
  }'
Python
import requests

API_URL = "http://localhost:3001"

headers = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json",
}

# Variant A: Concise prompt
variant_a = requests.post(
    f"{API_URL}/api/agent-config",
    headers=headers,
    json={
        "name": "Sales Agent - Concise",
        "systemPrompt": "You are a sales assistant. Qualify leads quickly by asking about budget, timeline, and decision authority.",
        "goal": "Qualify leads efficiently",
        "stages": [
            {"name": "intro", "description": "Brief greeting"},
            {"name": "qualify", "description": "Ask key qualification questions"},
            {"name": "close", "description": "Summarize and next steps"},
        ],
    },
).json()

# Variant B: Consultative prompt
variant_b = requests.post(
    f"{API_URL}/api/agent-config",
    headers=headers,
    json={
        "name": "Sales Agent - Consultative",
        "systemPrompt": "You are a consultative sales advisor. Build rapport, understand pain points deeply, and guide leads toward a solution that fits their needs.",
        "goal": "Build trust and qualify leads through consultative conversation",
        "stages": [
            {"name": "rapport", "description": "Build rapport and understand context"},
            {"name": "discovery", "description": "Deep dive into pain points and needs"},
            {"name": "solution", "description": "Present relevant solutions"},
            {"name": "qualify", "description": "Assess fit and next steps"},
        ],
    },
).json()

variant_a_id = variant_a["_id"]
variant_b_id = variant_b["_id"]
Node.js
const headers = {
  "X-API-Key": API_KEY,
  "Content-Type": "application/json",
};

// Variant A: Concise prompt
const variantAResponse = await fetch(`${API_URL}/api/agent-config`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "Sales Agent - Concise",
    systemPrompt:
      "You are a sales assistant. Qualify leads quickly by asking about budget, timeline, and decision authority.",
    goal: "Qualify leads efficiently",
    stages: [
      { name: "intro", description: "Brief greeting" },
      { name: "qualify", description: "Ask key qualification questions" },
      { name: "close", description: "Summarize and next steps" },
    ],
  }),
});
const variantA = await variantAResponse.json();

// Variant B: Consultative prompt
const variantBResponse = await fetch(`${API_URL}/api/agent-config`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "Sales Agent - Consultative",
    systemPrompt:
      "You are a consultative sales advisor. Build rapport, understand pain points deeply, and guide leads toward a solution that fits their needs.",
    goal: "Build trust and qualify leads through consultative conversation",
    stages: [
      { name: "rapport", description: "Build rapport and understand context" },
      { name: "discovery", description: "Deep dive into pain points and needs" },
      { name: "solution", description: "Present relevant solutions" },
      { name: "qualify", description: "Assess fit and next steps" },
    ],
  }),
});
const variantB = await variantBResponse.json();

const variantAId = variantA._id;
const variantBId = variantB._id;
5
Create the A/B test
6
Create a test with your two variants. Assign weights to control traffic distribution. The test starts in draft status.
7
curl
curl -X POST "${API_URL}/api/ab-tests" \
  -H "X-API-Key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Concise vs Consultative - Q2 2026",
    "description": "Testing whether a concise direct approach or a consultative approach yields higher qualification rates.",
    "variants": [
      { "agentConfigId": "'"${VARIANT_A_ID}"'", "weight": 50 },
      { "agentConfigId": "'"${VARIANT_B_ID}"'", "weight": 50 }
    ]
  }'
Python
test = requests.post(
    f"{API_URL}/api/ab-tests",
    headers=headers,
    json={
        "name": "Concise vs Consultative - Q2 2026",
        "description": "Testing whether a concise direct approach or a consultative approach yields higher qualification rates.",
        "variants": [
            {"agentConfigId": variant_a_id, "weight": 50},
            {"agentConfigId": variant_b_id, "weight": 50},
        ],
    },
).json()

test_id = test["_id"]
print(f"Created A/B test: {test_id}")
print(f"Status: {test['status']}")  # "draft"
Node.js
const testResponse = await fetch(`${API_URL}/api/ab-tests`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "Concise vs Consultative - Q2 2026",
    description:
      "Testing whether a concise direct approach or a consultative approach yields higher qualification rates.",
    variants: [
      { agentConfigId: variantAId, weight: 50 },
      { agentConfigId: variantBId, weight: 50 },
    ],
  }),
});

const test = await testResponse.json();
const testId = test._id;
console.log(`Created A/B test: ${testId}`);
console.log(`Status: ${test.status}`); // "draft"
8
Weights do not need to add up to 100. They represent relative proportions. A test with weights 50/50 splits traffic evenly, while 70/30 sends 70% of conversations to the first variant.
9
Start the test
10
Transition the test from draft to running to begin routing conversations to variants.
11
curl
curl -X PATCH "${API_URL}/api/ab-tests/${TEST_ID}/status" \
  -H "X-API-Key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{ "status": "running" }'
Python
running_test = requests.patch(
    f"{API_URL}/api/ab-tests/{test_id}/status",
    headers=headers,
    json={"status": "running"},
).json()

print(f"Status: {running_test['status']}")  # "running"
Node.js
const startResponse = await fetch(
  `${API_URL}/api/ab-tests/${testId}/status`,
  {
    method: "PATCH",
    headers,
    body: JSON.stringify({ status: "running" }),
  }
);

const runningTest = await startResponse.json();
console.log(`Status: ${runningTest.status}`); // "running"
12
Monitor results
13
Check the test results to see how each variant is performing. The results endpoint provides per-variant metrics including qualification rates, average messages, and conversation duration.
14
curl
curl "${API_URL}/api/ab-tests/${TEST_ID}/results" \
  -H "X-API-Key: ${API_KEY}" \
  | jq '.variants[] | {agentConfigName, totalConversations, qualificationRate}'
Python
results = requests.get(
    f"{API_URL}/api/ab-tests/{test_id}/results",
    headers={"X-API-Key": API_KEY},
).json()

print(f"Total conversations: {results['totalConversations']}")
print(f"Winner: {results.get('winner', 'Not yet determined')}")
print(f"Confidence: {results.get('confidenceLevel', 'N/A')}")

for variant in results["variants"]:
    print(f"\n--- {variant.get('agentConfigName', variant['agentConfigId'])} ---")
    print(f"  Conversations: {variant['totalConversations']}")
    print(f"  Qualified: {variant['qualified']}")
    print(f"  Qualification rate: {variant['qualificationRate']:.1f}%")
    print(f"  Avg messages: {variant['avgMessagesPerConversation']:.1f}")
Node.js
const resultsResponse = await fetch(
  `${API_URL}/api/ab-tests/${testId}/results`,
  {
    headers: { "X-API-Key": API_KEY },
  }
);

const results = await resultsResponse.json();
console.log(`Total conversations: ${results.totalConversations}`);
console.log(`Winner: ${results.winner ?? "Not yet determined"}`);
console.log(`Confidence: ${results.confidenceLevel ?? "N/A"}`);

for (const variant of results.variants) {
  const name = variant.agentConfigName ?? variant.agentConfigId;
  console.log(`\n--- ${name} ---`);
  console.log(`  Conversations: ${variant.totalConversations}`);
  console.log(`  Qualified: ${variant.qualified}`);
  console.log(`  Qualification rate: ${variant.qualificationRate.toFixed(1)}%`);
  console.log(`  Avg messages: ${variant.avgMessagesPerConversation.toFixed(1)}`);
}
15
Pause or complete the test
16
You can pause a running test to temporarily stop routing conversations to variants, or complete it when you have enough data.
17
curl
# Pause the test
curl -X PATCH "${API_URL}/api/ab-tests/${TEST_ID}/status" \
  -H "X-API-Key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{ "status": "paused" }'

# Complete the test
curl -X PATCH "${API_URL}/api/ab-tests/${TEST_ID}/status" \
  -H "X-API-Key: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{ "status": "completed" }'
Python
# Pause the test
requests.patch(
    f"{API_URL}/api/ab-tests/{test_id}/status",
    headers=headers,
    json={"status": "paused"},
)

# Resume later
requests.patch(
    f"{API_URL}/api/ab-tests/{test_id}/status",
    headers=headers,
    json={"status": "running"},
)

# Complete the test when you have enough data
requests.patch(
    f"{API_URL}/api/ab-tests/{test_id}/status",
    headers=headers,
    json={"status": "completed"},
)
Node.js
// Pause the test
await fetch(`${API_URL}/api/ab-tests/${testId}/status`, {
  method: "PATCH",
  headers,
  body: JSON.stringify({ status: "paused" }),
});

// Resume later
await fetch(`${API_URL}/api/ab-tests/${testId}/status`, {
  method: "PATCH",
  headers,
  body: JSON.stringify({ status: "running" }),
});

// Complete the test when you have enough data
await fetch(`${API_URL}/api/ab-tests/${testId}/status`, {
  method: "PATCH",
  headers,
  body: JSON.stringify({ status: "completed" }),
});

A/B test lifecycle

An A/B test moves through these statuses:
StatusDescription
draftInitial state. Variants can be modified. Not routing traffic.
runningActively routing conversations to variants based on weights.
pausedTemporarily halted. Can be resumed or completed.
completedTest is finished. Results are final. Can be deleted.
Valid status transitions:
FromTo
draftrunning
runningpaused
runningcompleted
pausedrunning
pausedcompleted
Variants can only be modified while the test is in draft status. Once a test has started running, changing variants would invalidate the results. If you need different variants, create a new test.

Tips for effective A/B testing

Change one variable

Modify only one dimension per test (e.g., system prompt tone, number of stages, or qualification criteria). Changing multiple variables makes it impossible to attribute performance differences.

Use equal weights

Start with 50/50 splits for the clearest comparison. Use unequal weights only when you want to limit exposure to an experimental variant.

Wait for significance

Let the test run until you have at least 50-100 conversations per variant. The results endpoint includes a confidence level indicator to help you decide when to conclude.

Apply the winner

Once you identify the best-performing variant, update your campaigns and default agent configuration to use the winning agent config. Then set up a new test to continue optimizing.