Build a Knowledge Base - Naturalead API

Knowledge bases give your AI agents the context they need to have informed conversations with leads. Instead of relying solely on the system prompt, agents can retrieve relevant information from your knowledge base during each conversation turn using RAG (Retrieval-Augmented Generation). This guide covers every way to build and maintain a knowledge base: manual document uploads, automated website crawling, and webhook sources for live-updating content.

Understanding knowledge base types

Naturalead supports two types of knowledge bases:

Internal

For proprietary content such as product documentation, pricing sheets, FAQs, and sales playbooks. Content is uploaded manually or via webhook sources.

External

For publicly available content such as website pages, blog posts, and help center articles. Content is typically ingested via sitemap crawling.

The type is informational and does not affect how the AI retrieves content. Choose the type that best describes the source of your content so your team can manage knowledge bases effectively.

Prerequisites

Before starting, make sure you have:

A Naturalead account with an API key or dashboard access
The knowledge:upload permission (granted to owner, integrator, and ai_architect roles)
Content to ingest (text documents, a website URL, or an external API endpoint)

Step 1: Create a knowledge base

Create the knowledge base

Every document, crawl job, and webhook source belongs to a knowledge base. Start by creating one.

curl

curl -X POST https://your-api.naturalead.com/api/knowledge-bases \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Documentation",
    "description": "Official product docs and FAQs",
    "type": "internal"
  }'

Python

import requests

response = requests.post(
    "https://your-api.naturalead.com/api/knowledge-bases",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "name": "Product Documentation",
        "description": "Official product docs and FAQs",
        "type": "internal",
    },
)

kb = response.json()
print(f"Created KB: {kb['_id']}")

Node.js

const response = await fetch(
  "https://your-api.naturalead.com/api/knowledge-bases",
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      name: "Product Documentation",
      description: "Official product docs and FAQs",
      type: "internal",
    }),
  }
);

const kb = await response.json();
console.log(`Created KB: ${kb._id}`);

Save the returned _id — you will need it for all subsequent operations.

Step 2: Upload documents manually

For content you already have as text (FAQs, product specs, sales scripts), upload documents directly.

Upload a document

Each document is chunked and synced to the vector store so the AI can retrieve relevant sections during conversations.

curl

curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/documents \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Pricing FAQ",
    "content": "Q: What plans do you offer?\nA: We offer Starter ($29/mo), Pro ($99/mo), and Enterprise (custom pricing).\n\nQ: Is there a free trial?\nA: Yes, all plans include a 14-day free trial with no credit card required.",
    "sourceType": "text"
  }'

Python

response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/documents",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "title": "Pricing FAQ",
        "content": "Q: What plans do you offer?\nA: We offer Starter ($29/mo), Pro ($99/mo), and Enterprise (custom pricing).\n\nQ: Is there a free trial?\nA: Yes, all plans include a 14-day free trial with no credit card required.",
        "sourceType": "text",
    },
)

doc = response.json()
print(f"Uploaded document: {doc['_id']}")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/documents`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      title: "Pricing FAQ",
      content:
        "Q: What plans do you offer?\nA: We offer Starter ($29/mo), Pro ($99/mo), and Enterprise (custom pricing).\n\nQ: Is there a free trial?\nA: Yes, all plans include a 14-day free trial with no credit card required.",
      sourceType: "text",
    }),
  }
);

const doc = await response.json();
console.log(`Uploaded document: ${doc._id}`);

Verify your documents

List all documents in the knowledge base to confirm the upload.

curl

curl https://your-api.naturalead.com/api/knowledge-bases/KB_ID/documents \
  -H "X-API-Key: nl_live_your_key"

Python

response = requests.get(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/documents",
    headers={"X-API-Key": "nl_live_your_key"},
)

documents = response.json()
print(f"Total documents: {len(documents)}")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/documents`,
  {
    headers: { "X-API-Key": "nl_live_your_key" },
  }
);

const documents = await response.json();
console.log(`Total documents: ${documents.length}`);

Documents are automatically chunked and indexed in the vector store. There is no separate indexing step required.

Step 3: Crawl a website via sitemap

For external knowledge bases, you can automatically ingest content from a website by crawling its sitemap.

Scan the sitemap first

Before starting a crawl, scan the sitemap to see which URLs are available. This lets you review the pages before committing to a full crawl.

curl

curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/sitemap-scan \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{ "sitemapUrl": "https://example.com/sitemap.xml" }'

Python

response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/sitemap-scan",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={"sitemapUrl": "https://example.com/sitemap.xml"},
)

result = response.json()
print(f"Found {result['total']} URLs")
for url in result["urls"][:5]:
    print(f"  - {url}")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/sitemap-scan`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      sitemapUrl: "https://example.com/sitemap.xml",
    }),
  }
);

const result = await response.json();
console.log(`Found ${result.total} URLs`);

Start the crawl

Launch a crawl job to fetch and ingest the pages. You can control the scope with maxPages and optionally enable LLM distillation to extract clean content from HTML.

curl

curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/crawl-sitemap \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "sitemapUrl": "https://example.com/sitemap.xml",
    "maxPages": 200,
    "useLlmDistill": true
  }'

Python

response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/crawl-sitemap",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "sitemapUrl": "https://example.com/sitemap.xml",
        "maxPages": 200,
        "useLlmDistill": True,
    },
)

job = response.json()
print(f"Crawl job started: {job['jobId']} (status: {job['status']})")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/crawl-sitemap`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      sitemapUrl: "https://example.com/sitemap.xml",
      maxPages: 200,
      useLlmDistill: true,
    }),
  }
);

const job = await response.json();
console.log(`Crawl job started: ${job.jobId} (status: ${job.status})`);

You can also pass a urls array instead of sitemapUrl to crawl specific pages without needing a sitemap.

Monitor crawl progress

Check the crawl job status to track how many pages have been processed.

curl

curl https://your-api.naturalead.com/api/knowledge-bases/KB_ID/crawl-jobs/JOB_ID \
  -H "X-API-Key: nl_live_your_key"

Python

response = requests.get(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/crawl-jobs/{job_id}",
    headers={"X-API-Key": "nl_live_your_key"},
)

job = response.json()
print(f"Status: {job['status']}")
print(f"Pages processed: {job.get('pagesProcessed', 0)}")
print(f"Pages failed: {job.get('pagesFailed', 0)}")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/crawl-jobs/${jobId}`,
  {
    headers: { "X-API-Key": "nl_live_your_key" },
  }
);

const job = await response.json();
console.log(`Status: ${job.status}`);
console.log(`Pages processed: ${job.pagesProcessed ?? 0}`);
console.log(`Pages failed: ${job.pagesFailed ?? 0}`);

Step 4: Manage crawl jobs

You can pause, resume, or cancel crawl jobs at any time. Pages already processed remain in the knowledge base regardless of the action taken.

Pause

Temporarily stop a running crawl. Useful if you need to reduce API load or review intermediate results.

Resume

Continue a paused crawl from where it left off. No pages are re-processed.

Cancel

Permanently stop a crawl. The job cannot be resumed after cancellation.

curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/crawl-jobs/JOB_ID/pause \
  -H "X-API-Key: nl_live_your_key"

Step 5: Set up webhook sources for auto-updating content

Webhook sources let you automatically pull content from external APIs on a schedule. This is useful for keeping your knowledge base in sync with content that changes frequently, such as a CMS, helpdesk, or product catalog.

Create a webhook source

A webhook source can be a simple URL fetch or a multi-step flow. At minimum, provide a name and either a url or steps array.

curl (simple URL)

curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/webhooks \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Catalog Sync",
    "url": "https://api.example.com/products/export",
    "method": "GET",
    "headers": {
      "Authorization": "Bearer ext_api_token"
    },
    "intervalMinutes": 60,
    "timeoutMs": 30000
  }'

Python

response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/webhooks",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "name": "Product Catalog Sync",
        "url": "https://api.example.com/products/export",
        "method": "GET",
        "headers": {"Authorization": "Bearer ext_api_token"},
        "intervalMinutes": 60,
        "timeoutMs": 30000,
    },
)

webhook = response.json()
print(f"Webhook source created: {webhook['_id']}")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/webhooks`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      name: "Product Catalog Sync",
      url: "https://api.example.com/products/export",
      method: "GET",
      headers: { Authorization: "Bearer ext_api_token" },
      intervalMinutes: 60,
      timeoutMs: 30000,
    }),
  }
);

const webhook = await response.json();
console.log(`Webhook source created: ${webhook._id}`);

Enrich content with context variables

When your webhook uses a multi-step flow, you can extract variables from one step and include them as context in subsequent steps. This is useful when the first step returns metadata (like a category name or project title) that should be stored alongside the detailed content fetched in later steps.

Mark an extraction as context by setting useAsContext: true. The variable’s value will be saved alongside the fetched content, helping your AI agent understand the full picture during conversations.

Example: Step 1 fetches a list of product categories and extracts categoryName. Step 2 fetches details for each product. With context enabled, each product’s knowledge entry will include the category name — even if the product API doesn’t return it.

{
  "name": "fetch_categories",
  "order": 0,
  "url": "https://api.example.com/categories",
  "method": "GET",
  "extractions": [
    {
      "variableName": "categoryName",
      "jsonPath": "$.name",
      "useAsContext": true
    }
  ],
  "fanOut": {
    "sourcePath": "$.items",
    "itemVariable": "product"
  }
}

Context variables accumulate across steps. If step 1 extracts category and step 2 extracts region, step 3 will have both values available in its knowledge entry.

Trigger a webhook manually

Test your webhook source by triggering it immediately, without waiting for the scheduled interval.

curl

curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/webhooks/WH_ID/trigger \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{}'

Python

response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/webhooks/{wh_id}/trigger",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={},
)

print(f"Triggered: {response.json()['_id']}")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/webhooks/${whId}/trigger`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({}),
  }
);

const result = await response.json();
console.log(`Triggered: ${result._id}`);

Toggle a webhook source on or off

Disable a webhook source to stop scheduled fetches without deleting its configuration.

curl

curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/webhooks/WH_ID/toggle \
  -H "X-API-Key: nl_live_your_key"

Python

response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/webhooks/{wh_id}/toggle",
    headers={"X-API-Key": "nl_live_your_key"},
)

webhook = response.json()
print(f"Webhook enabled: {webhook['enabled']}")

Node.js

const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/webhooks/${whId}/toggle`,
  {
    method: "POST",
    headers: { "X-API-Key": "nl_live_your_key" },
  }
);

const webhook = await response.json();
console.log(`Webhook enabled: ${webhook.enabled}`);

Webhook sources store authentication credentials (headers, auth config) in the database. Use dedicated service accounts with minimal permissions for external API access, and rotate credentials regularly.

Step 6: Attach a knowledge base to an agent

Once your knowledge base has content, attach it to an AI agent configuration so conversations can use it for RAG retrieval.

curl -X PUT https://your-api.naturalead.com/api/agent-config \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "knowledgeBaseId": "KB_ID"
  }'

When a conversation is active, the AI agent queries the attached knowledge base on each inbound message to retrieve relevant context before generating a response.

Cleanup and maintenance

Delete a document

Remove a single document from a knowledge base. The document is also removed from the vector store.

curl -X DELETE https://your-api.naturalead.com/api/knowledge-bases/KB_ID/documents/DOC_ID \
  -H "X-API-Key: nl_live_your_key"

Delete a knowledge base

Deleting a knowledge base removes all associated documents, crawl jobs, and webhook sources.

This action is irreversible. All documents, crawl jobs, and webhook source configurations within the knowledge base will be permanently deleted.

curl -X DELETE https://your-api.naturalead.com/api/knowledge-bases/KB_ID \
  -H "X-API-Key: nl_live_your_key"

Summary

Method	Best for	Update frequency
Manual upload	Small, static content (FAQs, scripts)	As needed
Sitemap crawl	Website content, help centers, blogs	One-time or periodic re-crawl
Webhook source	Dynamic content from APIs, CMS, databases	Scheduled (configurable interval)

Combine all three methods within a single knowledge base to give your AI agent the most comprehensive context for lead qualification conversations.

Documentation Index

​Understanding knowledge base types

Internal

External

​Prerequisites

​Step 1: Create a knowledge base

​Step 2: Upload documents manually

​Step 3: Crawl a website via sitemap

​Step 4: Manage crawl jobs

Pause

Resume

Cancel

​Step 5: Set up webhook sources for auto-updating content

​Step 6: Attach a knowledge base to an agent

​Cleanup and maintenance

​Delete a document

​Delete a knowledge base

​Summary

Understanding knowledge base types

Prerequisites

Step 1: Create a knowledge base

Step 2: Upload documents manually

Step 3: Crawl a website via sitemap

Step 4: Manage crawl jobs

Step 5: Set up webhook sources for auto-updating content

Step 6: Attach a knowledge base to an agent

Cleanup and maintenance

Delete a document

Delete a knowledge base

Summary