Skip to main content
Knowledge bases give your AI agents the context they need to have informed conversations with leads. Instead of relying solely on the system prompt, agents can retrieve relevant information from your knowledge base during each conversation turn using RAG (Retrieval-Augmented Generation). This guide covers every way to build and maintain a knowledge base: manual document uploads, automated website crawling, and webhook sources for live-updating content.

Understanding knowledge base types

Naturalead supports two types of knowledge bases:

Internal

For proprietary content such as product documentation, pricing sheets, FAQs, and sales playbooks. Content is uploaded manually or via webhook sources.

External

For publicly available content such as website pages, blog posts, and help center articles. Content is typically ingested via sitemap crawling.
The type is informational and does not affect how the AI retrieves content. Choose the type that best describes the source of your content so your team can manage knowledge bases effectively.

Prerequisites

Before starting, make sure you have:
  • A Naturalead account with an API key or dashboard access
  • The knowledge:upload permission (granted to owner, integrator, and ai_architect roles)
  • Content to ingest (text documents, a website URL, or an external API endpoint)

Step 1: Create a knowledge base

1
Create the knowledge base
2
Every document, crawl job, and webhook source belongs to a knowledge base. Start by creating one.
3
curl
curl -X POST https://your-api.naturalead.com/api/knowledge-bases \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Documentation",
    "description": "Official product docs and FAQs",
    "type": "internal"
  }'
Python
import requests

response = requests.post(
    "https://your-api.naturalead.com/api/knowledge-bases",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "name": "Product Documentation",
        "description": "Official product docs and FAQs",
        "type": "internal",
    },
)

kb = response.json()
print(f"Created KB: {kb['_id']}")
Node.js
const response = await fetch(
  "https://your-api.naturalead.com/api/knowledge-bases",
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      name: "Product Documentation",
      description: "Official product docs and FAQs",
      type: "internal",
    }),
  }
);

const kb = await response.json();
console.log(`Created KB: ${kb._id}`);
4
Save the returned _id — you will need it for all subsequent operations.

Step 2: Upload documents manually

For content you already have as text (FAQs, product specs, sales scripts), upload documents directly.
1
Upload a document
2
Each document is chunked and synced to the vector store so the AI can retrieve relevant sections during conversations.
3
curl
curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/documents \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Pricing FAQ",
    "content": "Q: What plans do you offer?\nA: We offer Starter ($29/mo), Pro ($99/mo), and Enterprise (custom pricing).\n\nQ: Is there a free trial?\nA: Yes, all plans include a 14-day free trial with no credit card required.",
    "sourceType": "text"
  }'
Python
response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/documents",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "title": "Pricing FAQ",
        "content": "Q: What plans do you offer?\nA: We offer Starter ($29/mo), Pro ($99/mo), and Enterprise (custom pricing).\n\nQ: Is there a free trial?\nA: Yes, all plans include a 14-day free trial with no credit card required.",
        "sourceType": "text",
    },
)

doc = response.json()
print(f"Uploaded document: {doc['_id']}")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/documents`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      title: "Pricing FAQ",
      content:
        "Q: What plans do you offer?\nA: We offer Starter ($29/mo), Pro ($99/mo), and Enterprise (custom pricing).\n\nQ: Is there a free trial?\nA: Yes, all plans include a 14-day free trial with no credit card required.",
      sourceType: "text",
    }),
  }
);

const doc = await response.json();
console.log(`Uploaded document: ${doc._id}`);
4
Verify your documents
5
List all documents in the knowledge base to confirm the upload.
6
curl
curl https://your-api.naturalead.com/api/knowledge-bases/KB_ID/documents \
  -H "X-API-Key: nl_live_your_key"
Python
response = requests.get(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/documents",
    headers={"X-API-Key": "nl_live_your_key"},
)

documents = response.json()
print(f"Total documents: {len(documents)}")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/documents`,
  {
    headers: { "X-API-Key": "nl_live_your_key" },
  }
);

const documents = await response.json();
console.log(`Total documents: ${documents.length}`);
Documents are automatically chunked and indexed in the vector store. There is no separate indexing step required.

Step 3: Crawl a website via sitemap

For external knowledge bases, you can automatically ingest content from a website by crawling its sitemap.
1
Scan the sitemap first
2
Before starting a crawl, scan the sitemap to see which URLs are available. This lets you review the pages before committing to a full crawl.
3
curl
curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/sitemap-scan \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{ "sitemapUrl": "https://example.com/sitemap.xml" }'
Python
response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/sitemap-scan",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={"sitemapUrl": "https://example.com/sitemap.xml"},
)

result = response.json()
print(f"Found {result['total']} URLs")
for url in result["urls"][:5]:
    print(f"  - {url}")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/sitemap-scan`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      sitemapUrl: "https://example.com/sitemap.xml",
    }),
  }
);

const result = await response.json();
console.log(`Found ${result.total} URLs`);
4
Start the crawl
5
Launch a crawl job to fetch and ingest the pages. You can control the scope with maxPages and optionally enable LLM distillation to extract clean content from HTML.
6
curl
curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/crawl-sitemap \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "sitemapUrl": "https://example.com/sitemap.xml",
    "maxPages": 200,
    "useLlmDistill": true
  }'
Python
response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/crawl-sitemap",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "sitemapUrl": "https://example.com/sitemap.xml",
        "maxPages": 200,
        "useLlmDistill": True,
    },
)

job = response.json()
print(f"Crawl job started: {job['jobId']} (status: {job['status']})")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/crawl-sitemap`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      sitemapUrl: "https://example.com/sitemap.xml",
      maxPages: 200,
      useLlmDistill: true,
    }),
  }
);

const job = await response.json();
console.log(`Crawl job started: ${job.jobId} (status: ${job.status})`);
7
You can also pass a urls array instead of sitemapUrl to crawl specific pages without needing a sitemap.
8
Monitor crawl progress
9
Check the crawl job status to track how many pages have been processed.
10
curl
curl https://your-api.naturalead.com/api/knowledge-bases/KB_ID/crawl-jobs/JOB_ID \
  -H "X-API-Key: nl_live_your_key"
Python
response = requests.get(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/crawl-jobs/{job_id}",
    headers={"X-API-Key": "nl_live_your_key"},
)

job = response.json()
print(f"Status: {job['status']}")
print(f"Pages processed: {job.get('pagesProcessed', 0)}")
print(f"Pages failed: {job.get('pagesFailed', 0)}")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/crawl-jobs/${jobId}`,
  {
    headers: { "X-API-Key": "nl_live_your_key" },
  }
);

const job = await response.json();
console.log(`Status: ${job.status}`);
console.log(`Pages processed: ${job.pagesProcessed ?? 0}`);
console.log(`Pages failed: ${job.pagesFailed ?? 0}`);

Step 4: Manage crawl jobs

You can pause, resume, or cancel crawl jobs at any time. Pages already processed remain in the knowledge base regardless of the action taken.

Pause

Temporarily stop a running crawl. Useful if you need to reduce API load or review intermediate results.

Resume

Continue a paused crawl from where it left off. No pages are re-processed.

Cancel

Permanently stop a crawl. The job cannot be resumed after cancellation.
curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/crawl-jobs/JOB_ID/pause \
  -H "X-API-Key: nl_live_your_key"

Step 5: Set up webhook sources for auto-updating content

Webhook sources let you automatically pull content from external APIs on a schedule. This is useful for keeping your knowledge base in sync with content that changes frequently, such as a CMS, helpdesk, or product catalog.
1
Create a webhook source
2
A webhook source can be a simple URL fetch or a multi-step flow. At minimum, provide a name and either a url or steps array.
3
curl (simple URL)
curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/webhooks \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Catalog Sync",
    "url": "https://api.example.com/products/export",
    "method": "GET",
    "headers": {
      "Authorization": "Bearer ext_api_token"
    },
    "intervalMinutes": 60,
    "timeoutMs": 30000
  }'
Python
response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/webhooks",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={
        "name": "Product Catalog Sync",
        "url": "https://api.example.com/products/export",
        "method": "GET",
        "headers": {"Authorization": "Bearer ext_api_token"},
        "intervalMinutes": 60,
        "timeoutMs": 30000,
    },
)

webhook = response.json()
print(f"Webhook source created: {webhook['_id']}")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/webhooks`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      name: "Product Catalog Sync",
      url: "https://api.example.com/products/export",
      method: "GET",
      headers: { Authorization: "Bearer ext_api_token" },
      intervalMinutes: 60,
      timeoutMs: 30000,
    }),
  }
);

const webhook = await response.json();
console.log(`Webhook source created: ${webhook._id}`);
4
Trigger a webhook manually
5
Test your webhook source by triggering it immediately, without waiting for the scheduled interval.
6
curl
curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/webhooks/WH_ID/trigger \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{}'
Python
response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/webhooks/{wh_id}/trigger",
    headers={
        "X-API-Key": "nl_live_your_key",
        "Content-Type": "application/json",
    },
    json={},
)

print(f"Triggered: {response.json()['_id']}")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/webhooks/${whId}/trigger`,
  {
    method: "POST",
    headers: {
      "X-API-Key": "nl_live_your_key",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({}),
  }
);

const result = await response.json();
console.log(`Triggered: ${result._id}`);
7
Toggle a webhook source on or off
8
Disable a webhook source to stop scheduled fetches without deleting its configuration.
9
curl
curl -X POST https://your-api.naturalead.com/api/knowledge-bases/KB_ID/webhooks/WH_ID/toggle \
  -H "X-API-Key: nl_live_your_key"
Python
response = requests.post(
    f"https://your-api.naturalead.com/api/knowledge-bases/{kb_id}/webhooks/{wh_id}/toggle",
    headers={"X-API-Key": "nl_live_your_key"},
)

webhook = response.json()
print(f"Webhook enabled: {webhook['enabled']}")
Node.js
const response = await fetch(
  `https://your-api.naturalead.com/api/knowledge-bases/${kbId}/webhooks/${whId}/toggle`,
  {
    method: "POST",
    headers: { "X-API-Key": "nl_live_your_key" },
  }
);

const webhook = await response.json();
console.log(`Webhook enabled: ${webhook.enabled}`);
Webhook sources store authentication credentials (headers, auth config) in the database. Use dedicated service accounts with minimal permissions for external API access, and rotate credentials regularly.

Step 6: Attach a knowledge base to an agent

Once your knowledge base has content, attach it to an AI agent configuration so conversations can use it for RAG retrieval.
curl -X PUT https://your-api.naturalead.com/api/agent-config \
  -H "X-API-Key: nl_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "knowledgeBaseId": "KB_ID"
  }'
When a conversation is active, the AI agent queries the attached knowledge base on each inbound message to retrieve relevant context before generating a response.

Cleanup and maintenance

Delete a document

Remove a single document from a knowledge base. The document is also removed from the vector store.
curl -X DELETE https://your-api.naturalead.com/api/knowledge-bases/KB_ID/documents/DOC_ID \
  -H "X-API-Key: nl_live_your_key"

Delete a knowledge base

Deleting a knowledge base removes all associated documents, crawl jobs, and webhook sources.
This action is irreversible. All documents, crawl jobs, and webhook source configurations within the knowledge base will be permanently deleted.
curl -X DELETE https://your-api.naturalead.com/api/knowledge-bases/KB_ID \
  -H "X-API-Key: nl_live_your_key"

Summary

MethodBest forUpdate frequency
Manual uploadSmall, static content (FAQs, scripts)As needed
Sitemap crawlWebsite content, help centers, blogsOne-time or periodic re-crawl
Webhook sourceDynamic content from APIs, CMS, databasesScheduled (configurable interval)
Combine all three methods within a single knowledge base to give your AI agent the most comprehensive context for lead qualification conversations.