This documentation explains how to train your replicas using the Sensay API. Training is essential for creating personalized replicas that can provide accurate and relevant responses based on your specific content.
What is a knowledge base?
A knowledge base is a collection of information that your replica uses to answer questions. It's the foundation of your replica's ability to provide accurate and contextually relevant responses. All training in Sensay relies on knowledge base entries.
Knowledge base workflow
Knowledge base entries follow different processing paths depending on their type (text, file, website, or YouTube). Each entry progresses through a series of status stages as it's processed:
Processing stages
A high-level view of the system is represented in this diagram. For each specific file type, refer to the more details state descriptions below.
--- displayMode: compact --- stateDiagram-v2 direction LR classDef badBadEvent font-family:"Consolas, monaco, monospace",fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:yellow classDef greenEvent font-family:"Consolas, monaco, monospace",fill:#0f0,color:black,font-weight:bold,stroke-width:2px,stroke:yellow classDef ms font-family:"Consolas, monaco, monospace"; [*] --> NEW:::ms NEW --> FILE_UPLOADED:::ms: File upload FILE_UPLOADED --> RAW_TEXT:::ms note left of RAW_TEXT Content extracted end note NEW --> RAW_TEXT:::ms: Crawler fetch RAW_TEXT --> PROCESSED_TEXT:::ms note left of PROCESSED_TEXT Content cleaned and optimized end note PROCESSED_TEXT --> VECTOR_CREATED:::greenEvent note left of VECTOR_CREATED Knowledge base updated and Replica is ready to use end note VECTOR_CREATED --> READY:::greenEvent note left of READY Knowledge base optimised and Replica is ready to use end note %% Error handling paths NEW --> UNPROCESSABLE:::badBadEvent FILE_UPLOADED --> UNPROCESSABLE:::badBadEvent RAW_TEXT --> UNPROCESSABLE:::badBadEvent PROCESSED_TEXT --> UNPROCESSABLE:::badBadEvent note left of UNPROCESSABLE The request cannot be handled and recovery is not possible end note
The processing pipeline automatically moves entries through these stages. Normally it will take about 5 minutes to
process a new entry from NEW
to VECTOR_CREATED
, depending on the content size and type,
but might take longer for larger files, complex websites, or during peak hours.
It might take up to 24 hours for the system to process an entry from VECTOR_CREATED
to READY
.
An entry will be marked as UNPROCESSABLE
only if processing is fundamentally not possible (e.g., corrupted files, URLs requiring authorization, private YouTube videos).
For temporary processing errors that might succeed on retry, the entry status remains unchanged but an error message is associated with the entry.
Processing paths by entry type
Different types of knowledge base entries follow different processing paths:
Text entries
stateDiagram-v2 direction LR classDef badBadEvent font-family:"Consolas, monaco, monospace",fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:yellow classDef greenEvent font-family:"Consolas, monaco, monospace",fill:#0f0,color:black,font-weight:bold,stroke-width:2px,stroke:yellow classDef ms font-family:"Consolas, monaco, monospace"; [*] --> RAW_TEXT:::ms RAW_TEXT --> PROCESSED_TEXT:::ms note left of PROCESSED_TEXT Content cleaned and optimized end note PROCESSED_TEXT --> VECTOR_CREATED:::greenEvent note left of VECTOR_CREATED Knowledge base updated and Replica is ready to use end note VECTOR_CREATED --> READY:::greenEvent note left of READY Knowledge base optimised and Replica is ready to use end note %% Error handling paths RAW_TEXT --> UNPROCESSABLE:::badBadEvent: Error PROCESSED_TEXT --> UNPROCESSABLE:::badBadEvent: Error note right of UNPROCESSABLE The request cannot be handled and recovery is not possible end note
File entries
stateDiagram-v2 direction LR classDef badBadEvent font-family:"Consolas, monaco, monospace",fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:yellow classDef greenEvent font-family:"Consolas, monaco, monospace",fill:#0f0,color:black,font-weight:bold,stroke-width:2px,stroke:yellow classDef ms font-family:"Consolas, monaco, monospace"; [*] --> NEW:::ms note right of NEW File waiting to be uploaded end note NEW --> FILE_UPLOADED:::ms: File upload FILE_UPLOADED --> RAW_TEXT:::ms note left of RAW_TEXT Text extracted from the file end note RAW_TEXT --> PROCESSED_TEXT:::ms note left of PROCESSED_TEXT Content cleaned and optimized end note PROCESSED_TEXT --> VECTOR_CREATED:::greenEvent note left of VECTOR_CREATED Knowledge base updated and Replica is ready to use end note VECTOR_CREATED --> READY:::greenEvent note left of READY Knowledge base optimised and Replica is ready to use end note %% Error handling paths NEW --> UNPROCESSABLE:::badBadEvent: Upload expired FILE_UPLOADED --> UNPROCESSABLE:::badBadEvent: Error (e.g. file is empty) RAW_TEXT --> UNPROCESSABLE:::badBadEvent: Error PROCESSED_TEXT --> UNPROCESSABLE:::badBadEvent: Error note right of UNPROCESSABLE The request cannot be handled and recovery is not possible end note
Website entries
stateDiagram-v2 direction LR classDef badBadEvent font-family:"Consolas, monaco, monospace",fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:yellow classDef greenEvent font-family:"Consolas, monaco, monospace",fill:#0f0,color:black,font-weight:bold,stroke-width:2px,stroke:yellow classDef ms font-family:"Consolas, monaco, monospace"; [*] --> NEW:::ms note right of NEW Waiting for the crawer to fetch the content of the website end note NEW --> RAW_TEXT:::ms note left of RAW_TEXT Content extracted from the website end note RAW_TEXT --> PROCESSED_TEXT:::ms note left of PROCESSED_TEXT Content cleaned and optimized end note PROCESSED_TEXT --> VECTOR_CREATED:::greenEvent note left of VECTOR_CREATED Knowledge base updated and Replica is ready to use end note VECTOR_CREATED --> READY:::greenEvent note left of READY Knowledge base optimised and Replica is ready to use end note %% Error handling paths NEW --> UNPROCESSABLE:::badBadEvent: Error (e.g. URL cannot be accessed) RAW_TEXT --> UNPROCESSABLE:::badBadEvent: Error PROCESSED_TEXT --> UNPROCESSABLE:::badBadEvent: Error note right of UNPROCESSABLE The request cannot be handled and recovery is not possible end note
YouTube entries
stateDiagram-v2 direction LR classDef badBadEvent font-family:"Consolas, monaco, monospace",fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:yellow classDef greenEvent font-family:"Consolas, monaco, monospace",fill:#0f0,color:black,font-weight:bold,stroke-width:2px,stroke:yellow classDef ms font-family:"Consolas, monaco, monospace"; [*] --> NEW:::ms note right of NEW Waiting for the crawer to fetch the content of the video end note NEW --> RAW_TEXT:::ms note left of RAW_TEXT Content extracted from the video end note RAW_TEXT --> PROCESSED_TEXT:::ms note left of PROCESSED_TEXT Content cleaned and optimized end note PROCESSED_TEXT --> VECTOR_CREATED:::greenEvent note left of VECTOR_CREATED Knowledge base updated and Replica is ready to use end note VECTOR_CREATED --> READY:::greenEvent note left of READY Knowledge base optimised and Replica is ready to use end note %% Error handling paths NEW --> UNPROCESSABLE:::badBadEvent: Error (e.g. Video is private) RAW_TEXT --> UNPROCESSABLE:::badBadEvent: Error PROCESSED_TEXT --> UNPROCESSABLE:::badBadEvent: Error note right of UNPROCESSABLE The request cannot be handled and recovery is not possible end note
Adding content to the knowledge base
There are four methods to add content to your replica's knowledge base:
Adding text content
Create a knowledge base entry with text content
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json" \
-d '{
"text": "The way to the stars is written in starlight."
}'
Example response:
{
"success": true,
"results": [
{
"type": "TEXT",
"enqueued": true,
"knowledgeBaseID": 12345
}
]
}
Export the ID into a variable: export KNOWLEDGE_BASE_ID=
This creates a new knowledge base entry with your text content and automatically starts processing it.
Wait for the content to be trained
You will need to wait for the status to be either VECTOR_CREATED
or READY
. You can check the training status via polling.
curl -X GET https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base/$KNOWLEDGE_BASE_ID \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json"
Example response:
{
"success":true,
"id":177030,
"replicaUUID":"db2cc1de-cbe9-46bf-a428-1144145b7311",
"type":"text",
"status":"VECTOR_CREATED"
}
You can now chat with the replica using the new content
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Version: $API_VERSION" \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "X-USER-ID: $USER_ID" \
-d '{"content":"What is the way to the stars written in?"}'
Example response:
{
"success":true,
"content":"The way to the stars is metaphorically \"written in starlight.\""
}
Uploading text-based files, documents or media files
- Create a knowledge base entry for file upload
Making sure that the file extension is representative of the content of the file, create a new knowledge base item:
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json" \
-d '{
"filename": "your_file.txt"
}'
Example response:
{
"success": true,
"results": [
{
"type": "FILE",
"enqueued": true,
"knowledgeBaseID": 12345,
"signedURL": "https://storage.googleapis.com/..."
}
]
}
Export the ID into a variable: export KNOWLEDGE_BASE_ID=
Export the Signed URL into a variable: export SIGNED_URL=
This creates a knowledge base entry for the file upload and returns a special URL where you can upload your file, along with the knowledge base ID for tracking. Files up to 50MB are supported. You can check the list of supported file types here.
- Upload the file to the signed URL
Making sure that the MIME Type is representative of the content of the file, upload your file:
echo "The way to earth is written in dust. The content of a plain file needs to be at least 50 characters." >> your_file.txt
curl -X PUT $SIGNED_URL \
-H "Content-Type: text/plain" \
--data-binary @your_file.txt
You can check the list of supported MIME types here.
Wait for the content to be trained
You will need to wait for the status to be either VECTOR_CREATED
or READY
. You can check the training status via polling.
curl -X GET https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base/$KNOWLEDGE_BASE_ID \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json"
Example response:
{
"success":true,
"id":177030,
"replicaUUID":"db2cc1de-cbe9-46bf-a428-1144145b7311",
"type":"text",
"status":"VECTOR_CREATED"
}
You can now chat with the replica using the new content
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Version: $API_VERSION" \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "X-USER-ID: $USER_ID" \
-d '{"content":"What is the way to earth written in?"}'
Example response:
{
"success":true,
"content":"The way to Earth is metaphorically \"written in dust.\" This phrase serves as a foundational idea or metaphor, suggesting a profound or inherent truth about the path to Earth. It implies a journey of introspection and understanding, where the path unfolds as one connects with the core of their aspirations."
}
Adding website content
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json" \
-d '{
"url": "https://en.wikipedia.org/wiki/National_Guard_of_Georgia",
"autoRefresh": false
}'
The autoRefresh
parameter (optional, defaults to false
) allows the system to automatically update the content when the source changes. The refresh interval is automatically determined and can not be customized.
Export the ID into a variable: export KNOWLEDGE_BASE_ID=
Wait for the content to be trained
You will need to wait for the status to be either VECTOR_CREATED
or READY
. You can check the training status via polling.
curl -X GET https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base/$KNOWLEDGE_BASE_ID \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json"
Example response:
{
"success":true,
"id":177030,
"replicaUUID":"db2cc1de-cbe9-46bf-a428-1144145b7311",
"type":"text",
"status":"VECTOR_CREATED"
}
You can now chat with the replica using the new content
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Version: $API_VERSION" \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "X-USER-ID: $USER_ID" \
-d '{"content":"What is the GNG?"}'
Example response:
{
"success":true,
"content":"The National Guard of Georgia (GNG) is a branch of the Defense Forces of Georgia, serving as a gendarmerie, guard of honour, and military reserve force. It was established on December 20, 1990, making it the first national military formation in then-Soviet Georgia. The GNG plays a multifaceted role, including responsibilities in civil affairs, internal security, natural disaster response, and support for military operations. It also has a significant historical role, having participated in major conflicts such as the Georgian Civil War and the Georgian-Ossetian and Georgian-Abkhaz conflicts in the early 1990s."
}
Adding YouTube content
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ&list=RDdQw4w9WgXcQ"
}'
Supported YouTube URL formats:
- Single videos:
https://www.youtube.com/watch?v=VIDEO_ID
- YouTube Shorts:
https://www.youtube.com/shorts/SHORT_VIDEO_ID
- Playlists:
https://www.youtube.com/playlist?list=PLAYLIST_ID
Export the ID into a variable: export KNOWLEDGE_BASE_ID=
Wait for the content to be trained
You will need to wait for the status to be either VECTOR_CREATED
or READY
. You can check the training status via polling.
curl -X GET https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base/$KNOWLEDGE_BASE_ID \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json"
Example response:
{
"success":true,
"id":177030,
"replicaUUID":"db2cc1de-cbe9-46bf-a428-1144145b7311",
"type":"text",
"status":"VECTOR_CREATED"
}
You can now chat with the replica using the new content
curl -X POST https://api.sensay.io/v1/replicas/$REPLICA_UUID/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Version: $API_VERSION" \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "X-USER-ID: $USER_ID" \
-d '{"content":"What are you never gonna do?"}'
Example response:
{
"success":true,
"content":"There are several things that are promised to never be done, such as letting someone down, running around and deserting them, making them cry, saying goodbye, and telling a lie that would hurt them. These promises highlight a commitment to positive and supportive behavior."
}
Managing knowledge base entries
List all knowledge base entries
curl -X GET https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json"
Example response:
{
"success": true,
"items": [
{
"id": 12345,
"replicaUUID": "12345678-1234-1234-1234-123456789abc",
"type": "TEXT",
"status": "READY",
"rawText": "Our company was founded in 2020...",
"createdAt": "2025-04-15T08:11:00.093761+00:00",
"updatedAt": "2025-04-15T08:11:05.299349+00:00",
"title": "Company Information",
"summary": "Basic company details and policies"
}
],
"total": 1
}
You can filter results using query parameters:
status
: Filter by processing status (e.g.,READY
,PROCESSING
)type
: Filter by entry type (e.g.,TEXT
,FILE
,WEBSITE
)
This endpoint also supports pagination.
Get a specific knowledge base entry
curl -X GET https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base/$KNOWLEDGE_BASE_ID \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json"
Example response:
{
"id": 12345,
"replicaUUID": "12345678-1234-1234-1234-123456789abc",
"type": "TEXT",
"status": "READY",
"rawText": "Your training text content...",
"createdAt": "2025-04-15T08:11:00.093761+00:00",
"updatedAt": "2025-04-15T08:11:05.299349+00:00",
"title": "Company Information",
"summary": "Basic company details including founding date, business focus, and operating hours."
}
Update a knowledge base entry
You may want to update a knowledge base entry when information becomes outdated, needs corrections, or requires expansion. Common scenarios include updating product information, revising company policies, or correcting errors in previously uploaded content.
When you update an entry's content (like rawText
), the system will reprocess it through the processing pipeline only if the status changes. For example, updating rawText
will set the status to RAW_TEXT and trigger reprocessing through the full pipeline (RAW_TEXT
→ PROCESSED_TEXT
→ VECTOR_CREATED
→ READY
). This ensures your replica uses the most current information when responding to questions.
curl -X PATCH https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base/$KNOWLEDGE_BASE_ID \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json" \
-d '{
"rawText": "Updated text content for your knowledge base entry.",
"title": "Updated Title"
}'
Example response:
{
"success": true
}
Delete a knowledge base entry
curl -X DELETE https://api.sensay.io/v1/replicas/$REPLICA_UUID/knowledge-base/$KNOWLEDGE_BASE_ID \
-H "X-ORGANIZATION-SECRET: $ORGANIZATION_SECRET" \
-H "Content-Type: application/json"
Example response:
{
"success": true
}