FAQs for Knowledge Base

Use a knowledge base to add corporate knowledge to your automations.

What is a RAG knowledge base and how does it work?

A RAG knowledge base is a collection of documents that you give to a Large Language Model (LLM) to search and find relevant information.

To create a knowledge base, your start with the recipe Oracle Integration — OpenSearch | Build and search your knowledge base. This recipe is a project that includes ready-to-use integrations, connections, AI agent with specific guidelines for knowledge bases, agentic AI tool with specific guidelines for knowledge bases, and agentic AI pattern optimized for knowledge bases.

Here's how knowledge bases work in Oracle Integration:

Ingestion - getting your documents into the knowledge base
  1. You upload PDF or text documents into an Oracle Cloud Infrastructure Object Storage bucket.
  2. You run the ObjectStoreKnowledgeBaseIngestion integration to ingest documents from the OCI Object Storage bucket into the OCI Search with OpenSearch knowledge base.
  3. Documents are split into chunks, and the selected embedding model is used to compute vector embeddings for each chunk before the chunks are uploaded into the OCI Search with OpenSearch knowledge base.
Search - getting answers from your knowledge base
  1. To perform a search, you run the KnowledgeBaseSearch integration and specify your query.

    Types of searches:

    • Text search: supports both BM25 keyword search and phrase search for exact matches. You could use this for finding specific order numbers, person names, or any other precise terminology.
    • Semantic search: searches based on meaning.
    • Hybrid search: searches both exact match and meaning.

What determines whether a document is ingested into the knowledge base?

By default, the document file name is used for identifying the document. To compare the contents of the document, the hash of the document is used.

The document ingestion strategy selected in the RAG ingest action in the ObjectStoreKnowledgeBaseIngestion integration determines whether a document is ingested:

  • Replace: This is the default behavior. Only new or changed documents in the OCI Object Storage bucket are ingested. If a document has already been ingested, it is not ingested again unless it has been updated. If the document has been updated, it's replaced in OCI Search with OpenSearch.

    Tip:

    A simple way to manage versions is to add the version number to the document file name such as mydocument_v1, mydocument_v2. This allows you to easily have multiple versions of a document in the knowledge base.
  • Append: This strategy is primarily designed for documents pushed through a REST trigger such as incident reports sent with attachments instead of documents from OCI Object Storage. Every time the integration runs, the documents are ingested as new entries. If this strategy is used with OCI Object Storage, every file in the bucket is ingested every time the integration runs. This means that if a file remains in the bucket, it will be re-ingested repeatedly, resulting in duplicate entries with the same name within the OCI Search with OpenSearch knowledge base.

  • Replace versioned: Requires the document version to be mapped in the RAG ingest action of the ObjectStoreKnowledgeBaseIngestion integration.
    • If no version is provided for a document, same behavior as Replace.
    • If a version is provided for a document:
      • If the document name, version, and hash values are the same as that of a document in the knowledge base, do not ingest the document.
      • If the document name and version are the same but the hash values are different, delete the existing document from the knowledge base and ingest the new document.
      • If the document name is the same but the specified version is different, ingest the new document into the knowledge base. This allows for multiple versions of the same document to be ingested.
    • If no version was provided for a document when it was ingested, but then a version is provided for the same document, this is considered as two different documents.

How many knowledge bases can I have per project?

Generally, you have one knowledge base per project, but you can have as many knowledge bases as needed.

For simplicity, it's recommended to have a single knowledge base per project. More than one knowledge base requires multiple integrations and tools, making it easier to have mapping errors.

To have multiple knowledge bases in the same project:
  1. Create your project with the recipe Oracle Integration — OpenSearch | Build and search your knowledge base and add your first knowledge base.
  2. For additional knowledge bases, create a different OCI Object Storage bucket. You'll need a different OCI Object Storage bucket for each knowledge base.
  3. Use the same OCI with Search OpenSearch connection. When you add a knowledge base, that knowledge base is assigned a different index in OCI with Search OpenSearch, so you can create multiple independent knowledge bases in the same OCI with Search OpenSearch cluster.
  4. Add the additional knowledge base to the project.
  5. Clone the ObjectStoreKnowledgeBaseIngestion integration and edit the OCI Object Storage actions to specify a different OCI Object Storage bucket.
  6. Clone the KnowledgeBaseSearch integration and edit the RAG search action to specify your new knowledge base.

What types of storage are supported for the knowledge base?

For temporary storage to upload your documents for ingestion, OCI Object Storage.

For the knowledge base database, OCI Search with OpenSearch version 2.19 or higher.

Why is the embedding model grayed out when I try to change it for my knowledge base?

If your knowledge base contains documents, you will not be able to change the embedding model as it will be grayed out. This is because all existing document chunks were vectorized using that specific model. Switching to a different model creates a mathematical mismatch, making it impossible for the system to compare your new queries against the old vectors.

To change models, delete all documents in the knowledge base, then change the embedding model. If you want to compare how different models perform, create separate knowledge bases.