Semantic Search in OpenSearch
OCI Search with OpenSearch supports semantic search starting with OpenSearch version 2.11.
With semantic search, search engines use the context and content of search queries to better understand the meaning of a query, compared to keyword search, where search results are based on content matching keywords in a query. OpenSearch implements semantic search using neural search, which is a technique that uses large language models to understand the relationships between terms. For more information about neural search in OpenSearch, see Neural search tutorial.
Using Neural Search in OCI Search with OpenSearch
- Register and deploy your choice of model to the cluster.
- Create an index and set up an ingestion pipeline using the deployed model. Use the ingestion pipeline to ingest documents into the index.
- Run semantic search queries on the index using either hybrid search or neural search.
Prerequisites
To use semantic search, the OpenSearch version for the cluster must be 2.11 or newer. By default, new clusters use version 2.11. See Creating an OpenSearch Cluster.
For existing clusters configured for version 2.3, you can perform an inline upgrade to version 2.11, for more information, see OpenSearch Cluster Software Upgrades.
To upgrade existing clusters configured for version 1.2.3 to 2.11, you need to use the upgrade process described in OpenSearch Cluster Software Upgrades.
Before you start setting up the model for semantic search, you need to complete the prerequisites, which include specifying the applicable IAM policy if required, and configuring the recommended cluster settings.
IAM Policy for Custom Models and Generative AI Connectors
If you're using one of the pretrained models that are hosted within OCI Search with OpenSearch you don't need to configure permissions, you can skip to the next prerequisite, Cluster Settings for Semantic Search. See also Semantic Search Walkthrough.
Otherwise, you need to create a policy to grant the required access.
You need to create a policy to grant the required access.
If you're new to policies, see Getting Started with Policies and Common Policies.
IAM Policy for Custom Models
If you're using a custom model, you need to grant access for OCI Search with OpenSearch to access to the Object Storage bucket that contains the model.
The following policy example includes the required permission:
ALLOW ANY-USER to manage object-family in tenancy WHERE ALL {request.principal.type='opensearchcluster', request.resource.compartment.id='<cluster_compartment_id>'}IAM Policy for Generative AI Connectors
If you're using a Generative AI connector, you need to grant access for OCI Search with OpenSearch to access Generative AI resources.
The following policy example includes the required permission:
ALLOW ANY-USER to manage generative-ai-family in tenancy WHERE ALL {request.principal.type='opensearchcluster', request.resource.compartment.id='<cluster_compartment_id>'}Regions for Generative AI Connectors
To use OCI Generative AI, your tenancy must be subscribed to the US Midwest (Chicago) region or the Germany Central (Frankfurt) region. You don't need to create the cluster in either of those regions, just ensure that your tenancy is subscribed to one of the regions.
Cluster Settings for Semantic Search
Use the settings operation of the Cluster APIs to configure the recommended cluster settings for semantic search. The following example includes the recommended settings:
PUT _cluster/settings
{
  "persistent": {
    "plugins": {
      "ml_commons": {
        "only_run_on_ml_node": "false",
        "model_access_control_enabled": "true",
        "native_memory_threshold": "99",
        "rag_pipeline_feature_enabled": "true",
        "memory_feature_enabled": "true",
        "allow_registering_model_via_local_file": "true",
        "allow_registering_model_via_url": "true",
        "model_auto_redeploy.enable":"true",
        "model_auto_redeploy.lifetime_retry_times": 10
      }
    }
  }
}Setting up a Model
The first step when configuring neural search is setting up the large language model you want to use. The model is used to generate vector embeddings from text fields.
Register a Model Group
Model groups enable you to manage access to specific models. Registering a model group is optional, however if you don't register a model group, ML Commons creates registers a new model group for you, so we recommend that you register the model group.
Register a model group using the register operation in the Model Group APIs, as shown in the following example:
POST /_plugins/_ml/model_groups/_register
{
  "name": "new_model_group",
  "description": "A model group for local models"
}Make note of the model_group_id returned in the response:
{
  "model_group_id": "<model_group_ID>",
  "status": "CREATED"
}Register the Model to the Model Group
Register the model using the register operation from the Model APIs. The content of the POST request to the register operation depends on the type of model you're using.
- Option 1: Built-in OpenSearch pretrained modelsSeveral pretrained sentence transformer models are available for you to directly register and deploy to a cluster without needing to download and then upload them manually into a private storage bucket, unlike the process required for the custom models option. With this option, when you register a pretrained model, you only need the model's model_group_id,name,version, andmodel_format. See Using an OpenSearch Pretrained Model for how to use a pretrained model.
- Option 2: Custom modelsYou need to pass the Object Storage URL in the actionssection in the register operation, for example:POST /_plugins/_ml/models/_register { ..... "actions": [ { "method": "GET", "action_type": "DOWNLOAD", "url": "<object_storage_URL_path>" } ] } }For an complete example for a register operation, see Custom Models - 2: Register the Model. 
- Option 3: Generative AI connectorYou can also register and deploy a remote GenAI embedding model such as the cohere.embed-english-v3.0 into your cluster using our GenAI Connector. You must create a connector first, and then register and deploy the model using the connector ID as described in the following steps. Note
 If you're using the ON-DEMAND model, stay current with the model deprecation notifications from the GenAI service and update your connector when necessary to avoid potential service interruptions. See Pretrained Foundational Models in Generative AI for the supported embedding models to select an embedding model from the list of supported models. If you're using the DEDICATED model, change the servingTypeparameter in the following payload example from ON-DEMAND to DEDICATED.Create a connector to Cohere Embedding model: POST /_plugins/_ml/connectors/_create { "name": "<connector_name>", "description": "<connector_description>", "version": "2", "protocol": "oci_sigv1", "parameters": { "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com", "auth_type": "resource_principal", "model": "<embedding_model>", "input_type":"search_document", "truncate": "END" }, "credential": { }, "actions": [ { "action_type": "predict", "method":"POST", "url": "https://${parameters.endpoint}/20231130/actions/embedText", "request_body": "{ \"inputs\":[\"${parameters.passage_text}\"], \"truncate\": \"${parameters.truncate}\" ,\"compartmentId\": \"<compartment_ocid>\", \"servingMode\": { \"modelId\": \"${parameters.model}\", \"servingType\": \"ON_DEMAND\" } }", "pre_process_function": "return '{\"parameters\": {\"passage_text\": \"' + params.text_docs[0] + '\"}}';", "post_process_function": "connector.post_process.cohere.embedding" } ] }The following example shows a payload to create a connector for the cohere.embed-english-v3.0model:POST /_plugins/_ml/connectors/_create { "name": "OCI GenAI Chat Connector to cohere.embed-english-v3 model", "description": "The connector to public Cohere model service for embed", "version": "2", "protocol": "oci_sigv1", "parameters": { "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com", "auth_type": "resource_principal", "model": "cohere.embed-english-v3.0", "input_type":"search_document", "truncate": "END" }, "credential": { }, "actions": [ { "action_type": "predict", "method":"POST", "url": "https://${parameters.endpoint}/20231130/actions/embedText", "request_body": "{ \"inputs\":[\"${parameters.passage_text}\"], \"truncate\": \"${parameters.truncate}\" ,\"compartmentId\": \"ocid1.compartment.oc1..aaaaaaaa..........5bynxlkea\", \"servingMode\": { \"modelId\": \"${parameters.model}\", \"servingType\": \"ON_DEMAND\" } }", "pre_process_function": "return '{\"parameters\": {\"passage_text\": \"' + params.text_docs[0] + '\"}}';", "post_process_function": "connector.post_process.cohere.embedding" } ] }
After you make the register request, you can check the status of the operation, use the task_id with the Get operation of the Tasks APIs, as shown in the following example:
GET /_plugins/_ml/tasks/<task_ID>When the register operation is complete, the status value in the response to the Get operation is COMPLETED, as shown the following example: 
{
  "model_id": "<embedding_model_ID>",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "3qSqVfK2RvGJv1URKfS1bw"
  ],
  "create_time": 1706829732915,
  "last_update_time": 1706829780094,
  "is_async": true
}Make note of the model_id value returned in the response to use when you deploy the model.
Deploy the Model
After the register operation is completed for the model, you can deploy the model to the cluster using the deploy operation of the Model APIs, passing the model_id from the Get operation response in the previous step, as shown in the following example: 
POST /_plugins/_ml/models/<embedding_model_ID>/_deploy Make note of the task_id returned in the response, you can use the task_id to check the status of the operation.
For example, from the following response:
{
  "task_id": "<task_ID>",
  "task_type": "DEPLOY_MODEL",
  "status": "CREATED"
}to check the status of the register operation, use the task_id with the Get operation of the Tasks APIs, as shown in the following example: 
GET /_plugins/_ml/tasks/<task_ID>When the deploy operation is complete, the status value in the response to the Get operation is COMPLETED.
Ingest Data
The first step when configuring neural search is setting up the large language model you want to use. The model is used to generate vector embeddings from text fields.
Ingest Data into Index
After successfully creating an index, you can now ingest data into the index as shown in the following example:
POST /test-index/_doc/1
{
  "passage_text": "there are many sharks in the ocean"
}
 
POST /test-index/_doc/2
{
  "passage_text": "fishes must love swimming"
}
 
POST /test-index/_doc/3
{
  "passage_text": "summers are usually very hot"
}
 
POST /test-index/_doc/4
{
  "passage_text": "florida has a nice weather all year round"
}GET /test-index/_doc/3Create Ingestion Pipeline
Using the model ID of the model deployed, create an ingestion pipeline as shown in the following example:
PUT _ingest/pipeline/test-nlp-pipeline
{
  "description": "An example neural search pipeline",
  "processors" : [
    {
      "text_embedding": {
        "model_id": "<model_ID>",
        "field_map": {
           "passage_text": "passage_embedding"
        }
      }
    }
  ]
}The ingestion pipeline defines a processor and the field mappings (in this case "passage_text" → "passage_embedding" ) This means if you use this pipeline on a specific index to ingest data, the pipeline automatically finds the "passage_text" field, and use the pipeline model to generate the corresponding embeddings, "passage_embedding", and then maps them before indexing.
Remember "passage_text" → "passage_embedding" are user defined and can be anything you want. Ensure that you're consistent with the naming while creating the index where you plan to use the pipeline. The pipeline processor needs to be able to map the fields as described.
Create an Index
During the index creation, you can specify the pipeline you want to use to ingest documents into the index.
The following API call example shows how to create an index using the test-nlp-pipeline pipeline created in the previous step.
PUT /test-index
{
    "settings": {
        "index.knn": true,
        "default_pipeline": "test-nlp-pipeline"
    },
    "mappings": {
        "properties": {
            "passage_embedding": {
                "type": "knn_vector",
                "dimension": <model_dimension>,
                "method": {
                    "name":"hnsw",
                    "engine":"lucene",
                    "space_type": "l2",
                    "parameters":{
                        "m":512,
                        "ef_construction": 245
                    }
                }
            },
            "passage_text": {
                "type": "text"
            }
        }
    }
}When creating the index, you also need to specify which library implementation of approximate nearest neighbor (ANN) you want to use. OCI Search with OpenSearch supports NMSLIB, Faiss, and Lucene libraries, for more information, see Search Engines.
The following example uses the Lucene engine:
{
  "model_id": "<model_ID>",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "3qSqVfK2RvGJv1URKfS1bw"
  ],
  "create_time": 1706829732915,
  "last_update_time": 1706829780094,
  "is_async": true
}Ingest Data into Index
After successfully creating an index, you can now ingest data into the index as shown in the following example:
POST /test-index/_doc/1
{
  "passage_text": "there are many sharks in the ocean"
}
 
POST /test-index/_doc/2
{
  "passage_text": "fishes must love swimming"
}
 
POST /test-index/_doc/3
{
  "passage_text": "summers are usually very hot"
}
 
POST /test-index/_doc/4
{
  "passage_text": "florida has a nice weather all year round"
}Use a GET to verify that the documents are being ingested correctly and embeddings are getting auto generated during ingestion:
GET /test-index/_doc/3