LLM Services
Your first task in enabling your skill to use a Large Language Model (LLM) is creating a service that accesses the LLM provider's endpoint from Oracle Digital Assistant.
You can create an LLM service manually or by importing an YAML definition. You can
also convert an existing REST service into an LLM service by clicking Convert to
LLM in the REST Services tab.
Note:
If your skill calls the Cohere models via Oracle Generative AI Service, then there are a few tasks that you'll need to perform to allow your Oracle Digital Assistant instance access to translation, text generation, text summarization, and embedding resources. Among these tasks is creating tenant resource policies which may require assistance from Oracle Support.Create an LLM Service
To create the service manually:
- Select > Settings
> API Services in the side menu.
- Open the LLM Services tab. Click +Add LLM Service.
- Complete the dialog by entering a name for the service, its
endpoint, an optional description, and its methods. Then click
Create.
- For Cohere's Command model, enter the endpoint to the
Co.Generate
endpoint:https://api.cohere.ai/v1/generate
- For Azure OpenAI,
specify a
completions
operation to enable the multiple text completions needed for multi-turn refinements. For example:https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/completions?api-version={api-version}
- For the Cohere command, command-light, and Llama models via
Oracle Cloud Infrastructure (OCI) Generative
AI:
https://generativeai.aiservice.us-chicago-1.oci.oraclecloud.com/20231130/actions/generateText
- For the Cohere summarization model via Oracle Cloud
Infrastructure (OCI) Generative
AI:
https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/summarizeText
- For Cohere's Command model, enter the endpoint to the
- Enter the authentication type. The authentication type required for the endpoint depends on the provider and the model. Some require that an API key be passed as header, but others, like Cohere, require a bearer token. For the Oracle Generative AI Cohere models, choose OCI Resource Principal.
- Specify the headers (if applicable).
- For the request content type, choose application/json as then content type, then add the provider-specific POST request payload, and if needed, the static response (for dialog flow testing), and error payload samples.
- Check for a 200 response code by clicking Test Request.
Import an LLM Service
If you're importing the service:
- click Import LLM Services (or choose Import LLM Services from the More menu).
- Browse to, and select, a YAML file with LLM service definition. The
YAML file looks something like
this:
exportedRestServices: - endpoint: "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/generateText" name: "genAI_cohere" authType: "resourcePrincipal" restServiceMethods: - restServiceMethodType: "POST" contentType: "application/json" statusCode: 200 methodIncrementId: 0 requestBody: "{\n \"compartmentId\": \"ocid1.compartment.oc1..aaaaaaaaexampleuniqueID\"\ ,\n \"servingMode\": {\n \"servingType\": \"ON_DEMAND\",\n \ \ \"modelId\": \"cohere.command\"\n },\n \"inferenceRequest\": {\n \ \ \"runtimeType\": \"COHERE\",\n \"prompt\": \"Tell me a joke\",\n\ \ \"maxTokens\": 1000,\n \"isStream\": false,\n \"frequencyPenalty\"\ : 1,\n \"topP\": 0.75,\n \"temperature\": 0\n }\n}" mockResponsePayload: "{\n \"modelId\": \"cohere.command\",\n \"modelVersion\"\ : \"15.6\",\n \"inferenceResponse\": {\n \"generatedTexts\": [\n \ \ {\n \"id\": \"6fd60b7d-3001-4c99-9ad5-28b207a03c86\"\ ,\n \"text\": \" Why was the computer cold?\\n\\nBecause it left\ \ its Windows open!\\n\\nThat joke may be dated, but I hope you found it amusing\ \ nonetheless. If you'd like to hear another one, just let me know. \\n\\nWould\ \ you like to hear another joke? \"\n }\n ],\n \"timeCreated\"\ : \"2024-02-08T11:12:04.252Z\",\n \"runtimeType\": \"COHERE\"\n }\n\ }" restServiceParams: [] - endpoint: "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/generateText" name: "genAI_cohere_light" authType: "resourcePrincipal" restServiceMethods: - restServiceMethodType: "POST" contentType: "application/json" statusCode: 200 methodIncrementId: 0 requestBody: "{\n \"compartmentId\": \"ocid1.compartment.oc1..aaaaaaaaexampleuniqueID\"\ ,\n \"servingMode\": {\n \"servingType\": \"ON_DEMAND\",\n \ \ \"modelId\": \"cohere.command-light\"\n },\n \"inferenceRequest\": {\n\ \ \"runtimeType\": \"COHERE\",\n \"prompt\": \"Tell me a joke\"\ ,\n \"maxTokens\": 1000,\n \"isStream\": false,\n \"frequencyPenalty\"\ : 1,\n \"topP\": 0.75,\n \"temperature\": 0\n }\n}" mockResponsePayload: "{\n \"modelId\": \"cohere.command-light\",\n \"modelVersion\"\ : \"15.6\",\n \"inferenceResponse\": {\n \"generatedTexts\": [\n \ \ {\n \"id\": \"dfa27232-90ea-43a1-8a46-ef8920cc3c37\"\ ,\n \"text\": \" Why don't scientists trust atoms?\\n\\nBecause\ \ they make up everything!\\n\\nI hope you found that joke to be a little amusing.\ \ Would you like me to tell you another joke or explain a little more about\ \ the purpose of jokes and humor? \"\n }\n ],\n \"\ timeCreated\": \"2024-02-08T11:15:38.156Z\",\n \"runtimeType\": \"COHERE\"\ \n }\n}" restServiceParams: [] - endpoint: "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/generateText" name: "genAI_llama" authType: "resourcePrincipal" restServiceMethods: - restServiceMethodType: "POST" contentType: "application/json" statusCode: 200 methodIncrementId: 0 requestBody: "{\n \"compartmentId\": \"ocid1.compartment.oc1..aaaaaaaaexampleuniqueID\"\ ,\n \"servingMode\": {\n \"servingType\": \"ON_DEMAND\",\n \ \ \"modelId\": \"meta.llama-2-70b-chat\"\n },\n \"inferenceRequest\":\ \ {\n \"runtimeType\": \"LLAMA\",\n \"prompt\": \"Tell me a joke\"\ ,\n \"maxTokens\": 1000,\n \"isStream\": false,\n \"frequencyPenalty\"\ : 1,\n \"topP\": 0.75,\n \"temperature\": 0\n }\n}" mockResponsePayload: "{\n \"modelId\": \"meta.llama-2-70b-chat\",\n \"modelVersion\"\ : \"1.0\",\n \"inferenceResponse\": {\n \"created\": \"2024-02-08T11:16:18.810Z\"\ ,\n \"runtimeType\": \"LLAMA\",\n \"choices\": [\n \ \ {\n \"finishReason\": \"stop\",\n \"index\"\ : 0,\n \"text\": \".\\n\\nI'm not able to generate jokes or humor\ \ as it is subjective and can be offensive. I am programmed to provide informative\ \ and helpful responses that are appropriate for all audiences. Is there anything\ \ else I can help you with?\"\n }\n ]\n }\n}" restServiceParams: []
- Confirm that the request returns a 200 response by clicking
Test Request.
Tip:
If the imported service displays in the REST Services tab instead of the LLM Services tab, select the service in the REST Services tab, then click Convert to LLM.
Generative AI Service
Before you create an LLM service that accesses the Cohere summarization and text
generation models through Oracle Cloud Infrastructure (OCI)
Generative AI, you need the following:
- A dedicated AI cluster for the Generative AI resource and Language service.
- Tenancy policy statements for accessing both the Language and Generative AI services.
These policy statements, which are written by you (or your tenancy
administrator), use aggregate resource types for the various Language and
Generative AI resources. For the Language translation resource, the aggregate
resource type is
ai-service-language-family
. For the Generative AI resources (which includes thegenerative-ai-text-generation
andgenerative-ai-text-summarization
resources) it'sgenerative-ai-family
. The policy syntax varies according to the subscription type (single tenancy versus paired instance).- Individual (Single Tenancy) – If Oracle Digital Assistant resides on a single tenancy, an Allow statement grants access
to the Language and Generative AI resources. This statement has the
following
syntax:
Allow any-user to use ai-service-language-family in tenancy where request.principal.id='<oda-instance-ocid>' Allow any-user to use generative-ai-family in tenancy where request.principal.id='<oda-instance-ocid>'
- Paired Instance – Oracle Digital Assistant instances paired with subscriptions to Oracle Fusion Cloud
Applications require destination policies that combine Define and
Admit statements. Together, these statements allow
cross-tenancy sharing of the Language and Generate AI resources. The
Define statement names the OCID (Oracle Cloud Identifier) of
the source tenancy that has predefined policies that can allow resource
access to a single instance on a tenancy, a specific tenancy, or to all
tenancies.
Note:
Because the source tenancy OCID is not noted on your Oracle Cloud Infrastructure Console, you must file a Service Request (SR) with Oracle Support to obtain this OCID.The Admit statement controls the scope of the access within the tenancy. The syntax used for this statement is specific to how the resources have been organized on the tenant. Here's the syntax for a policy statement that restricts access to the Languages resources to a specific compartment.
Here's the syntax for a policy statement that allows tenancy-wide access to the Language resources.Define SourceTenancy as ocid1.tenancy.oc1..<unique_ID> Admit any-user of tenant SourceTenancy to use ai-service-language-family in compartment <compartment-name> where request.principal.id in ('<ODA instance OCID 1>', '<ODA instance OCID 2>', ...)
Define SourceTenancy as ocid1.tenancy.oc1..<unique_ID> Admit any-user of tenant SourceTenancy to use ai-service-language-family in tenancy where request.principal.id in ('<ODA instance OCID 1>', '<ODA instance OCID 2>', ...)
These destination policies correspond to the Define and/or Endorse statements that have already been created for the source tenancy. The syntax used in these policies is specific to the scope of the access granted to the tenancies.Scope of Access Source Tenancy Policy Statements All tenancies Endorse any-user to use ai-service-language-family in any-tenancy where request.principal.type='odainstance'
A specific tenancy Define TargetTenancy as <target-tenancy-OCID> Endorse any-user to use ai-service-language-family in tenancy TargetTenancy where request.principal.type='odainstance'
Specific Oracle Digital Assistant instances on a specific tenancy Define TargetTenancy as <target-tenancy-OCID> Endorse any-user to use ai-service-language-family in tenancy TargetTenancy where request.principal.id in ('<ODA instance OCID 1>', '<ODA instance OCID 2>', ...)
- Individual (Single Tenancy) – If Oracle Digital Assistant resides on a single tenancy, an Allow statement grants access
to the Language and Generative AI resources. This statement has the
following
syntax:
- Endpoints for the Oracle Generative AI model and the Language API
Sample Payloads
Open AI and Azure Open AI
Method | Transformer Payload |
---|---|
POST Request |
|
Response (Non-Streaming) |
|
Error (Maximum Content Length Exceeded) |
|
Cohere (Command Model)
This payload supports the
/generate
API and the associated
Cohere.command
model, not the /chat
API that's
used for the cohere.command.R
model.
Method | Payload |
---|---|
POST Request |
|
Cohere via Oracle Generative AI Service
Method | Payload |
---|---|
POST Request | Note:
Contact Oracle Support for the compartmentID
OCID.
|
Response |
|
Cohere Command - Light
Method | Payload |
---|---|
POST Request | Note:
Contact Oracle Support for the compartmentID
OCID.
|
Response |
|
Llama
Method | Payload |
---|---|
POST Request | Note:
Contact Oracle Support for the compartmentID
OCID.
|
Response |
|
Summarize Payloads
Method | Payload |
---|---|
POST Request | Note:
Contact Oracle Support for the compartmentID
OCID.
|
Response |
|