Migrate to the Chat Models
The
cohere.command
, cohere.command-light
,
meta.llama-2-70b-chat
models that are used by the now-deprecated Oracle
Cloud Infrastructure (OCI) Generative AI /generateText
and
/summarizeText
endpoints have been retired. You can only continue using these models
through a dedicated AI cluster, which uses the dedicated serving mode. However, we recommend that you
instead continue with the on-demand service mode by using the /chat
endpoint
with the (new) chat models. Here's an example of this
endpoint:https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/chat
To
migrate to these models, you can either create a new LLM Service, or modify the existing one
using the /chat
endpoint that targets one of the later models, such as
cohere.command-r-08-2024 or meta.llama-3.2-90b-vision-instruct. Here is an example of an
updated request body for cohere.command-r-08-2024:{
"compartmentId": "ocid1.tenancy.oc1..XXXXXXX",
"servingMode": {
"servingType": "ON_DEMAND",
"modelId": "cohere.command-r-08-2024"
},
"chatRequest": {
"apiFormat": "COHERE",
"message": "what you want to ask goes here …",
"maxTokens": 1000,
"isStream": true,
"frequencyPenalty": 0,
"topP": 0.75,
"temperature": 1
}
}
Here's an example request body for
meta.llama-3.2-90b-vision-instruct:{
"compartmentId": "ocid1.tenancy.oc1..XXXXXXX",
"servingMode": {
"servingType": "ON_DEMAND",
"modelId": "meta.llama-3.2-90b-vision-instruct"
},
"chatRequest": {
"messages": [
{
"role": "USER",
"content": [
{
"type": "TEXT",
"text": "what you want to ask goes here …"
}
]
}
],
"apiFormat": "GENERIC",
"maxTokens": 600,
"isStream": false,
"numGenerations": 1,
"frequencyPenalty": 0,
"presencePenalty": 0
}
}
In addition, you will also have to manually update the handler code that's generated
from the Generative AI-specific transformation templates to declare the new chat model. For
cohere.command-r-08-2024, the declaration looks like
this:
let modelId = "cohere.command-r-08-2024"
let apiFormat = "COHERE";
For meta.llama-3.2-90b-vision-instruct (or
other non-Cohere models), the model declaration is as follows. Note that the value set for the
apiFormat
variable declaration is GENERIC
, which is used for non-Cohere
models.let modelId = "meta.llama-3.2-90b-vision-instruct"
let apiFormat = "GENERIC";
The handler code will also need to include chat-specific parameters (the message
parameter
that replaces the prompt
parameter, for example).
Note
If you continue with the Command models, which are retired, then, in addition to creating the dedicated AI cluster, you'll have to change the
If you continue with the Command models, which are retired, then, in addition to creating the dedicated AI cluster, you'll have to change the
servingType
parameter in the both the request payload and the Gen AI
template's transformRequestPayload
handler code from
ON_DEMAND
to
DEDICATED
.
"servingMode": {
"modelId": "cohere.command,
"servingType": "DEDICATED"
},
For other models, such as meta.llama-3.2-90b-vision-instruct, set
servingType
in the request payload and the
transformRequestPayload
handler to ON_DEMAND
:
"servingMode": {
"servingType": "ON_DEMAND",
"modelId": "meta.llama-3.2-90b-vision-instruct"
},
Lastly, the handler code functions will need to be modified to support the
updated payloads; The streaming case for the event.payload.responseItems
(items)
must include the finishReason
attribute that prevents a
duplicate message in the response array. Here is the finishReason
attribute
the updated handler code for
Cohere. if (item.text) {
let finshReasonVar = item.finishReason;
if (finshReasonVar != 'COMPLETE') {
// check for only the stream items and not the 'complete' message (e.g. the last message returned by the API)
llmPayload.responseItems.push({ "candidates": [{ "content" : item.text || "" }] });
}
}
Here is the attribute definition in the updated Llama template
code: let finshReasonVar = item.finishReason;
if (finshReasonVar != 'stop') {
let msgcontent = item.message.content[0];
let text = msgcontent.text;
if (text !== "") {
llmPayload.responseItems.push({ "candidates": [{ "content" : text || "" }] })
Note
You do not have to update the template code for the non-streaming case. In the following code snippets, the
Here's an example of the handler code modified for
You do not have to update the template code for the non-streaming case. In the following code snippets, the
"isStream": streamResVar
is set
by the invokeLLM component's Use Streaming
property.
cohere.command-r-08-2024
:/**
* Handler to transform the request payload
* @param {TransformPayloadEvent} event - event object contains the following properties:
* - payload: the request payload object
* @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
* @returns {object} the transformed request payload
*/
transformRequestPayload: async (event, context) => {
// Cohere doesn't support chat completions, so we first print the system prompt, and if there
// are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
let prompt = event.payload.messages[0].content;
let streamResVar = event.payload.streamResponse;
if (event.payload.messages.length > 1) {
let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}`, '');
prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:`
}
// using Cohere new OCI gen-ai /chat endpoint
let modelId = "cohere.command-r-08-2024"
let apiFormat = "COHERE";
return {
"compartmentId": event.compartmentId,
"servingMode": {
"servingType": "ON_DEMAND",
"modelId": modelId
},
"chatRequest": {
"apiFormat": apiFormat,
"message": prompt,
"maxTokens": 4000,
"isStream": streamResVar,
"frequencyPenalty": 0,
"topP": 0.75,
"temperature": 0
}
};
},
/**
* Handler to transform the response payload
* @param {TransformPayloadEvent} event - event object contains the following properties:
* - payload: the response payload object
* @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
* @returns {object} the transformed response payload
*/
transformResponsePayload: async (event, context) => {
let llmPayload = {};
if (event.payload.responseItems) {
// streaming case
llmPayload.responseItems = [];
event.payload.responseItems.forEach(item => {
// only grab the text items, since last item in the responseItems[] is the finished reason not part of the sentence
if (item.text) {
let finshReasonVar = item.finishReason;
if (finshReasonVar != 'COMPLETE') {
// check for only the stream items and not the 'complete' message (e.g., the last message returned by the API)
llmPayload.responseItems.push({ "candidates": [{ "content" : item.text || "" }] });
}
}
});
} else {
llmPayload.candidates = [{ "content" : event.payload.chatResponse.text || "" }];
}
return llmPayload;
},
Here's an example of the handler code for
meta.llama-3.2-90b-vision-instruct: /**
* Handler to transform the request payload
* @param {TransformPayloadEvent} event - event object contains the following properties:
* - payload: the request payload object
* @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
* @returns {object} the transformed request payload
*/
transformRequestPayload: async (event, context) => {
// are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
let prompt = event.payload.messages[0].content;
let streamResVar = event.payload.streamResponse;
if (event.payload.messages.length > 1) {
let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}`, '');
prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:
}
let modelId = "meta.llama-3.2-90b-vision-instruct"
let apiFormat = "GENERIC";
return {
"compartmentId": event.compartmentId,
"servingMode": {
"servingType": "ON_DEMAND",
"modelId": modelId
},
"chatRequest": {
"messages": [
{
"role": "USER",
"content": [
{
"type": "TEXT",
"text": prompt
}
]
}
],
"apiFormat": apiFormat,
"maxTokens": 4000,
"isStream": streamResVar,
"numGenerations": 1,
"frequencyPenalty": 0,
"presencePenalty": 0,
"temperature": 1,
"topP": 1,
"topK": 1
}
};
},
/**
* Handler to transform the response payload
* @param {TransformPayloadEvent} event - event object contains the following properties:
* - payload: the response payload object
* @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
* @returns {object} the transformed response payload
*/
transformResponsePayload: async (event, context) => {
let llmPayload = {};
if (event.payload.responseItems) {
// streaming case
llmPayload.responseItems = [];
event.payload.responseItems.forEach(item => {
let finshReasonVar = item.finishReason;
if (finshReasonVar != 'stop') {
let msgcontent = item.message.content[0];
let text = msgcontent.text;
if (text !== "") {
llmPayload.responseItems.push({ "candidates": [{ "content" : text || "" }] });
}
}
});
} else {
event.payload.chatResponse.choices.forEach(item => {
let msgcontent = item.message.content[0];
let text = msgcontent.text;
llmPayload.candidates = [{ "content" : text || "" }];
});
}
return llmPayload;
},
Tip:
To find out what your code needs, we recommend that you debug it locally. Because a transformation event handler is similar to a custom component, you can use the same debugging technique.