Eseguire la migrazione ai modelli chat

I modelli cohere.command, cohere.command-light e meta.llama-2-70b-chat utilizzati dagli endpoint /generateText e /summarizeText di Oracle Cloud Infrastructure (OCI) Generative AI ora non più validi sono stati ritirati. È possibile continuare a utilizzare questi modelli solo tramite un cluster AI dedicato, che utilizza la modalità di servizio dedicata. Tuttavia, si consiglia di continuare con la modalità di servizio su richiesta utilizzando l'endpoint /chat con i nuovi modelli di chat. Di seguito è riportato un esempio di endpoint.

https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/chat

Per eseguire la migrazione a questi modelli, è possibile creare un nuovo servizio LLM o modificare quello esistente utilizzando l'endpoint /chat che si rivolge a uno dei modelli successivi, ad esempio cohere.command-r-08-2024 o meta.llama-3.2-90b-vision-instruct. Di seguito è riportato un esempio di corpo della richiesta aggiornato per cohere.command-r-08-2024.

{
   "compartmentId": "ocid1.tenancy.oc1..XXXXXXX",
   "servingMode": {
       "servingType": "ON_DEMAND",
       "modelId": "cohere.command-r-08-2024"
   },
   "chatRequest": {
       "apiFormat": "COHERE",
       "message": "what you want to ask goes here …",
       "maxTokens": 1000,
       "isStream": true,
       "frequencyPenalty": 0,
       "topP": 0.75,
       "temperature": 1
   }
}

Di seguito è riportato un corpo della richiesta di esempio per meta.llama-3.2-90b-vision-instruct:

{

    "compartmentId": "ocid1.tenancy.oc1..XXXXXXX",
    "servingMode": {
        "servingType": "ON_DEMAND",
        "modelId": "meta.llama-3.2-90b-vision-instruct"
    },

    "chatRequest": {
        "messages": [
            {
                "role": "USER",
                "content": [
                    {
                        "type": "TEXT",
                        "text": "what you want to ask goes here …"
                    }
                ]
            }
        ],
        "apiFormat": "GENERIC",
        "maxTokens": 600,
        "isStream": false,
        "numGenerations": 1,
        "frequencyPenalty": 0,
        "presencePenalty": 0
  }
}

Inoltre, dovrai anche aggiornare manualmente il codice dell'handler generato dai modelli di trasformazione specifici dell'AI generativa per dichiarare il nuovo modello di chat. Per cohere.command-r-08-2024, la dichiarazione è simile alla seguente:

let modelId = "cohere.command-r-08-2024"
let apiFormat = "COHERE";

Per meta.llama-3.2-90b-vision-instruct (o altri modelli non Cohere), la dichiarazione del modello è la seguente. Si noti che il set di valori per la dichiarazione della variabile apiFormat è GENERIC, che viene utilizzato per i modelli non Cohere.

let modelId = "meta.llama-3.2-90b-vision-instruct"
let apiFormat = "GENERIC";

Il codice dell'handler dovrà inoltre includere parametri specifici della chat, ad esempio il parametro message che sostituisce il parametro prompt.

Nota

Se si continuano con i modelli di comando, che sono in ritiro, oltre a creare il cluster AI dedicato, sarà necessario modificare il parametro servingType sia nel payload della richiesta che nel codice dell'handler transformRequestPayload del modello AI generale da ON_DEMAND a DEDICATED.

"servingMode": {
        "modelId": "cohere.command,
        "servingType": "DEDICATED"
    },

Per altri modelli, ad esempio meta.llama-3.2-90b-vision-instruct, impostare servingType nel payload della richiesta e l'handler transformRequestPayload su ON_DEMAND:

 "servingMode": {
        "servingType": "ON_DEMAND",
        "modelId": "meta.llama-3.2-90b-vision-instruct"
    },

Infine, le funzioni del codice dell'handler dovranno essere modificate per supportare i payload aggiornati. Il caso di streaming per event.payload.responseItems (items) deve includere l'attributo finishReason che impedisce un messaggio duplicato nell'array di risposte. Di seguito è riportato l'attributo finishReason del codice handler aggiornato per Cohere.

       if (item.text) {
            let finshReasonVar = item.finishReason;
            if (finshReasonVar != 'COMPLETE') {
              // check for only the stream items and not the 'complete' message (e.g. the last message returned by the API)
              llmPayload.responseItems.push({ "candidates": [{ "content" : item.text || "" }] });
            }
          }

Di seguito è riportata la definizione dell'attributo nel codice modello Llama aggiornato.

       let finshReasonVar = item.finishReason;
          if (finshReasonVar != 'stop') {
              let msgcontent = item.message.content[0];
              let text = msgcontent.text;
              if (text !== "") {
                llmPayload.responseItems.push({ "candidates": [{ "content" : text || "" }] })

Nota

Non è necessario aggiornare il codice del modello per il caso non in streaming. Negli snippet di codice seguenti, "isStream": streamResVar viene impostato dalla proprietà Usa streaming del componente invokeLLM.

Ecco un esempio del codice handler modificato per cohere.command-r-08-2024:

/**
    * Handler to transform the request payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the request payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed request payload
    */
    transformRequestPayload: async (event, context) => {
      // Cohere doesn't support chat completions, so we first print the system prompt, and if there
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      let streamResVar = event.payload.streamResponse;
      if (event.payload.messages.length > 1) {
        let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}`, '');
        prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:`
      }
      // using Cohere new OCI gen-ai /chat endpoint
      let modelId = "cohere.command-r-08-2024"
      let apiFormat = "COHERE";
      return {
        "compartmentId": event.compartmentId,
        "servingMode": {
          "servingType": "ON_DEMAND",
          "modelId": modelId
        },

        "chatRequest": {
          "apiFormat": apiFormat,
          "message": prompt,
          "maxTokens": 4000,
          "isStream": streamResVar,
          "frequencyPenalty": 0,
          "topP": 0.75,
          "temperature": 0
        }
      };
    },

    /**
    * Handler to transform the response payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the response payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed response payload
    */

    transformResponsePayload: async (event, context) => {
      let llmPayload = {};
      if (event.payload.responseItems) {
        // streaming case		
        llmPayload.responseItems = [];
        event.payload.responseItems.forEach(item => {
          // only grab the text items, since last item in the responseItems[] is the finished reason not part of the sentence

if (item.text) {

            let finshReasonVar = item.finishReason;
            if (finshReasonVar != 'COMPLETE') {
              // check for only the stream items and not the 'complete' message (e.g., the last message returned by the API)
              llmPayload.responseItems.push({ "candidates": [{ "content" : item.text || "" }] });
            }
          } 
       });
      } else {
        llmPayload.candidates = [{ "content" : event.payload.chatResponse.text || "" }];
      }
     return llmPayload;
    },

Ecco un esempio del codice handler per meta.llama-3.2-90b-vision-instruct:

    /**
    * Handler to transform the request payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the request payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed request payload
    */

    transformRequestPayload: async (event, context) => {
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      let streamResVar = event.payload.streamResponse;
      if (event.payload.messages.length > 1) {
        let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}`, '');
        prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:
      }
      let modelId = "meta.llama-3.2-90b-vision-instruct"
      let apiFormat = "GENERIC";
      return {
        "compartmentId": event.compartmentId,
        "servingMode": {
          "servingType": "ON_DEMAND",
          "modelId": modelId
        },

        "chatRequest": {
          "messages": [
              {
                  "role": "USER",
                  "content": [
                      {
                          "type": "TEXT",
                          "text": prompt
                      }
                  ]
              }
          ],
          "apiFormat": apiFormat,
          "maxTokens": 4000,
          "isStream": streamResVar,
          "numGenerations": 1,
          "frequencyPenalty": 0,
          "presencePenalty": 0,
          "temperature": 1,
          "topP": 1,
          "topK": 1
      }
      };
    },
 

    /**
    * Handler to transform the response payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the response payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed response payload
    */

    transformResponsePayload: async (event, context) => {
      let llmPayload = {};
      if (event.payload.responseItems) {
        // streaming case
        llmPayload.responseItems = [];
        event.payload.responseItems.forEach(item => {

          let finshReasonVar = item.finishReason;
          if (finshReasonVar != 'stop') {
              let msgcontent = item.message.content[0];
              let text = msgcontent.text;
              if (text !== "") {
                llmPayload.responseItems.push({ "candidates": [{ "content" : text || "" }] });
              }       
          }       
        });
      } else {
          event.payload.chatResponse.choices.forEach(item => {
          let msgcontent = item.message.content[0];
          let text = msgcontent.text;
          llmPayload.candidates = [{ "content" : text || "" }];
         });
  
      }
    return llmPayload;
    },

Suggerimento

Per scoprire di cosa ha bisogno il tuo codice, ti consigliamo di eseguirne il debug localmente. Poiché un handler di eventi di trasformazione è simile a un componente personalizzato, è possibile utilizzare la stessa tecnica di debug.

Documentazione dell'infrastruttura Oracle Cloud

Eseguire la migrazione ai modelli chat