チャット・モデルへの移行

現在非推奨のOracle Cloud Infrastructure (OCI)生成AI /generateTextおよび/summarizeTextエンドポイントで使用されるcohere.command、cohere.command-light、meta.llama-2-70b-chatモデルが廃止されました。これらのモデルの使用を続行できるのは、専用サービング・モードを使用する専用AIクラスタのみです。ただし、かわりに、/chatエンドポイントを(新しい)チャット・モデルとともに使用して、オンデマンド・サービス・モードを続行することをお薦めします。このエンドポイントの例を次に示します:

https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/chat

これらのモデルに移行するには、新しいLLMサービスを作成するか、cohere.command-r-08-2024やmeta.llama-3.2-90b-vision-instructなど、後続のモデルのいずれかをターゲットとする/chatエンドポイントを使用して既存のLLMサービスを変更できます。cohere.command-r-08-2024の更新されたリクエスト本文の例を次に示します:

{
   "compartmentId": "ocid1.tenancy.oc1..XXXXXXX",
   "servingMode": {
       "servingType": "ON_DEMAND",
       "modelId": "cohere.command-r-08-2024"
   },
   "chatRequest": {
       "apiFormat": "COHERE",
       "message": "what you want to ask goes here …",
       "maxTokens": 1000,
       "isStream": true,
       "frequencyPenalty": 0,
       "topP": 0.75,
       "temperature": 1
   }
}

meta.llama-3.2-90b-vision-instructのリクエスト本文の例を次に示します。

{

    "compartmentId": "ocid1.tenancy.oc1..XXXXXXX",
    "servingMode": {
        "servingType": "ON_DEMAND",
        "modelId": "meta.llama-3.2-90b-vision-instruct"
    },

    "chatRequest": {
        "messages": [
            {
                "role": "USER",
                "content": [
                    {
                        "type": "TEXT",
                        "text": "what you want to ask goes here …"
                    }
                ]
            }
        ],
        "apiFormat": "GENERIC",
        "maxTokens": 600,
        "isStream": false,
        "numGenerations": 1,
        "frequencyPenalty": 0,
        "presencePenalty": 0
  }
}

さらに、生成AI固有の変換テンプレートから生成されたハンドラ・コードを手動で更新して、新しいチャット・モデルを宣言する必要もあります。cohere.command-r-08-2024の場合、宣言は次のようになります。

let modelId = "cohere.command-r-08-2024"
let apiFormat = "COHERE";

meta.llama-3.2-90b-vision-instruct (または他の非Cohereモデル)の場合、モデル宣言は次のようになります。apiFormat変数宣言に設定される値はGENERICで、Cohere以外のモデルに使用されます。

let modelId = "meta.llama-3.2-90b-vision-instruct"
let apiFormat = "GENERIC";

ハンドラ・コードには、チャット固有のパラメータ(たとえば、promptパラメータを置き換えるmessageパラメータ)も含める必要があります。

ノート

コマンド・モデル(廃止)を続行する場合は、専用AIクラスタの作成に加えて、リクエスト・ペイロードと生成AIテンプレートのtransformRequestPayloadハンドラ・コードの両方のservingTypeパラメータをON_DEMANDからDEDICATEDに変更する必要があります。

"servingMode": {
        "modelId": "cohere.command,
        "servingType": "DEDICATED"
    },

meta.llama-3.2-90b-vision-instructなどの他のモデルでは、リクエスト・ペイロードのservingTypeとtransformRequestPayloadハンドラをON_DEMANDに設定します。

 "servingMode": {
        "servingType": "ON_DEMAND",
        "modelId": "meta.llama-3.2-90b-vision-instruct"
    },

最後に、更新されたペイロードをサポートするようにハンドラ・コード関数を変更する必要があります。event.payload.responseItems (items)のストリーミング・ケースには、レスポンス配列でメッセージが重複しないようにするfinishReason属性が含まれている必要があります。Cohereの更新されたハンドラ・コードのfinishReason属性を次に示します。

       if (item.text) {
            let finshReasonVar = item.finishReason;
            if (finshReasonVar != 'COMPLETE') {
              // check for only the stream items and not the 'complete' message (e.g. the last message returned by the API)
              llmPayload.responseItems.push({ "candidates": [{ "content" : item.text || "" }] });
            }
          }

更新されたLlamaテンプレート・コードの属性定義を次に示します。

       let finshReasonVar = item.finishReason;
          if (finshReasonVar != 'stop') {
              let msgcontent = item.message.content[0];
              let text = msgcontent.text;
              if (text !== "") {
                llmPayload.responseItems.push({ "candidates": [{ "content" : text || "" }] })

ノート

非ストリーミング・ケースのテンプレート・コードを更新する必要はありません。次のコード・スニペットでは、"isStream": streamResVarは、invokeLLMコンポーネントの「ストリーミングの使用」プロパティによって設定されます。

cohere.command-r-08-2024用に変更されたハンドラ・コードの例を次に示します:

/**
    * Handler to transform the request payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the request payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed request payload
    */
    transformRequestPayload: async (event, context) => {
      // Cohere doesn't support chat completions, so we first print the system prompt, and if there
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      let streamResVar = event.payload.streamResponse;
      if (event.payload.messages.length > 1) {
        let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}`, '');
        prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:`
      }
      // using Cohere new OCI gen-ai /chat endpoint
      let modelId = "cohere.command-r-08-2024"
      let apiFormat = "COHERE";
      return {
        "compartmentId": event.compartmentId,
        "servingMode": {
          "servingType": "ON_DEMAND",
          "modelId": modelId
        },

        "chatRequest": {
          "apiFormat": apiFormat,
          "message": prompt,
          "maxTokens": 4000,
          "isStream": streamResVar,
          "frequencyPenalty": 0,
          "topP": 0.75,
          "temperature": 0
        }
      };
    },

    /**
    * Handler to transform the response payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the response payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed response payload
    */

    transformResponsePayload: async (event, context) => {
      let llmPayload = {};
      if (event.payload.responseItems) {
        // streaming case		
        llmPayload.responseItems = [];
        event.payload.responseItems.forEach(item => {
          // only grab the text items, since last item in the responseItems[] is the finished reason not part of the sentence

if (item.text) {

            let finshReasonVar = item.finishReason;
            if (finshReasonVar != 'COMPLETE') {
              // check for only the stream items and not the 'complete' message (e.g., the last message returned by the API)
              llmPayload.responseItems.push({ "candidates": [{ "content" : item.text || "" }] });
            }
          } 
       });
      } else {
        llmPayload.candidates = [{ "content" : event.payload.chatResponse.text || "" }];
      }
     return llmPayload;
    },

meta.llama-3.2-90b-vision-instructのハンドラ・コードの例を次に示します。

    /**
    * Handler to transform the request payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the request payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed request payload
    */

    transformRequestPayload: async (event, context) => {
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      let streamResVar = event.payload.streamResponse;
      if (event.payload.messages.length > 1) {
        let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}`, '');
        prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:
      }
      let modelId = "meta.llama-3.2-90b-vision-instruct"
      let apiFormat = "GENERIC";
      return {
        "compartmentId": event.compartmentId,
        "servingMode": {
          "servingType": "ON_DEMAND",
          "modelId": modelId
        },

        "chatRequest": {
          "messages": [
              {
                  "role": "USER",
                  "content": [
                      {
                          "type": "TEXT",
                          "text": prompt
                      }
                  ]
              }
          ],
          "apiFormat": apiFormat,
          "maxTokens": 4000,
          "isStream": streamResVar,
          "numGenerations": 1,
          "frequencyPenalty": 0,
          "presencePenalty": 0,
          "temperature": 1,
          "topP": 1,
          "topK": 1
      }
      };
    },
 

    /**
    * Handler to transform the response payload
    * @param {TransformPayloadEvent} event - event object contains the following properties:
    * - payload: the response payload object
    * @param {LlmTransformationContext} context - see https://oracle.github.io/bots-node-sdk/LlmTransformationContext.html
    * @returns {object} the transformed response payload
    */

    transformResponsePayload: async (event, context) => {
      let llmPayload = {};
      if (event.payload.responseItems) {
        // streaming case
        llmPayload.responseItems = [];
        event.payload.responseItems.forEach(item => {

          let finshReasonVar = item.finishReason;
          if (finshReasonVar != 'stop') {
              let msgcontent = item.message.content[0];
              let text = msgcontent.text;
              if (text !== "") {
                llmPayload.responseItems.push({ "candidates": [{ "content" : text || "" }] });
              }       
          }       
        });
      } else {
          event.payload.chatResponse.choices.forEach(item => {
          let msgcontent = item.message.content[0];
          let text = msgcontent.text;
          llmPayload.candidates = [{ "content" : text || "" }];
         });
  
      }
    return llmPayload;
    },

ヒント:

コードに必要なものを確認するには、ローカルでデバッグすることをお薦めします。変換イベント・ハンドラはカスタム・コンポーネントと似ているため、同じデバッグ手法を使用できます。

Oracle Cloud Infrastructureドキュメント

チャット・モデルへの移行