LLM Transformation Handlers

Each provider has its own format for the request, response, and error payloads. Because of this, the LLM provider and Oracle Digital Assistant can't communicate directly, so to facilitate the exchange between the skill and its LLM providers, you need to transform these payloads into Oracle Digital Assistant's Common LLM Interface and back again.

You enable this transformation by creating an LLM transformation handler, a script whose transformRequestPayload, transformResponsePayload, and transformErrorResponsePayload methods execute the payload transformations. These transformation methods have two signatures:
  • event: The properties used for this object depend on the event type (transformRequestPayload, transformResponsePayload, transformErrorResponsePayload).
  • context: References the LlmTransformationContext class, which provides access to convenience methods you can use to create your event handler logic.

Create an LLM Transformation Handler

To create an LLM Transformation Event Handler:
  1. Click Components in the left navbar.
  2. Click +New Service.
  3. Complete the Create Service dialog:
    • Name: Enter the service name.
    • Service Type: Embedded Container
    • Component Service Package Type: New Component
    • Component Type: LLM Transformation
    • Component Name: Enter an easily identifiable name for the entity event handler. You will reference this name when you create the LLM service for the skill.
    • Template: We provide templates for skills that call Cohere directly or via Oracle Generative AI service. You don't have to edit these templates. If your skill calls a non-Cohere/Oracle Generative AI model, such as Azure Open AI, you'll need to add the appropriate code.

      The templates for the Oracle Generative AI Cohere (text generation and summarization) and Llama (text summarization) are sorted under Generative AI in the Template list menu. The template for accessing Cohere directly is located under Other. To access the template that contains the starter code for other models, choose Custom (which is also located under Other).

  4. Click Create to generate the event handler code.
  5. After deployment completes, expand the service and then select the transformation handler to open its properties page, which lists the three LLM provider-CLMI transformation methods (transformRequestPayload, transformResponsePayload, and transformErrorResponsePayload).
  6. If you're creating a Cohere or Oracle Generative AI, service, the handler code is complete.
    If you're using the Custom template, click Edit to open the Edit Component and then update the following placeholder code with the provider-specific code:
    Method Location in Editor (Custom Template) Placeholder Code (Custom Template)
    transformRequestPayload Lines 23-25
    transformRequestPayload: async (event, context) => {
          return event.payload;
        },
    transformResponsePayload Lines 33-35
        transformResponsePayload: async (event, context) => {
          return event.payload;
        },
    transformErrorResponsePayload Lines 44-46
        transformErrorResponsePayload: async (event, context) => {
          return event.payload;
        }
  7. Verify the sytnax of your updates by clicking Validate. Fix any validation errors (if any), then click Save. Then click Close.

LLM Provider Transformation Code Samples

Azure OpenAI

Method Event Handler Transformation Code
Request
transformRequestPayload: async (event, context) => {
  let payload = { "model": "gpt-4-0314",
                    "messages": event.payload.messages.map(m => { return {"role": m.role, "content": m.content}; }),                     "max_tokens": event.payload.maxTokens,
                    "temperature": event.payload.temperature,
                    "stream": event.payload.streamResponse
                  };
  return payload;
},
Response (Non-streaming)
transformResponsePayload: async (event, context) => {
     let llmPayload = {};      
     if (event.payload.responseItems) {
       // streaming case
       llmPayload.responseItems = [];
       event.payload.responseItems
           .filter(item => item.choices.length > 0)
           .forEach(item => {
         llmPayload.responseItems.push({"candidates": item.choices.map( c => {return {"content": c.delta.content || "" };})});
       });
     } else {
        // non-streaming case
        llmPayload.candidates = event.payload.choices.map( c => {return {"content": c.message.content || "" };});
     } 
     return llmPayload;
   }
When streaming is enabled, the response transformation event handler is called in batches of 20 streamed messages. This batched array of streamed responses is stored under the responseItems key.
Error
transformErrorResponsePayload: async (event, context) => {
  let errorCode = 'unknown';
  if (event.payload.error) {
    if ( 'context_length_exceeded' === event.payload.error.code) {
      errorCode = 'modelLengthExceeded';
    }  else if ('content_filter' === event.payload.error.code) {
      errorCode = 'flagged'; 
    } 
    return {"errorCode" : errorCode, "errorMessage": event.payload.error.message};
  } else {
    return {"errorCode" : errorCode, "errorMessage": JSON.stringify(event.payload)};
  }   
}

Oracle Generative AI Service – Cohere

Method Event Handler Code
Request
transformRequestPayload: async (event, context) => {
      // Cohere doesn't support chat completions, so we first print the system prompt, and if there
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      if (event.payload.messages.length > 1) {
         let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}` , '');
         prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:`
      }
      // using Cohere
      let modelId = "cohere.command"
      let runtimeType = "COHERE";
       return {
        "compartmentId": event.compartmentId,
        "servingMode": {
          "servingType": "ON_DEMAND",
            "modelId": modelId
        },
        "inferenceRequest": {
          "runtimeType": runtimeType,
          "prompt": prompt,
          "isStream": event.payload.streamResponse,
          "maxTokens": event.payload.maxTokens,
          "temperature": event.payload.temperature,
          // parameters set to default values
          "frequencyPenalty": 0,
          "isEcho": false,
          "numGenerations": 1,
          "presencePenalty": 0,
          "returnLikelihoods": "NONE",
          "topK": 0,
          "topP": 0.75,
          "truncate": "NONE"
        }
      };
}
Response
transformResponsePayload: async (event, context) => {      
    let llmPayload = {};
    if (event.payload.responseItems) {
        // streaming case
        llmPayload.responseItems = [];
        event.payload.responseItems.forEach(item => {
          llmPayload.responseItems.push({"candidates": [{"content": item.text || "" }]});
        });
      } else {
        // non-streaming
        llmPayload.candidates = event.payload.inferenceResponse.generatedTexts.map( item => {return {"content": item.text || "" };});
      }
      return llmPayload;
 }
Error
transformErrorResponsePayload: async (event, context) => {      
      const error = event.payload.message || 'unknown error';
      if (error.startsWith('invalid request: total number of tokens')) {
        // returning modelLengthExceeded error code will cause a retry with reduced chat history
        return {"errorCode" : "modelLengthExceeded", "errorMessage": error};
      } else {
        return {"errorCode" : "unknown", "errorMessage": error};
      }
}

Oracle Generative AI - Llama

Method Event Handler Code
Request
transformRequestPayload: async (event, context) => {
      // Cohere doesn't support chat completions, so we first print the system prompt, and if there
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      if (event.payload.messages.length > 1) {
         let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}` , '');
         prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:`
      }
      // using Llama
      let modelId = "meta.llama-2-70b-chat"
      let runtimeType = "LLAMA";
       return {
        "compartmentId": event.compartmentId,
        "servingMode": {
          "servingType": "ON_DEMAND",
            "modelId": modelId
        },
        "inferenceRequest": {
          "runtimeType": runtimeType,
          "prompt": prompt,
          "isStream": event.payload.streamResponse,
          "maxTokens": event.payload.maxTokens,
          "temperature": event.payload.temperature,
          // parameters set to default values
          "frequencyPenalty": 0,
          "isEcho": false,
          "numGenerations": 1,
          "presencePenalty": 0,
          "returnLikelihoods": "NONE",
          "topK": 0,
          "topP": 0.75,
          "truncate": "NONE"
        }
      };
}
Response
transformResponsePayload: async (event, context) => {      
    let llmPayload = {};
    if (event.payload.responseItems) {
        // streaming case
        llmPayload.responseItems = [];
        event.payload.responseItems.forEach(item => {
          llmPayload.responseItems.push({"candidates": [{"content": item.text || "" }]});
        });
      } else {
        // non-streaming
        llmPayload.candidates = event.payload.inferenceResponse.choices.map( item => {return {"content": item.text || "" };});
      }
      return llmPayload;
 }
Error
transformErrorResponsePayload: async (event, context) => {      
      const error = event.payload.message || 'unknown error';
      if (error.startsWith('invalid request: total number of tokens')) {
        // returning modelLengthExceeded error code will cause a retry with reduced chat history
        return {"errorCode" : "modelLengthExceeded", "errorMessage": error};
      } else {
        return {"errorCode" : "unknown", "errorMessage": error};
      }
}

Oracle Generative AI - Llama

Method Event Handler Code
Request
transformRequestPayload: async (event, context) => {
      // Cohere doesn't support chat completions, so we first print the system prompt, and if there
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      if (event.payload.messages.length > 1) {
         let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}` , '');
         prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:`
      }
      let modelId = "cohere.command"
      return {
        "compartmentId": event.compartmentId,
        "servingMode": {
          "servingType": "ON_DEMAND",
          "modelId": modelId
        },
        "input" : prompt,
        "temperature": event.payload.temperature,
        // parameters set to default values
        "length": "AUTO",
        "extractiveness": "AUTO",
        "format": "PARAGRAPH",
        // natural language instructions
        "additionalCommand": "write in a conversational style"
      };
}
Response
transformResponsePayload: async (event, context) => {
      let llmPayload = {};
      // non-streaming only: streaming is not supported
      llmPayload.candidates = [{"content": event.payload.summary}];
      return llmPayload;
}
Error
transformErrorResponsePayload: async (event, context) => {             const error = event.payload.message ||
          'unknown error';      if(error.startsWith('invalid request:
          total number of tokens')) {        // returning modelLengthExceeded error
          code will cause a retry with reduced chat history        return{"errorCode": "modelLengthExceeded", "errorMessage": error};      }
          else{        return{"errorCode": "unknown", "errorMessage": error};      }}

Cohere (Command Model) – Direct Access to Cohere

The handlers in this transformation code support the /generate API and the associated Cohere.command model, not the /chat API that's used for the cohere.command.R model. If you wish to use the /chat endpoint instead, then you will need to manually update the request and response payloads in the generated code template.
Method Event Handler Code
Request
transformRequestPayload: async (event, context) => {            
      // Cohere doesn't support chat completions, so we first print the system prompt, and if there
      // are additional chat entries, we add these to the system prompt under the heading CONVERSATION HISTORY
      let prompt = event.payload.messages[0].content;
      if (event.payload.messages.length > 1) {
         let history = event.payload.messages.slice(1).reduce((acc, cur) => `${acc}\n${cur.role}: ${cur.content}` , '');
         prompt += `\n\nCONVERSATION HISTORY:${history}\nassistant:`
      }
      return {
        "max_tokens": event.payload.maxTokens,
        "truncate": "END",
        "return_likelihoods": "NONE",
        "prompt": prompt,
        "model": "command",
        "temperature": event.payload.temperature,
        "stream": event.payload.streamResponse
      };
 }
This handler manages the conversation history to maintain the conversation context.
Response
transformResponsePayload: async (event, context) => {
  let llmPayload = {};      
  if (event.payload.responseItems) {
        // streaming case
        llmPayload.responseItems = [];
        event.payload.responseItems.forEach(item => {
          llmPayload.responseItems.push({"candidates": [{"content": item.text || "" }]});
        });
      } else {
        // non-streaming
        llmPayload.candidates = event.payload.generations.map( item => {return {"content": item.text || "" };});
      }
   return llmPayload;
}
Error
transformErrorResponsePayload: async (event, context) => {      
    // NOTE: Depending on the Cohere version, this code might need to be updated
      const error = event.payload.message || 'unknown error';
      if (error.startsWith('invalid request: total number of tokens')) {
        // returning modelLengthExceeded error code will cause a retry with reduced chat history
        return {"errorCode" : "modelLengthExceeded", "errorMessage": error};
      } else {
        return {"errorCode" : "unknown", "errorMessage": error};
      }
 
}

The Common LLM Interface

Each LLM provider has its own format for its request and response payloads. The Common LLM Interface, or CLMI, enables the invokeLLM component to handle these proprietary request and response payloads.

The CMLI consists of the following:
  • A request body specification.
  • A success response body specification, applicable when the LLM REST call returns an HTTP 200 status.
  • An error response body specification, applicable when the LLM REST call returns an HTTP status other than 200 but the invocation of the LLM service was still successful.

    Note:

    For unsuccessful invocations, the invokeLLM component handles the 401 (not authorized) or 500 (internal server error) errors.

CLMI Request Body Specification

The JSON CLMI request body contains the following properties:
Property Type Default Description Required?
messages An array of message objects N/A A list of messages. The first message is the prompt with the role property set to system. If the LLM supports a multi-turn conversation so that the LLM response can be refined or enhanced, the subsequent messages will be pairs of messages from the user and assistant roles. The user message contains the follow-up instructions or question for the LLM. The assistant message contains the LLM response to the user message (the completion). If the LLM does not support multi-turn conversations, then the messages array will only contain a single system message holding the prompt. Yes
streamResponse boolean false Determines whether the LLM's response will be streamed back to the LLM component. Setting this property to true enhances the user experience, because streaming enables the LLM component to send partial response messages back to users so that they don't have to wait for the LLM to complete the response.

Set streamResponse to false when response validation is used. Because the entire message is required before validation can take place, the message may be rendered for users multiple times: first streamed then validated, then streamed again.

No
maxTokens integer 1024 The model generates tokens for the words in its results. Tokens can be thought of as pieces of words. 100 tokens equals about 75 words in English, for example. This property limits the size of the content generated by the model by setting the maximum number of tokens that it generates for the response. No
temperature number 0 The model uses temperature to gauge the randomness – and thus the creativity – of its responses. You set this as value ranging from 0 (predictable results) to 1 (more randomized results). 0 means that the model will send the same results to a given prompt. 1 means that the model's results to a give response can vary wildly. Use an event handler to apply a multiplier if the LLM provider supports a range other than 0-1. No
user string N/A A unique identifier representing your end user, which can be used for monitoring and detecting abusive language. No
providerExtension object N/A Enables LLM provider-specific configuration options that are not defined as part of CLMI. No

The Message Object Structure

Property Type Description Required?
role string The message creator. The values are system, user, and assistant. Yes
content string The message content Yes
turn integer A number that indicates the current refinement turn of the chat messages exchange. When the first prompt is sent to the LLM, the turn is 1. Yes
retry boolean A flag that indicates whether the message is sent to the LLM to correct an error in the response No (defaults to false)
tag string A custom tag that marks a specific prompt. If you're improving the LLM response using Recursive Criticism and Improvement (RCI), you can enable the custom logic in the validateResponsePayload handler to detect the current step of RCI process by setting the tag to "criticize" or "improve". No

Success Response Body Specification

Property Type Description Required?
candidates An array of candidate objects A list of candidate messages returned by the LLM Yes

Candidate Objects

The JSON CLMI request body contains the following properties:
Property Type Description Required?
content string The message content Yes

Error Response Body Specification

The JSON CLMI error response body contains the following properties:
Property Type Description Required?
errorCode String
  • notAuthorized: Indicates that the LLM request does not have the proper authorization key.
  • modelLengthExceeded: Indicates that the combination of request messages (the system prompt along with the user and assistant refinement messages) and the maximum number of tokens exceeds the model's token limit.
  • requestFlagged: Indicates that the LLM cannot fulfill the request because it violates the LLM provider's moderation policies. For example, requests that contain racist or sexually abusive content would be flagged.
  • responseFlagged: Indicates that the LLM response violates moderation policies of the LLM provider. For example, responses that contain toxic content, such as racist or sexually abusive language, would be flagged.
  • requestInvalid: Indicates that the request to the LLM is invalid. For example, the request is not valid because it failed some of the validation rules set in an event handler, or it's in a format that's not recognized by the LLM.
  • responseInvalid: Indicates that the LLM-generated response is invalid. For example, the response is not valid because it failed some of the validation rules defined in a validation event handler.
  • unknown: When other errors occur.
Yes
errorMessage String The LLM provider-specific error message. This might be a stringified JSON object. Yes