Add a call transcript, show suggestions and summarizations in Twilio

You can either use the following steps to perform your setup or you can quickly try it out by downloading a compressed file which has the code for adding the transcripts, and shows the suggestions and summarizations described in this topic. See How to start your OJET application from a compressed filefor more information.

You can configure your Twilio setup to send the live raw audio streams of phone calls to a specific destination. From there, you can use any real-time speech SDK to convert the audio stream into text. This text is then used to render in the Fusion.

Here's a high level view of the architecture:

INSERT GRAPHIC

Here's an overview of the diagram:

  1. Configure your Twilio App to enable the streams
  2. Initialize the real time-transcription service for real-time transcription
  3. Pass the real time audio stream to the realtime-transcription service
  4. Call the feedLiveTranscript API to render the transcripts in Fusion

Configure your Twilio App to enable the streams

Streams should be enabled from the quick start application you've built in Introduction to Gen AI and CTI.

You can enable the streams from voiceResponse function in handler.js file by adding the following code after initializing the voiceResponse function:

const start = twiml.start();
  start.stream({
    name: 'Example Audio Stream',
    url: 'wss://twilio-node-voice-stream.onrender.com/',
    track: 'both_tracks'
  });

For more information on Twilio streams, see https://www.twilio.com/docs/voice/twiml/stream.

You also need to add the logic to forward the audio stream to media toolbar application from your quick start application. This is done by way of websockets.

For this, in index.js file of your quick start application, initialize a websocket as shown in the following example:

//...
const WebSocket = require('ws');
//...
// const server = http.createServer(app);
const wss = new WebSocket.Server({server});
var callSidWebSocketMap = new Map();
var streamSidCallSidMap = new Map();
 
//...
wss.on("connection", function connection(ws) {
  console.log("New Connection Initiated");
 
  ws.on("message", function incoming(message) {
    const msg = JSON.parse(message);
    switch (msg.event) {
      case "registerClient":
          console.log('New client registered: ' + JSON.stringify(msg));
          callSidWebSocketMap.set(msg.accountSid, ws);
          break;
      case "connected":
        console.log(`A new call has connected.`);
        break;
      case "start":
          console.log('New stream started: ' + JSON.stringify(msg));
          streamSidCallSidMap.set(msg.start.streamSid, msg.start.accountSid)
          break;
      case "media":
          var twilioData;
          if (msg?.media?.track === 'outbound') {
              twilioData = "{\"event\":\"media\",\"streamId\":\""+msg?.streamSid+"\",\"from\":\"agent\",\"payload\":\"" + msg?.media?.payload + "\"}";
          } else {
              twilioData = "{\"event\":\"media\",\"streamId\":\""+msg?.streamSid+"\",\"payload\":\"" + msg?.media?.payload + "\"}";
          }
          const clientWebSocket = callSidWebSocketMap.get(streamSidCallSidMap.get(msg.streamSid));
          if (clientWebSocket) {
              console.log(`Sending media payload to clientWebSocket`);
              clientWebSocket.send(twilioData);
          } else {
              console.log(`Client not found`);
          }
          console.log(twilioData);
          break;
      case "stop":
          console.log('Call Has Ended' + JSON.stringify(msg));
          callSidWebSocketMap.get(msg?.stop?.accountSid)?.close();
          break;
    }
  });
});
//...

Initialize the realtime-transcription service for real-time transcription

  1. Download the real-time speech transcription app, and extract the contents to a directory.
  2. Extract the contents to a directory. In index.ts file, update the compartmentId variable with your compartment ID having the OCI speech service.
  3. Update config.js file with your OCI configuration.
  4. Open the oci-speech directory in your terminal and run the npm install command.
  5. Build the project with the npm run build command.
  6. Start the service by running the npm run start command.

    The websocket for the oci-speech service is now running. Now you just need to send the audio stream coming from Twilio to this websocket to get the transcription results.

Pass the realtime audio stream to realtime-transcription service

In your vendorHandler.ts file, initialize the following class variables:

export class VendorHandler implements ICtiVendorHandler {
    // ...
    private static messageIds: string[] = [];
    private static TWILIO_SERVICE: string = 'https://twilio-node-voice-stream.onrender.com'; // Your quick-start application URL
    private static RT_SPEECH_SERVICE: string = 'wss://phoenix339284.appsdev.fusionappsdphx1.oraclevcn.com:8004/'; // Your real-time speech transcription service URL
    private static TWILIO_WS_URL: string = 'wss://twilio-node-voice-stream.onrender.com/'; // Your quick-start application WebSocket URL
    private static transcriptionServerWsForAgent: WebSocket;
    // ...
}

Define the following functions:

// This function is the entry point for transcription
private initTranscription(accountSid: string): void {
    VendorHandler.transcriptionServerWsForAgent = new WebSocket(VendorHandler.RT_SPEECH_SERVICE); // WebSocket connection to real-time speech transcription service
    let self = this;
    // 1. Initialize WebSocket connection to real-time speech transcription service
    VendorHandler.transcriptionServerWsForAgent.addEventListener("open", (event) => {
        // 2. Once the WebSocket connection to realtime speech transcript service is success,
        // Initialize WebSocket connection to Twilio to get the audio stream
        this.initializeWebsocketConnectionToTwilio(accountSid);
    });
 
    // 3. The transcription results from the real
    VendorHandler.transcriptionServerWsForAgent.addEventListener("message", async (event) => {
        await this.handleTranscriptResponseFromSpeechService(event, self);
    });
}
 
// This function initializes the WebSocket connection to Twilio
private initializeWebsocketConnectionToTwilio(accountSid: string): void {
    let twilioServerWs: WebSocket = new WebSocket(VendorHandler.TWILIO_WS_URL);
    // 2.1. Initialize WebSocket connection to your Twilio quick-start application
    twilioServerWs.addEventListener("open", (event) => {
        // 2.2. Once the WebSocket connection to Twilio is success,
        // Send registerClient message to Twilio to register the client for transcription 
        twilioServerWs.send(JSON.stringify({"event":"registerClient", "accountSid": accountSid}));
    });
    twilioServerWs.addEventListener("message", async (event) => {
        // 2.3. Here, you will receive the audio stream payload and you need to pass this to
        // your real-time speech service websocket for getting the transcript results.
        this.handleAudioStreamFromTwilio(event);
    });
    twilioServerWs.addEventListener("error", (err) => {
        console.log("Message from server ", err);
    });
}
 
// This function forwards the audio stream to real-time transcript service
private handleAudioStreamFromTwilio(event: any): void {
    const msg = JSON.parse(event.data);
    switch (msg.event) {
    case "media":
        // 2.3.1. Send the audio stream to your real-time transcript service in a specific format as returned from generatePayloadForTranscriptServer function
        VendorHandler.transcriptionServerWsForAgent.send(JSON.stringify(this.generatePayloadForTranscriptServer(msg)));
        break;
    }
}
 
// This function generates the payload to transcription function in a specific format
private generatePayloadForTranscriptServer(message: any): any {
    return {
        "callId": message.streamId,
        "role": message.from === 'agent' ? 'AGENT' : 'END_USER',
        "message": message.payload
    }
}
 
// This function handles the results generated from the real-time speech transcript service
private async handleTranscriptResponseFromSpeechService(event: any, self: any): Promise<void> {
    let state = "STARTED";
    const responseFromServer = JSON.parse(event.data);
    const role: string = responseFromServer.role == 'AGENT' ? 'AGENT' : 'END_USER'
    if (responseFromServer.final) {
        state = "CLOSED"
    } else {
        if (VendorHandler.messageIds.includes(responseFromServer?.messageId))  {
            state = "INPROGRESS";
        } else {
            VendorHandler.messageIds.push(responseFromServer?.messageId)
        }
    }
    // Invoke UEF API to add the transcript message to the engagement panel.
    await self.integrationEventsHandler.addRealTimeTranscript(responseFromServer?.messageId, responseFromServer?.transcript, role, state);
}

Invoke the initTranscription function from incoming and outgoing event handlers:

public incomingCallCallback = (call: Call) => {
    this.initTranscription(call.parameters.AccountSid);
    //...
}
 
public async makeOutboundCall(phoneNumber: string, eventId: string) {
    //...
    // if (this.device) {
        // this.call = await this.device.connect({ params });
        this.initTranscription(this.call.parameters.AccountSid);
        //...
    // }
}

Complete Code

Here's the complete code of the vendorHandler.ts file for accepting an incoming call:

import { ICtiVendorHandler } from './ICtiVendorHandler';
import { Device, Call } from '@twilio/voice-sdk';
import {IntegrationEventsHandler} from "../integrationEventsHandler";
 
export class VendorHandler implements ICtiVendorHandler {
    private twilio: any;
    private device: Device | null;
    private integrationEventsHandler: IntegrationEventsHandler;
    private call: Call | null;
    public idAndToken: any;
    private static messageIds: string[] = [];
    private static TWILIO_SERVICE: string = 'https://twilio-node-voice-stream.onrender.com';
    private static RT_SPEECH_SERVICE: string = 'wss://phoenix339284.appsdev.fusionappsdphx1.oraclevcn.com:8004/';
    private static TWILIO_WS_URL: string = 'wss://twilio-node-voice-stream.onrender.com/';
    private static transcriptionServerWsForAgent: WebSocket;
 
    constructor(integrationEventsHandler: IntegrationEventsHandler) {
        this.twilio = (window as any).Twilio;
        this.device = null;
        this.idAndToken = null;
        this.integrationEventsHandler = integrationEventsHandler;
        this.call = null;
    }
 
    public async makeAgentAvailable(): Promise<void> {
        this.idAndToken = await this.getIdAndToken();
        this.device = new this.twilio.Device(this.idAndToken.token, {
            logLevel: 1,
            codecPreferences: ["opus", "pcmu"],
            enableRingingState: true
        });
        let resolve: Function;
        let reject: Function;
        if (this.device) {
            this.device.on("registered", () => {
                console.log("Registration completed ...")
                resolve();
            });
            this.device.on("error", (deviceError: any) => {
                console.error("Registration Failed ...", deviceError);
                reject();
            });
            this.device.on("incoming", this.incomingCallCallback);
            this.device.register();
        }
        return new Promise((res: Function, rej: Function) => {
            resolve = res;
            reject = rej;
        });
    }
    public async makeAgentUnavailable() {
        throw new Error('Method not implemented.');
    }
    public async makeOutboundCall(phoneNumber: string, eventId: string) {
        const params = {
            To: phoneNumber,
        };
        if (this.device) {
            this.call = await this.device.connect({ params });
            this.initTranscription(this.call.parameters.AccountSid);
            this.call.on("accept", () => { this.integrationEventsHandler.outboundCallAcceptedHandler(eventId) });
            this.call.on("disconnect", () => { this.integrationEventsHandler.callHangupHandler(eventId) });
            this.call.on("cancel", () => { this.integrationEventsHandler.callRejectedHandler(eventId) });
        }
    }
    public async acceptCall() {
        if (this.call) {
            this.call.accept();
        }
    }
    public async rejectCall() {
        if (this.call) {
            this.call.reject();
        }
    }
    public async hangupCall() {
        if (this.call) {
            this.call.disconnect();
        }
    }
 
    private initializeWebsocketConnectionToTwilio(accountSid: string): void {
        let twilioServerWs: WebSocket = new WebSocket(VendorHandler.TWILIO_WS_URL);
        twilioServerWs.addEventListener("open", (event) => {
            twilioServerWs.send(JSON.stringify({"event":"registerClient", "accountSid": accountSid}));
        });
        twilioServerWs.addEventListener("message", async (event) => {
            this.handleAudioStreamFromTwilio(event);
        });
        twilioServerWs.addEventListener("error", (err) => {
            console.log("Message from server ", err);
        });
    }
 
    private handleAudioStreamFromTwilio(event: any): void {
        const msg = JSON.parse(event.data);
        switch (msg.event) {
            case "media":
                VendorHandler.transcriptionServerWsForAgent.send(JSON.stringify(this.generatePayloadForTranscriptServer(msg)));
                break;
        }
    }
 
    private generatePayloadForTranscriptServer(message: any): any {
        return {
            "callId": message.streamId,
            "role": message.from === 'agent' ? 'AGENT' : 'END_USER',
            "message": message.payload
        }
    }
 
    private async handleTranscriptResponseFromSpeechService(event: any, self: any): Promise<void> {
        let state = "STARTED";
        const responseFromServer = JSON.parse(event.data);
        const role: string = responseFromServer.role == 'AGENT' ? 'AGENT' : 'END_USER'
        if (responseFromServer.final) {
            state = "CLOSED"
        } else {
            if (VendorHandler.messageIds.includes(responseFromServer?.messageId))  {
                state = "INPROGRESS";
            } else {
                VendorHandler.messageIds.push(responseFromServer?.messageId)
            }
        }
        await self.integrationEventsHandler.addRealTimeTranscript(responseFromServer?.messageId, responseFromServer?.transcript, role, state);
    }
 
 
    private initTranscription(accountSid: string): void {
        VendorHandler.transcriptionServerWsForAgent = new WebSocket(VendorHandler.RT_SPEECH_SERVICE);
        let self = this;
        VendorHandler.transcriptionServerWsForAgent.addEventListener("open", (event) => {
            this.initializeWebsocketConnectionToTwilio(accountSid);
        });
 
        VendorHandler.transcriptionServerWsForAgent.addEventListener("message", async (event) => {
            await this.handleTranscriptResponseFromSpeechService(event, self);
        });
    }
 
    private async getIdAndToken(): Promise<any> {
        const headers: Headers = (new Headers()) as Headers;
        const url: string = `${VendorHandler.TWILIO_SERVICE}/token`; // Replace this url with the url of the deployed node app
        headers.set('Accept', 'application/json');
        const request: Request = new Request(url, {
            method: 'GET',
            headers: headers
        }) as Request;
        const idAndToken: Response = await fetch(request);
        this.idAndToken = await idAndToken.json();
        return this.idAndToken;
    }
 
    public incomingCallCallback = (call: Call) => {
        this.initTranscription(call.parameters.AccountSid);
        this.integrationEventsHandler.incomingCallHandler(call.parameters.From, call.parameters.CallSid);
        this.call = call;
        this.call.on("cancel", () => { this.integrationEventsHandler.callRejectedHandler(call.parameters.CallSid) });
        this.call.on("disconnect", () => { this.integrationEventsHandler.callHangupHandler(call.parameters.CallSid) });
        this.call.on("reject", () => { this.integrationEventsHandler.callRejectedHandler(call.parameters.CallSid) });
    }
    public async sendTextMessage(suggestionData: IMcaOnToolbarInteractionCommandData, resolveRef: Function): Promise<void> {
        var myHeaders = new Headers();
        myHeaders.append("Authorization", 'Basic QUM0NzJjNjZmYTU0ZTRiNzNhYWExZTg1Yzk4Nzc1YmRjZjo3Mzg5NjlkYzBkMjNjMTVhMGEwNzE1NDY0N2ZiNjNhYg=='); // Add your authorization header here
        myHeaders.append("Content-Type", "application/x-www-form-urlencoded");
 
        var urlencoded = new URLSearchParams();
        urlencoded.append("To", this.call?.parameters.From || '');
        urlencoded.append("From", "+13087374071");// Your TWILIO Number
        urlencoded.append("Body", suggestionData.inData.metadata.originalSuggestionText + " Please refer: " + suggestionData.inData.metadata.externalUrl);
 
        var requestOptions: any = {
            method: 'POST',
            headers: myHeaders,
            body: urlencoded,
            redirect: 'follow'
        };
 
        fetch(`${VendorHandler.TWILIO_SERVICE}`, requestOptions)
            .then(response => response.text())
            .then(result => console.log(result))
            .catch(error => console.log('error', error));
    }
}

Verify your progress

Once you complete these steps, use OJET serve to start you application and sign in to your Fusion application. Open the media toolbar and make your agent available for calls by clicking on the agent availability button. Now, start a call to your customer care number. You'll receive the incoming call notification in your media toolbar application and in your Fusion window. You can accept the call from your media toolbar application or from your Fusion application. Once the conversation is started, you can see the real-time transcripts are getting rendered in the Fusion engagement panel.