Add a call transcript, show suggestions and summarizations in Twilio
You can either use the following steps to perform your setup or you can quickly try it out by downloading a compressed file which has the code for adding the transcripts, and shows the suggestions and summarizations described in this topic. See How to start your OJET application from a compressed filefor more information.
You can configure your Twilio setup to send the live raw audio streams of phone calls to a specific destination. From there, you can use any real-time speech SDK to convert the audio stream into text. This text is then used to render in the Fusion.
Here's a high level view of the architecture:
INSERT GRAPHIC
Here's an overview of the diagram:
- Configure your Twilio App to enable the streams
- Initialize the real time-transcription service for real-time transcription
- Pass the real time audio stream to the
realtime-transcription
service - Call the
feedLiveTranscript
API to render the transcripts in Fusion
Configure your Twilio App to enable the streams
Streams should be enabled from the quick start application you've built in Introduction to Gen AI and CTI.
You can enable the streams from voiceResponse
function in handler.js
file by adding the following code after initializing the
voiceResponse
function:
const start = twiml.start();
start.stream({
name: 'Example Audio Stream',
url: 'wss://twilio-node-voice-stream.onrender.com/',
track: 'both_tracks'
});
For more information on Twilio streams, see https://www.twilio.com/docs/voice/twiml/stream.
You also need to add the logic to forward the audio stream to media toolbar application from your quick start application. This is done by way of websockets.
For this, in index.js file of your quick start application, initialize a websocket as shown in the following example:
//...
const WebSocket = require('ws');
//...
// const server = http.createServer(app);
const wss = new WebSocket.Server({server});
var callSidWebSocketMap = new Map();
var streamSidCallSidMap = new Map();
//...
wss.on("connection", function connection(ws) {
console.log("New Connection Initiated");
ws.on("message", function incoming(message) {
const msg = JSON.parse(message);
switch (msg.event) {
case "registerClient":
console.log('New client registered: ' + JSON.stringify(msg));
callSidWebSocketMap.set(msg.accountSid, ws);
break;
case "connected":
console.log(`A new call has connected.`);
break;
case "start":
console.log('New stream started: ' + JSON.stringify(msg));
streamSidCallSidMap.set(msg.start.streamSid, msg.start.accountSid)
break;
case "media":
var twilioData;
if (msg?.media?.track === 'outbound') {
twilioData = "{\"event\":\"media\",\"streamId\":\""+msg?.streamSid+"\",\"from\":\"agent\",\"payload\":\"" + msg?.media?.payload + "\"}";
} else {
twilioData = "{\"event\":\"media\",\"streamId\":\""+msg?.streamSid+"\",\"payload\":\"" + msg?.media?.payload + "\"}";
}
const clientWebSocket = callSidWebSocketMap.get(streamSidCallSidMap.get(msg.streamSid));
if (clientWebSocket) {
console.log(`Sending media payload to clientWebSocket`);
clientWebSocket.send(twilioData);
} else {
console.log(`Client not found`);
}
console.log(twilioData);
break;
case "stop":
console.log('Call Has Ended' + JSON.stringify(msg));
callSidWebSocketMap.get(msg?.stop?.accountSid)?.close();
break;
}
});
});
//...
Initialize the realtime-transcription service for real-time transcription
- Download the real-time speech transcription app, and extract the contents to a directory.
- Extract the contents to a directory. In
index.ts
file, update thecompartmentId
variable with your compartment ID having the OCI speech service. - Update
config.js
file with your OCI configuration. - Open the
oci-speech
directory in your terminal and run thenpm install
command. - Build the project with the
npm run build
command. - Start the service by running the
npm run start
command.The websocket for the
oci-speech
service is now running. Now you just need to send the audio stream coming from Twilio to this websocket to get the transcription results.
Pass the realtime audio stream to realtime-transcription service
In your vendorHandler.ts
file, initialize the following class
variables:
export class VendorHandler implements ICtiVendorHandler {
// ...
private static messageIds: string[] = [];
private static TWILIO_SERVICE: string = 'https://twilio-node-voice-stream.onrender.com'; // Your quick-start application URL
private static RT_SPEECH_SERVICE: string = 'wss://phoenix339284.appsdev.fusionappsdphx1.oraclevcn.com:8004/'; // Your real-time speech transcription service URL
private static TWILIO_WS_URL: string = 'wss://twilio-node-voice-stream.onrender.com/'; // Your quick-start application WebSocket URL
private static transcriptionServerWsForAgent: WebSocket;
// ...
}
Define the following functions:
// This function is the entry point for transcription
private initTranscription(accountSid: string): void {
VendorHandler.transcriptionServerWsForAgent = new WebSocket(VendorHandler.RT_SPEECH_SERVICE); // WebSocket connection to real-time speech transcription service
let self = this;
// 1. Initialize WebSocket connection to real-time speech transcription service
VendorHandler.transcriptionServerWsForAgent.addEventListener("open", (event) => {
// 2. Once the WebSocket connection to realtime speech transcript service is success,
// Initialize WebSocket connection to Twilio to get the audio stream
this.initializeWebsocketConnectionToTwilio(accountSid);
});
// 3. The transcription results from the real
VendorHandler.transcriptionServerWsForAgent.addEventListener("message", async (event) => {
await this.handleTranscriptResponseFromSpeechService(event, self);
});
}
// This function initializes the WebSocket connection to Twilio
private initializeWebsocketConnectionToTwilio(accountSid: string): void {
let twilioServerWs: WebSocket = new WebSocket(VendorHandler.TWILIO_WS_URL);
// 2.1. Initialize WebSocket connection to your Twilio quick-start application
twilioServerWs.addEventListener("open", (event) => {
// 2.2. Once the WebSocket connection to Twilio is success,
// Send registerClient message to Twilio to register the client for transcription
twilioServerWs.send(JSON.stringify({"event":"registerClient", "accountSid": accountSid}));
});
twilioServerWs.addEventListener("message", async (event) => {
// 2.3. Here, you will receive the audio stream payload and you need to pass this to
// your real-time speech service websocket for getting the transcript results.
this.handleAudioStreamFromTwilio(event);
});
twilioServerWs.addEventListener("error", (err) => {
console.log("Message from server ", err);
});
}
// This function forwards the audio stream to real-time transcript service
private handleAudioStreamFromTwilio(event: any): void {
const msg = JSON.parse(event.data);
switch (msg.event) {
case "media":
// 2.3.1. Send the audio stream to your real-time transcript service in a specific format as returned from generatePayloadForTranscriptServer function
VendorHandler.transcriptionServerWsForAgent.send(JSON.stringify(this.generatePayloadForTranscriptServer(msg)));
break;
}
}
// This function generates the payload to transcription function in a specific format
private generatePayloadForTranscriptServer(message: any): any {
return {
"callId": message.streamId,
"role": message.from === 'agent' ? 'AGENT' : 'END_USER',
"message": message.payload
}
}
// This function handles the results generated from the real-time speech transcript service
private async handleTranscriptResponseFromSpeechService(event: any, self: any): Promise<void> {
let state = "STARTED";
const responseFromServer = JSON.parse(event.data);
const role: string = responseFromServer.role == 'AGENT' ? 'AGENT' : 'END_USER'
if (responseFromServer.final) {
state = "CLOSED"
} else {
if (VendorHandler.messageIds.includes(responseFromServer?.messageId)) {
state = "INPROGRESS";
} else {
VendorHandler.messageIds.push(responseFromServer?.messageId)
}
}
// Invoke UEF API to add the transcript message to the engagement panel.
await self.integrationEventsHandler.addRealTimeTranscript(responseFromServer?.messageId, responseFromServer?.transcript, role, state);
}
Invoke the initTranscription function from incoming and outgoing event handlers:
public incomingCallCallback = (call: Call) => {
this.initTranscription(call.parameters.AccountSid);
//...
}
public async makeOutboundCall(phoneNumber: string, eventId: string) {
//...
// if (this.device) {
// this.call = await this.device.connect({ params });
this.initTranscription(this.call.parameters.AccountSid);
//...
// }
}
Complete Code
Here's the complete code of the vendorHandler.ts
file for accepting
an incoming call:
import { ICtiVendorHandler } from './ICtiVendorHandler';
import { Device, Call } from '@twilio/voice-sdk';
import {IntegrationEventsHandler} from "../integrationEventsHandler";
export class VendorHandler implements ICtiVendorHandler {
private twilio: any;
private device: Device | null;
private integrationEventsHandler: IntegrationEventsHandler;
private call: Call | null;
public idAndToken: any;
private static messageIds: string[] = [];
private static TWILIO_SERVICE: string = 'https://twilio-node-voice-stream.onrender.com';
private static RT_SPEECH_SERVICE: string = 'wss://phoenix339284.appsdev.fusionappsdphx1.oraclevcn.com:8004/';
private static TWILIO_WS_URL: string = 'wss://twilio-node-voice-stream.onrender.com/';
private static transcriptionServerWsForAgent: WebSocket;
constructor(integrationEventsHandler: IntegrationEventsHandler) {
this.twilio = (window as any).Twilio;
this.device = null;
this.idAndToken = null;
this.integrationEventsHandler = integrationEventsHandler;
this.call = null;
}
public async makeAgentAvailable(): Promise<void> {
this.idAndToken = await this.getIdAndToken();
this.device = new this.twilio.Device(this.idAndToken.token, {
logLevel: 1,
codecPreferences: ["opus", "pcmu"],
enableRingingState: true
});
let resolve: Function;
let reject: Function;
if (this.device) {
this.device.on("registered", () => {
console.log("Registration completed ...")
resolve();
});
this.device.on("error", (deviceError: any) => {
console.error("Registration Failed ...", deviceError);
reject();
});
this.device.on("incoming", this.incomingCallCallback);
this.device.register();
}
return new Promise((res: Function, rej: Function) => {
resolve = res;
reject = rej;
});
}
public async makeAgentUnavailable() {
throw new Error('Method not implemented.');
}
public async makeOutboundCall(phoneNumber: string, eventId: string) {
const params = {
To: phoneNumber,
};
if (this.device) {
this.call = await this.device.connect({ params });
this.initTranscription(this.call.parameters.AccountSid);
this.call.on("accept", () => { this.integrationEventsHandler.outboundCallAcceptedHandler(eventId) });
this.call.on("disconnect", () => { this.integrationEventsHandler.callHangupHandler(eventId) });
this.call.on("cancel", () => { this.integrationEventsHandler.callRejectedHandler(eventId) });
}
}
public async acceptCall() {
if (this.call) {
this.call.accept();
}
}
public async rejectCall() {
if (this.call) {
this.call.reject();
}
}
public async hangupCall() {
if (this.call) {
this.call.disconnect();
}
}
private initializeWebsocketConnectionToTwilio(accountSid: string): void {
let twilioServerWs: WebSocket = new WebSocket(VendorHandler.TWILIO_WS_URL);
twilioServerWs.addEventListener("open", (event) => {
twilioServerWs.send(JSON.stringify({"event":"registerClient", "accountSid": accountSid}));
});
twilioServerWs.addEventListener("message", async (event) => {
this.handleAudioStreamFromTwilio(event);
});
twilioServerWs.addEventListener("error", (err) => {
console.log("Message from server ", err);
});
}
private handleAudioStreamFromTwilio(event: any): void {
const msg = JSON.parse(event.data);
switch (msg.event) {
case "media":
VendorHandler.transcriptionServerWsForAgent.send(JSON.stringify(this.generatePayloadForTranscriptServer(msg)));
break;
}
}
private generatePayloadForTranscriptServer(message: any): any {
return {
"callId": message.streamId,
"role": message.from === 'agent' ? 'AGENT' : 'END_USER',
"message": message.payload
}
}
private async handleTranscriptResponseFromSpeechService(event: any, self: any): Promise<void> {
let state = "STARTED";
const responseFromServer = JSON.parse(event.data);
const role: string = responseFromServer.role == 'AGENT' ? 'AGENT' : 'END_USER'
if (responseFromServer.final) {
state = "CLOSED"
} else {
if (VendorHandler.messageIds.includes(responseFromServer?.messageId)) {
state = "INPROGRESS";
} else {
VendorHandler.messageIds.push(responseFromServer?.messageId)
}
}
await self.integrationEventsHandler.addRealTimeTranscript(responseFromServer?.messageId, responseFromServer?.transcript, role, state);
}
private initTranscription(accountSid: string): void {
VendorHandler.transcriptionServerWsForAgent = new WebSocket(VendorHandler.RT_SPEECH_SERVICE);
let self = this;
VendorHandler.transcriptionServerWsForAgent.addEventListener("open", (event) => {
this.initializeWebsocketConnectionToTwilio(accountSid);
});
VendorHandler.transcriptionServerWsForAgent.addEventListener("message", async (event) => {
await this.handleTranscriptResponseFromSpeechService(event, self);
});
}
private async getIdAndToken(): Promise<any> {
const headers: Headers = (new Headers()) as Headers;
const url: string = `${VendorHandler.TWILIO_SERVICE}/token`; // Replace this url with the url of the deployed node app
headers.set('Accept', 'application/json');
const request: Request = new Request(url, {
method: 'GET',
headers: headers
}) as Request;
const idAndToken: Response = await fetch(request);
this.idAndToken = await idAndToken.json();
return this.idAndToken;
}
public incomingCallCallback = (call: Call) => {
this.initTranscription(call.parameters.AccountSid);
this.integrationEventsHandler.incomingCallHandler(call.parameters.From, call.parameters.CallSid);
this.call = call;
this.call.on("cancel", () => { this.integrationEventsHandler.callRejectedHandler(call.parameters.CallSid) });
this.call.on("disconnect", () => { this.integrationEventsHandler.callHangupHandler(call.parameters.CallSid) });
this.call.on("reject", () => { this.integrationEventsHandler.callRejectedHandler(call.parameters.CallSid) });
}
public async sendTextMessage(suggestionData: IMcaOnToolbarInteractionCommandData, resolveRef: Function): Promise<void> {
var myHeaders = new Headers();
myHeaders.append("Authorization", 'Basic QUM0NzJjNjZmYTU0ZTRiNzNhYWExZTg1Yzk4Nzc1YmRjZjo3Mzg5NjlkYzBkMjNjMTVhMGEwNzE1NDY0N2ZiNjNhYg=='); // Add your authorization header here
myHeaders.append("Content-Type", "application/x-www-form-urlencoded");
var urlencoded = new URLSearchParams();
urlencoded.append("To", this.call?.parameters.From || '');
urlencoded.append("From", "+13087374071");// Your TWILIO Number
urlencoded.append("Body", suggestionData.inData.metadata.originalSuggestionText + " Please refer: " + suggestionData.inData.metadata.externalUrl);
var requestOptions: any = {
method: 'POST',
headers: myHeaders,
body: urlencoded,
redirect: 'follow'
};
fetch(`${VendorHandler.TWILIO_SERVICE}`, requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));
}
}
Verify your progress
Once you complete these steps, use OJET serve to start you application and sign in to your Fusion application. Open the media toolbar and make your agent available for calls by clicking on the agent availability button. Now, start a call to your customer care number. You'll receive the incoming call notification in your media toolbar application and in your Fusion window. You can accept the call from your media toolbar application or from your Fusion application. Once the conversation is started, you can see the real-time transcripts are getting rendered in the Fusion engagement panel.