Implement a web-based user interface for interacting with Oracle Cloud Infrastructure Generative AI Agents

Use Oracle Cloud Infrastructure Generative AI (OCI Generative AI) Agents to implement an interactive web interface which allows users to engage in real-time conversations.

OCI Generative AI does not provide a user interface outside the OCI console and users want to consume the API to integrate it into their web projects. This reference architecture showcases a web application that consumes OCI Generative AI Agents which is also integrated with real time Speech to Text and Text to Speech in order to provide a complete experience without leaving the client's tenant data. This solution involves a virtual machine that connects the web application and the OCI Speech service through web sockets.

Using OCI Speech real-time transcription, everything the user says is instantly converted into text and processed by the AI agent. The AI agent then generates a response, which is not only displayed on the screen but also spoken back to the user through OCI Speech Text to Speech capability. This creates a fully immersive, natural, and dynamic interaction, ideal for customer service, virtual assistants, and conversational AI solutions.

Architecture

This reference architecture is built around Oracle Visual Builder as the front-end interface, which seamlessly integrates with OCI Generative AI Agents and OCI Speech.

  1. Oracle Visual Builder sends user inputs to OCI Speech via the VM Bridge with Python SDK.
  2. The text-to-speech feature in OCI Speech lets you synthesize human-like speech from text across applications. This feature enables customer conversations, multi-language voice translations, and improved accessibility. Choose from a variety of voices to enhance interactions.
  3. Oracle Visual Builder handles user interactions, sending user inputs to the OCI Generative AI Agents via REST APIs and displaying the agent's responses in real time.

    OCI Generative AI Agents use OCI Generative AI behind the scenes to enable access to pretrained foundational models from Cohere and Meta. It supports dedicated AI clusters with private GPUs for stable, high-performance production workloads, including hosting and fine-tuning.

    The Chat API and Playground provide an interactive chat experience with Cohere and Meta models via the OCI console or API. LangChain integration allows flexible development of OCI Generative AI applications, while LlamaIndex integration enables building RAG solutions with custom data sources. For operations, OCI Generative AI includes content moderation controls and will soon support model endpoint swapping with zero downtime, as well as activation and deactivation features. It also provides analytics on model usage, including call statistics, tokens processed, and error counts.

  4. For voice output, OCI Speech Text-to-Speech (TTS) service converts the agent's responses into spoken audio, enhancing the user experience.

The following diagram illustrates this reference architecture.



oci-genai-speech-arch-oracle.zip

The architecture has the following components:

  • Compute

    With Oracle Cloud Infrastructure Compute, you can provision and manage compute hosts in the cloud. You can launch compute instances with shapes that meet your resource requirements for CPU, memory, network bandwidth, and storage. After creating a compute instance, you can access it securely, restart it, attach and detach volumes, and terminate it when you no longer need it.

  • OCI Speech

    OCI Speech is one of the several cloud-native AI services. You can use the OCI Speech service to convert audio files to readable text that is stored in JSON format.

    OCI Speech harnesses the power of spoken language by allowing you to easily convert audio files containing human speech into highly accurate text transcriptions. The service is an OCI native application that you can access using a web application, REST API, SDK, CLI, or Console.

    OCI Speech uses automatic speech recognition (ASR) technology to provide a grammatically correct transcription of video and audio files. OCI Speech handles low-fidelity audio recordings and transcribes challenging recordings like meetings or call center calls. Using Speech, you can turn files stored in OCI Object Storage or a data asset into accurate, normalized, timestamped, and profanity-filtered text. This functionality is available with downstream services. For example, you could use additional services such as language and forecasting to analyze call sentiment, target content for advertising, index your media folders and create a media search engine using Oracle Cloud Infrastructure Lakehouse.

  • OCI Generative AI agents

    OCI Generative AI agents is a fully managed service that combines the power of large language models (LLMs) with an intelligent retrieval system aimed at creating contextually relevant answers by searching your knowledge base.

    OCI Generative AI Agents supports several ways to on-board your data where you and your customers can interact with your data using a chat interface or API.

    • Supports several data on-boarding methods and interaction channels (chat interface or API).
    • Creates contextually relevant answers by searching your knowledge base.
    • Provides source attribution for every answer.
    • Offers hybrid search capabilities (lexical and semantic). Includes content moderation options for input and output.
    • Supports multi-turn conversations, where users can ask follow-up questions and receive answers that consider the context of previous questions and answers.
    • Can interpret data from two-axis charts and reference tables in a PDF, without needing explicit descriptions of the visual elements.
    • All hyperlinks present in PDF documents are extracted and displayed as hyperlinks in the chat response.
  • Oracle Visual Builder

    Oracle Visual Builder is an intuitive development experience on top of a development and hosting platform that empowers you to create engaging responsive applications. Focusing on ease of use and a visual development approach, it provides an easy way for you to create applications that are hosted in Oracle’s secure and scalable cloud platform.

    Visual Development Experience

    Oracle Visual Builder provides simple but powerful visual development tools to create responsive apps, all without the need to install any additional software. This rich set of visual tools help you quickly design your app by dragging and dropping UI components and customizing their attributes to define behavior. While these tools lend themselves to low-code developers, experienced developers can just as easily access the underlying source code, even extend it using standard HTML5, JavaScript, and CSS techniques for complex needs.

    Easy Access to Data

    Oracle Visual Builder makes it easy to access your app’s data through REST-based services. So you can create reusable business objects to implement your app’s business logic and store its data, which can then be managed through REST endpoints that Oracle Visual Builder generates for you or you can pick data objects exposed by Oracle SaaS or Oracle Integration applications in an integrated catalog of REST services. You can also access data from any external REST service with just a few clicks.

    Development and Hosting Platform

    Oracle Visual Builder is a complete development tool as well as a hosting platform, which means you can manage your application’s lifecycle right from development to test and final publishing. Version management and data migration are built into an app’s lifecycle, making it easy for you to stage and publish your app and manage its data in every phase.

Acknowledgments

  • Author: Jesus Brasero Jimenez
  • Contributor: Anupama Pundpal