Use OCI Speech to Transcribe Natural Language

Oracle Cloud Infrastructure (OCI) Speech is one of the several cloud-native AI services. You can use the Speech service to convert audio files to readable text that is stored in JSON format.

Speech harnesses the power of spoken language by allowing you to easily convert audio files containing human speech into highly accurate text transcriptions. The service is an OCI native application that you can access using a web application, REST API, SDK, CLI, or Console.

Speech uses automatic speech recognition (ASR) technology to provide a grammatically correct transcription of video and audio files. Speech handles low-fidelity audio recordings and transcribes challenging recordings like meetings or call center calls. Using Speech, you can turn files stored in OCI Object Storage or a data asset into accurate, normalized, timestamped, and profanity-filtered text. This functionality is available with downstream services. For example, you could use additional services such as language and forecasting to analyze call sentiment, target content for advertising, index your media folders and create a media search engine using Oracle Cloud Infrastructure Lakehouse.

Architecture

This architecture demonstrates the relationship among the various components in a typical system that has OCI Speech at its core.

This architecture can apply to many types of applications. For example, a web application can record a help desk representative's conversation with a customer who is reporting a problem. The audio file of the conversation is saved to OCI Object Storage which emits events for new audio files. OCI Events triggers a OCI Functions app that creates a transcription request using a REST API call to the OCI Speech service. Speech takes the job, retrieves the audio file from OCI Object Storage, feeds the file into pretrained acoustic and language models, and transfers the output to a JSON text file. The JSON file is stored in OCI Object Storage. OCI Object Storage detects the new text file and emits an event. OCI Events triggers OCI Functions to pull the text file and upload the text and metadata to a MySQL database. OCI Events also triggers OCI Notifications to publish a message when the transcript is ready which notifies web application which subscribes to the event. The web application displays the transcript in the ticket that the help desk representative created.

Optionally, the web application can retrieve audio file metadata such as the audio file duration, size, start date and time, and save it in the help desk ticket.

The audio and text files stored in OCI Object Storage can be fed into downstream analytic tools using Oracle Cloud Infrastructure Lakehouse (not shown in the diagram).

The following diagram illustrates this reference architecture.

Description of architecture-ai-speech.png follows
Description of the illustration architecture-ai-speech.png

architecture-ai-speech-oracle.zip

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain shouldn't affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Compartment

    Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.

  • Load balancer

    The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Identity and Access Management (IAM)

    Oracle Cloud Infrastructure Identity and Access Management (IAM) is the access control plane for Oracle Cloud Infrastructure (OCI) and Oracle Cloud Applications. The IAM API and the user interface enable you to manage identity domains and the resources within the identity domain. Each OCI IAM identity domain represents a standalone identity and access management solution or a different user population.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Functions

    Oracle Cloud Infrastructure Functions is a fully managed, multitenant, highly scalable, on-demand, Functions-as-a-Service (FaaS) platform. It is powered by the Fn Project open source engine. Functions enable you to deploy your code, and either call it directly or trigger it in response to events. Oracle Functions uses Docker containers hosted in Oracle Cloud Infrastructure Registry.

  • Events

    Oracle Cloud Infrastructure services emit events, which are structured messages that describe the changes in resources. Events are emitted for create, read, update, or delete (CRUD) operations, resource lifecycle state changes, and system events that affect cloud resources.

  • Monitoring

    Oracle Cloud Infrastructure Monitoring service actively and passively monitors your cloud resources using metrics to monitor resources and alarms to notify you when these metrics meet alarm-specified triggers.

  • Audit

    The Oracle Cloud Infrastructure Audit service automatically records calls to all supported Oracle Cloud Infrastructure public application programming interface (API) endpoints as log events. Currently, all services support logging by Oracle Cloud Infrastructure Audit.

  • Notifications

    The Oracle Cloud Infrastructure Notifications service broadcasts messages to distributed components through a publish-subscribe pattern, delivering secure, highly reliable, low latency, and durable messages for applications hosted on Oracle Cloud Infrastructure.

  • Oracle Cloud Infrastructure Speech

    Oracle Cloud Infrastructure Speech is a new AI service that uses Automatic Speech Recognition (ASR) to convert speech to text.

  • Oracle MySQL Database Service

    Oracle MySQL Database Service is a fully managed Oracle Cloud Infrastructure (OCI) database service that lets developers quickly develop and deploy secure, cloud native applications. Optimized for and exclusively available in OCI, Oracle MySQL Database Service is 100% built, managed, and supported by the OCI and MySQL engineering teams.

    Oracle MySQL Database Service has an integrated, high-performance analytics engine (HeatWave) to run sophisticated real-time analytics directly against an operational MySQL database.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.

  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Cloud Guard

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Security Zones

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Load Balancer Bandwidth

    While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom flexible shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.

Considerations

  • Performance

    Use Oracle cloud native services - Events, Functions, Notifications and AI Speech to deploy serverless applications that scale automatically based on workload. The services are managed by Oracle.

    The AI speech job is processed in a strict first-in first-out manner. You can create a queue of jobs with a maximum of 10,000 tasks at the tenant level. If you submit a job that exceeds the maximum tasks, that job fails. Jobs are retained for 90 days.

  • Access

    OCI Speech supports access through the OCI Console, Java and Python SDK client, and OCI CLI. When testing it's recommended to use the CLI tool or Console.

  • Availability

    In this example the database is not highly available. For critical applications consider running MySQL Database Service in HA mode with 3 replicas.

  • Cost

    Use Oracle cloud native services - Events, Functions, Notifications and AI Speech to deploy the serverless application that incur no fixed cost. You only pay for the service request when you use it.

Acknowledgments

  • Authors: Wei Han, Zaid Al Qaddoumi
  • Contributors: Sreya Dutta

Change Log

This log lists significant changes: