Use OCI Speech to Transcribe Natural Language

Oracle Cloud Infrastructure (OCI) Speech is one of the several cloud native AI services. You can use the Speech service to convert audio files to readable text that is stored in JSON format.

Speech harnesses the power of spoken language by allowing you to easily convert audio files containing human speech into highly accurate text transcriptions. The service is an OCI native application that you can access using a web application, REST API, SDK, CLI, or Console.

Speech uses automatic speech recognition (ASR) technology to provide a grammatically correct transcription of video and audio files. Speech handles low-fidelity audio recordings and transcribes challenging recordings like meetings or call center calls. Using Speech, you can turn files stored in Object Storage or a data asset into accurate, normalized, timestamped, and profanity-filtered text. This functionality is available with downstream services. For example, you could use additional service such as language and forecasting to analyze call sentiment, target content for advertising, index your media folders and create a media search engine using Data Lake House.

Architecture

This architecture demonstrates the relationship among the various components in a typical system that has OCI Speech at its core.

This architecture can apply to many types of applications. For example, a web application can record a help desk representative's conversation with a customer who is reporting a problem. The audio file of the conversation is saved to Object Storage which emits events for new audio files. Oracle Events triggers a Functions app that creates a transcription request via REST API call to the Speech service. Speech takes the job, retrieves the audio file from Object Storage, feeds the file into pretrained acoustic and language models, and outputs a JSON text file. The JSON file is stored in Object Storage. Object Storage detects the new text file and emits an event. Oracle Events triggers Oracle Functions to pull the text file and upload the text and metadata to a MySQL database. Oracle Events also triggers Oracle Notifications to publish a message when the transcript is ready which the web application can subscribe to. The web application displays the transcript in the ticket that the help desk representative created.

Optionally, the web application can retrieve audio file metadata such as the audio file duration, size, start date and time, and save it in the help desk ticket.

The audio and text files stored in Object storage can be fed into downstream analytic tools using Data Lakehouse (not shown in the diagram).

The following diagram illustrates this reference architecture.

Description of architecture-ai-speech.png follows
Description of the illustration architecture-ai-speech.png

architecture-ai-speech-oracle.zip

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Compartment

    Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.

  • Load Balancer

    The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end.

    The load balancer provides access to different applications.

  • Security List

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Object Storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Oracle Functions

    This architecture uses a function to call the OCI Speech REST API with a specific audio file and then store the transcribed text file and metadata in Object Storage. The function can be built using the Java or Python SDK.

  • Oracle Cloud Infrastructure Events

    In this architecture, the Events service is configured to listen to changes in Object Storage creation. The service is invoked after the object is uploaded to Object Storage and calls the function for processing.

  • Oracle Infrastructure Cloud Speech

    Oracle Cloud Infrastructure Speech is a new AI service that uses Automatic Speech Recognition (ASR) to convert speech to text.

  • Oracle Infrastructure Cloud Notifications

    The Notifications service broadcasts messages to distributed components through a publish-subscribe pattern, delivering secure, highly reliable, low latency and durable messages for applications hosted on Oracle Cloud Infrastructure and externally.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.

  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Cloud Guard

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Security Zones

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Load Balancer Bandwidth

    While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom flexible shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.

Considerations

  • Performance

    Use Oracle cloud native services - Events, Functions, Notifications and AI Speech to deploy serverless applications that scale automatically based on workload. The services are managed by Oracle.

    The AI speech job is processed in a strict first-in first-out manner. You can create a queue of jobs with a maximum of 10,000 tasks at the tenant level. If you submit a job that exceeds the maximum tasks, that job fails. Jobs are retained for 90 days.

  • Access

    OCI Speech supports access through the OCI Console, Java and Python SDK client, and OCI CLI. When testing it's recommended to use the CLI tool or Console.

  • Availability

    In this example the database is not highly available. For critical applications consider running MySQL Database Service in HA mode with 3 replicas.

  • Cost

    Use Oracle cloud native services - Events, Functions, Notifications and AI Speech to deploy the serverless application that incur no fixed cost. You only pay for the service request when you use it.

Acknowledgements

  • Author: Wei Han