Transcribe Speech to Text and Synthesize Text to Speech with a Speech Action

Capabilities

OCI Speech is an AI service that can transcribe speech to text and synthesize text to speech.

OCI Speech harnesses the power of spoken language by enabling you to easily convert audio files containing human speech into highly exact text transcriptions. It uses automatic speech recognition (ASR) technology to provide a grammatically correct transcription. It can handle low-fidelity media recordings and transcribe challenging recordings such as meetings or call center calls.

OCI Speech lets you synthesize text to human-like speech across applications. This feature enables customer conversations, multi-language voice translations, and improved accessibility. Based on neural networks deep learning technology, this feature learns from a vast data set of human speech, capturing subtleties such as intonation, emotion, and rhythm to generate speech that closely mimics natural human expression. For example, a pause is added to the generated voice after a complete sentence followed by a period. You can use this feature in various ways, including improving accessibility for users with visual impairments, enhancing the experience in gaming, and accelerating the creation of educational content.

Oracle Integration supports using OCI Speech in an integration with the speech action.

Prerequisites

See Prerequisites for information on the prerequisites you must satisfy in the Oracle Cloud Console.

Invoke Oracle Cloud Infrastructure Speech from an Integration

Add a Speech action to an integration in either of the following ways:
- On the side of the canvas, click Actions and drag the OCI Speech action to the appropriate location.
- Click at the location where you want to add the OCI Speech action, then select OCI Speech.
Enter a name and optional description.

Select the following information.

Element	Description
Select categories	Select one of the following categories: Speech to text Text to speech
Action	If you selected the Speech to text category, select a transcription job to perform. You can create, update, or delete a transcription job. You can also get information about a transcription job or list the transcription jobs that are available in a compartment. Create transcription job: If you select this action, Oracle Integration will accept a request payload containing details such as compartment ID, model, input location (of the speech to be transcribed to text), and output location to create a transcription job. Compartment ID can also be specified in the Compartment field when performing Step 5. Get transcription job: If you select this action, Oracle Integration will accept the transcription job ID (of the transcription job) as a path parameter to retrieve the transcription job. List transcription jobs: If you select this action, Oracle Integration will accept the compartment ID (that contains the transcription jobs) as a query parameter to return a list of transcription jobs available in the compartment. It can also be specified in the Compartment field when performing Step 5. Update transcription job: If you select this action, Oracle Integration will accept the transcription job ID (of the transcription job you want to update) as a path parameter and a request payload with details such as display name, description, and so on that need to be modified. This action updates the specified transcription job with the new details you provide. Delete transcription job: If you select this action, Oracle Integration will accept the transcription job ID (of the transcription job you want to delete) as a path parameter. This action deletes the specified transcription job (but it will not delete the output transcription file that is stored in the output location bucket in the object store). If you selected the Text to speech category, select the Synthesize Speech action. If you select this action, Oracle Integration will accept a request payload with details such as compartment ID, model name, text (the text to be synthesized to speech), output format, and so on. It returns a stream reference that you can download. See Using Text to Speech.

Element

Description

Select categories

Select one of the following categories:

Speech to text
Text to speech

Action

If you selected the Speech to text category, select a transcription job to perform. You can create, update, or delete a transcription job. You can also get information about a transcription job or list the transcription jobs that are available in a compartment.

Create transcription job: If you select this action, Oracle Integration will accept a request payload containing details such as compartment ID, model, input location (of the speech to be transcribed to text), and output location to create a transcription job. Compartment ID can also be specified in the Compartment field when performing Step 5.
Get transcription job: If you select this action, Oracle Integration will accept the transcription job ID (of the transcription job) as a path parameter to retrieve the transcription job.
List transcription jobs: If you select this action, Oracle Integration will accept the compartment ID (that contains the transcription jobs) as a query parameter to return a list of transcription jobs available in the compartment. It can also be specified in the Compartment field when performing Step 5.
Update transcription job: If you select this action, Oracle Integration will accept the transcription job ID (of the transcription job you want to update) as a path parameter and a request payload with details such as display name, description, and so on that need to be modified. This action updates the specified transcription job with the new details you provide.
Delete transcription job: If you select this action, Oracle Integration will accept the transcription job ID (of the transcription job you want to delete) as a path parameter. This action deletes the specified transcription job (but it will not delete the output transcription file that is stored in the output location bucket in the object store).

If you selected the Text to speech category, select the Synthesize Speech action. If you select this action, Oracle Integration will accept a request payload with details such as compartment ID, model name, text (the text to be synthesized to speech), output format, and so on. It returns a stream reference that you can download. See Using Text to Speech.

Click Continue.

Select the following information, then click Continue.

Element	Description
Compartment	This field is available only when you select Create transcription job, List transcription jobs, or Synthesize Speech action in Step 3. Select the Oracle Cloud Infrastructure compartment in which your Oracle Integration is installed.
Output bucket	This field is available only when you select Create transcription job action in Step 3. Select a bucket to store the text output generated by the OCI Speech action.
Speaker	This field is available only when you select Synthesize Speech action in Step 3. Select a speaker (predefined voice) from the drop-down list.

Element

Description

Compartment

This field is available only when you select Create transcription job, List transcription jobs, or Synthesize Speech action in Step 3.

Select the Oracle Cloud Infrastructure compartment in which your Oracle Integration is installed.

Output bucket

This field is available only when you select Create transcription job action in Step 3.

Select a bucket to store the text output generated by the OCI Speech action.

Speaker

This field is available only when you select Synthesize Speech action in Step 3.

Select a speaker (predefined voice) from the drop-down list.

On the Summary page, click Finish.
Open the mapper and define the mappings between the source and target elements as needed for the action you selected in Step 3.

Note:
You can optionally specify Compartment Id and Bucket Name in the mapper to override the value you selected initially for Compartment and Output Bucket respectively (in Step 5).
1. Perform the following source-to-target mappings for the Create transcription job action:
  - Map the source Is Punctuation Enabled to the target Is Punctuation Enabled.
  - Map the source Compartment Id to the target Compartment Id.
  - Map the source Display Name to the target Display Name.
  - Map the source Description to the target Description.
  - Map the source Domain to the target Domain.
  - Map the source Language Code to the target Language Code.
  - Map the source Model Type to the target Model Type.
  - Map the source Is Diarization Enabled to the target Is Diarization Enabled.
  - Map the source Location Type to the target Location Type.
  - Map the source Object Locations to the target Object Locations.
  - Map the source Namespace Name to the target Namespace Name.
  - Map the source Bucket Name to the target Bucket Name.
  - Map the source Prefix to the target Prefix.
2. Perform the following source-to-target mapping for the Get transcription job action:
  - Map the source Transcription Job Id to the target Transcription Job Id.
3. Perform the following source-to-target mapping for the List transcription jobs action:
  - Map the source Compartment Id to the target Compartment Id.
  You can optionally configure target elements such as Lifecycle State, Display Name, Id, Limit, Page, Sort Order, and Sort By.
4. Perform the following source-to-target mappings for the Update transcription job action:
  - Map the source Transcription Job Id to the target Transcription Job Id.
  - Map the source Display Name to the target Display Name.
  - Map the source Description to the target Description.
5. Perform the following source-to-target mapping for the Delete transcription job action:
  - Map the source Transcription Job Id to the target Transcription Job Id.
6. Perform the following source-to-target mappings for the Synthesize Speech action:
  - Map the source Is Punctuation Enabled to the target Is Punctuation Enabled.
  - Map the source Config Type to the target Config Type.
  - Map the source Compartment Id to the target Compartment Id.
  - Map the source Model Name to the target Model Name.
  - Map the source Voice Id to the target Voice Id.
  - Map the source Model Family to the target Model Family.
  - Map the source Output Format to the target Output Format.
  - Map the source Sample Rate In Hz to the target Sample Rate In Hz.
  - Map the source Speech Mark Types to the target Speech Mark Types.
  - Map the source Text Type to the target Text Type.
  - Map the source Is Stream Enabled to the target Is Stream Enabled.
  - Map the source Text to the target Text.
Exit the mapper.
The speech action is now configured.
Here's what happens when you activate and run the integration based on the action you selected in Step 3:
- Create transcription job: Converts the speech you provided to text and the text output is stored in the output bucket you selected.
- Get transcription job: Retrieves the transcription job with the specified ID.
- List transcription jobs: Returns a list of transcription jobs available in the specified compartment.
- Update transcription job: Updates the specified transcription job with the new details you provided.
- Delete transcription job: Deletes the specified transcription job (but it will not delete the output transcription file that is stored in the output location bucket in the object store).
- Synthesize Speech: Converts the text you provided to speech. You can download the output file from the response.