Creating a Transcription Job

Create and submit a Speech job to transcribe one or more media files to text files.

Before you begin:

Store media the files that you want to transcribe in an Object Storage bucket in a tenancy.

The Whisper model is trained on a large corpus of multilingual data collected from the web and supports file based voice to text transcription for 50+ languages. This model uses the same service end points and API/SDK interfaces as the Oracle speech model to give you the most flexibility and compatibility. In addition, the Whisper model uses diarization to label individual speakers in the recording.

To compare Whisper and Oracle ASR models for transcription job creating, see Comparing Whisper and Oracle ASR Models.

Comparing Whisper and Oracle ASR Models

Compare Whisper model and Oracle ASR model for creating transcription jobs.

Use the following comparison table of the Whisper model vs the Oracle ASR model to choose the correct model when creating a transcription job.

Feature Oracle ASR model Whisper Model in Oracle Speech Service
Real time transcriptions Supported Not supported
Large file size Up to 2 GB Up to 2 GB
Word level timestamp Supported Supported
Multilingual support English, Spanish, French, German, Italian, Portuguese, and Hindi Same as Oracle ASR model plus 50 other languages*
Diarization Supported Supported

* OpenAI Whisper FAQ

  • Store the media file that you want to transcribe.
    1. Open the navigation menu and click Analytics & AI. Under AI Services, click Speech.
    2. Under List Scope, select the compartment that you want to work in.
    3. In the left-side navigation menu, click Jobs
    4. Click Create job.
    5. On the basic information page, enter a unique name (255 character limit) for the project. The name must include one or more alphanumeric characters, dashes, or underscores in any order. If you don't provide a name, a name is automatically generated for you.

      For example:


    6. (Optional) Enter a description (400 character limit) for the job.
    7. Under List Scope, select the compartment that you want to work in.
    8. Under Input, select a data input bucket that contains the media file that you want to transcribe.

      If the bucket that you want isn't in the selected compartment, change the compartment.

    9. Under Output, select where you want to store the output files in the input bucket or in a different bucket. To use a different bucket, select it.
    10. (Optional) Enter an output prefix to separate and sort the files in the bucket.

      For example, you could enter call_ctr for call center media files.

      You can also create an output folder in your bucket by using a slash (/). For example, MyResults/ stores all the transcribed files in a MyResults folder in the bucket.

    11. Select the model type of the job you're creating.

      See Comparing Whisper and Oracle ASR Models to determine the correct model type.
    12. If a whisper model was selected in the previous step, select the model subtype. Otherwise, proceed to the next step.
    13. Choose the language of the media file.

      You can search for the appropriate language by language or language code. US English is the default.

    14. (Optional) To include both the SRT format and JSON formats in the transcription, select Get SRT transcription format.
    15. If you don't want your transcription punctuated, clear Disable punctuation.

      Enable punctuation is selected for whisper models and can't be disabled.
    16. (Optional) To identify the speakers in the input file, select Enable diarization.

      You can let the Speech service automatically detect the number of unique speakers in the input file or you can enter a number. The minimum number of speakers is two and the maximum is sixteen.


      Using diarization increases the transcription task latency, which is why this option is disabled by default.

    17. To add filters to change the way the output file is generated, click Add filter.
      1. Select a filter type. Profanity is the default.
      2. Select the filter mode:

        For example, the profanity filter offers these modes:

        • Mask:Any detected profanity is masked in the transcription with asterisks except for the first letter.

        • Remove: Any detected profanity is replaced with one asterisk in the transcription.

        • Tag: Doesn't mask or remove the profanity, rather it marks them as TYPE: "Profanity" in the transcription.

    18. (Optional) Click Show advanced options to assign tags to the job. Tags help you to easily locate and track resources by selecting a tag namespace, then entering the key and value. To add more than one tag, click Add tags.

      Tagging describes the various tags that you can use organize and find resources including cost-tracking tags.

    19. Click Next to choose your files for the job.
    20. Select the check boxes for the media files that you want to transcribe or select them all by selecting the check box next to Name.

      • The maximum file size is 2 GB.

      • File duration is a maximum of 4 hours.

    21. Click Submit to start the job.

      A job can run in seconds or hours depending on the size and number of files you select. While running, the job is in an in-progress state that changes to succeeded or failed when it finishes. You can select a job to go to its details page.

      • Each job can have up to 100 tasks.

      • Jobs are retained for 90 days.

  • Use the oci speech transcription-job create command and required parameters to create a transcription job:

    oci speech command-name --compartment-id compartment_id --input-location file://path/to/file --output-location file://path/to/file ... [OPTIONS]

    Avoid entering confidential information.

    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.

  • Use the CreateTranscriptionJob and ChangeTranscriptionJobCompartment operations to create a job.