Document Translation

The document translation model translates text into a chosen language

OCI Document Translation is a cloud-based service that enables seamless and efficient language translation within documents. OCI Document Translation uses Oracle pre-trained Machine Translation models to perform language translation and other language-related operations.

Document Translation translates a variety of document types. Word, Excel and PowerPoint can be translated while keeping their original formatting. Plain text, HTML formats and JSON are supported, ideal for translating online content or integrating translation your global applications. Additionally, formats for closed captions and subtitling are supported, improving the accessibility of your video content.

Supported Document Types

Document Type Extensions
Microsoft Office docx, pptx, xlsx
HTML .html
JSON .json
Text .txt
CSV Comma-separated values, .csv
TSV Tab-separated values, .tsv
SRT SubRip Subtitle file, .srt
Web VTT Web Video Text Tracks Format, .vtt

Supported Languages

For list of supported languages, see Text Translation.

Use Cases

Streamlined approach to overcoming language barriers
  • Translate user guides, blogs and knowledge base articles to reach a wider audience.
  • Improve internal communications and knowledge sharing across global teams.
  • Expand the reach of your sales and marketing campaigns by providing presentations and marketing assets in multiple languages.
  • Make your training content more inclusive to non-native speakers by adding subtitles to recorded video content.
  • Develop multi-lingual support for products and services, including expanding your machine learning models to be used with non-English input content.

Known Issues and Limitations

Oracle Translate provides good quality and reliable translation for a wide range of business enterprise and generic content. Translated documents can be utilized without further modification if some imperfections are tolerated. In cases where precision is essential, post-editing by native language speakers can be commissioned to rectify and enhance machine translation output.

To enhance the quality of your translations, consider these potential limitations and adopt the recommendations listed below:

Limitation Recommendation

Quality of source content: This can impact the quality of the translation. The key areas to consider that may reduce translation quality are:

  • Source containing abbreviations, misspellings, or lack of punctuation
  • Long strings (>20 words) and very short strings (1-word or 2-word strings)
Check the source content for spelling mistakes, punctuation errors and string length.
Context- and Domain-specific terminology or Named Entities: If your content contains specific terminology or named entities (person names, company names, brand names), our generic translation model may not translate this content to meet the needs of your use case.

We recommend using our glossary feature to control your translation and consistently translate terms that are specific to your use case.

Note

Oracle Brand protection is in place to ensure registered Oracle terms such as product names are not translated. This is expected behaviour and is based on the Oracle Official Names List.

Controlling Translation Features

Document Translation allows you to control and customize translation through Advanced Properties, either by using a glossary file or specific file properties.

A glossary is a list of user-supplied terms that can be used within the Document Translation service to control your translation. By using a glossary, you can specify how to translate or not translate certain terminology.

The main use cases for glossaries include:

  • Ensuring your context- and domain-specific terminology is translated consistently throughout your content.
  • Restrict certain terms or words from translation. For example, brand or product names that you don't want to translate.

File type specific properties, allow you to optionally control what elements of a file are translated. For example, columns to translate in a CSV file or elements to translate for a JSON file.

Advanced Property Description
Glossaries

You can specify custom terminologies per job, where certain words can be translated differently. Glossary can be supplied as comma separated values (CSV) with no header.

Sample value for advanced properties:

{"translation":{"glossary": {"type": "bucket","bucketDetails": {"bucketName": "source-bucket", "namespace": "idngwwc5ajp5","prefix": "glossary_text.csv"}}}}

Sample glossary csv file content (For example: glossary_text.csv):

India,India
Oracle,Oracle
Oracle Cloud Infrastructure,Oracle Cloud Infrastructure
Oracle NetSuite,Oracle NetSuite
csv

Specify whether the headers present and the columns to translate.

  • columnsToTranslate: index (starting from 1) of the column to translate.
  • hasHeaders: specifies whether the csv has headers, if True the first row is left untranslated.

Example:

{"translation":{"csv":{"columnsToTranslate":[2],"hasHeaders":false}}}
json

Specify the elements to translate to translate.

Example:

{"translation":{"json":{"filter":"path","pathsToTranslate":["jsonData.title","jsonData.existingSkills","jsonData.structured.experience[*].role"]}}}
  • Upload the document to a bucket. For more information, see Upload Dataset.
    1. Open the navigation menu and click Analytics & AI. Under AI Services, click Language.
    2. In the left-side navigation menu, click Jobs.
    3. Click Create Job, and then enter a name and compartment.
    4. Select the Pretrained language translation.
    5. Select source language.
    6. Select target languages.
    7. Click Next.
    8. Enter the Data type.
    9. Enter the bucket where the document is located.
    10. Enter the datafile name.
    11. Enter the text column name of the column that has the text to be processed.
    12. Enter the row ID column. This is the column that uniquely identifies the row.
    13. (Optional) Enter the columns to be copied to output.
    14. (Optional) Enter the Job output data.
    15. To review details, click Next.
    16. Click Create job.
  • Use the oci ai language batch-language-translation command and required parameters to translate one or more files:

    oci ai language batch-language-translation --documents [<list-of-documents>] ... [OPTIONS]

    Example:

    oci ai language batch-language-translation --documents '[{"key": "1","languageCode": "en","text": "hello world"}]' 
    --target-language-code es

    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.

  • Run the BatchLanguageTranslation operation to translate one or more files.