Using Pretrained Models

Learn about the Language service pretrained models.

Limitations Common to all Models

  • If your text doesn't follow English grammar, the model performance could degrade.

  • Spelling isn't checked or corrected so the results might not be as you expect when spelling mistakes exist.

Single Requests

  • A record can be up to 1,000 characters. We encourage you to use Batch Requests that support records up to 5,000 characters and support more than one record in a single request.

  • There is no minimum number of characters that must be provided, but the output quality is highly dependent on the amount of information provided to the models.

Batch Requests

  • A batch can have up to 100 records.

  • A record can be up to 5,000 characters long.

  • The total number of characters to process in a request can be up to 20,000 characters.

About Language Detection

The language detection model identifies which natural language the input text is in.

For example, language detection can help make customer support interactions more personable and quicker. Customer service chatbots can interact with customers based on the language of their input text and respond accordingly. If a customer needs help with a product, the chatbot server can field the corresponding language product manual, or transfer to the call center for the specific language.

Supported Languages

  • Afrikaans

  • Albanian

  • Arabic

  • Armenian

  • Azerbaijani

  • Basque

  • Belarusian

  • Bengali

  • Bosnian

  • Brazilian Portuguese

  • Bulgarian

  • Burmese

  • Cantonese

  • Catalan

  • Cebuano

  • Chinese

  • Croatian

  • Czech

  • Danish

  • Dutch

  • Eastern Punjabi

  • Egyptian Arabic

  • English

  • Esperanto

  • Estonian

  • Finnish

  • French

  • Georgian

  • German

  • Greek

  • Hebrew

  • Hindi

  • Hungarian

  • Icelandic

  • Indonesian

  • Irish

  • Italian

  • Japanese

  • Javanese

  • Kannada

  • Kazakh

  • Korean

  • Kurdish (Sorani)

  • Latin

  • Latvian

  • Lithuanian

  • Macedonian

  • Malay

  • Malayalam

  • Marathi

  • Minangkabau

  • Nepali

  • Norwegian (Bokmal)

  • Norwegian (Nynorsk)

  • Persian

  • Polish

  • Portuguese

  • Romanian

  • Russian

  • Serbian

  • Serbo-Croatian

  • Slovak

  • Slovene

  • Spanish

  • Swahili

  • Swedish

  • Tagalog

  • Tamil

  • Telugu

  • Thai

  • Turkish

  • Ukrainian

  • Urdu

  • Uzbek

  • Vietnamese

  • Welsh

Examples

Input Text Language and Scores

OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate.

English 0.9999

«Нос» - сатирический рассказ Николая

Гоголя, написанный во время его жизни в

Санкт-Петербурге. В это время в

творчестве Гоголя основное внимание

уделялось сюрреализму и гротеску с

романтическим оттенком; Предлагаемый

здесь рассказ «Нос» является примером.

Рассказ Николая Гоголя «Нос» был

написан между 1832 и 1833 годами и

завершен в 1834 году, подвергался

различным пересмотрам и модификациям

Н. Гоголем, в основном из-за

непрерывного вмешательства цензуры.

Russian 0.9999

The JSON for the first example is:

Sample Request
POST https://<region-url>/20210101/actions/batchDetectDominantLanguage
API Request format:
{
  "documents": [
    {
      "key": "doc1",
      "text": "OCI recently added new services to existing compliance program 
including SOC, HIPAA, and ISO to enable our customers to solve their use 
cases. We also released new white papers and guidance documents related to 
Object Storage, the Australian Prudential Regulation Authority (APRA), and the 
Central Bank of Brazil. These resources help regulated customers better 
understand how OCI supports their regional and industry-specific compliance 
requirements. Not only are we expanding our number of compliance offerings and 
regulatory alignments, we continue to add regions and services at a faster clip."
    }
  ]
}
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "languages": [
                {
                    "code": "en",
                    "name": "English",
                    "score": 0.9999840921009815
                }
            ]
        }
    ],
    "errors": []
}}

Limitations

  • Only one language is returned. In cases where the input is multi-lingual, the dominant language is returned.

About Text Classification

Text classification analyses the text and identifies categories for the content with a confidence score.

Text classification uses the natural language processing (NLP) service that uses deep learning techniques to find insights from textual data. It returns a category from a set of the predefined categories.

For example, you could analyze a collection of financial documents to Credit and Lending, Insurance, Investing, Banking, and so on. The results are given in category and subcategory format.

Supported Features

  • Text category

  • Confidence scores

  • Requests support single record and multi-record batches.

Supported Languages for Input Text

  • English

Examples

Input Text Categories and Scores

Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner.

Sports and Games/Motor Sports 1.0000

The JSON for the first example is:

Sample Request
POST https://<region-url>/20210101/actions/batchDetectLanguageTextClassification
API Request format:
{
    "documents": [
        {
            "key": "1",
            "textClassification": [
                {
                    "label": "Sports and Games/Motor Sports",
                    "score": 1
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "textClassification": [
                {
                    "label": "Sports and Games/Motor Sports",
                    "score": 1
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}

Supported Categories

Category Domain Sub-Domain
Animals and Plants/Pets Animals and Plants Pets
Arts and Culture/Audio and Music Arts and Culture Audio and Music
Arts and Culture/Performing Arts Arts and Culture Performing Arts
Arts and Culture/Visual Art and Design Arts and Culture Visual Art and Design
Autos and Vehicles/Bicycles and Accessories Autos and Vehicles Bicycles and Accessories
Autos and Vehicles/Boats and Watercraft Autos and Vehicles Boats and Watercraft
Autos and Vehicles/Motor Vehicles Autos and Vehicles Motor Vehicles
Autos and Vehicles/Vehicle Parts and Services Autos and Vehicles Vehicle Parts and Services
Autos and Vehicles/Vehicle Shopping and Renting Autos and Vehicles Vehicle Shopping and Renting
Books and Literature/Children'S Literature Books and Literature Children'S Literature
Books and Literature/E-Books Books and Literature E-Books
Books and Literature/Poetry Books and Literature Poetry
Business and Industry/Advertising and Marketing Business and Industry Advertising and Marketing
Business and Industry/Agriculture and Forestry Business and Industry Agriculture and Forestry
Business and Industry/Business Operations Business and Industry Business Operations
Business and Industry/Business Services Business and Industry Business Services
Business and Industry/Construction and Maintenance Business and Industry Construction and Maintenance
Business and Industry/Defence Business and Industry Defence
Business and Industry/Energy and Utilities Business and Industry Energy and Utilities
Business and Industry/Manufacturing Business and Industry Manufacturing
Business and Industry/Metals and Mining Business and Industry Metals and Mining
Business and Industry/Printing and Publishing Business and Industry Printing and Publishing
Business and Industry/Textiles and Nonwovens Business and Industry Textiles and Nonwovens
Business and Industry/Transportation and Logistics Business and Industry Transportation and Logistics
Computer and Electronics/Computer Hardware Computer and Electronics Computer Hardware
Computer and Electronics/Computer Security Computer and Electronics Computer Security
Computer and Electronics/Consumer Electronics Computer and Electronics Consumer Electronics
Computer and Electronics/Electronics and Electrical Computer and Electronics Electronics and Electrical
Computer and Electronics/Enterprise Software Computer and Electronics Enterprise Software
Computer and Electronics/Programming Computer and Electronics Programming
Computer and Electronics/Software Computer and Electronics Software
Education and Occupation/Education Education and Occupation Education
Education and Occupation/Jobs Education and Occupation Jobs
Entertainment/Celebrities and Famous Personalities Entertainment Celebrities and Famous Personalities
Entertainment/Comics and Animation Entertainment Comics and Animation
Entertainment/Events and Listings Entertainment Events and Listings
Entertainment/Film Industry Entertainment Film Industry
Entertainment/Humor Entertainment Humor
Entertainment/Movies Entertainment Movies
Finance/Accounting and Auditing Finance Accounting and Auditing
Finance/Banking Finance Banking
Finance/Credit and Lending Finance Credit and Lending
Finance/Financial Planning and Management Finance Financial Planning and Management
Finance/Grants, Scholarships and Financial Aid Finance Grants, Scholarships and Financial Aid
Finance/Investing Finance Investing
Fitness and Beauty/Fashion and Style Fitness and Beauty Fashion and Style
Fitness and Beauty/Weight Loss Fitness and Beauty Weight Loss
Food and Grocery/Cooking and Recipes Food and Grocery Cooking and Recipes
Food and Grocery/Drinks and Beverages Food and Grocery Drinks and Beverages
Food and Grocery/Food Food and Grocery Food
Food and Grocery/Grocery Retailers Food and Grocery Grocery Retailers
Food and Grocery/Restaurants Food and Grocery Restaurants
Government and Laws/Government Government and Laws Government
Government and Laws/Legal Government and Laws Legal
Government and Laws/Military Government and Laws Military
Government and Laws/Public Safety Government and Laws Public Safety
Groups and Communities/Social Networks Groups and Communities Social Networks
Health and Medical/Aging and Geriatrics Health and Medical Aging and Geriatrics
Health and Medical/Conditions and Disease Health and Medical Conditions and Disease
Health and Medical/Medical Facilities and Services Health and Medical Medical Facilities and Services
Health and Medical/Mental Health Health and Medical Mental Health
Health and Medical/Nursing Health and Medical Nursing
Health and Medical/Nutrition Health and Medical Nutrition
Health and Medical/Oral and Dental Care Health and Medical Oral and Dental Care
Health and Medical/Pharmacy Health and Medical Pharmacy
Health and Medical/Women'S Health Health and Medical Women'S Health
Hobbies and Leisure Activities/Crafts Hobbies and Leisure Activities Crafts
Hobbies and Leisure Activities/Outdoors Hobbies and Leisure Activities Outdoors
Hobbies and Leisure Activities/Water Activities Hobbies and Leisure Activities Water Activities
Home and Decor/Bed and Bath Home and Decor Bed and Bath
Home and Decor/Gardening and Landscaping Home and Decor Gardening and Landscaping
Home and Decor/Home Furnishings Home and Decor Home Furnishings
Home and Decor/Home Improvement Home and Decor Home Improvement
Home and Decor/Kitchen and Dining Home and Decor Kitchen and Dining
Home and Decor/Yard and Patio Home and Decor Yard and Patio
Hotel and Travel/Hotels and Accommodations Hotel and Travel Hotels and Accommodations
Hotel and Travel/Personal Travel Hotel and Travel Personal Travel
Internet and Communications/Communications Equipment Internet and Communications Communications Equipment
Internet and Communications/Email and Messaging Internet and Communications Email and Messaging
Internet and Communications/Service Providers Internet and Communications Service Providers
Internet and Communications/Web Services Internet and Communications Web Services
Real Estate and Properties/Real Estate Listings Real Estate and Properties Real Estate Listings
Retailing and Shopping/Consumer Resources Retailing and Shopping Consumer Resources
Retailing and Shopping/Department Stores Retailing and Shopping Department Stores
Retailing and Shopping/Gifts and Special Event Items Retailing and Shopping Gifts and Special Event Items
Science and Technology/Biology Science and Technology Biology
Science and Technology/Chemistry Science and Technology Chemistry
Science and Technology/Computer Science Science and Technology Computer Science
Science and Technology/Earth Sciences Science and Technology Earth Sciences
Science and Technology/Ecology and Environment Science and Technology Ecology and Environment
Science and Technology/Engineering Science and Technology Engineering
Science and Technology/Mathematics and Statistics Science and Technology Mathematics and Statistics
Science and Technology/Physics Science and Technology Physics
Science and Technology/Social Science Science and Technology Social Science
Society and State/Family and Relationships Society and State Family and Relationships
Society and State/Kids and Teens Society and State Kids and Teens
Society and State/Religion and Belief Society and State Religion and Belief
Society and State/Social Issues and Advocacy Society and State Social Issues and Advocacy
Sports and Games/Animal Sports Sports and Games Animal Sports
Sports and Games/Board Games Sports and Games Board Games
Sports and Games/Card Games Sports and Games Card Games
Sports and Games/Combat Sports Sports and Games Combat Sports
Sports and Games/Computer and Video Games Sports and Games Computer and Video Games
Sports and Games/Gambling Sports and Games Gambling
Sports and Games/Individual Sports Sports and Games Individual Sports
Sports and Games/International Sports Competitions Sports and Games International Sports Competitions
Sports and Games/Motor Sports Sports and Games Motor Sports
Sports and Games/Table Games Sports and Games Table Games
Sports and Games/Team Sports Sports and Games Team Sports
Sports and Games/Water Sports Sports and Games Water Sports
Sports and Games/Winter Sports Sports and Games Winter Sports

Limitations

  • If your text is about more than one category, the major category of the text is identified and could differ from human interpretation of the text.

About Named Entity Recognition

Named Entity Recognition (NER) detects named entities in text.

The NER model uses natural language processing to find a variety of named entities. For each entity extracted, NER also returns the location of the entity extracted (offset and length), and a confidence score, which is a value 0–1.

Supported Languages for Input Text

  • English
  • Spanish

Use Cases

You could use the NER endpoint effectively in these scenarios:

Classifying content for news providers

It can be difficult to classify and categorize news article content. The NER model can automatically scan articles to identify the major people, organizations, and places in them. The extracted entities can be saved as tags with the related articles. Knowing the relevant tags for each article helps with automatically categorizing the articles in defined hierarchies, and enables content discovery.

Customer support

Recognizing relevant entities in customer complaints and feedback, product specifications, department details, or company branch details, helps to classify the feedback appropriately. The entities can then be forwarded to the person responsible for the identified product.

Similarly, there could be feedback tweets where you can categorize them all based on their locations, and the products mentioned.

Efficient search algorithms

You could use NER to extract entities that are then searched against the query, instead of searching for a query across the millions of articles and websites online. When run on articles, all the relevant entities associated with each article are extracted and stored separately. This separation could speed up the search process considerably. The search term is only matched with a small list of entities in each article, leading to quick and efficient searches.

It can be used for searching content from millions of research papers, Wikipedia articles, blogs, articles, and so on.

Content recommendations

Extracting entities from a particular article, and recommending the other articles that have the most similar entities mentioned in them is possible with NER. For example, it can be used effectively to develop content recommendations for a media industry client. It enables the extraction of the entities associated with historical content or previous activities. NER compares them with the label assigned to other unseen content to filter relevant entities.

Automatically summarizing job candidates

The NER model could facilitate the evaluation of job candidates, by simplifying the effort required to shortlist candidates with numerous applications. Recruiters could filter and categorize them based on identified entities like location, college degrees, employers, skills, designations, certifications, and patents.

Supported Entities

The following table describes the different entities that NER can extract. The entity type and subtype depends on the API that you call (detectDominantLanguageEntities or batchDetectDominantLanguageEntities).

Note

To maintain backward compatibility, the detectDominantLanguageEntities wasn't modified when we introduced the concept of subtype. We recommend that you use the batchDetectDominantLanguageEntities endpoint because the service uses types and subtypes. The isPii property was dropped to introduce the batching API so you can compute it with the supported entity types as in the following table.

Entity (Full Name) Entity Type (In Prediction) Entity Subtype (In prediction) Single Record API / Batch API (if blank, both APIs are consistent) Is PII Description
DATE DATE Single record

X

Absolute or relative dates, periods, and date range.

Examples:

“10th of June”,

“third Friday in August”

“the first week of March”

DATETIME DATE Batch
EMAIL EMAIL
EVENT EVENT Χ Named hurricanes, sports events, and so on.
FACILITY FACILITY Single record Χ Buildings, airports, highways, bridges, and so on.
LOCATION FACILITY Batch
GEOPOLITICAL ENTITY GPE Single record Χ Countries, cities, and states.
LOCATION GPE Batch
IP ADDRESS IPADDRESS IP address according to IPv4 and IPv6 standards.
LANGUAGE LANGUAGE Χ Any named language.
LOCATION LOCATION Χ Non-GPE locations, mountain ranges, bodies of water.
CURRENCY MONEY Single record

X

Monetary values, including the unit.
QUANTITY CURRENCY Batch
NATIONALITIES, 
RELIGIOUS and 
POLITICAL GROUPS
NORP Χ Nationalities, religious or political groups.
ORGANIZATION ORG Χ Companies, agencies, institutions, and so on.
PERCENTAGE PERCENT Single record Χ Percentage.
QUANTITY PERCENTAGE Batch
PERSON PERSON People, including fictional characters.
PHONENUMBER PHONE_NUMBER

Supported phone numbers:

("GB") - United Kingdom
("AU") - Australia 
("NZ") - New Zealand 
("SG") - Singapore 
("IN") - India
("US")  - United States
PRODUCT PRODUCT Χ Vehicles, tools, foods, and so on (not services).
NUMBER QUANTITY Single record Χ Measurements, as weight or distance.
QUANTITY NUMBER Batch X
TIME TIME Single record

Χ

Anything less than 24 hours (time, duration, and so on).
DATETIME TIME Batch
URL URL URL.

Examples

Input Text Entities and Scores

Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner.

Red Bull Racing Honda [ORG] 1.0000
four-time [QUANTITY/NUMBER] 1.0000
Formula-1 World [EVENT] 0.9705
Oracle Cloud Infrastructure (OCI [ORG] 0.9811

OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate.

OCI [ORG] 1.0000
SOC [ORG] 1.0000
HIPAA [ORG] 1.0000
ISO [ORG] 1.0000
Australian Prudential Regulation Authority [ORG] 1.0000
Central Bank of Brazil [ORG] 0.9998
OCI [ORG] 1.0000

The JSON for the first example is:

Sample Request
POST https://<region-url>/20210101/actions/batchDetectLanguageEntities
API Request format:
"{
    "documents": [
       

{             "key": "doc1",             "text": " Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner."         }
    ]
}"
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "entities": [
                {
                    "offset": 0,
                    "length": 15,
                    "text": "Red Bull Racing",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.9914557933807373,
                    "metaInfo": null
                },
                {
                    "offset": 16,
                    "length": 5,
                    "text": "Honda",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.6515499353408813,
                    "metaInfo": null
                },
                {
                    "offset": 27,
                    "length": 9,
                    "text": "four-time",
                    "type": "QUANTITY",
                    "subType": null,
                    "score": 0.9998091459274292,
                    "metaInfo": [
                        {
                            "offset": 27,
                            "length": 9,
                            "text": "four-time",
                            "subType": "UNIT",
                            "score": 0.9998091459274292
                        }
                    ]
                },
                {
                    "offset": 47,
                    "length": 5,
                    "text": "World",
                    "type": "LOCATION",
                    "subType": "NON_GPE",
                    "score": 0.5825434327125549,
                    "metaInfo": null
                },
                {
                    "offset": 79,
                    "length": 27,
                    "text": "Oracle Cloud Infrastructure",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.998045802116394,
                    "metaInfo": null
                },
                {
                    "offset": 108,
                    "length": 3,
                    "text": "OCI",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.9986366033554077,
                    "metaInfo": null
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}

Limitations

  • Sometimes, entities might not be separated or combined as you expect.

  • NER uses the context of the sentence to identify entities. If the context is not present in the text processed, entities might not be extracted as you expect.

  • Malformed text (structure and semantics) might reduce the performance.

  • Age isn't a separate entity so age-related periods might be identified as a date entity.

About Key Phrase Extraction

Keyword extraction is the automated process of extracting the words with the most relevance, and expressions from the input text. It helps summarize the content, and recognizes the main topics.

The key phrase extraction model uses NLP and ML to find insights related to the main points of the text. It understands the unstructured input text, and returns key words and key phrases (KPs).

The KPs consists of subjects and objects that are being talked about in the document. Any modifiers, like adjectives associated with these subjects and objects, are also included in the output. Confidence scores for each key phrase that signify how confident we are about the KP are included. Confidence scores are a value from 0 to 1.

Use Cases

Some business use cases are:

  • Brand monitoring

  • Monitoring market research

  • Competitive market analysis

  • Customer support tickets

  • Employee feedback analysis

  • Customer reviews

  • Email analysis

Supported Features

  • Key phrases

  • Confidence scores

  • Requests support single record and multi-record batches.

Supported Languages for Input Text

  • English
  • Spanish

Examples

Input Text Key Phrases

Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner.

Red Bull Racing Honda 0.9997
Oracle Cloud Infrastructure 0.9583
infrastructure partner 0.9583
oci 0.9979

OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate.

OCI 0.9999
new services 0.9998
existing compliance program 0.9998
including SOC 0.9998
use cases 0.9998
new white papers 0.9998
guidance documents 0.9998
Object Storage 0.9998
Australian Prudential Regulation Authority 0.9998
Central Bank of Brazil 0.9998
regulated customers 0.9998
industry-specific compliance requirements 0.9998
number of compliance offerings 0.9998
regulatory alignments 0.9998
faster rate 0.9998
ISO 0.9992
customers 0.9992
apra 0.9992
resources 0.9992
services 0.8186
HIPPA 0.9979
regions 0.9147

The JSON for the first example is:

Sample Request
POST https://<region-url>/20210101/actions/batchDetectLanguageKeyPhrases
API Request format:
{
  "documents": [
    {
      "key": "doc1",
      "text": "Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner."
    }
  ]
}
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "keyPhrases": [
                {
                    "text": "red bull racing honda",
                    "score": 0.9997546563973576
                },
                {
                    "text": "oracle cloud infrastructure",
                    "score": 0.9997546563973576
                },
                {
                    "text": "infrastructure partner",
                    "score": 0.9997546563973576
                },
                {
                    "text": "oci",
                    "score": 0.9979336625058923
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}

Limitations

  • Key phrases that are noun phrases with adjective modifiers are identified so words that don't follow this criteria could be ignored.

  • This model is case insensitive.

  • Text that contains multiple punctuation between words might be flagged as a key phrase.

  • URLs that are well formed (begin with http, https, or www) are identified.

About Sentiment Analysis

Sentiment analysis can be used to gauge the mood or the tone of text.

Sentiment analysis analyzes the subjective information in an expression. For example, opinions, appraisals, emotions, or attitudes toward a topic, person, or entity. Expressions are classified, with a confidence score, as positive, negative, or neutral.

The Language service sentiment analysis uses natural language processing (NLP). The service understands the text, returns positive, neutral, mixed, and negative sentiments, and a confidence score. It supports both sentence and aspect-based sentiment analysis.

Aspect-Based Sentiment Analysis

Aspect-Based Sentiment Analysis (ABSA) extracts the individual aspects in the input document and classifies each of the aspects into one of the polarity classes: positive, negative, mixed, and neutral. With the predicted sentiment for each aspect, the Language API also provides a confidence score for each of the classes, and their corresponding offsets in the input.

Confidence scores closer to 1 indicate a higher confidence in the label's classification, while lower scores indicate lower confidence score. The range of the confidence score for each class is between 0–1, and the cumulative scores of all the four classes sum to 1.

For example, a restaurant review that says "Food is marginal, but the service is so bad.” contains positive sentiment toward the food aspect. Also, it has a strong negative sentiment toward the service aspect. Classifying the overall sentiment as negative would neglect the fact that food was good. ABSA addresses this problem by referring to an aspect as an attribute (or component) of an entity. Also, the screen of a phone or the picture quality of a camera.

If the input data is "I had a good day at work today", then a single aspect day is identified with 100% positive, 0% neutral, 0% mixed, and 0% negative sentiments.

Sentence Level Sentiment Analysis

The Language service also provides sentence level sentiment with confidence scores for each sentence in the text. Based on the use case, you can select either sentence or document sentiment, or ABSA, or both. For example, in a customer feedback analytics scenario, you may want to identify sentences that need human review for further action.

Use Cases

Some business use cases are:

  • Brand monitoring

  • Monitoring market research

  • Employee feedback analysis

  • Customer reviews and emails analysis

  • Product surveys

For example, customer and employee raw survey responses can be processed using the sentiment analysis model. The results can then be aggregated for analysis and follow up, and to facilitate engagements.

Social media monitoring can be employed with sentiment analysis to specifically extract the overall mood swing of the customer. For example, when a new product is launched, or competitive market research is conducted.

Supported Features

  • Analysis level: sentence and aspect

  • English language

  • Requests support single record and multi-record batches.

Supported Languages for Input Text

  • English
  • Spanish

Aspect-Based Sentiment Analysis Example

Input Text Sentiment Polarity Score

OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate.

services [Positive]

OCI [Positive]

resources [Positive]

regions [Positive]

1. OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable 
our customers to solve their use cases. NEUTRAL 0.5512
2. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential 
Regulation Authority (APRA), and the Central Bank of Brazil. NEUTRAL 0.5512
3. These resources help regulated customers better understand how OCI supports their regional and industry-
specific compliance requirements. POSITIVE 0.6763
4. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add 
regions and services at a faster rate. POSITIVE 0.4658

Sample Request:

API Request format:
POST https://<region-url>/20210101/actions/batchDetectLanguageSentiments?level=ASPECT 
Input JSON
{
“documents”: [
    {
        "key" : "doc1",
        "text" : "OCI recently added new services to existing compliance 
program including SOC, HIPAA, and ISO to enable our customers to solve their 
use cases. We also released new white papers and guidance documents related to 
Object Storage, the Australian Prudential Regulation Authority (APRA), and the 
Central Bank of Brazil. These resources help regulated customers better 
understand how OCI supports their regional and industry-specific compliance 
requirements. Not only are we expanding our number of compliance offerings and 
regulatory alignments, we continue to add regions and services at a faster clip."
    }
]
}
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "documentSentiment": "Positive",
            "documentScores": {
                "Neutral": 0.44763687,
                "Positive": 0.46578798,
                "Mixed": 0.064058214,
                "Negative": 0.022516921
            },
            "sentences": [
                {
                    "offset": 0,
                    "length": 147,
                    "text": "OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases.",
                    "sentiment": "Neutral",
                    "scores": {
                        "Negative": 0.0154264,
                        "Mixed": 0,
                        "Neutral": 0.98231775,
                        "Positive": 0.0022558598
                    }
                },
                {
                    "offset": 148,
                    "length": 170,
                    "text": "We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil.",
                    "sentiment": "Neutral",
                    "scores": {
                        "Mixed": 0,
                        "Neutral": 0.97296304,
                        "Negative": 0.007886417,
                        "Positive": 0.019150572
                    }
                },
                {
                    "offset": 319,
                    "length": 137,
                    "text": "These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements.",
                    "sentiment": "Neutral",
                    "scores": {
                        "Neutral": 0.5864549,
                        "Positive": 0.35583654,
                        "Mixed": 0.02932497,
                        "Negative": 0.028383587
                    }
                },
                {
                    "offset": 457,
                    "length": 145,
                    "text": "Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate.",
                    "sentiment": "Positive",
                    "scores": {
                        "Negative": 0.022516921,
                        "Positive": 0.46578798,
                        "Mixed": 0.064058214,
                        "Neutral": 0.44763687
                    }
                }
            ],
            "aspects": [
                {
                    "offset": 325,
                    "length": 9,
                    "text": "resources",
                    "sentiment": "Positive",
                    "scores": {
                        "Positive": 0.9841423768960832,
                        "Negative": 0.01398839404953044,
                        "Neutral": 0,
                        "Mixed": 0.0018692290543864747
                    }
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}

Sentence Level Sentiment Analysis Example

Input Text Sentiment Polarity Score

I was impressed with the griddle as it kept an even heat throughout the surface. My only concern is that the cord is too short, I really wish it was at least 16 inches long so I do not have to buy an extension cord. Overall, I think it is OK for the price.

Sentence 1 [Positive]

Sentence 2 [Negative]

Sentence 3 [Neutral]

Positive: 0.9997686743736267,
Negative: 0.00023133518698159605,
Neutral: 0
Positive: 0.0043107867240906,
Negative: 
0.9956892132759094,
Neutral": 0
Positive: 0.9908866882324219,
Negative: 0,
Neutral: 0.009113257750868797
Positive: 0.0933981895446777,
Negative: 0,
Neutral: 0.906601857487112284

Sample Request:

API Request format:
POST https://<region-url>/20210101/actions/batchDetectLanguageSentiments?level=SENTENCE
Input JSON

{
    "documents": [
        {
            "key": "doc1",
            "text": "OCI recently added new services to existing compliance program including SOC, HIPAA, and ISO to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster clip."
        }
    ]
}
Response JSON:
{
    "documents": [
        {
            "key": "doc1",
            "documentSentiment": "positive",
            "documentScores": {
                "positive": 0.6763582,
                "mixed": 0.08708387,
                "neutral": 0.12376911,
                "negative": 0.11278882
            },
            "sentences": [
                {
                    "text": "OCI recently added new services to existing compliance program including SOC, HIPAA, and ISO to enable our customers to solve their use cases.",
                    "sentiment": "neutral",
                    "scores": {
                        "positive": 0.15475821,
                        "neutral": 0.5567636,
                        "mixed": 0.09907853,
                        "negative": 0.18939966
                    }
                },
                {
                    "text": "We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil.",
                    "sentiment": "neutral",
                    "scores": {
                        "mixed": 0.07148028,
                        "negative": 0.12318015,
                        "positive": 0.11138679,
                        "neutral": 0.6939528
                    }
                },
                {
                    "text": "These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements.",
                    "sentiment": "positive",
                    "scores": {
                        "negative": 0.11278882,
                        "neutral": 0.12376911,
                        "mixed": 0.08708387,
                        "positive": 0.6763582
                    }
                },
                {
                    "text": "Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster clip.",
                    "sentiment": "neutral",
                    "scores": {
                        "mixed": 0.0973028,
                        "positive": 0.18745653,
                        "negative": 0.1592837,
                        "neutral": 0.55595696
                    }
                }
            ],
            "aspects": [],
            "languageCode": "en"
        }
    ],
    "errors": []
}

The actual values, and input and output structure might vary by model version, see the SDK documentation.

Limitations

  • The identified aspects might be partial matches or split aspects.

  • When sentences are semantically or structurally incorrect, the aspects could differ from your expectations.

  • Pronouns are not considered aspects.

  • Sarcasm is not recognized.

About Personal Identifiable Information

Language detects, classifies and provides options to de-identify personal identifiable information (PII) in unstructured text.

Following are the supported entities:

  • Person name

  • Address

  • Age

  • Date or time

  • Social security number or taxpayer ID (US)

  • Email

  • Passport number (US)

  • Telephone or fax (US)

  • Driver identification number (US)

  • Bank account number (US)

  • Bank routing number

  • Bank account (SWIFT)

  • Credit or debit card number

  • IP address

  • MAC address

  • Secret types

    • COOKIE

    • XSRF_TOKEN

    • AUTH_BASIC

    • AUTH_BEARER

    • JSON_WEB_TOKEN

    • PRIVATE_KEY

    • PUBLIC_KEY

  • OCI account details

    • OCI_OCID_USER

    • OCI_OCID_TENANCY

    • OCI_SMTP_USERNAME

    • OCI_OCID_REFERENCE

    • OCI_FINGERPRINT

    • OCI_CREDENTIAL

    • OCI_PRE_AUTH_REQUEST

    • OCI_STORAGE_SIGNED_URL

    • OCI_CUSTOMER_SECRET_KEY

    • OCI_ACCESS_KEy

Use Cases

Detecting and curating private information in user feedback

Many organizations collect user feedback is collected through various channels such as product reviews, return requests, support tickets, and feedback forums. You can use Language PII detection service for automatic detection of PII entities to not only proactively warn, but also anonymize before storing posted feedback. Using the automatic detection of PII entities allows you to proactively warn users about sharing private data, and applications to implement measures like storing masked data.

Scanning object storage for presence of sensitive data

Cloud storage solutions such as OCI Object Storage are widely used by employees to store business documents in the locations either locally controlled or shared by multiple teams. Ensuring that such shared locations do not store private information such as employee names, demographics and payroll information requires automatic scanning of all the documents for presence of PII. OCI language PII service provides batch API to process multiple text documents at scale for processing data at scale.

Examples

Input Text Output Text Masked with "*"

Hello Support Team,

I am reaching out to seek help with my credit card number 1234 5678 9873 2345 expiring on 11/23. There was a suspicious transaction on 12-Aug-2022 which I reported by calling from my mobile number +1 (423) 111-9999 also I emailed from my email id sarah.jones1234@hotmail.com. Would you please let me know the refund status?

Regards,

Sarah

Hello Support Team, I am reaching out to seek help with my credit card number ******************* expiring on ***** . There was a suspicious transaction on *********** which I reported by calling from my mobile number ** ************** also I emailed from my email id *************************** . Would you please let me know the refund status? Regards, *****

The JSON for the example is:

Sample Request
POST https://<region-url>/20210101/actions/batchDetectLanguagePiiEntities
API Request format:
{
  "documents": [
    {
      "languageCode": "en",
      "key": "1",
      "text": "Hello Support Team, I am reaching out to seek help with my credit card number 1234 5678 9873 2345 expiring on 11/23. There was a suspicious transaction on 12-Aug-2022 which I reported by calling from my mobile number +1 (423) 111-9999 also I emailed from my email id sarah.jones1234@hotmail.com. Would you please let me know the refund status? Regards, Sarah"
    }
  ],
  "compartmentId": "ocid1.tenancy.oc1..aaaaaaaadany3y6wdh3u3jcodcmm42ehsdno525pzyavtjbpy72eyxcu5f7q",
  "masking": {
    "ALL": {
      "mode": "MASK",
      "isUnmaskedFromEnd": true,
      "leaveCharactersUnmasked": 4
    }
  }
}
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "entities": [
                {
                    "offset": 79,
                    "length": 19,
                    "type": "CREDIT_DEBIT_NUMBER",
                    "text": "1234 5678 9873 2345",
                    "score": 0.75,
                    "isCustom": false
                },
                {
                    "offset": 111,
                    "length": 5,
                    "type": "DATE_TIME",
                    "text": "11/23",
                    "score": 0.9992455840110779,
                    "isCustom": false
                },
                {
                    "offset": 156,
                    "length": 11,
                    "type": "DATE_TIME",
                    "text": "12-Aug-2022",
                    "score": 0.998766303062439,
                    "isCustom": false
                },
                {
                    "offset": 218,
                    "length": 2,
                    "type": "TELEPHONE_NUMBER",
                    "text": "+1",
                    "score": 0.6941494941711426,
                    "isCustom": false
                },
                {
                    "offset": 221,
                    "length": 14,
                    "type": "TELEPHONE_NUMBER",
                    "text": "(423) 111-9999",
                    "score": 0.9527066349983215,
                    "isCustom": false
                },
                {
                    "offset": 268,
                    "length": 27,
                    "type": "EMAIL",
                    "text": "sarah.jones1234@hotmail.com",
                    "score": 0.95,
                    "isCustom": false
                },
                {
                    "offset": 354,
                    "length": 5,
                    "type": "PERSON",
                    "text": "Sarah",
                    "score": 0.9918518662452698,
                    "isCustom": false
                }
            ],
            "languageCode": "en",
            "maskedText": "Hello Support Team, \nI am reaching out to seek help with my credit card number ***************2345 expiring on *1/23. There was a suspicious transaction on *******2022 which I reported by calling from my mobile number +1 **********9999 also I emailed from my email id ***********************.com. Would you please let me know the refund status?\nRegards,\n*arah"
        }
    ],
    "errors": []
}