Using Pretrained Models
Learn about the Language service pretrained models.
Limitations Common to all Models
-
If your text doesn't follow English grammar, the model performance could degrade.
-
Spelling isn't checked or corrected so the results might not be as you expect when spelling mistakes exist.
Single Requests
-
A record can be up to 1,000 characters. We encourage you to use Batch Requests that support records up to 5,000 characters and support more than one record in a single request.
-
There is no minimum number of characters that must be provided, but the output quality is highly dependent on the amount of information provided to the models.
Batch Requests
-
A batch can have up to 100 records.
-
A record can be up to 5,000 characters long.
-
The total number of characters to process in a request can be up to 20,000 characters.
About Language Detection
The language detection model identifies which natural language the input text is in.
For example, language detection can help make customer support interactions more personable and quicker. Customer service chatbots can interact with customers based on the language of their input text and respond accordingly. If a customer needs help with a product, the chatbot server can field the corresponding language product manual, or transfer to the call center for the specific language.
Supported Languages
-
Afrikaans
-
Albanian
-
Arabic
-
Armenian
-
Azerbaijani
-
Basque
-
Belarusian
-
Bengali
-
Bosnian
-
Brazilian Portuguese
-
Bulgarian
-
Burmese
-
Cantonese
-
Catalan
-
Cebuano
-
Chinese
-
Croatian
-
Czech
-
Danish
-
Dutch
-
Eastern Punjabi
-
Egyptian Arabic
-
English
-
Esperanto
-
Estonian
-
Finnish
-
French
-
Georgian
-
German
-
Greek
-
Hebrew
-
Hindi
-
Hungarian
-
Icelandic
-
Indonesian
-
Irish
-
Italian
-
Japanese
-
Javanese
-
Kannada
-
Kazakh
-
Korean
-
Kurdish (Sorani)
-
Latin
-
Latvian
-
Lithuanian
-
Macedonian
-
Malay
-
Malayalam
-
Marathi
-
Minangkabau
-
Nepali
-
Norwegian (Bokmal)
-
Norwegian (Nynorsk)
-
Persian
-
Polish
-
Portuguese
-
Romanian
-
Russian
-
Serbian
-
Serbo-Croatian
-
Slovak
-
Slovene
-
Spanish
-
Swahili
-
Swedish
-
Tagalog
-
Tamil
-
Telugu
-
Thai
-
Turkish
-
Ukrainian
-
Urdu
-
Uzbek
-
Vietnamese
-
Welsh
Examples
Input Text | Language and Scores |
---|---|
OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate. |
English 0.9999 |
«Нос» - сатирический рассказ Николая Гоголя, написанный во время его жизни в Санкт-Петербурге. В это время в творчестве Гоголя основное внимание уделялось сюрреализму и гротеску с романтическим оттенком; Предлагаемый здесь рассказ «Нос» является примером. Рассказ Николая Гоголя «Нос» был написан между 1832 и 1833 годами и завершен в 1834 году, подвергался различным пересмотрам и модификациям Н. Гоголем, в основном из-за непрерывного вмешательства цензуры. |
Russian 0.9999 |
The JSON for the first example is:
- Sample Request
-
POST https://<region-url>/20210101/actions/batchDetectDominantLanguage
- API Request format:
-
{ "documents": [ { "key": "doc1", "text": "OCI recently added new services to existing compliance program including SOC, HIPAA, and ISO to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster clip." } ] }
- Response JSON:
-
{ "documents": [ { "key": "1", "languages": [ { "code": "en", "name": "English", "score": 0.9999840921009815 } ] } ], "errors": [] }}
Limitations
-
Only one language is returned. In cases where the input is multi-lingual, the dominant language is returned.
About Text Classification
Text classification analyses the text and identifies categories for the content with a confidence score.
Text classification uses the natural language processing (NLP) service that uses deep learning techniques to find insights from textual data. It returns a category from a set of the predefined categories.
For example, you could analyze a collection of financial documents to Credit and Lending, Insurance, Investing, Banking, and so on. The results are given in category and subcategory format.
Supported Features
-
Text category
-
Confidence scores
-
Requests support single record and multi-record batches.
Supported Languages for Input Text
- English
Examples
Input Text | Categories and Scores |
---|---|
Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner. |
|
The JSON for the first example is:
- Sample Request
-
POST https://<region-url>/20210101/actions/batchDetectLanguageTextClassification
- API Request format:
-
{ "documents": [ { "key": "1", "textClassification": [ { "label": "Sports and Games/Motor Sports", "score": 1 } ], "languageCode": "en" } ], "errors": [] }
- Response JSON:
-
{ "documents": [ { "key": "1", "textClassification": [ { "label": "Sports and Games/Motor Sports", "score": 1 } ], "languageCode": "en" } ], "errors": [] }
Supported Categories
Category | Domain | Sub-Domain |
---|---|---|
Animals and Plants/Pets | Animals and Plants | Pets |
Arts and Culture/Audio and Music | Arts and Culture | Audio and Music |
Arts and Culture/Performing Arts | Arts and Culture | Performing Arts |
Arts and Culture/Visual Art and Design | Arts and Culture | Visual Art and Design |
Autos and Vehicles/Bicycles and Accessories | Autos and Vehicles | Bicycles and Accessories |
Autos and Vehicles/Boats and Watercraft | Autos and Vehicles | Boats and Watercraft |
Autos and Vehicles/Motor Vehicles | Autos and Vehicles | Motor Vehicles |
Autos and Vehicles/Vehicle Parts and Services | Autos and Vehicles | Vehicle Parts and Services |
Autos and Vehicles/Vehicle Shopping and Renting | Autos and Vehicles | Vehicle Shopping and Renting |
Books and Literature/Children'S Literature | Books and Literature | Children'S Literature |
Books and Literature/E-Books | Books and Literature | E-Books |
Books and Literature/Poetry | Books and Literature | Poetry |
Business and Industry/Advertising and Marketing | Business and Industry | Advertising and Marketing |
Business and Industry/Agriculture and Forestry | Business and Industry | Agriculture and Forestry |
Business and Industry/Business Operations | Business and Industry | Business Operations |
Business and Industry/Business Services | Business and Industry | Business Services |
Business and Industry/Construction and Maintenance | Business and Industry | Construction and Maintenance |
Business and Industry/Defence | Business and Industry | Defence |
Business and Industry/Energy and Utilities | Business and Industry | Energy and Utilities |
Business and Industry/Manufacturing | Business and Industry | Manufacturing |
Business and Industry/Metals and Mining | Business and Industry | Metals and Mining |
Business and Industry/Printing and Publishing | Business and Industry | Printing and Publishing |
Business and Industry/Textiles and Nonwovens | Business and Industry | Textiles and Nonwovens |
Business and Industry/Transportation and Logistics | Business and Industry | Transportation and Logistics |
Computer and Electronics/Computer Hardware | Computer and Electronics | Computer Hardware |
Computer and Electronics/Computer Security | Computer and Electronics | Computer Security |
Computer and Electronics/Consumer Electronics | Computer and Electronics | Consumer Electronics |
Computer and Electronics/Electronics and Electrical | Computer and Electronics | Electronics and Electrical |
Computer and Electronics/Enterprise Software | Computer and Electronics | Enterprise Software |
Computer and Electronics/Programming | Computer and Electronics | Programming |
Computer and Electronics/Software | Computer and Electronics | Software |
Education and Occupation/Education | Education and Occupation | Education |
Education and Occupation/Jobs | Education and Occupation | Jobs |
Entertainment/Celebrities and Famous Personalities | Entertainment | Celebrities and Famous Personalities |
Entertainment/Comics and Animation | Entertainment | Comics and Animation |
Entertainment/Events and Listings | Entertainment | Events and Listings |
Entertainment/Film Industry | Entertainment | Film Industry |
Entertainment/Humor | Entertainment | Humor |
Entertainment/Movies | Entertainment | Movies |
Finance/Accounting and Auditing | Finance | Accounting and Auditing |
Finance/Banking | Finance | Banking |
Finance/Credit and Lending | Finance | Credit and Lending |
Finance/Financial Planning and Management | Finance | Financial Planning and Management |
Finance/Grants, Scholarships and Financial Aid | Finance | Grants, Scholarships and Financial Aid |
Finance/Investing | Finance | Investing |
Fitness and Beauty/Fashion and Style | Fitness and Beauty | Fashion and Style |
Fitness and Beauty/Weight Loss | Fitness and Beauty | Weight Loss |
Food and Grocery/Cooking and Recipes | Food and Grocery | Cooking and Recipes |
Food and Grocery/Drinks and Beverages | Food and Grocery | Drinks and Beverages |
Food and Grocery/Food | Food and Grocery | Food |
Food and Grocery/Grocery Retailers | Food and Grocery | Grocery Retailers |
Food and Grocery/Restaurants | Food and Grocery | Restaurants |
Government and Laws/Government | Government and Laws | Government |
Government and Laws/Legal | Government and Laws | Legal |
Government and Laws/Military | Government and Laws | Military |
Government and Laws/Public Safety | Government and Laws | Public Safety |
Groups and Communities/Social Networks | Groups and Communities | Social Networks |
Health and Medical/Aging and Geriatrics | Health and Medical | Aging and Geriatrics |
Health and Medical/Conditions and Disease | Health and Medical | Conditions and Disease |
Health and Medical/Medical Facilities and Services | Health and Medical | Medical Facilities and Services |
Health and Medical/Mental Health | Health and Medical | Mental Health |
Health and Medical/Nursing | Health and Medical | Nursing |
Health and Medical/Nutrition | Health and Medical | Nutrition |
Health and Medical/Oral and Dental Care | Health and Medical | Oral and Dental Care |
Health and Medical/Pharmacy | Health and Medical | Pharmacy |
Health and Medical/Women'S Health | Health and Medical | Women'S Health |
Hobbies and Leisure Activities/Crafts | Hobbies and Leisure Activities | Crafts |
Hobbies and Leisure Activities/Outdoors | Hobbies and Leisure Activities | Outdoors |
Hobbies and Leisure Activities/Water Activities | Hobbies and Leisure Activities | Water Activities |
Home and Decor/Bed and Bath | Home and Decor | Bed and Bath |
Home and Decor/Gardening and Landscaping | Home and Decor | Gardening and Landscaping |
Home and Decor/Home Furnishings | Home and Decor | Home Furnishings |
Home and Decor/Home Improvement | Home and Decor | Home Improvement |
Home and Decor/Kitchen and Dining | Home and Decor | Kitchen and Dining |
Home and Decor/Yard and Patio | Home and Decor | Yard and Patio |
Hotel and Travel/Hotels and Accommodations | Hotel and Travel | Hotels and Accommodations |
Hotel and Travel/Personal Travel | Hotel and Travel | Personal Travel |
Internet and Communications/Communications Equipment | Internet and Communications | Communications Equipment |
Internet and Communications/Email and Messaging | Internet and Communications | Email and Messaging |
Internet and Communications/Service Providers | Internet and Communications | Service Providers |
Internet and Communications/Web Services | Internet and Communications | Web Services |
Real Estate and Properties/Real Estate Listings | Real Estate and Properties | Real Estate Listings |
Retailing and Shopping/Consumer Resources | Retailing and Shopping | Consumer Resources |
Retailing and Shopping/Department Stores | Retailing and Shopping | Department Stores |
Retailing and Shopping/Gifts and Special Event Items | Retailing and Shopping | Gifts and Special Event Items |
Science and Technology/Biology | Science and Technology | Biology |
Science and Technology/Chemistry | Science and Technology | Chemistry |
Science and Technology/Computer Science | Science and Technology | Computer Science |
Science and Technology/Earth Sciences | Science and Technology | Earth Sciences |
Science and Technology/Ecology and Environment | Science and Technology | Ecology and Environment |
Science and Technology/Engineering | Science and Technology | Engineering |
Science and Technology/Mathematics and Statistics | Science and Technology | Mathematics and Statistics |
Science and Technology/Physics | Science and Technology | Physics |
Science and Technology/Social Science | Science and Technology | Social Science |
Society and State/Family and Relationships | Society and State | Family and Relationships |
Society and State/Kids and Teens | Society and State | Kids and Teens |
Society and State/Religion and Belief | Society and State | Religion and Belief |
Society and State/Social Issues and Advocacy | Society and State | Social Issues and Advocacy |
Sports and Games/Animal Sports | Sports and Games | Animal Sports |
Sports and Games/Board Games | Sports and Games | Board Games |
Sports and Games/Card Games | Sports and Games | Card Games |
Sports and Games/Combat Sports | Sports and Games | Combat Sports |
Sports and Games/Computer and Video Games | Sports and Games | Computer and Video Games |
Sports and Games/Gambling | Sports and Games | Gambling |
Sports and Games/Individual Sports | Sports and Games | Individual Sports |
Sports and Games/International Sports Competitions | Sports and Games | International Sports Competitions |
Sports and Games/Motor Sports | Sports and Games | Motor Sports |
Sports and Games/Table Games | Sports and Games | Table Games |
Sports and Games/Team Sports | Sports and Games | Team Sports |
Sports and Games/Water Sports | Sports and Games | Water Sports |
Sports and Games/Winter Sports | Sports and Games | Winter Sports |
Limitations
-
If your text is about more than one category, the major category of the text is identified and could differ from human interpretation of the text.
About Named Entity Recognition
Named Entity Recognition (NER) detects named entities in text.
The NER model uses natural language processing to find a variety of named entities. For each entity extracted, NER also returns the location of the entity extracted (offset and length), and a confidence score, which is a value 0–1.
Supported Languages for Input Text
- English
- Spanish
Use Cases
You could use the NER endpoint effectively in these scenarios:
- Classifying content for news providers
-
It can be difficult to classify and categorize news article content. The NER model can automatically scan articles to identify the major people, organizations, and places in them. The extracted entities can be saved as tags with the related articles. Knowing the relevant tags for each article helps with automatically categorizing the articles in defined hierarchies, and enables content discovery.
- Customer support
-
Recognizing relevant entities in customer complaints and feedback, product specifications, department details, or company branch details, helps to classify the feedback appropriately. The entities can then be forwarded to the person responsible for the identified product.
Similarly, there could be feedback tweets where you can categorize them all based on their locations, and the products mentioned.
- Efficient search algorithms
-
You could use NER to extract entities that are then searched against the query, instead of searching for a query across the millions of articles and websites online. When run on articles, all the relevant entities associated with each article are extracted and stored separately. This separation could speed up the search process considerably. The search term is only matched with a small list of entities in each article, leading to quick and efficient searches.
It can be used for searching content from millions of research papers, Wikipedia articles, blogs, articles, and so on.
- Content recommendations
-
Extracting entities from a particular article, and recommending the other articles that have the most similar entities mentioned in them is possible with NER. For example, it can be used effectively to develop content recommendations for a media industry client. It enables the extraction of the entities associated with historical content or previous activities. NER compares them with the label assigned to other unseen content to filter relevant entities.
- Automatically summarizing job candidates
-
The NER model could facilitate the evaluation of job candidates, by simplifying the effort required to shortlist candidates with numerous applications. Recruiters could filter and categorize them based on identified entities like location, college degrees, employers, skills, designations, certifications, and patents.
Supported Entities
The following table describes the different entities that NER can extract. The entity
type and subtype depends on the API that you call
(detectDominantLanguageEntities
or
batchDetectDominantLanguageEntities
).
To maintain backward compatibility, the
detectDominantLanguageEntities
wasn't modified when we
introduced the concept of subtype. We recommend that you use the
batchDetectDominantLanguageEntities
endpoint because the
service uses types and subtypes. The isPii
property was dropped
to introduce the batching API so you can compute it with the supported entity
types as in the following table.
Entity (Full Name) | Entity Type (In Prediction) | Entity Subtype (In prediction) | Single Record API / Batch API (if blank, both APIs are consistent) | Is PII | Description |
---|---|---|---|---|---|
DATE |
DATE |
Single record |
X |
Absolute or relative dates, periods, and date range. Examples: “10th of June”, “third Friday in August” “the first week of March” |
|
DATETIME |
DATE |
Batch | |||
EMAIL |
EMAIL |
√ | |||
EVENT |
EVENT |
Χ | Named hurricanes, sports events, and so on. | ||
FACILITY |
FACILITY |
Single record | Χ | Buildings, airports, highways, bridges, and so on. | |
LOCATION |
FACILITY |
Batch | |||
GEOPOLITICAL ENTITY |
GPE |
Single record | Χ | Countries, cities, and states. | |
LOCATION |
GPE |
Batch | |||
IP ADDRESS |
IPADDRESS |
√ | IP address according to IPv4 and IPv6 standards. | ||
LANGUAGE |
LANGUAGE |
Χ | Any named language. | ||
LOCATION |
LOCATION |
Χ | Non-GPE locations, mountain ranges, bodies of water. | ||
CURRENCY |
MONEY |
Single record |
X |
Monetary values, including the unit. | |
QUANTITY |
CURRENCY |
Batch | |||
|
NORP |
Χ | Nationalities, religious or political groups. | ||
ORGANIZATION |
ORG |
Χ | Companies, agencies, institutions, and so on. | ||
PERCENTAGE |
PERCENT |
Single record | Χ | Percentage. | |
QUANTITY |
PERCENTAGE |
Batch | |||
PERSON |
PERSON |
√ | People, including fictional characters. | ||
PHONENUMBER |
PHONE_NUMBER |
√ |
Supported phone numbers:
|
||
PRODUCT |
PRODUCT |
Χ | Vehicles, tools, foods, and so on (not services). | ||
NUMBER |
QUANTITY |
Single record | Χ | Measurements, as weight or distance. | |
QUANTITY |
NUMBER |
Batch | X | ||
TIME |
TIME |
Single record |
Χ
|
Anything less than 24 hours (time, duration, and so on). | |
DATETIME |
TIME |
Batch | |||
URL |
URL |
√ | URL. |
Examples
Input Text | Entities and Scores |
---|---|
Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner. |
|
OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate. |
|
The JSON for the first example is:
- Sample Request
-
POST https://<region-url>/20210101/actions/batchDetectLanguageEntities
- API Request format:
-
"{ "documents": [ { "key": "doc1", "text": " Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner." } ] }"
- Response JSON:
-
{ "documents": [ { "key": "1", "entities": [ { "offset": 0, "length": 15, "text": "Red Bull Racing", "type": "ORGANIZATION", "subType": null, "score": 0.9914557933807373, "metaInfo": null }, { "offset": 16, "length": 5, "text": "Honda", "type": "ORGANIZATION", "subType": null, "score": 0.6515499353408813, "metaInfo": null }, { "offset": 27, "length": 9, "text": "four-time", "type": "QUANTITY", "subType": null, "score": 0.9998091459274292, "metaInfo": [ { "offset": 27, "length": 9, "text": "four-time", "subType": "UNIT", "score": 0.9998091459274292 } ] }, { "offset": 47, "length": 5, "text": "World", "type": "LOCATION", "subType": "NON_GPE", "score": 0.5825434327125549, "metaInfo": null }, { "offset": 79, "length": 27, "text": "Oracle Cloud Infrastructure", "type": "ORGANIZATION", "subType": null, "score": 0.998045802116394, "metaInfo": null }, { "offset": 108, "length": 3, "text": "OCI", "type": "ORGANIZATION", "subType": null, "score": 0.9986366033554077, "metaInfo": null } ], "languageCode": "en" } ], "errors": [] }
Limitations
-
Sometimes, entities might not be separated or combined as you expect.
-
NER uses the context of the sentence to identify entities. If the context is not present in the text processed, entities might not be extracted as you expect.
-
Malformed text (structure and semantics) might reduce the performance.
-
Age isn't a separate entity so age-related periods might be identified as a date entity.
About Key Phrase Extraction
Keyword extraction is the automated process of extracting the words with the most relevance, and expressions from the input text. It helps summarize the content, and recognizes the main topics.
The key phrase extraction model uses NLP and ML to find insights related to the main points of the text. It understands the unstructured input text, and returns key words and key phrases (KPs).
The KPs consists of subjects and objects that are being talked about in the document. Any modifiers, like adjectives associated with these subjects and objects, are also included in the output. Confidence scores for each key phrase that signify how confident we are about the KP are included. Confidence scores are a value from 0 to 1.
Use Cases
Some business use cases are:
-
Brand monitoring
-
Monitoring market research
-
Competitive market analysis
-
Customer support tickets
-
Employee feedback analysis
-
Customer reviews
-
Email analysis
Supported Features
-
Key phrases
-
Confidence scores
-
Requests support single record and multi-record batches.
Supported Languages for Input Text
- English
- Spanish
Examples
Input Text | Key Phrases |
---|---|
Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner. |
|
OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate. |
|
The JSON for the first example is:
- Sample Request
-
POST https://<region-url>/20210101/actions/batchDetectLanguageKeyPhrases
- API Request format:
-
{ "documents": [ { "key": "doc1", "text": "Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner." } ] }
- Response JSON:
-
{ "documents": [ { "key": "1", "keyPhrases": [ { "text": "red bull racing honda", "score": 0.9997546563973576 }, { "text": "oracle cloud infrastructure", "score": 0.9997546563973576 }, { "text": "infrastructure partner", "score": 0.9997546563973576 }, { "text": "oci", "score": 0.9979336625058923 } ], "languageCode": "en" } ], "errors": [] }
Limitations
-
Key phrases that are noun phrases with adjective modifiers are identified so words that don't follow this criteria could be ignored.
-
This model is case insensitive.
-
Text that contains multiple punctuation between words might be flagged as a key phrase.
-
URLs that are well formed (begin with http, https, or www) are identified.
About Sentiment Analysis
Sentiment analysis can be used to gauge the mood or the tone of text.
Sentiment analysis analyzes the subjective information in an expression. For example, opinions, appraisals, emotions, or attitudes toward a topic, person, or entity. Expressions are classified, with a confidence score, as positive, negative, or neutral.
The Language service sentiment analysis uses natural language processing (NLP). The service understands the text, returns positive, neutral, mixed, and negative sentiments, and a confidence score. It supports both sentence and aspect-based sentiment analysis.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA) extracts the individual aspects in the input document and classifies each of the aspects into one of the polarity classes: positive, negative, mixed, and neutral. With the predicted sentiment for each aspect, the Language API also provides a confidence score for each of the classes, and their corresponding offsets in the input.
Confidence scores closer to 1 indicate a higher confidence in the label's classification, while lower scores indicate lower confidence score. The range of the confidence score for each class is between 0–1, and the cumulative scores of all the four classes sum to 1.
For example, a restaurant review that says "Food is marginal, but the service is so bad.” contains positive sentiment toward the food aspect. Also, it has a strong negative sentiment toward the service aspect. Classifying the overall sentiment as negative would neglect the fact that food was good. ABSA addresses this problem by referring to an aspect as an attribute (or component) of an entity. Also, the screen of a phone or the picture quality of a camera.
If the input data is "I had a good day at work today", then a single aspect day is identified with 100% positive, 0% neutral, 0% mixed, and 0% negative sentiments.
Sentence Level Sentiment Analysis
The Language service also provides sentence level sentiment with confidence scores for each sentence in the text. Based on the use case, you can select either sentence or document sentiment, or ABSA, or both. For example, in a customer feedback analytics scenario, you may want to identify sentences that need human review for further action.
Use Cases
Some business use cases are:
-
Brand monitoring
-
Monitoring market research
-
Employee feedback analysis
-
Customer reviews and emails analysis
-
Product surveys
For example, customer and employee raw survey responses can be processed using the sentiment analysis model. The results can then be aggregated for analysis and follow up, and to facilitate engagements.
Social media monitoring can be employed with sentiment analysis to specifically extract the overall mood swing of the customer. For example, when a new product is launched, or competitive market research is conducted.
Supported Features
-
Analysis level: sentence and aspect
-
English language
-
Requests support single record and multi-record batches.
Supported Languages for Input Text
- English
- Spanish
Aspect-Based Sentiment Analysis Example
Input Text | Sentiment | Polarity Score |
---|---|---|
OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate. |
services [Positive] OCI [Positive] resources [Positive] regions [Positive] |
|
Sample Request:
- API Request format:
-
POST https://<region-url>/20210101/actions/batchDetectLanguageSentiments?level=ASPECT
- Input JSON
-
{ “documents”: [ { "key" : "doc1", "text" : "OCI recently added new services to existing compliance program including SOC, HIPAA, and ISO to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster clip." } ] }
- Response JSON:
-
{ "documents": [ { "key": "1", "documentSentiment": "Positive", "documentScores": { "Neutral": 0.44763687, "Positive": 0.46578798, "Mixed": 0.064058214, "Negative": 0.022516921 }, "sentences": [ { "offset": 0, "length": 147, "text": "OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases.", "sentiment": "Neutral", "scores": { "Negative": 0.0154264, "Mixed": 0, "Neutral": 0.98231775, "Positive": 0.0022558598 } }, { "offset": 148, "length": 170, "text": "We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil.", "sentiment": "Neutral", "scores": { "Mixed": 0, "Neutral": 0.97296304, "Negative": 0.007886417, "Positive": 0.019150572 } }, { "offset": 319, "length": 137, "text": "These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements.", "sentiment": "Neutral", "scores": { "Neutral": 0.5864549, "Positive": 0.35583654, "Mixed": 0.02932497, "Negative": 0.028383587 } }, { "offset": 457, "length": 145, "text": "Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate.", "sentiment": "Positive", "scores": { "Negative": 0.022516921, "Positive": 0.46578798, "Mixed": 0.064058214, "Neutral": 0.44763687 } } ], "aspects": [ { "offset": 325, "length": 9, "text": "resources", "sentiment": "Positive", "scores": { "Positive": 0.9841423768960832, "Negative": 0.01398839404953044, "Neutral": 0, "Mixed": 0.0018692290543864747 } } ], "languageCode": "en" } ], "errors": [] }
Sentence Level Sentiment Analysis Example
Input Text | Sentiment | Polarity Score |
---|---|---|
I was impressed with the griddle as it kept an even heat throughout the surface. My only concern is that the cord is too short, I really wish it was at least 16 inches long so I do not have to buy an extension cord. Overall, I think it is OK for the price. |
Sentence 1 [Positive] Sentence 2 [Negative] Sentence 3 [Neutral] |
|
Sample Request:
- API Request format:
-
POST https://<region-url>/20210101/actions/batchDetectLanguageSentiments?level=SENTENCE
- Input JSON
-
{ "documents": [ { "key": "doc1", "text": "OCI recently added new services to existing compliance program including SOC, HIPAA, and ISO to enable our customers to solve their use cases. We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster clip." } ] }
- Response JSON:
-
{ "documents": [ { "key": "doc1", "documentSentiment": "positive", "documentScores": { "positive": 0.6763582, "mixed": 0.08708387, "neutral": 0.12376911, "negative": 0.11278882 }, "sentences": [ { "text": "OCI recently added new services to existing compliance program including SOC, HIPAA, and ISO to enable our customers to solve their use cases.", "sentiment": "neutral", "scores": { "positive": 0.15475821, "neutral": 0.5567636, "mixed": 0.09907853, "negative": 0.18939966 } }, { "text": "We also released new white papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil.", "sentiment": "neutral", "scores": { "mixed": 0.07148028, "negative": 0.12318015, "positive": 0.11138679, "neutral": 0.6939528 } }, { "text": "These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements.", "sentiment": "positive", "scores": { "negative": 0.11278882, "neutral": 0.12376911, "mixed": 0.08708387, "positive": 0.6763582 } }, { "text": "Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster clip.", "sentiment": "neutral", "scores": { "mixed": 0.0973028, "positive": 0.18745653, "negative": 0.1592837, "neutral": 0.55595696 } } ], "aspects": [], "languageCode": "en" } ], "errors": [] }
The actual values, and input and output structure might vary by model version, see the SDK documentation.
Limitations
-
The identified aspects might be partial matches or split aspects.
-
When sentences are semantically or structurally incorrect, the aspects could differ from your expectations.
-
Pronouns are not considered aspects.
-
Sarcasm is not recognized.
About Personal Identifiable Information
Language detects, classifies and provides options to de-identify personal identifiable information (PII) in unstructured text.
Following are the supported entities:
-
Person name
-
Address
-
Age
-
Date or time
-
Social security number or taxpayer ID (US)
-
Email
-
Passport number (US)
-
Telephone or fax (US)
-
Driver identification number (US)
-
Bank account number (US)
-
Bank routing number
-
Bank account (SWIFT)
-
Credit or debit card number
-
IP address
-
MAC address
-
Secret types
-
COOKIE
-
XSRF_TOKEN
-
AUTH_BASIC
-
AUTH_BEARER
-
JSON_WEB_TOKEN
-
PRIVATE_KEY
-
PUBLIC_KEY
-
-
OCI account details
-
OCI_OCID_USER
-
OCI_OCID_TENANCY
-
OCI_SMTP_USERNAME
-
OCI_OCID_REFERENCE
-
OCI_FINGERPRINT
-
OCI_CREDENTIAL
-
OCI_PRE_AUTH_REQUEST
-
OCI_STORAGE_SIGNED_URL
-
OCI_CUSTOMER_SECRET_KEY
-
OCI_ACCESS_KEy
-
Use Cases
- Detecting and curating private information in user feedback
-
Many organizations collect user feedback is collected through various channels such as product reviews, return requests, support tickets, and feedback forums. You can use Language PII detection service for automatic detection of PII entities to not only proactively warn, but also anonymize before storing posted feedback. Using the automatic detection of PII entities allows you to proactively warn users about sharing private data, and applications to implement measures like storing masked data.
- Scanning object storage for presence of sensitive data
-
Cloud storage solutions such as OCI Object Storage are widely used by employees to store business documents in the locations either locally controlled or shared by multiple teams. Ensuring that such shared locations do not store private information such as employee names, demographics and payroll information requires automatic scanning of all the documents for presence of PII. OCI language PII service provides batch API to process multiple text documents at scale for processing data at scale.
Examples
Input Text | Output Text Masked with "*" |
---|---|
Hello Support Team, I am reaching out to seek help with my credit card number 1234 5678 9873 2345 expiring on 11/23. There was a suspicious transaction on 12-Aug-2022 which I reported by calling from my mobile number +1 (423) 111-9999 also I emailed from my email id sarah.jones1234@hotmail.com. Would you please let me know the refund status? Regards, Sarah |
Hello Support Team, I am reaching out to seek help with my credit card number ******************* expiring on ***** . There was a suspicious transaction on *********** which I reported by calling from my mobile number ** ************** also I emailed from my email id *************************** . Would you please let me know the refund status? Regards, ***** |
The JSON for the example is:
- Sample Request
-
POST https://<region-url>/20210101/actions/batchDetectLanguagePiiEntities
- API Request format:
-
{ "documents": [ { "languageCode": "en", "key": "1", "text": "Hello Support Team, I am reaching out to seek help with my credit card number 1234 5678 9873 2345 expiring on 11/23. There was a suspicious transaction on 12-Aug-2022 which I reported by calling from my mobile number +1 (423) 111-9999 also I emailed from my email id sarah.jones1234@hotmail.com. Would you please let me know the refund status? Regards, Sarah" } ], "compartmentId": "ocid1.tenancy.oc1..aaaaaaaadany3y6wdh3u3jcodcmm42ehsdno525pzyavtjbpy72eyxcu5f7q", "masking": { "ALL": { "mode": "MASK", "isUnmaskedFromEnd": true, "leaveCharactersUnmasked": 4 } } }
- Response JSON:
-
{ "documents": [ { "key": "1", "entities": [ { "offset": 79, "length": 19, "type": "CREDIT_DEBIT_NUMBER", "text": "1234 5678 9873 2345", "score": 0.75, "isCustom": false }, { "offset": 111, "length": 5, "type": "DATE_TIME", "text": "11/23", "score": 0.9992455840110779, "isCustom": false }, { "offset": 156, "length": 11, "type": "DATE_TIME", "text": "12-Aug-2022", "score": 0.998766303062439, "isCustom": false }, { "offset": 218, "length": 2, "type": "TELEPHONE_NUMBER", "text": "+1", "score": 0.6941494941711426, "isCustom": false }, { "offset": 221, "length": 14, "type": "TELEPHONE_NUMBER", "text": "(423) 111-9999", "score": 0.9527066349983215, "isCustom": false }, { "offset": 268, "length": 27, "type": "EMAIL", "text": "sarah.jones1234@hotmail.com", "score": 0.95, "isCustom": false }, { "offset": 354, "length": 5, "type": "PERSON", "text": "Sarah", "score": 0.9918518662452698, "isCustom": false } ], "languageCode": "en", "maskedText": "Hello Support Team, \nI am reaching out to seek help with my credit card number ***************2345 expiring on *1/23. There was a suspicious transaction on *******2022 which I reported by calling from my mobile number +1 **********9999 also I emailed from my email id ***********************.com. Would you please let me know the refund status?\nRegards,\n*arah" } ], "errors": [] }