Vision can detect and recognize text in a document.
Language classification identifies the language of a document, then OCR draws bounding boxes
around the printed or hand-written text it finds in an image, and digitizes the
text.
If you have a PDF with text, Vision finds the text in
that document and extracts the text. It then provides bounding boxes for the identified text.
Text Detection can be used with Document AI or Image Analysis models.
Vision provides a confidence score for each text grouping.
The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in
the extracted text, while lower scores indicate lower confidence score. The range of the
confidence score for each label is from 0 to 1.
Note
OCR support is limited to English. If you know that the text in
the images is in English, set the language to Eng.
Supported features are:
Word extraction
Text line extraction
Confidence score
Bounding polygons
Single request
Batch request
Limitations are:
Although Language classification identifies several languages, OCR is limited to
English.
Document Classification can be used to classify a document.
Vision provides a list of possible document types for
the analyzed document. Each document type has a confidence score. The confidence score is a
decimal number. Scores closer to 1 indicate a higher confidence in the extracted text, while
lower scores indicate lower confidence score. The range of the confidence score for each label
is between 0-1. The list of possible document types is:
Table extraction can be used to identify tables in a document and extract their
contents. For example, if a PDF receipt contains a table that includes the taxes and total
amount, Vision identifies the table and extract the table
structure.
Vision provides the number of rows and columns for the
table and the contents in each table cell. Each cell has a confidence score. The confidence
score is a decimal number. Scores closer to 1 indicate a higher confidence in the extracted
text, while lower scores indicate lower confidence score. The range of the confidence score
for each label is from 0 to 1.
Supported features are:
Table extraction for tables with and without borders
Key value extraction can be used to identify values for predefined keys in a receipt.
For example, if a receipt includes a merchant name, merchant address, or merchant phone number,
Vision can identify these values and return them as a key
value pair.
Supported features are:
Extract values for predefined key value pairs
Bounding polygons
Single request
Batch request
Limitations:
Supports receipts in English only.
Supported fields are:
MerchantName
The name of the merchant issuing the receipt.
MerchantPhoneNumber
The telephone number of the merchant.
MerchantAddress
The address of the merchant.
TransactionDate
The date the receipt was issued.
TransactionTime
The time the receipt was issued.
Total
The total amount of the receipt, after all charges and taxes have been applied.
OCR PDF generates a searchable PDF file in your Object Storage. For example, Vision can take a PDF file with text and images, and return a PDF
file where you can search for the text in the PDF.
Vision provides pretrained models for customers to
extract insights about their documents without needing Data Scientists.
You need the following before using a pretrained model:
A paid tenancy account in Oracle Cloud Infrastructure.
Familiarity with Oracle Cloud Infrastructure Object Storage.
You can call the pretrained Document AI models as a batch request using Rest APIs,
SDK, or CLI. You can call the pretrained Document AI models as a single request using
the Console, Rest APIs, SDK, or CLI.
See the Limits section for information on what is allowed in batch
requests.