Vision provides pretrained document AI models that let
you organize and extract text and structure from business documents.
Pretrained models let you use AI with no data science experience. Provide an image-based
document to the Vision service and get back information about your document without needing to
create your own model.
Important
The AnalyzeDocument and
DocumentJob capabilities in Vision are moving to a new
service, Document Understanding. The following features are
impacted:
Table detection
Document classification
Receipt key-value extraction
Document OCR
These features are available in Vision until January
1, 2024. After then, they're available only in Document Understanding.
Use Cases
Pretrained document AI models let you automate back-office operations, and process
receipts more accurately.
Intelligent search
Enrich image-based files with metadata, including document type and key fields, for
easier retrieval.
Expense reporting
Extract the required information from receipts to automate business workflows. For
example, employee expense reporting, spending compliance, and reimbursement.
Downstream Natural Language Processing (NLP)
Extract text from PDF files and organize it as the input for NLP, either in tables or
in words and lines.
Loyalty points capture
Automate loyalty points calculations from receipts, based on the number of items or
the total amount paid.
Supported Formats
Vision supports several document formats.
Documents can be uploaded either from a local file or Oracle Cloud Infrastructure Object Storage. They can be in the following formats:
Vision can detect and recognize text in a document.
Language classification identifies the language of a document, then OCR draws bounding boxes
around the printed or hand-written text it finds in an image, and digitizes the
text.
If you have a PDF with text, Vision finds the text in
that document and extracts the text. It then provides bounding boxes for the identified text.
Text Detection can be used with Document AI or Image Analysis models.
Vision provides a confidence score for each text grouping.
The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in
the extracted text, while lower scores indicate lower confidence score. The range of the
confidence score for each label is from 0 to 1.
Note
OCR support is limited to English. If you know that the text in
the images is in English, set the language to Eng.
Supported features are:
Word extraction
Text line extraction
Confidence score
Bounding polygons
Single request
Batch request
Limitations are:
Although Language classification identifies several languages, OCR is limited to
English.
Document Classification can be used to classify a document.
Vision provides a list of possible document types for
the analyzed document. Each document type has a confidence score. The confidence score is a
decimal number. Scores closer to 1 indicate a higher confidence in the extracted text, while
lower scores indicate lower confidence score. The range of the confidence score for each label
is between 0 to 1. The list of possible document types is:
Table extraction can be used to identify tables in a document and extract their
contents. For example, if a PDF receipt contains a table that includes the taxes and total
amount, Vision identifies the table and extract the table
structure.
Vision provides the number of rows and columns for the
table and the contents in each table cell. Each cell has a confidence score. The confidence
score is a decimal number. Scores closer to 1 indicate a higher confidence in the extracted
text, while lower scores indicate lower confidence score. The range of the confidence score
for each label is from 0 to 1.
Supported features are:
Table extraction for tables with and without borders
Key value extraction can be used to identify values for predefined keys in a receipt.
For example, if a receipt includes a merchant name, merchant address, or merchant phone number,
Vision can identify these values and return them as a key
value pair.
Supported features are:
Extract values for predefined key value pairs
Bounding polygons
Single request
Batch request
Limitations:
Supports receipts in English only.
Supported fields are:
MerchantName
The name of the merchant issuing the receipt.
MerchantPhoneNumber
The telephone number of the merchant.
MerchantAddress
The address of the merchant.
TransactionDate
The date the receipt was issued.
TransactionTime
The time the receipt was issued.
Total
The total amount of the receipt, after all charges and taxes have been applied.
OCR PDF generates a searchable PDF file in your Object Storage. For example, Vision can take a PDF file with text and images, and return a PDF
file where you can search for the text in the PDF.
Vision provides pretrained models for customers to
extract insights about their documents without needing Data Scientists.
You need the following before using a pretrained model:
A paid tenancy account in Oracle Cloud Infrastructure.
Familiarity with Oracle Cloud Infrastructure Object Storage.
You can call the pretrained Document AI models as a batch request using Rest APIs,
SDK, or CLI. You can call the pretrained Document AI models as a single request using
the Console, Rest APIs, SDK, or CLI.
See the Limits section for information on what is allowed in batch
requests.