Perform Document Classification and Key Value Extraction

Use prebuilt OCI Document Understanding models to build document classification and key value extraction into your applications without machine learning (ML) or artificial intelligence (AI) expertise. For example, you might use document classification to identify passports, driver licenses, receipts, and invoices.

If you have fewer than 10,000 documents, you can process them in a single data flow. If you have more than 10,000 documents, then create a separate data flow to process each bucket (that is, using a separate dataset for each bucket), and use a Sequence to sequentially process the data flows. See Process Data Using a Sequence of Data Flows.
Prerequisites:
  1. On the Oracle Analytics Home page, click Create, and then click Data Flow.
  2. Select the dataset linking to the documents you want to analyze, then click Add.

  3. In the Data Flow editor, click Add a step (+).
  4. From the Data Flow Steps pane, double-click Apply AI Model, and then select the model to use.
    For example, you might select Pretrained Document Classification to identify passports.
  5. In Apply AI Model, go to the Inputs section, and configure the Input Column and Input Type parameters.
    • If you're referencing your source documents by bucket, in Input Column select URL, and in Input Type select Buckets.

    • If you're referencing your source documents individually, in Input Column select File Location, and in Input Type select Documents.
  6. In the data flow editor, click Add a step (+) and select Save Data.
  7. In Name, enter a name for the output dataset.
    For example, you might call the dataset 'Passport Identification Analysis Results'.
  8. In the Save data to field, specify the location for output dataset.
  9. Click Save, enter a name for the data flow, and click OK.
  10. Click Run Data Flow.
When the data flow completes the analysis, open the dataset that you specified in Step 7.

To locate the generated dataset, from the Oracle Analytics home page, navigate to Data, then Datasets.
Description of oci_du_files13.png follows
Description of the illustration oci_du_files13.png

For more detail about the generated results, see Output Data Generated for OCI Document Understanding Models.