Prepare Documents to Analyze with an OCI Document Understanding Model

You use buckets in OCI Object Storage to store the documents that you want to analyze, then create a dataset to access these documents in Oracle Analytics.

You typically store input documents and AI models in the same Oracle Cloud account (tenancy), which makes it easier to setup in Oracle Analytics.

If your input documents and AI models are stored in different tenancies:
  • Make sure that the visibility of the storage bucket containing your input documents is public. See Change the visibility of a bucket.
  • Populate the input dataset for the data flow with individual document URLs instead of a single URL for the OCI bucket where documents are stored.
Data flows in Oracle Analytics can process up to 10,000 documents in one run. If you have more than 10,000 documents, in OCI's Object Storage & Archive Storage, create multiple buckets containing no more than 10,000 documents in each one. Then, create a separate dataset and data flow for each bucket, and use a sequence to sequentially process the data flows.

You can use a private or public bucket that is accessible by the OCI user and that complies with OCI's generic limits on documents. See OCI documentation.

  1. In OCI Console, navigate to Object Storage & Archive Storage, and create a bucket to store your documents.

  2. In the Object Storage & Archive Storage area, click a bucket name, then under the Objects region of the page click Upload and upload your documents.
    Make sure that the bucket contains no extraneous files that you don't want to process. Oracle Analytics processes every file in the bucket.
  3. For each bucket, add the bucket URL to a comma-separated values (CSV) file.
    1. In Object Storage, select the bucket to display the documents in the Objects dialog.
    2. Copy the URL from the browser's URL bar.
    3. Create a CSV file with fields for ID, Bucket Name, and Bucket URL.
    4. Paste the bucket URL into the CSV file as the Bucket URL value.
      Alternatively, if your input documents and AI models are stored in different tenancies, add them individually to the CSV file.
      Create a CSV file with fields for ID, Document Name, and Document URL. For each document in Object Storage, click the ellipsis icon ellipsis icon, and select View Object Details, and copy the Name value and URL Path (URI) value.

      Paste the Name value as Document Name, and paste the URL Path (URI) value as Document URL.

  4. In Oracle Analytics, for each bucket that you're using to store your documents, click Create, then Dataset.
  5. Upload the CSV file that you created in Step 3, and save the dataset.
    Repeat steps 4 and 5 for each bucket. If you have more than 10,000 documents, create multiple buckets of up to 10,000 documents and create a separate dataset for each bucket.