Building a Custom Model

Document Understanding provides an option to build custom models to extract insights from images without needing data scientists.

You need the following before building a custom model:

  • A paid tenancy account in Oracle Cloud Infrastructure.
  • Familiarity with Oracle Cloud Infrastructure Object Storage.
  • The correct policies set up.

Train the model using one of Document Understanding's custom model training modes. The training modes are:

  • Recommended training: Document Understanding automatically selects the training duration to create the best model. The training might take up to 24 hours.
  • Custom duration: This option lets you set the maximum training duration.

The best training duration depends on the complexity of the detection problem, the typical number of labels in a document, the resolution, and other factors. Consider these needs, and allocate more time as the training complexity increases. The minimum amount of training time recommended is 30 minutes. A longer training time gives greater accuracy, but gives diminishing returns in accuracy with time. Use the recommended mode to get a base optimized model. If you want a better result, increase the training time.

You can use the Console, CLI, or API to build a custom model.

    1. Create a project.
      1. Open the navigation menu and click Analytics & AI. Under AI Services, click Document Understanding.
      2. From the Document Understanding home page, under Custom models, click Projects.
      3. Click Create project.
      4. Select the compartment to create the project in.
      5. Enter a Name for the project. Don't enter confidential information.
      6. Give the project a description to help you find it.
      7. Click Create project.
    2. Click the name of the project you created in the previous step.
    3. Click Create Model.
    4. Select the model type to train. either Document classification or Key value extraction.
      For a description of these types, see About Custom Models.
    5. Select the training data.
      • If you don't have any annotated documents, choose to Create a New Dataset. You're taken to Oracle Cloud Infrastructure Data Labeling where you can easily add labels to the document content. For more information on annotating documents in Data Labeling, see the section on Labeling Documents.
      • If you do have annotated documents, Choose an existing dataset.
        • If you annotated the dataset in Data Labeling, click Data Labeling Service.
        • If you annotated the images using a third-party tool, click Object Storage.
    6. Click Next.
    7. Enter a name or the custom model.
    8. (Optional) Give the model a description to help you find it.
    9. Select the training duration:
      • Recommended training: Document Understanding automatically selects the training duration to create the best model. The training might take up to 24 hours.
      • Custom: This option let you set the maximum training duration (in hours).
    10. Click Next.
    11. Review the information you provided in the previous steps. You can make any changes, by clicking Previous.
    12. When you want to start training the custom model, click Create and train.
  • Use the create command and required parameters to create a project:

    oci ai-document project create [OPTIONS]

    Use the create command and required parameters to create a model:

    oci ai-document model create [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.

  • Run the CreateProject operation to create a project.

    Run the CreateModel operation to create a model.