6 Auto Populate Catalog
This chapter contains information about creating and managing automated extractors to pull data into your catalogs.
About Auto Populate
You can automate the process of extracting metadata from sources directly to your data catalogs.
Manually creating schema, tables, and partitions from your data sources is time consuming and complicated. Oracle AI Data Platform offers the ability to automatically extract metadata from data sources and create entities in catalogs that you specify in the metadata extractor.
You automatically populate this metadata in your catalog by creating a metadata extractor. As part of creating the extractor, you specify the target catalog to extract metadata to and the source for the metadata. You can choose to have the extractor create tables in a specified schema, or let the system suggest where the tables are created if no schema is specified or detected.
Auto populate can extract metadata from the following file types:
- CSV
- JSON
- Avro
- ORC
- Parquet
- Delta Lake
You can opt to either manually review entities that are extracted or let the system automatically create the entities from the extracted metadata. When extracting metadata, entities that cause errors are captured in the log. You can view the log to see which entities encountered errors and take action to correct.
Manually reviewing entities allows you to accept or reject entities on an individual basis. You can view entities are already approved or rejected in the Reviewed Entities tab.
Extractors display their status to let you know what stage they are currently at and if user intervention is required.
Extractor Status | Description |
---|---|
Not Started | The extractor has not started. Start the extractor to begin. |
Running | Extractor is in progress |
Ready for review | The extractor has run and you have chosen manual approval. Extracted entities must be reviewed and either accepted or approved. |
Reviewing | The extractor has run and you have chosen manual approval. Some entities have been reviewed or approved by a user, but entities remain that require review. |
Completed | The extractor has run and entities have either been approved automatically or manually approved by a user |
You can view and use metadata extractors created by other users if you have the requisite permissions.
Create Metadata Extractor
You can create metadata extractors to automate extracting entities like schema and tables to your catalogs.
- On the Home page, click Auto populate catalog.
- Click
Create Metadata Extractor.
- Enter a name for the metadata extractor.
- Select the target catalog from the Catalog dropdown.
- Select the appropriate source type from Source Type dropdown.
- Next to Compute, click Browse and choose the cluster the extractor should use. Click Select.
- Next to Compartment, click Browse and choose the compartment to extract your metadata to. Click Select.
- Next to Bucket, click Browse and choose the bucket within the compartment to extract your metadata to. Click Select.
- Optional: Next to Folder, click Browse and choose the folder within the bucket to extract your metadata to. Click Select.
- Select whether entities are created with manual approval or automatically approved by the system.
- Optional: Select the schema where external tables are created. If no schema specified, the system creates tables in schema based on folder structure, or in the default schema if no schema is detected.
Manually Review Extracted Metadata Entities
When you choose the manual method of creating entities in a metadata extractor, you need to review the extracted entities and approve or reject adding them to your catalog.
- On the Home page, click Auto populate catalog.
- Click the name of the metadata extractor.
- Click the Entities awaiting review tab.
- For each entity, select Approve or Reject.
- Optional: Select Approval All or Reject All to set all entities under review to the selected status.
- Click Submit.
View Reviewed Entities
You can see entities that have been manually or automatically reviewed as part of metadata extraction and see log details, table details, or column schema for that entity.
- On the Home page, click Auto populate catalog.
- Click on the name of the metadata extractor.
- Click the Reviewed entities tab.
- Next to an entity, click
Actions.
- Click View table details to see the table details for the selected entity.
- Click View column schema to see the column schema for the selected entity.
- Click View logs to see the metadata extractor logs for the selected entity.