4 Manage with Master Catalog

This chapter helps you use and understand the master catalog, standard and external catalogs, schema, tables, and volumes.

Master Catalog

Master Catalog in AI Data Platform Workbench is the top level entity that enables you to manage your data and metadata by providing a centralized view.

Master Catalog is a container for both standard and external catalogs. You create catalogs with their data assets in Oracle Autonomous AI Lakehouse, OCI Object Storage, and Kafka. Master Catalog allows you to enforce permissions on its child objects.

Standard and external catalogs have different functions and use cases:

  • Standard catalog: A standard catalog is a logical container for schemas (databases), users can create tables, views and volumes in a schema. Standard catalog manages the lifecycle of metadata of all child objects.
  • External catalog: An external catalog is backed by external data sources like Oracle Autonomous AI Lakehouse, Kafka, etc. In case of external catalog, the metadata is synched from the external source and users can query the data in an external source using the 3-part name like: catalog_name.schema_name.table_name. In case of external catalog the metadata lifecycle is managed by the external source and the Master Catalog keeps a copy of the metadata.

Use Cases for Master Catalog

Master catalogs can be leveraged to help with data preparation and analysis, storing unstructured data, and more.

Query and Analyze Data Using SQL Syntax

Create managed or external tables in a standard catalog to query and analyze data using familiar SQL-like syntax, making it easier to explore and understand the data stored in AI Data Platform.

Data Preparation

Leverage structured format of data stored in managed/external tables for preparing data for machine learning models, making it easier to clean, transform, and feature engineer data. This facilitates efficient data access and processing for feature engineering and model training

Time Travel

Open table formats support schema evolution. The structure of the data can change over time without rewriting the entire dataset. These tables can be versioned and users can run time travel queries allowing you to query historical versions of data, facilitating retrospective analysis and data recovery.

ACID Transaction Support

Open table formats support full Create, Read, Update, and Delete (CRUD) operations, ensuring data consistency and enabling data updates. Tables can be used to store and manage transactional data, enabling applications to track changes to data.

Efficiently Read and Write Data

Tables in AI Data Platform Workbench can be partitioned, allowing for efficient data access and processing, especially for large datasets.

Store and Process Unstructured Data

Create managed or external volumes to store unstructured data so that they can be processed using Apache Spark.

Cross-Tenancy External Tables and Volumes

Cross-tenancy external tables and volumes allow you to securely access and query data stored in disparate tenancies without the need for complex ETL pipelines or manual data movement.

AI Data Platform Workbench enables users to create cross-tenancy external tables and volumes, a powerful capability designed to eliminate data silos and streamline collaboration.

The benefits of cross-tenancy are:
  • Zero Data Duplication: You access live data where it resides, saving on storage costs and ensuring "single source of truth" integrity.
  • Simplified Governance: You manage permissions across boundaries using IAM policies and AI Data Platform Workbench access controls.

Cross-Tenancy Access Requirements

Setting up cross-tenancy access for external tables and volumes requires specific IAM policies configured in a provider tenancy and a consumer tenancy.

In the provider tenancy, you need to create an IAM Dynamic Group in the Oracle Cloud Infrastructure (OCI) console that includes your specific AI Data Platform Workbench resource as a member. For more information, see Managing Dynamic Groups.

After you create the IAM Dynamic Group, you need to configure IAM policies in the provider tenancy:
  • Define resources in IAM for consumer tenancy, user group and dynamic groups
  • Write admit IAM policy for the consumer tenancy resources
define tenancy <consumer_tenancy_name1> as <consumer tenancy OCID>
define group <group_name1> as <consumer user group>
define dynamic-group <dynamic_group_name1> as <consumer dynamic group OCID>

admit dynamic-group <dynamic_group_name1> of tenancy <consumer_tenancy_name1> to manage object-family in tenancy
admit dynamic-group <dynamic_group_name1> of tenancy <consumer_tenancy_name1> to { OBJECTSTORAGE_NAMESPACE_READ } in tenancy
admit group <group_name1> of tenancy <consumer_tenancy_name1> to manage object-family in tenancy
After configuring the provider tenancy IAM policies, you need to configure your consumer tenancy IAM policies:
  • Define the resource in IAM for provider tenancy
  • Write endorse IAM policy for the local consumer tenancy resources
define tenancy <provider_tenancy_name1> as <provider tenancy OCID>

endorse dynamic-group <dynamic_group_name> to manage object-family in tenancy <provider_tenancy_name1>
endorse dynamic-group <dynamic_group_name> to { OBJECTSTORAGE_NAMESPACE_READ } in tenancy <provider_tenancy_name1>
endorse group <group_name> to manage object-family in tenancy <provider_tenancy_name1>

Once both provider and consumer tenancy IAM policies are configured, you can create cross-tenancy external tables and volumes using SQL grammar. For more information, see SQL Grammar.

Example: Create a Cross Tenancy Table with SQL

CREATE EXTERNAL TABLE [IF NOT EXISTS] <catalog_name>.<schema-name>.<table-name>
[ ( <column1-name><column1-type> [comment <column1-comment>], ... ) ]
USING [HIVE|DELTA, CSV, TXT, ORC, JDBC, PARQUET, etc.]
LOCATION 'oci://my-bucket@mytenancynamespace/my-folder/'
[TBLPROPERTIES ( DESCRIPTION = 'some-description', '<property-name>'='<property-value>'[, ...]) ]

Limitation

AI Data Platform Workbench does not support creating cross tenancy external tables or external volumes from the UI.