4 Manage with Master Catalog

This chapter helps you use and understand the master catalog, standard and external catalogs, schema, tables, and volumes.

Master Catalog

Master Catalog in AI Data Platform is the top level entity that enables you to manage your data and metadata by providing a centralized view.

Master Catalog is a container for both standard and external catalogs. You create catalogs with their data assets in OCI Object Storage, Autonomous Data Warehouse (ADW), and Kafka. Master Catalog allows you to enforce permissions on its child objects.

Standard and external catalogs have different functions and use cases:

  • Standard catalog: A standard catalog is a logical container for schemas (databases), users can create tables, views and volumes in a schema. Standard catalog manages the lifecycle of metadata of all child objects.
  • External catalog: An external catalog is backed by external data sources like Autonomous Data Warehouse, Kafka, etc. In case of external catalog, the metadata is synched from the external source and users can query the data in an external source using the 3-part name like: catalog_name.schema_name.table_name. In case of external catalog the metadata lifecycle is managed by the external source and the Master Catalog keeps a copy of the metadata.

Use Cases for Master Catalog

Master catalogs can be leveraged to help with data preparation and analysis, storing unstructured data, and more.

Query and Analyse Data Using SQL Syntax

Create managed or external tables in a standard catalog to query and analyze data using familiar SQL-like syntax, making it easier to explore and understand the data stored in AI Data Platform.

Data Preparation

Leverage structured format of data stored in managed/external tables for preparing data for machine learning models, making it easier to clean, transform, and feature engineer data. This facilitates efficient data access and processing for feature engineering and model training

Time Travel

Open table formats support schema evolution. The structure of the data can change over time without rewriting the entire dataset. These tables can be versioned and users can run time travel queries allowing you to query historical versions of data, facilitating retrospective analysis and data recovery.

ACID Transaction Support

Open table formats support full Create, Read, Update, and Delete (CRUD) operations, ensuring data consistency and enabling data updates. Tables can be used to store and manage transactional data, enabling applications to track changes to data.

Efficiently Read and Write Data

Tables in AI Data Platform can be partitioned, allowing for efficient data access and processing, especially for large datasets.

Store and Process Unstructured Data

Create managed or external volumes to store unstructured data so that they can be processed using Apache Spark.