Data Catalog Metastore

As an OCI Data Flow user, you can access the Data Catalog Metastore to securely store and retrieve schema definitions for objects in unstructured and semi-structured data assets such as Object Storage.

For integration with OCI Data Flow, the Metastore provides an invocation endpoint to OCI Data Flow, exposing the Hive Metastore interface. Apache Hive is a data warehousing framework that facilitates read, write, or manage operations on large datasets residing in distributed systems. A Hive Metastore is the central repository of metadata for a Hive cluster. It stores metadata for data structures such as databases, tables, and partitions in a relational database, backed by files maintained in Object Storage. Apache Spark SQL makes use of a Hive Metastore for this purpose.

Required IAM Policies

You must add policies to allow Metastore Resource Principal access to storage locations.

As a prerequisite, create a dynamic group that includes the metastore. In the following policy statements, its OCID is represented by <dg-metastore-ocid>:
ALLOW dynamic-group <dg-metastore-ocid> to read buckets in tenancy where any {all {target.bucket.name='<managed-table-location-bucket>', request.region='<managed-table-location-bucket-region>'}, all {target.bucket.name='<external-table-location-bucket>', request.region='<external-table-location-bucket-region>'}}
ALLOW dynamic-group <dg-metastore-ocid> to manage objects in tenancy where all {target.bucket.name='<managed-table-location-bucket>', request.region='<managed-table-location-bucket-region>'}
ALLOW dynamic-group <dg-metastore-ocid> to read objects in tenancy where all {target.bucket.name='<external-table-location-bucket>', request.region='<external-table-location-bucket-region>'}

For information about the policies for Data Flow users, see Hive Metastore Policies.

Prerequisites

Before you create a metastore, you must create buckets in Oracle Object Storage to contain the Managed and External tables.
  • Managed Table: In a managed table, metastore manages both the table data and the table schema.
  • External Table: In an external table, metastore manages only the table schema.

While creating a metasore in Data Catalog, you provide the URLs of the buckets in which the managed and external tables are located.

Note

We recommend that you do not use the same location for managed and external tables. If both the tables are in the same directory, deletion of data from managed table can result in loss of data from the external table also.

For more information, see HDFS connector.

Creating a Metastore

You can create only one metastore in a compartment in your tenancy.

Here's how you create a metastore:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Data Catalog.
  2. From the Data Catalog service page, click Metastores.
  3. Click Create Metastore. The Create Metastore panel opens.
  4. In the panel, do the following:
    1. In Create in Compartment, select the compartment where you want to create the metastore.
    2. In the Name field, enter a name for the metastore.
    3. In the Default Managed Table Location field, enter the Object Store URL of the location of the managed table. For more information, see Prerequisites.
      Note

      We recommend that you do not use the same location for managed and external tables. If both the tables are in the same directory, deletion of data from managed table can result in loss of data from the external table also.
    4. In the Default External Table Location field, enter the Object Store URL of the location of the external table. For more information, see Prerequisites.
    5. (Optional) In the Tags section, add tags that can help to identify the metastore.
    6. Click Create.
The metastore is created and you can view it by clicking Metastores from the Data Catalog service page.

Using the Metastore in OCI Data Flow

After you create a metastore in Data Catalog, you can create, manage, and execute applications in OCI Data Flow to enable them to read or write to a Hive Metastore. You can create the following types of applications in OCI Data Flow:

The Data Flow application now starts using the Metastore to store and retrieve the metadata of objects.

Viewing the Details of a Metastore

Here's how you can view the details of the metastore that you created:

  1. On the Data Catalog service page, click Metastores.
  2. Click the name of the metastore. Alternatively, click the Actions icon (three dots) for the metastore and select View Details. The metastore details page appears.
    The metastore details page displays the following details:
    • Metastore Information tab: This tab provides the following information:
      • Name of the metastore
      • OCID of the metastore - Click Show to view the URL and click Copy to copy it.
        Note

        You can also copy the OCID from the Metastores main page. On the Metastores page, click the Actions icon (three dots) for the metastore and select Copy OCID.
      • The compartment in which the metastore is created
      • The URL of the default external table location - Click Show to view the URL and click Copy to copy it.
      • The Compartment OCID - Click Show to view the URL and click Copy to copy it.
      • The date and time the metastore is created.
      • The date and time the metastore is updated.
      • The URL of the default managed table location - Click Show to view the URL and click Copy to copy it.
    • Tags tab: This tabs provides information about the tags that you defined while creating the metastore.

Editing the Metastore

You can edit only the name of the metastore.
Note

Before you edit the name, review its usage in other services. The new metastore name must be available to the service accessing this metastore.

Here's how you edit the name of a metastore:

  1. On the metastore details page, click Edit. The Edit Metastore panel opens.
    Alternatively, from the Data Catalog service page, click Metastores. On the Metastores page, click the Actions icon (three dots) for the metastore and select Edit.
  2. In the Name field, enter the new name for the metastore.
  3. Click Save Changes.
    The name of the metastore gets updated.

Moving a Metastore

You can move the metastore to a different compartment.
Note

Before moving the metastore, review its usage in other services. The new compartment name must be available to the service accessing this metastore.

Here's how you move the metastore to a different compartment:

  1. On the metastore details page, click Move Resource. The Move Resource to a Different Compartment panel opens.
    Alternatively, from the Data Catalog service page, click Metastores. On the Metastores page, click the Actions icon (three dots) for the metastore and select Move Resource.
  2. From the Choose New Compartment dropdown, select the compartment to which you want to move the metastore.
  3. Click Move Resource.
    The metastore is moved to the selected compartment.

Deleting a Metastore

Here's how you delete a metastore:

  1. On the metastore details page, click Delete. The Delete Metastore panel opens.
    Alternatively, from the Data Catalog service page, click Metastores. On the Metastores page, click the Actions icon (three dots) for the metastore and select Delete.
  2. Type Delete to confirm.
    Note

    Before deleting the metastore, review its usage in other services. When you delete the metastore, services can no longer access it.
  3. Click Delete.
    The metastore gets deleted.