Setting up Data Catalog Metastore

Data Flow is integrated with the Data Catalog Metastore where the schema definitions for unstructured and semi-structured data is stored.

You can only create one metastore per tenancy. This constraint ensures a single source of truth for metadata. When creating a Data Catalog metastore, you indicate both the managed-table-bucket location and the external-table-bucket location in Object Storage. Keep these two locations different as a best practice. The metastore assumes that it owns the data for the managed tables. For external tables, the Hive-compatible metastore doesn't manage or own the underlying data. So, operations such as delete DROPTABLE both data and metadata for managed tables, but it only deletes the metadata for external tables.

If you don't have a metastore, create one for use with Data Flow.
    1. From the Console navigation menu, select Data Catalog.
    2. On the Data Catalog page, select Metastores.
    3. Select Create Metastore.
    4. For Create in compartment, select dataflow-compartment.
    5. Enter a Name that's suitable for all users in your tenancy, as only one metastore is allowed per region.
    6. For Default Managed Table Location, enter the path to managed-table-bucket, using the format, oci://managed-table-bucket@<your_objectstore_namespace>.
      For example, if the namespace in question is bigdatasciencelarge, enter oci://managed-table-bucket@bigdatasciencelarge.
    7. For Default External Table Location, enter the path to external-table-bucket, using the format, oci://external-table-bucket@<your_objectstore_namespace>.
      For example, if the namespace in question is bigdatasciencelarge, enter oci://external-table-bucket@bigdatasciencelarge.
    8. Select Create.
  • Use the create command and required parameters to create a metastore for use with Data Flow.

    oci data-catalog metastore create [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.

  • Run the CreateMetastore operation to create a Metastore to use with Data Flow.

Coarse-Grained Access Control in Data Catalog Metastore

The Data Catalog Metastore provides coarse-grained access control using the Identity and Access Management service to avoid accidental access and modification of resources created by another user. As an administrator, you can grant access to resources such as catalogs, databases, and tables using predefined policies mentioned in the Resources List on the metastore details page. For more information, see the Data Catalog Metastore documentation.
Note

This feature isn't supported with Spark 2.4.4.