Configuring Spark to Access Data Catalog Metastore
- Access Apache Ambari.
- From the side toolbar, under Services select Spark3.
- Select the Configs tab, and then expand the Advanced spark3-defaults section.
-
Add or update the
spark.sql.hive.metastore.jars
key with the following value:/usr/lib/oci-dcat-metastore-client/lib/integration/*:/usr/lib/oci-dcat-metastore-client/lib/*:/usr/lib/hive/lib/*:{{spark_home}}/jars/*
-
Add or update the
spark.sql.warehouse.dir
key with the Object Storage path for the managed table. Example:oci://bucket-name@tenancy-name-of-bucket/path/to/managed/table/directory
. - Expand the Custom spark3-defaults section.
-
Add or update the
spark.driver.extraJavaOptions
key with the following value:-Doracle.dcat.metastore.client.show_provider_details=true -Doracle.dcat.metastore.client.custom.authentication_provider=com.oracle.pic.dcat.metastore.commons.auth.provider.UserPrincipalsCustomAuthenticationDetailsProvider -DOCI_TENANT_METADATA=ocid1.tenancy.oc1.<unique_ID> -DOCI_REGION_METADATA=<region-identifier> -DOCI_USER_METADATA=ocid1.user.oc1.<unique_ID> -DOCI_FINGERPRINT_METADATA=<user-finger-print> -DOCI_PVT_KEY_FILE_PATH=/private <key-file-path.pem> -DOCI_PASSPHRASE_METADATA="<passphase-of-the-key>" -Doci.metastore.uris=https://datacatalog.<region>.oci.oraclecloud.com:443 -Doracle.dcat.metastore.id=ocid1.datacatalogmetastore.oc1.<unique_ID>
-
Add or update the
spark.hadoop.fs.AbstractFileSystem.oci.impl
key with the valuecom.oracle.bmc.hdfs.Bmc
. -
Add or update the
spark.hadoop.fs.oci.client.hostname
key with the Object Storage URL. Example:https://objectstorage.<region-identifier>.oraclecloud.com
. - Expand the Custom spark3-hive-site-override section.
-
Add or update the
hive.metastore.uris
key with the URL of the metastore. Example:https://datacatalog.<region-identifier>.oci.oraclecloud.com:443
. -
Add or update the
hive.metastore.warehouse.dir
key with the Object Storage path for the managed table. Example:oci://bucket-name@tenancy-name-of-bucket/path/to/managed/table/directory
. -
Add or update the
hive.metastore.warehouse.external.dir
key with the Object Storage path for the external table. Example:oci://bucket-name@tenancy-name-of-bucket/path/to/external/table/directory
. - Expand the Advanced spark3-thrift-sparkconf section.
-
Add or update the
spark.sql.hive.metastore.jars
key with the following value:/usr/lib/oci-dcat-metastore-client/lib/integration/*:/usr/lib/oci-dcat-metastore-client/lib/*:/usr/lib/hive/lib/*:{{spark_home}}/jars/*
Note
Ensure you don't have:{{hadoop_home}}/lib/*
. - Expand the Custom spark3-thrift-sparkconf section.
-
Add or update the
spark.driver.extraJavaOptions
key with the following value:-Doracle.dcat.metastore.client.show_provider_details=true -Doracle.dcat.metastore.client.custom.authentication_provider=com.oracle.pic.dcat.metastore.commons.auth.provider.UserPrincipalsCustomAuthenticationDetailsProvider -DOCI_TENANT_METADATA=ocid1.tenancy.oc1.<unique_ID> -DOCI_REGION_METADATA=<region-identifier> -DOCI_USER_METADATA=ocid1.user.oc1.<unique_ID> -DOCI_FINGERPRINT_METADATA=<user-finger-print> -DOCI_PVT_KEY_FILE_PATH=/private <key-file-path.pem> -DOCI_PASSPHRASE_METADATA="<passphase-of-the-key>" -Doci.metastore.uris=https://datacatalog.<region>.oci.oraclecloud.com:443 -Doracle.dcat.metastore.id=ocid1.datacatalogmetastore.oc1.<unique_ID>
- Select Restart to restart the Spark service in the Big Data Service cluster.