Configuring Spark to Access Data Catalog Metastore

  1. Access Apache Ambari.
  2. From the side toolbar, under Services select Spark3.
  3. Select the Configs tab, and then expand the Advanced spark3-defaults section.
  4. Add or update the spark.sql.hive.metastore.jars key with the following value:
    /usr/lib/oci-dcat-metastore-client/lib/integration/*:/usr/lib/oci-dcat-metastore-client/lib/*:/usr/lib/hive/lib/*:{{spark_home}}/jars/*
  5. Add or update the spark.sql.warehouse.dir key with the Object Storage path for the managed table. Example: oci://bucket-name@tenancy-name-of-bucket/path/to/managed/table/directory.
  6. Expand the Custom spark3-defaults section.
  7. Add or update the spark.driver.extraJavaOptions key with the following value:
    -Doracle.dcat.metastore.client.show_provider_details=true -Doracle.dcat.metastore.client.custom.authentication_provider=com.oracle.pic.dcat.metastore.commons.auth.provider.UserPrincipalsCustomAuthenticationDetailsProvider -DOCI_TENANT_METADATA=ocid1.tenancy.oc1.<unique_ID> -DOCI_REGION_METADATA=<region-identifier> -DOCI_USER_METADATA=ocid1.user.oc1.<unique_ID> -DOCI_FINGERPRINT_METADATA=<user-finger-print> -DOCI_PVT_KEY_FILE_PATH=/private <key-file-path.pem> -DOCI_PASSPHRASE_METADATA="<passphase-of-the-key>" -Doci.metastore.uris=https://datacatalog.<region>.oci.oraclecloud.com:443 -Doracle.dcat.metastore.id=ocid1.datacatalogmetastore.oc1.<unique_ID>
  8. Add or update the spark.hadoop.fs.AbstractFileSystem.oci.impl key with the value com.oracle.bmc.hdfs.Bmc.
  9. Add or update the spark.hadoop.fs.oci.client.hostname key with the Object Storage URL. Example: https://objectstorage.<region-identifier>.oraclecloud.com.
  10. Expand the Custom spark3-hive-site-override section.
  11. Add or update the hive.metastore.uris key with the URL of the metastore. Example: https://datacatalog.<region-identifier>.oci.oraclecloud.com:443.
  12. Add or update the hive.metastore.warehouse.dir key with the Object Storage path for the managed table. Example: oci://bucket-name@tenancy-name-of-bucket/path/to/managed/table/directory.
  13. Add or update the hive.metastore.warehouse.external.dir key with the Object Storage path for the external table. Example: oci://bucket-name@tenancy-name-of-bucket/path/to/external/table/directory.
  14. Expand the Advanced spark3-thrift-sparkconf section.
  15. Add or update the spark.sql.hive.metastore.jars key with the following value:
    /usr/lib/oci-dcat-metastore-client/lib/integration/*:/usr/lib/oci-dcat-metastore-client/lib/*:/usr/lib/hive/lib/*:{{spark_home}}/jars/*
    Note

    Ensure you don't have :{{hadoop_home}}/lib/*.
  16. Expand the Custom spark3-thrift-sparkconf section.
  17. Add or update the spark.driver.extraJavaOptions key with the following value:
    -Doracle.dcat.metastore.client.show_provider_details=true -Doracle.dcat.metastore.client.custom.authentication_provider=com.oracle.pic.dcat.metastore.commons.auth.provider.UserPrincipalsCustomAuthenticationDetailsProvider -DOCI_TENANT_METADATA=ocid1.tenancy.oc1.<unique_ID> -DOCI_REGION_METADATA=<region-identifier> -DOCI_USER_METADATA=ocid1.user.oc1.<unique_ID> -DOCI_FINGERPRINT_METADATA=<user-finger-print> -DOCI_PVT_KEY_FILE_PATH=/private <key-file-path.pem> -DOCI_PASSPHRASE_METADATA="<passphase-of-the-key>" -Doci.metastore.uris=https://datacatalog.<region>.oci.oraclecloud.com:443 -Doracle.dcat.metastore.id=ocid1.datacatalogmetastore.oc1.<unique_ID>
  18. Select Restart to restart the Spark service in the Big Data Service cluster.