Query External Data on Oracle Compute Cloud at Customer

Describes packages and tools to allow the Autonomous Database deployments on Exadata Cloud@Customer to load, query, and manage unstructured data by building external tables and external partitioned tables from objects in Data Lakes.

You can use the DBMS_CLOUD package to configure Autonomous Database deployments on Exadata Cloud@Customer to access the unstructured data stored on the Oracle Compute Cloud at Customer (C3) Object Storage Bucket.

Overview

You require a data platform that can extract business insights from structured and unstructured data. The combination of structured data available in a data warehouse and unstructured data available in a Data Lake is known as Data Lakehouse.

An Autonomous Data Lakehouse allows data analytics on both structured and unstructured data. The structured data is stored in an Autonomous Data Warehouse, while the unstructured data is stored in a Data Lake configured on C3 Object Storage Buckets. The Autonomous Data Warehouse can query, load, and manage objects in the Data Lake by using a database package called DBMS_CLOUD. This package allows the Autonomous Database to create external and external partitioned tables on data lake objects such as csv, txt, Avro, and Parquet files. You can build Autonomous Data Lakehouses by combining the capabilities of Autonomous Database deployments on Exadata Cloud@Customer and Oracle C3.

An Autonomous Data Lakehouse has two main components: An Autonomous Data Warehouse and a Data Lake. The Autonomous Data Warehouse is built using Autonomous Database deployments on Exadata Cloud@Customer, while the Data Lake is built using C3 Object Storage Buckets. The Autonomous Database hosts the structured relational data, while the Data Lake hosts collections of unstructured data in the form of txt, csv, Avro, Parquet, and other file types. You can query the Autonomous Database, and the database determines where to find the results, whether in the structured data, the unstructured data, or a combination of the two. You can use the DBMS_CLOUD package to configure Autonomous Database deployments on Exadata Cloud@Customer to load, query, and manage unstructured data by building external and external partitioned tables from objects in Data Lakes.

Prerequisites

Follow the steps below to configure C3 Object Storage Buckets and to provision various Autonomous Database components. You need to finish these steps before configuring Autonomous Database to communicate with C3 Object Storage Buckets.
  • Create a C3 Object Storage Bucket.
  • Create a user and group. Add the user to the group and configure the required policies to allow the user to manage files (add/read/remove) to the C3 Object Storage Bucket.
    Example:
    Allow group <group_name> to manage objects in tenancy where
    target.bucket.name='<new_bucket_name_created>'
  • Create an API-Key for the user created. From the API-key generation process, note the following information: user_ocid, tenancy_ocid, fingerprint, region. In addition, save the Public and Private PEM key files created during the API-key generation process. This information is required to configure the DBMS_CLOUD package later.
  • Download the https certificate authority chain as shown below. This certificate authority chain allows client applications to authenticate the C3 Object Storage Bucket via https protocol. The https certificates are usually saved on a crt file, for example c3_ca.crt. Obtain the certificate authority chain by pointing a browser to the following URL: https://iaas.domainname/cachain where domainname is the name of the DNS domain configured during the C3 infrastructure deployment.
  • Create a new Compute VM. Once the new VM starts, copy the Private PEM key file and the C3 https certificate file created in the above steps to the VM.
  • Create an Exadata Infrastructure Resource.
  • Create an Autonomous VM Cluster.
  • Create an Autonomous Container Database (ACD).
  • Create an Autonomous Database.

Configuring Autonomous Database to communicate with the C3 Object Storage Bucket

You need to complete the following two tasks for the Autonomous Database to communicate with the C3 Object Storage Bucket:
  • Add the C3 Object Storage https certificate to the Autonomous VM Cluster Grid Infrastructure TCPS Wallet. For example, the wallet location is /var/opt/oracle/dbaas_acfs/grid/tcps_wallets . This is needed so that the Autonomous Database can authenticate the https connection to the C3 Object Storage.
  • Add an entry to the C##CLOUD$SERVICE_dbms_cloud_store table on the ACD created above. This is to tell the DBMS_CLOUD package that the C3 Object Storage bucket is a valid cloud store. A sample SQL command to do this is:
    SQL>INSERT INTO C##CLOUD$SERVICE.dbms_cloud_store VALUES('ORACLE_BMC','<objectstorage.mydomain.com>',null,1);
In Autonomous Databases, you cannot complete the above two tasks as a regular user. Oracle Cloud Operations needs to run these two tasks on behalf of the regular user. You need to log in to My Oracle Support (MOS) to create a new Support Request for Oracle Cloud Operations to perform the two tasks. You need to add the following information to the ticket. You will get these values after you complete the steps in .
  • OCID of Exadata Infrastructure resource
  • OCID of the Autonomous VM Cluster
  • OCID of the ACD
  • C3 Object Storage URL
  • C3 Object Storage https certificate file

After Cloud Operations completes the above tasks, log in to the Autonomous Database. Use the DBMS_CLOUD.CREATE_CREDENTIAL procedure to provide the Autonomous Database with the authentication information required to connect to the C3 Object Storage Bucket. See CREATE_CREDENTIAL for more details.