Query External Partitioned Data with Hive Format Source File Organization

Use DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE to create an external partitioned table and generate the partitioning information from the Cloud Object Store file path.

Consider the following sample source files in Object Store:

custsales/month=2019-01/custsales-2019-01.csv

custsales/month=2019-02/custsales-2019-02.csv

custsales/month=2019-03/custsales-2019-03.csv

With this naming, the values for month are captured within the object name.

To create a partitioned external table with data stored in this sample Hive format, do the following:

  1. Store Object Store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL.

    For example:

    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL (
        credential_name => 'DEF_CRED_NAME',
        username => 'adb_user@example.com',
        password => 'password' );
    END;
    /

    Creating a credential to access Oracle Cloud Infrastructure Object Store is not required if you enable resource principal credentials. See Use Resource Principal to Access Oracle Cloud Infrastructure Resources for more information.

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.

    See CREATE_CREDENTIAL Procedure for information about the username and password parameters for different object storage services.

  2. Create an external partitioned table on top of your source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE.

    The procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE supports external partitioned files in the supported cloud object storage services. The credential is a table level property; therefore, the external files must all be on the same cloud object store.

    For example:

    BEGIN
        DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE(
            TABLE_NAME => 'sales_sample',
            CREDENTIAL_NAME => 'DEF_CRED_NAME',
            FILE_URI_LIST => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_landing/o/sales_sample/*.parquet',
            FORMAT => '{"type":"parquet", "schema": "first","partition_columns":[{"name":"month","type":"varchar2(100)"}]}');
    END;
    /

    The parameters are:

    • table_name: is the external table name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma-delimited list of source file URIs. There are two options for this list:

      • Specify a comma-delimited list of individual file URIs without wildcarding.

      • Specify a single file URI with wildcards, where the wildcards can only be after the last slash "/". The character "*" can be used as the wildcard for multiple characters, the character "?" can be used as the wildcard for a single character.

    • column_list: is a comma delimited list of column names and data types for the external table. The list includes the columns inside the data file and those derived from the object name (from names in the file path).

      The column_list is not required when the data files are structured files (Parquet, Avro, or ORC).

    • format: defines the options you can specify to describe the format of the source file. The partition_columns format parameter specifies the names of the partition columns.

      If the data in your source file is encrypted, decrypt the data by specifying the encryption format option. See Decrypt Data While Importing from Object Storage for more information on decrypting data.

      See DBMS_CLOUD Package Format Options for more information.

    In this example, namespace-string is the Oracle Cloud Infrastructure object storage namespace and bucketname is the bucket name. See Understanding Object Storage Namespaces for more information.

    The DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE call would result in the following table definition:

    CREATE TABLE "ADMIN"."SALES_SAMPLE"
       ( "DAY_ID" TIMESTAMP (6),
          "GENRE_ID" NUMBER(19,0),
          "MOVIE_ID" NUMBER(19,0),
          "CUST_ID" NUMBER(19,0),
          "APP" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
          "DEVICE" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
          "OS" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
          "PAYMENT_METHOD" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
          "LIST_PRICE" BINARY_DOUBLE,
          "DISCOUNT_TYPE" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
          "DISCOUNT_PERCENT" BINARY_DOUBLE,
          "ACTUAL_PRICE" BINARY_DOUBLE,
          "MONTH" VARCHAR2(100 BYTE) COLLATE "USING_NLS_COMP"
       ) DEFAULT COLLATION "USING_NLS_COMP"
       ORGANIZATION EXTERNAL
        ( TYPE ORACLE_BIGDATA
          DEFAULT DIRECTORY "DATA_PUMP_DIR"
          ACCESS PARAMETERS
          ( com.oracle.bigdata.fileformat=parquet
    com.oracle.bigdata.filename.columns=["month"]
    com.oracle.bigdata.file_uri_list="https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_landing/o/sales_sample/*.parquet"
    com.oracle.bigdata.credential.schema="ADMIN"
    com.oracle.bigdata.credential.name=CRED_OCI
    com.oracle.bigdata.trimspaces=notrim
      )
        )
       REJECT LIMIT 0
      PARTITION BY LIST ("MONTH")
     (PARTITION "P1" VALUES (('2019-01'))
          LOCATION
           ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_landing/o/sales_sample/month=2019-01/*.parquet'
           ),
     PARTITION "P2" VALUES (('2019-02'))
          LOCATION
           ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_landing/o/sales_sample/month=2019-02/*.parquet'
           ))
      PARALLEL ;

    See CREATE_EXTERNAL_PART_TABLE Procedure for detailed information about the parameters.

    See DBMS_CLOUD URI Formats for more information on the supported cloud object storage services.

  3. You can now run queries on the external partitioned table you created in the previous step.

    Your Autonomous Database takes advantage of the partitioning information of your external partitioned table, ensuring that the query only accesses the relevant data files in the Object Store.

    For example:

    SELECT movie_id, month FROM sales WHERE month='2019-02'

    The external partitioned tables you create with DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE include two invisible columns file$path and file$name. These columns help identify which file a record is coming from. See External Table Metadata Columns for more information.

If there are any rows in the source files that do not match the format options you specified, the query reports an error. You can use DBMS_CLOUD parameters, like rejectlimit, to suppress these errors. As an alternative, you can also validate the external partitioned table you created to see the error messages and the rejected rows so that you can change your format options accordingly. See Validate External Data and Validate External Partitioned Data for more information.