74 DBMS_HADOOP

The DBMS_HADOOP package provides a PL/SQL procedure called CREATE_EXTDDL_FOR_HIVE(), that creates an Oracle external table for a given hive table.

Big Data SQL needs to be correctly set up for DBMS_HADOOP to work.

This chapter contains the following topics:

74.1 DBMS_HADOOP Overview

The DBMS_HADOOP package provides two procedures for creating an Oracle external table and for synchronizing the Oracle external table partitions.

These procedures are:

  • CREATE_EXTDDL_FOR_HIVE() — creates an Oracle external table for a given hive table

  • SYNCHRONIZE_PARTITIONS_FOR_HIVE() — helps to synchronize the Oracle external table partitions with those in the corresponding hive table

74.2 DBMS_HADOOP Security Model

Users must have ALTER privileges on the table for modifying external table parameters. In addition the ALTER privileges, users must have READ privileges for the directory object that contains the external data source and WRITE privileges for the directory objects containing bad, log, and discard files.

Similar privileges are applicable to the partitioned external tables.

74.3 Summary of DBMS_HADOOP Subprograms

DBMS_HADOOP includes the CREATE_EXTDDL_FOR_HIVE procedure and SYNC_PARTITIONS_FOR_HIVE procedure subprograms.

Table 74-1 DBMS_HADOOP Subprograms

Subprogram Description

CREATE_EXTDDL_FOR_HIVE Procedure

Given a hive table name, creates a text of DDL that can be executed to create an external table corresponding to the hive table

SYNC_PARTITIONS_FOR_HIVE Procedure

Synchronizes the existing partitioning definition of a hive table in Oracle catalog

74.3.1 CREATE_EXTDDL_FOR_HIVE Procedure

This procedure creates an Oracle external table for a given hive table.

Syntax

DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE (
   cluster_id         IN  VARCHAR2,
   db_name            IN  VARCHAR2 := NULL,
   hive_table_name    IN  VARCHAR2,
   hive_partition     IN  BOOLEAN,
   table_name         IN  VARCHAR2 := NULL,
   perform_DDL        IN  BOOLEAN := FALSE,
   text_of_DDL        OUT CLOB);

Parameters

Table 74-2 CREATE_EXTDDL_FOR_HIVE Procedure Parameters

Parameter Description

cluster_id

Hadoop cluster ID

hive_db_name

Database where the hive table is located

hive_table_name

Name of the hive table

hive_partition

If this argument is TRUE and if the original hive table is partitioned, the corresponding Oracle external table will also be partitioned using the same partition key(s).

If the original hive table is not partitioned, hive_partition=TRUE has no effect. If hive_partition=FALSE, the corresponding Oracle external table will not be partitioned even if the original hive table is partitioned.

table_name

Name of the Oracle external table

perform_DDL

If this argument is TRUE, the external table will be automatically created. Otherwise, only the textual representation of the DDL statement will be generated and returned in text_of_DDL .

text_of_DDL

If the argument perform_DDL is FALSE, only the textual representation of the DDL statement will be generated and returned in text_of_DDL .

74.3.2 SYNCHRONIZE_PARTITIONS_FOR_HIVE Procedure

This procedure synchronizes the Oracle external table partitions with those in the corresponding hive table.

Syntax

DBMS_HADOOP.SYNCHRONIZE_PARTITIONS_FOR_HIVE (
   table_name     IN  VARCHAR2,
   table_owner    IN  VARCHAR2);

Parameters

Table 74-3 SYNCHRONIZE_PARTITIONS_FOR_HIVE Procedure Parameters

Parameter Description

table_name

Oracle external table

table_owner

Schema name