9.2.11 Apache Hive

Integrating with Hive

Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) release does not include a Hive storage handler because the HDFS Handler provides all of the necessary Hive functionality.

You can create a Hive integration to create tables and update table definitions in case of DDL events. This is limited to data formatted in Avro Object Container File format. For more information, see Writing in HDFS in Avro Object Container File Format and HDFS Handler Configuration.

For Hive to consume sequence files, the DDL creates Hive tables including STORED as sequencefile . The following is a sample create table script:

CREATE EXTERNAL TABLE table_name (
  col1 string,
  ...
  ...
  col2 string)
ROW FORMAT DELIMITED
STORED as sequencefile
LOCATION '/path/to/hdfs/file';

Note:

If files are intended to be consumed by Hive, then the gg.handler.name.partitionByTable property should be set to true.

Apache Hive Table Creation for Avro OCF

Use the following Apache Hive DDL to register external tables over the HDFS location that contains Avro OCF data:

CREATE EXTERNAL TABLE my_avro_ocf_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/hive/schema/'
TBLPROPERTIES (
  'avro.schema.url' = 'hdfs:///user/hive/schema/QASOURCE.TCUSTMER.avsc'
);

Note:

  • LOCATION: Points to the HDFS directory where Oracle GoldenGate writes .avro files.

  • avro.schema.url: Path to the Avro (.avsc) schema file(s) that describes the records structure written by Oracle GoldenGate.