Using the Metadata Providers

15 Using the Metadata Providers

Learn how to use the Metadata Providers, which can replicate from a source to a target using a Replicat parameter file.

Topics:

15.1 About the Metadata Providers

Metadata Providers work only if handlers are configured to run with a Replicat process.

The Replicat process maps source table to target table and source column to target column mapping using syntax in the Replicat configuration file. The source metadata definitions are included in the Oracle GoldenGate trail file (or by source definitions files in Oracle GoldenGate releases 12.2 and later). When the replication target is a database, the Replicat process obtains the target metadata definitions from the target database. However, this is a shortcoming when pushing data to Big Data applications or during Java delivery in general. Typically, Big Data applications provide no target metadata, so Replicat mapping is not possible. The metadata providers exist to address this deficiency. You can use a metadata provider to define target metadata using either Avro or Hive, which enables Replicat mapping of source table to target table and source column to target column.

The use of the metadata provider is optional and is enabled if the gg.mdp.type property is specified in the Java Adapter Properties file. If the metadata included in the source Oracle GoldenGate trail file is acceptable for output, then do not use the metadata provider. Use a metadata provider should be used in the following cases:

You need to map source table names into target table names that do not match.
You need to map source column names into target column name that do not match.
You need to include certain columns from the source trail file and omit other columns.

A limitation of Replicat mapping is that the mapping defined in the Replicat configuration file is static. Oracle GoldenGate provides functionality for DDL propagation when using an Oracle database as the source. The proper handling of schema evolution can be problematic when the Metadata Provider and Replicat mapping are used. Consider your use cases for schema evolution and plan for how you want to update the Metadata Provider and the Replicat mapping syntax for required changes.

For every table mapped in Replicat using COLMAP, the metadata is retrieved from a configured metadata provider and retrieved metadata then be used by Replicat for column mapping.

Only the Hive and Avro Metadata Providers are supported and you must choose one or the other to use in your metadata provider implementation.

Scenarios - When to use a metadata provider

The following scenarios do not require a metadata provider to be configured:

A mapping in which the source schema named GG is mapped to the target schema named GGADP.*

A mapping in which the schema and table name whereby the schema GG.TCUSTMER is mapped to the table name GGADP.TCUSTMER_NEW
```
MAP GG.*, TARGET GGADP.*;
(OR)
MAP GG.TCUSTMER, TARGET GG_ADP.TCUSTMER_NEW;
```
The following scenario requires a metadata provider to be configured:

A mapping in which the source column name does not match the target column name. For example, a source column of CUST_CODE mapped to a target column of CUST_CODE_NEW.
```
MAP GG.TCUSTMER, TARGET GG_ADP.TCUSTMER_NEW, COLMAP(USEDEFAULTS, CUST_CODE_NEW=CUST_CODE, CITY2=CITY);
```

Parent topic: Using the Metadata Providers

15.2 Avro Metadata Provider

The Avro Metadata Provider is used to retrieve the table metadata from Avro Schema files. For every table mapped in Replicat using COLMAP, the metadata is retrieved from Avro Schema. Retrieved metadata is then used by Replicat for column mapping.

This section contains the following:

Topics:

Parent topic: Using the Metadata Providers

15.2.1 Detailed Functionality

The Avro Metadata Provider uses Avro schema definition files to retrieve metadata. Avro schemas are defined using JSON. For each table mapped in the process_name. prm file, you must create a corresponding Avro schema definition file. For information about how to define Avro schemas, see Defining a Schema.

Avro Metadata Provider Schema Definition Syntax

{"namespace": "[$catalogname.]$schemaname",
"type": "record",
"name": "$tablename",
"fields": [
     {"name": "$col1", "type": "$datatype"},
     {"name": "$col2 ",  "type": "$datatype ", "primary_key":true}, 
     {"name": "$col3", "type": "$datatype ", "primary_key":true}, 
     {"name": "$col4", "type": ["$datatype","null"]}   
   ]
}
 
namespace            - name of catalog/schema being mapped
name                 - name of the table being mapped
fields.name          - array of column names
fields.type          - datatype of the column
fields.primary_key   - indicates the column is part of primary key.

Representing nullable and not nullable columns:

"type":"$datatype" - indicates the column is not nullable, where "$datatype" is the actual datatype.
"type": ["$datatype","null"] - indicates the column is nullable, where "$datatype" is the actual datatype

The names of schema files that are accessed by the Avro Metadata Provider must be in the following format:

[$catalogname.]$schemaname.$tablename.mdp.avsc
 
$catalogname    - name of the catalog if exists
$schemaname   - name of the schema
$tablename        - name of the table
.mdp.avsc           -  constant, which should be appended always

Supported Avro Data Types

boolean
bytes
double
float
int
long
string

See https://avro.apache.org/docs/1.7.5/spec.html#schema_primitive.

Parent topic: Avro Metadata Provider

15.2.2 Runtime Prerequisites

Before you start the Replicat process, create Avro schema definitions for all tables mapped in Replicat's parameter file.

Parent topic: Avro Metadata Provider

15.2.3 Classpath Configuration

The Avro Metadata Provider requires no additional classpath setting.

Parent topic: Avro Metadata Provider

15.2.4 Avro Metadata Provider Configuration

Property	Required/Optional	Legal Values	Default	Explanation
`gg.mdp.type`	Required	`avro`	`-`	Selects the Avro Metadata Provider
`gg.mdp.schemaFilesPath`	Required	Example:`/home/user/ggadp/avroschema/`	`-`	The path to the Avro schema files directory
`gg.mdp.charset`	Optional	Valid character set	`UTF-8`	Specifies the character set of the column with character data type. Used to convert the source data from the trail file to the correct target character set.
`gg.mdp.nationalCharset`	Optional	Valid character set	`UTF-8`	Specifies the character set of the column with character data type. Used to convert the source data from the trail file to the correct target character set. Example: Used to indicate character set of columns, such as `NCHAR`, `NVARCHAR` in an Oracle database.

Parent topic: Avro Metadata Provider

15.2.5 Review a Sample Configuration

This is an example for configuring the Avro Metadata Provider. Consider a source that includes the following table:

TABLE GG.TCUSTMER {
     CUST_CODE VARCHAR(4) PRIMARY KEY,
     NAME VARCHAR(100),
     CITY VARCHAR(200),
     STATE VARCHAR(200)
}

This table maps the(CUST_CODE (GG.TCUSTMER) in the source to CUST_CODE2 (GG_AVRO.TCUSTMER_AVRO) on the target and the column CITY (GG.TCUSTMER) in source to CITY2 (GG_AVRO.TCUSTMER_AVRO) on the target. Therefore, the mapping in the process_name. prm file is:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, CUST_CODE2=CUST_CODE, CITY2=CITY);

In this example the mapping definition is as follows:

Source schema GG is mapped to target schema GG_AVRO.
Source column CUST_CODE is mapped to target column CUST_CODE2.
Source column CITY is mapped to target column CITY2.
USEDEFAULTS specifies that rest of the columns names are same on both source and target (NAME and STATE columns).

This example uses the following Avro schema definition file:

File path: /home/ggadp/avromdpGG_AVRO.TCUSTMER_AVRO.mdp.avsc

{"namespace": "GG_AVRO",
"type": "record",
"name": "TCUSTMER_AVRO",
"fields": [
     {"name": "NAME", "type": "string"},
    {"name": "CUST_CODE2",  "type": "string", "primary_key":true},
     {"name": "CITY2", "type": "string"},
     {"name": "STATE", "type": ["string","null"]}
]
}

The configuration in the Java Adapter properties file includes the following:

gg.mdp.type = avro
gg.mdp.schemaFilesPath = /home/ggadp/avromdp

The following sample output uses a delimited text formatter with a semi-colon as the delimiter:

I;GG_AVRO.TCUSTMER_AVRO;2013-06-02 22:14:36.000000;NAME;BG SOFTWARE CO;CUST_CODE2;WILL;CITY2;SEATTLE;STATE;WA

Oracle GoldenGate for Big Data includes a sample Replicat configuration file, a sample Java Adapter properties file, and sample Avro schemas at the following location:

GoldenGate_install_directory/AdapterExamples/big-data/metadata_provider/avro

Parent topic: Avro Metadata Provider

15.2.6 Metadata Change Events

If the DDL changes in the source database tables, you may need to modify the Avro schema definitions and the mappings in the Replicat configuration file. You may also want to stop or suspend the Replicat process in the case of a metadata change event. You can stop the Replicat process by adding the following line to the Replicat configuration file (process_name. prm):

DDL INCLUDE ALL, EVENTACTIONS (ABORT)

Alternatively, you can suspend the Replicat process by adding the following line to the Replication configuration file:

DDL INCLUDE ALL, EVENTACTIONS (SUSPEND)

Parent topic: Avro Metadata Provider

15.2.7 Limitations

Avro bytes data type cannot be used as primary key.

The source-to-target mapping that is defined in the Replicat configuration file is static. Oracle GoldenGate 12.2 and later support DDL propagation and source schema evolution for Oracle Databases as replication source. If you use DDL propagation and source schema evolution, you lose the ability to seamlessly handle changes to the source metadata.

Parent topic: Avro Metadata Provider

15.2.8 Troubleshooting

This section contains the information about how to troubleshoot the following issues:

Topics:

Parent topic: Avro Metadata Provider

15.2.8.1 Invalid Schema Files Location

The Avro schema files directory specified in the gg.mdp.schemaFilesPath configuration property must be a valid directory.If the path is not valid, you encounter following exception:

oracle.goldengate.util.ConfigException: Error initializing Avro metadata provider
Specified schema location does not exist. {/path/to/schema/files/dir}

Parent topic: Troubleshooting

15.2.8.2 Invalid Schema File Name

For every table that is mapped in the process_name.prm file, you must create a corresponding Avro schema file in the directory that is specified in gg.mdp.schemaFilesPath.

For example, consider the following scenario:

Mapping:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2=cust_code, CITY2 = CITY);

Property:

gg.mdp.schemaFilesPath=/home/usr/avro/

In this scenario, you must create a file called GG_AVRO.TCUSTMER_AVRO.mdp.avsc in the /home/usr/avro/ directory.

If you do not create the /home/usr/avro/GG_AVRO.TCUSTMER_AVRO.mdp.avsc file, you encounter the following exception:

java.io.FileNotFoundException: /home/usr/avro/GG_AVRO.TCUSTMER_AVRO.mdp.avsc

Parent topic: Troubleshooting

15.2.8.3 Invalid Namespace in Schema File

The target schema name specified in Replicat mapping must be same as the namespace in the Avro schema definition file.

For example, consider the following scenario:

Mapping:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2 = cust_code, CITY2 = CITY);
 
Avro Schema Definition:
 
{
"namespace": "GG_AVRO",
..
}

In this scenario, Replicat abends with following exception:

Unable to retrieve table matadata. Table : GG_AVRO.TCUSTMER_AVRO
Mapped [catalogname.]schemaname (GG_AVRO) does not match with the schema namespace {schema namespace}

Parent topic: Troubleshooting

15.2.8.4 Invalid Table Name in Schema File

The target table name that is specified in Replicat mapping must be same as the name in the Avro schema definition file.

For example, consider the following scenario:

Mapping:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2 = cust_code, CITY2 = CITY);

Avro Schema Definition:

{
"namespace": "GG_AVRO",
"name": "TCUSTMER_AVRO",
..
}

In this scenario, if the target table name specified in Replicat mapping does not match with the Avro schema name, then REPLICAT abends with following exception:

Unable to retrieve table matadata. Table : GG_AVRO.TCUSTMER_AVRO
Mapped table name (TCUSTMER_AVRO) does not match with the schema table name {table name}

Parent topic: Troubleshooting

15.3 Java Database Connectivity Metadata Provider

The Java Database Connectivity (JDBC) Metadata Provider is used to retrieve the table metadata from any target database that supports a JDBC connection and has a database schema. The JDBC Metadata Provider is the preferred metadata provider for any target database that is an RDBMS, although various other non-RDBMS targets also provide a JDBC driver.

Topics:

Parent topic: Using the Metadata Providers

15.3.1 JDBC Detailed Functionality

The JDBC Metadata Provider uses the JDBC driver that is provided with your target database. The DBC driver retrieves the metadata for every target table that is mapped in the Replicat properties file. Replicat processes use the retrieved target metadata to map columns.

You can enable this feature for JDBC Handler by configuring the REPERROR property in your Replicat parameter file. In addition, you need to define the error codes specific to your RDBMS JDBC target in the JDBC Handler properties file as follows:

Table 15-1 JDBC REPERROR Codes

Property	Value	Required
gg.error.duplicateErrorCodes	Comma-separated integer values of error codes that indicate duplicate errors	No
gg.error.notFoundErrorCodes	Comma-separated integer values of error codes that indicate Not Found errors	No
gg.error.deadlockErrorCodes	Comma-separated integer values of error codes that indicate deadlock errors	No

For example:

#ErrorCode
gg.error.duplicateErrorCodes=1062,1088,1092,1291,1330,1331,1332,1333
gg.error.notFoundErrorCodes=0
gg.error.deadlockErrorCodes=1213

To understand how the various JDBC types are mapped to database-specific SQL types, see https://docs.oracle.com/javase/6/docs/technotes/guides/jdbc/getstart/mapping.html#table1.

Parent topic: Java Database Connectivity Metadata Provider

15.3.2 Java Classpath

The JDBC Java Driver location must be included in the class path of the handler using the gg.classpath property.

For example, the configuration for a MySQL database might be:

gg.classpath= /path/to/jdbc/driver/jar/mysql-connector-java-5.1.39-bin.jar

Parent topic: Java Database Connectivity Metadata Provider

15.3.3 JDBC Metadata Provider Configuration

The following are the configurable values for the JDBC Metadata Provider. These properties are located in the Java Adapter properties file (not in the Replicat properties file).

Table 15-2 JDBC Metadata Provider Properties

Properties	Required/ Optional	Legal Values	Default	Explanation
`gg.mdp.type`	Required	`jdbc`	None	Entering `jdbc` at a command prompt activates the use of the JDBC Metadata Provider.
`gg.mdp.ConnectionUrl`	Required	`jdbc:subprotocol:subname`	None	The target database JDBC URL.
`gg.mdp.DriverClassName`	Required	Java class name of the JDBC driver	None	The fully qualified Java class name of the JDBC driver.
`gg.mdp.userName`	Optional	A legal username string.	None	The user name for the JDBC connection. Alternatively, you can provide the user name using the `ConnectionURL` property.
`gg.mdp.password`	Optional	A legal password string	None	The password for the JDBC connection. Alternatively, you can provide the password using the `ConnectionURL` property.

Parent topic: Java Database Connectivity Metadata Provider

15.3.4 Review a Sample Configuration

MySQL Driver Configuration

gg.mdp.type=jdbc
gg.mdp.ConnectionUrl=jdbc:oracle:thin:@myhost:1521:orcl
gg.mdp.DriverClassName=oracle.jdbc.driver.OracleDriver
gg.mdp.UserName=username
gg.mdp.Password=password

Netezza Driver Configuration

gg.mdp.type=jdbc
gg.mdp.ConnectionUrl=jdbc:netezza://hostname:port/databaseName
gg.mdp.DriverClassName=org.netezza.Driver
gg.mdp.UserName=username
gg.mdp.Password=password

Oracle OCI Driver configuration

ggg.mdp.type=jdbc
gg.mdp.ConnectionUrl=jdbc:oracle:oci:@myhost:1521:orcl
gg.mdp.DriverClassName=oracle.jdbc.driver.OracleDriver
gg.mdp.UserName=username
gg.mdp.Password=password

Oracle Teradata Driver configuration

gg.mdp.type=jdbc
gg.mdp.ConnectionUrl=jdbc:teradata://10.111.11.111/USER=username,PASSWORD=password
gg.mdp.DriverClassName=com.teradata.jdbc.TeraDriver
gg.mdp.UserName=username
gg.mdp.Password=password

Oracle Thin Driver Configuration

gg.mdp.type=jdbc
gg.mdp.ConnectionUrl=jdbc:mysql://localhost/databaseName?user=username&password=password
gg.mdp.DriverClassName=com.mysql.jdbc.Driver
gg.mdp.UserName=username
gg.mdp.Password=password

Redshift Driver Configuration

gg.mdp.type=jdbc
gg.mdp.ConnectionUrl=jdbc:redshift://hostname:port/databaseName
gg.mdp.DriverClassName=com.amazon.redshift.jdbc42.Driver
gg.mdp.UserName=username
gg.mdp.Password=password

Parent topic: Java Database Connectivity Metadata Provider

15.4 Hive Metadata Provider

The Hive Metadata Provider is used to retrieve the table metadata from a Hive metastore. The metadata is retrieved from Hive for every target table that is mapped in the Replicat properties file using the COLMAP parameter. The retrieved target metadata is used by Replicat for the column mapping functionality.

Topics:

Parent topic: Using the Metadata Providers

15.4.1 Detailed Functionality

The Hive Metadata Provider uses both Hive JDBC and HCatalog interfaces to retrieve metadata from the Hive metastore. For each table mapped in the process_name.prm file, a corresponding table is created in Hive.

The default Hive configuration starts an embedded, local metastore Derby database. Because, Apache Derby is designed to be an embedded database, it allows only a single connection. The limitation of the Derby Database means that it cannot function when working with the Hive Metadata Provider. To workaround this limitation this, you must configure Hive with a remote metastore database. For more information about how to configure Hive with a remote metastore database, see https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration.

Hive does not support Primary Key semantics, so the metadata retrieved from Hive metastore does not include a primary key definition. When you use the Hive Metadata Provider, use the Replicat KEYCOLS parameter to define primary keys.

KEYCOLS

Use the KEYCOLS parameter must be used to define primary keys in the target schema. The Oracle GoldenGate HBase Handler requires primary keys. Therefore, you must set primary keys in the target schema when you use Replicat mapping with HBase as the target.

The output of the Avro formatters includes an Array field to hold the primary column names. If you use Replicat mapping with the Avro formatters, consider using KEYCOLS to identify the primary key columns.

For example configurations of KEYCOLS, see Review a Sample Configuration.

Supported Hive Data types

BIGINT
BINARY
BOOLEAN
CHAR
DATE
DECIMAL
DOUBLE
FLOAT
INT
SMALLINT
STRING
TIMESTAMP
TINYINT
VARCHAR

See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types.

Parent topic: Hive Metadata Provider

15.4.2 Configuring Hive with a Remote Metastore Database

You can find a list of supported databases that you can use to configure remote Hive metastore can be found at https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-SupportedBackendDatabasesforMetastore.

The following example shows a MySQL database is configured as the Hive metastore using properties in the ${HIVE_HOME}/conf/hive-site.xml Hive configuration file.

Note:

The ConnectionURL and driver class used in this example are specific to MySQL database. If you use a database other than MySQL, then change the values to fit your configuration.

<property>
         <name>javax.jdo.option.ConnectionURL</name>	
         <value>jdbc:mysql://MYSQL_DB_IP:MYSQL_DB_PORT/DB_NAME?createDatabaseIfNotExist=false</value>
 </property>
 
 <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
 </property>
 
 <property>
          <name>javax.jdo.option.ConnectionUserName</name>
     <value>MYSQL_CONNECTION_USERNAME</value>
 </property>
 
 <property>
         <name>javax.jdo.option.ConnectionPassword</name>
         <value>MYSQL_CONNECTION_PASSWORD</value>
 </property>

To see a list of parameters to configure in the hive-site.xml file for a remote metastore, see https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastoreDatabase.

Note:

Follow these steps to add the MySQL JDBC connector JAR in the Hive classpath:

In HIVE_HOME/lib/ directory. DB_NAME should be replaced by a valid database name created in MySQL.
Start the Hive Server:

HIVE_HOME/bin/hiveserver2/bin/hiveserver2
Start the Hive Remote Metastore Server:

HIVE_HOME/bin/hive --service metastore

Parent topic: Hive Metadata Provider

15.4.3 Classpath Configuration

For the Hive Metadata Provider to connect to Hive, you must configure thehive-site.xml file and the Hive and HDFS client jars in the gg.classpath variable. The client JARs must match the version of Hive to which the Hive Metadata Provider is connecting.

For example, if the hive-site.xml file is created in the /home/user/oggadp/dirprm directory, then gg.classpath entry is gg.classpath=/home/user/oggadp/dirprm/

Create a hive-site.xml file that has the following properties:

<configuration>
<!-- Mandatory Property --> 
<property>
<name>hive.metastore.uris</name>
<value>thrift://HIVE_SERVER_HOST_IP:9083</value>
<property>
 
<!-- Optional Property. Default value is 5 -->
<property>
<name>hive.metastore.connect.retries</name>
<value>3</value>
</property>
 
<!-- Optional Property. Default value is 1 -->
<property>
<name>hive.metastore.client.connect.retry.delay</name>
<value>10</value>
</property>
 
<!-- Optional Property. Default value is 600 seconds -->
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>50</value>
</property>

 </configuration>

By default, the following directories contain the Hive and HDFS client jars:
```
HIVE_HOME/hcatalog/share/hcatalog/*
HIVE_HOME/lib/*
HIVE_HOME/hcatalog/share/webhcat/java-client/*
HADOOP_HOME/share/hadoop/common/*
HADOOP_HOME/share/hadoop/common/lib/*
HADOOP_HOME/share/hadoop/mapreduce/*
```
Configure the gg.classpath exactly as shown in the step 1. The path to the hive-site.xml file must be the path with no wildcard appended. If you include the * wildcard in the path to the hive-site.xml file, it will not be located. The path to the dependency JARs must include the * wildcard character to include all of the JAR files in that directory in the associated classpath. Do not use *.jar.

Parent topic: Hive Metadata Provider

15.4.4 Hive Metadata Provider Configuration Properties

Property	Required/Optional	Legal Values	Default	Explanation
`gg.mdp.type`	Required	`hive`	`-`	Selects the Hive Metadata Provider
`gg.mdp.connectionUrl`	Required	Format without Kerberos Authentication: j`dbc:hive2://HIVE_SERVER_IP:HIVE_JDBC_PORT/HIVE_DB` Format with Kerberos Authentication: `jdbc:hive2://HIVE_SERVER_IP:HIVE_JDBC_PORT/HIVE_DB;` `principal=user/FQDN@MY.REALM`	`-`	The JDBC connection URL of the Hive server
`gg.mdp.driverClassName`	Required	`org.apache.hive.jdbc.HiveDriver`	`-`	The fully qualified Hive JDBC driver class name
`gg.mdp.userName`	Optional	Valid username	`""`	The user name for connecting to the Hive database. The `userName` property is not required when Kerberos authentication is used. The Kerberos principal should be specified in the connection URL as specified in `connectionUrl` property's legal values.
`gg.mdp.password`	Optional	Valid Password	`""`	The password for connecting to the Hive database
`gg.mdp.charset`	Optional	Valid character set	`UTF-8`	The character set of the column with the character data type. Used to convert the source data from the trail file to the correct target character set.
`gg.mdp.nationalCharset`	Optional	Valid character set	`UTF-8`	The character set of the column with the national character data type. Used to convert the source data from the trail file to the correct target character set. For example, this property may indicate the character set of columns, such as `NCHAR` and `NVARCHAR` in an Oracle database.
`gg.mdp.authType`	Optional	Kerberos	none	Allows you to designate Kerberos authentication to Hive.
`gg.mdp.kerberosKeytabFile`	Optional (Required if `authType=kerberos`)	Relative or absolute path to a Kerberos keytab file.	`-`	The `keytab` file allows Hive to access a password to perform the `kinit` operation for Kerberos security.
`gg.mdp.kerberosPrincipal`	Optional (Required if `authType=kerberos`)	A legal Kerberos principal name(`user/FQDN@MY.REALM`)	`-`	The Kerberos principal name for Kerberos authentication.

Parent topic: Hive Metadata Provider

15.4.5 Review a Sample Configuration

This is an example for configuring the Hive Metadata Provider. Consider a source with following table:

TABLE GG.TCUSTMER {
     CUST_CODE VARCHAR(4)   PRIMARY KEY,
     NAME VARCHAR(100),
     CITY VARCHAR(200),
     STATE VARCHAR(200)}

The example maps the column CUST_CODE (GG.TCUSTMER) in the source to CUST_CODE2 (GG_HIVE.TCUSTMER_HIVE) on the target and column CITY (GG.TCUSTMER) in the source to CITY2 (GG_HIVE.TCUSTMER_HIVE)on the target.

Mapping configuration in the process_name. prm file includes the following configuration:

MAP GG.TCUSTMER, TARGET GG_HIVE.TCUSTMER_HIVE, COLMAP(USEDEFAULTS, CUST_CODE2=CUST_CODE, CITY2=CITY) KEYCOLS(CUST_CODE2);

In this example:

The source schema GG is mapped to the target schema GG_HIVE.
The source column CUST_CODE is mapped to the target column CUST_CODE2.
The source column CITY is mapped to the target column CITY2.
USEDEFAULTS specifies that rest of the column names are same on both source and target (NAME and STATE columns).
KEYCOLS is used to specify that CUST_CODE2 should be treated as primary key.

Because primary keys cannot be specified in the Hive DDL, the KEYCOLS parameter is used to specify the primary keys.

Note:

You can choose any schema name and are not restricted to the gg_hive schema name. The Hive schema can be pre-existing or newly created. You do this by modifying the connection URL (gg.mdp.connectionUrl) in the Java Adapter properties file and the mapping configuration in the Replicat.prm file. Once the schema name is changed, update the connection URL (gg.mdp.connectionUrl) and mapping in the Replicat.prm file.

You can create the schema and tables for this example in Hive by using the following commands. You can create the schema and tables for this example in Hive by using the following commands. To start the Hive CLI use the following command:

HIVE_HOME/bin/hive

To create the GG_HIVE schema, in Hive, use the following command:

hive> create schema gg_hive;
OK
Time taken: 0.02 seconds

To create the TCUSTMER_HIVE table in the GG_HIVE database, use the following command:

hive> CREATE EXTERNAL TABLE `TCUSTMER_HIVE`(
    >   "CUST_CODE2" VARCHAR(4),
    >   "NAME" VARCHAR(30),
    >   "CITY2" VARCHAR(20),
    >   "STATE" STRING);
OK
Time taken: 0.056 seconds

Configure the .properties file in a way that resembles the following:

gg.mdp.type=hive
gg.mdp.connectionUrl=jdbc:hive2://HIVE_SERVER_IP:10000/gg_hive
gg.mdp.driverClassName=org.apache.hive.jdbc.HiveDriver

The following sample output uses the delimited text formatter, with a comma as the delimiter:

I;GG_HIVE.TCUSTMER_HIVE;2015-10-07T04:50:47.519000;cust_code2;WILL;name;BG SOFTWARE CO;city2;SEATTLE;state;WA

A sample Replicat configuration file, Java Adapter properties file, and Hive create table SQL script are included with the installation at the following location:

GoldenGate_install_directory/AdapterExamples/big-data/metadata_provider/hive

Parent topic: Hive Metadata Provider

15.4.6 Security

You can secure the Hive server using Kerberos authentication. For information about how to secure the Hive server, see the Hive documentation for the specific Hive release. The Hive Metadata Provider can connect to a Kerberos secured Hive server.

Make sure that the paths to the HDFS core-site.xml file and the hive-site.xml file are in the handler's classpath.

Enable the following properties in the core-site.xml file:

<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value> 
</property>
 
<property> 
<name>hadoop.security.authorization</name> 
<value>true</value> 
</property>

Enable the following properties in the hive-site.xml file:

<property>
<name>hive.metastore.sasl.enabled</name>
<value>true</value>
</property>
 
<property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>/path/to/keytab</value> <!-- Change this value -->
</property>
 
<property>
<name>hive.metastore.kerberos.principal</name>
<value>Kerberos Principal</value> <!-- Change this value -->
</property>
 
<property>
   <name>hive.server2.authentication</name>
    <value>KERBEROS</value>
</property>
 
<property>
   <name>hive.server2.authentication.kerberos.principal</name>
    <value>Kerberos Principal</value> <!-- Change this value -->
</property>
 
<property>
    <name>hive.server2.authentication.kerberos.keytab</name>
    <value>/path/to/keytab</value> <!-- Change this value -->
</property>

Parent topic: Hive Metadata Provider

15.4.7 Metadata Change Event

Tables in Hive metastore should be updated, altered, or created manually if the source database tables change. In the case of a metadata change event, you may wish to terminate or suspend the Replicat process. You can terminate the Replicat process by adding the following to the Replicat configuration file (process_name. prm):

DDL INCLUDE ALL, EVENTACTIONS (ABORT)

You can suspend, the Replicat process by adding the following to the Replication configuration file:

DDL INCLUDE ALL, EVENTACTIONS (SUSPEND)

Parent topic: Hive Metadata Provider

15.4.8 Limitations

Columns with binary data type cannot be used as primary keys.

The source-to-target mapping that is defined in the Replicat configuration file is static. Oracle GoldenGate 12.2 and later versions supports DDL propagation and source schema evolution for Oracle databases as replication sources. If you use DDL propagation and source schema evolution, you lose the ability to seamlessly handle changes to the source metadata.

Parent topic: Hive Metadata Provider

15.4.9 Additional Considerations

The most common problems encountered are the Java classpath issues. The Hive Metadata Provider requires certain Hive and HDFS client libraries to be resolved in its classpath.

The required client JAR directories are listed in Classpath Configuration. Hive and HDFS client JARs do not ship with Oracle GoldenGate for Big Data. The client JARs should be of the same version as the Hive version to which the Hive Metadata Provider is connecting.

To establish a connection to the Hive server, the hive-site.xml file must be in the classpath.

Parent topic: Hive Metadata Provider

15.4.10 Troubleshooting

If the mapped target table is not present in Hive, the Replicat process will terminate with a "Table metadata resolution exception".

For example, consider the following mapping:

MAP GG.TCUSTMER, TARGET GG_HIVE.TCUSTMER_HIVE, COLMAP(USEDEFAULTS, CUST_CODE2=CUST_CODE, CITY2=CITY) KEYCOLS(CUST_CODE2);

This mapping requires a table called TCUSTMER_HIVE to be created in the schema GG_HIVE in the Hive metastore. If this table is not present in Hive, then the following exception occurs:

ERROR [main) - Table Metadata Resolution Exception
Unable to retrieve table matadata. Table : GG_HIVE.TCUSTMER_HIVE
NoSuchObjectException(message:GG_HIVE.TCUSTMER_HIVE table not found)

Parent topic: Hive Metadata Provider