7 Using the Metadata Provider

This chapter explains the Metadata Provider functionality, different types of Metadata Providers and examples that can be used to understand the functionality.

This chapter contains the following:

Valid only if handlers are configured to run with Replicat process.

The Replicat process provides functionality to perform source table to target table and source column to target column mapping using syntax in the Replicat configuration file. This mapping syntax is documented as part of the Oracle GoldenGate Replicat documentation. The source metadata definitions are included in the Oracle GoldenGate trail file (or by source definitions files for Oracle GoldenGate versions older than 12.2). When the replication target is a database, the Replicat process obtains the target metadata definitions from the target database. However, this is a shortcoming when pushing data to Big Data applications or Java Delivery in general. Big Data applications generally provide no target metadata therefore the Replicat mapping is not possible. The Metadata Provider exists to address this deficiency. The Metadata Provider can be used to define target metadata using either Avro or Hive which in turn enables source table to target table and source column to target column Replicat mapping. The use of the Metadata Provider is not required. If the metadata included in the source Oracle GoldenGate trail file is acceptable for the output, then do not use the Metadata Provider. The Metadata Provider should be used in the following cases:

  • The requirement is for mapping source table names into target table names that do not match.

  • The requirement is for mapping of source column names into target column name that do not match.

  • The requirement is for the inclusion of certain columns from the source trail file and omitting other columns.

Replicat mapping has a general limitation in that the mapping defined in the Replicat configuration file is static. Oracle GoldenGate 12.2 provides functionality for DDL propagation when using an Oracle Database as the source. Therefore the proper handling of schema evolution can be problematic when the Metadata Provider and Replicat mapping are being used. Users will need to consider their particular use cases for schema evolution and plan for how they will update the Metadata Provider and the Replicat mapping syntax for required changes.

For every table mapped in REPLICAT using COLMAP, the metadata will be retrieved from a configured metadata provider and retrieved metadata will then be used by REPLICAT for column mapping functionality.

Users have the choice of configuring one Metadata Provider implementation. Currently Hive and Avro Metadata Providers are supported. Details on configuring Hive and Avro Metadata Providers will be discussed in the following sections.

The Metadata Provider is optional and is enabled if the gg.mdp.type property is specified in the Java Adapter Properties file.

Scenarios - When to use Metadata Provider

  1. The following scenarios do not require the Metadata Provider to be configured:

    The mapping of schema name whereby the source schema named GG is mapped to the target schema named GGADP.*

    The mapping of schema and table name whereby the schema GG.TCUSTMER is mapped to the table name GGADP.TCUSTMER_NEW

    MAP GG.*, TARGET GGADP.*;
    (OR)
    MAP GG.TCUSTMER, TARGET GG_ADP.TCUSTMER_NEW;
    
  2. The following scenario requires Metadata Provider to be configured:

    The mapping of column names whereby the source column name does not match the target column name. For example source column CUST_CODE mapped to target column CUST_CODE_NEW

    MAP GG.TCUSTMER, TARGET GG_ADP.TCUSTMER_NEW, COLMAP(USEDEFAULTS, CUST_CODE_NEW=CUST_CODE, CITY2=CITY);
    

7.1 Avro Metadata Provider

The Avro Metadata Provider is used to retrieve the table metadata from Avro Schema files. For every table mapped in REPLICAT using COLMAP, the metadata will be retrieved from Avro Schema and retrieved metadata will then be used by REPLICAT for column mapping.

This section contains the following:

7.1.1 Detailed Functionality

The Avro Metadata Provider uses Avro schema definition files to retrieve metadata. The Avro schemas are defined using the JSON. For each table mapped in Replicat.prm file, a corresponding The Avro schema definition file should be created. More information on defining Avro schemas can be found at:

http://avro.apache.org/docs/current/gettingstartedjava.html#Defining+a+schema

Avro Metadata Provider Schema definition syntax:

{"namespace": "[$catalogname.]$schemaname",
"type": "record",
"name": "$tablename",
"fields": [
     {"name": "$col1", "type": "$datatype"},
     {"name": "$col2 ",  "type": "$datatype ", "primary_key":true}, 
     {"name": "$col3", "type": "$datatype ", "primary_key":true}, 
     {"name": "$col4", "type": ["$datatype","null"]}   
   ]
}
 
namespace            - name of catalog/schema being mapped
name                 - name of the table being mapped
fields.name          - array of column names
fields.type          - datatype of the column
fields.primary_key   - indicates the column is part of primary key.

Representing nullable and not nullable columns:

"type":"$datatype" - indicates the column is not nullable, where "$datatype" is the actual datatype.
"type": ["$datatype","null"] - indicates the column is nullable, where "$datatype" is the actual datatype

The file naming convention for Avro schema files accessed by the Avro Metadata Provider must be in the following format:

[$catalogname.]$schemaname.$tablename.mdp.avsc
 
$catalogname    - name of the catalog if exists
$schemaname   - name of the schema
$tablename        - name of the table
.mdp.avsc           -  constant, which should be appended always

Supported Avro Data Types:

  • boolean

  • bytes

  • double

  • float

  • int

  • long

  • string

For more information on Avro data types, see https://avro.apache.org/docs/1.7.5/spec.html#schema_primitive.

7.1.2 Runtime Prerequisites

The Avro schema definitions should be created for all tables mapped in Replicat's parameter file before starting the Replicat process.

7.1.3 Classpath Configuration

There is no additional classpath setting required for Avro Metadata Provider.

7.1.4 Avro Metadata Provider Configuration

The configuration properties of Oracle GoldenGate Avro Metadata Provider are detailed in this section


Property Required/Optional Legal Values Default Explanation

gg.mdp.type

Required

avro

-

Selects Avro Metadata Provider

gg.mdp.schemaFilesPath

Required

Example for a legal value could be /home/user/ggadp/avroschema/

-

Path to Avro schema files directory

gg.mdp.charset

Optional

Valid character set

UTF-8

Specifies the character set of the column with character data type. This is used to convert the source data from trail file to the correct target charset.

gg.mdp.nationalCharset

Optional

Valid character set

UTF-8

Specifies the character set of the column with character data type. This is used to convert the source data from trail file to the correct target charset.

Example: Indicates character set of columns like NCHAR, NVARCHAR in case of Oracle Database.


7.1.5 Sample Configuration

This section provides an example for configuring the Avro Metadata Provider. Consider a source with following table:

TABLE GG.TCUSTMER {
     CUST_CODE VARCHAR(4) PRIMARY KEY,
     NAME VARCHAR(100),
     CITY VARCHAR(200),
     STATE VARCHAR(200)
}

Map column CUST_CODE (GG.TCUSTMER) in source to CUST_CODE2 (GG_AVRO.TCUSTMER_AVRO) on target and column CITY (GG.TCUSTMER) in source to CITY2 (GG_AVRO.TCUSTMER_AVRO) on target.

Mapping in Replicat .prm file:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, CUST_CODE2=CUST_CODE, CITY2=CITY);
 
Mapping definition in this example:
1.Source schema GG is mapped to target schema GG_AVRO
2.Source column CUST_CODE is mapped to target column CUST_CODE2
3.Source column CITY is mapped to target column CITY2
4.USEDEFAULTS specifies that rest of the columns names are same on both source and target (NAME and STATE columns).

The Avro schema definition file for the preceding example:

File path: /home/ggadp/avromdpGG_AVRO.TCUSTMER_AVRO.mdp.avsc

{"namespace": "GG_AVRO",
"type": "record",
"name": "TCUSTMER_AVRO",
"fields": [
     {"name": "NAME", "type": "string"},
    {"name": "CUST_CODE2",  "type": "string", "primary_key":true},
     {"name": "CITY2", "type": "string"},
     {"name": "STATE", "type": ["string","null"]}
]
}

The configuration in the Java Adapter Properties file will include the following:

gg.mdp.type = avro
gg.mdp.schemaFilesPath = /home/ggadp/avromdp

Following is the sample output using delimited text formatter with ";" as delimiter for the preceding example.

I;GG_AVRO.TCUSTMER_AVRO;2013-06-02 22:14:36.000000;NAME;BG SOFTWARE CO;CUST_CODE2;WILL;CITY2;SEATTLE;STATE;WA

The Oracle GoldenGate for Big Data installation include a sample Replicat configuration file, a sample Java Adapter properties file, and sample Avro schemas at:

GoldenGate_install_directory/AdapterExamples/big-data/metadata_provider/avro

7.1.6 Metadata Change Event

The Avro schema definitions and the mappings in the Replicat configuration file may need to be modified if there is a DDL change in the source database tables. You may wish to abort or suspend the Replicat process in the case of a metadata change event. The Replicat process can be aborted by adding the following to the Replicat configuration file (.prm file).

DDL INCLUDE ALL, EVENTACTIONS (ABORT)

Alternatively, the Replicat process can be suspended by adding the following to the Replication configuration file (.prm file).

DDL INCLUDE ALL, EVENTACTIONS (SUSPEND)

7.1.7 Limitations

Avro bytes data type cannot be used as primary key.

The source to target mapping defined in the Replicat configuration file is static. Oracle GoldenGate 12.2 supports DDL propagation and source schema evolution for Oracle Databases as the replication source. However, evolution of the source schemas may be problematic the static mapping configured in the Replicat configuration file.

7.1.8 Troubleshooting

This section contains the following:

7.1.8.1 Invalid Schema Files Location

The Avro schema files directory location specified by the configuration property gg.mdp.schemaFilesPath should be a valid directory. Failure to configure a valid directory in gg.mdp.schemaFilesPath property leads to following exception:

oracle.goldengate.util.ConfigException: Error initializing Avro metadata provider
Specified schema location does not exist. {/path/to/schema/files/dir}

7.1.8.2 Invalid Schema File Name

For every table mapped in Replicat.prm file, a corresponding Avro schema file must be created in the directory specified in gg.mdp.schemaFilesPath.

For example, consider the following scenario:

Mapping:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2=cust_code, CITY2 = CITY);
 

Property:

gg.mdp.schemaFilesPath=/home/usr/avro/

A file called GG_AVRO.TCUSTMER_AVRO.mdp.avsc must be created in the /home/usr/avro/ directory. that is, /home/usr/avro/GG_AVRO.TCUSTMER_AVRO.mdp.avsc

Failing to create the /home/usr/avro/GG_AVRO.TCUSTMER_AVRO.mdp.avsc file results in the following exception:

java.io.FileNotFoundException: /home/usr/avro/GG_AVRO.TCUSTMER_AVRO.mdp.avsc

7.1.8.3 Invalid Namespace in Schema File

The target schema name specified in REPLICAT mapping must be same as namespace in the Avro schema definition file.

For example, consider the following scenario:

Mapping:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2 = cust_code, CITY2 = CITY);
 
Avro Schema Definition:
 
{
"namespace": "GG_AVRO",
..
}

In this scenario, REPLICAT abends with following exception if the target schema name specified in Replicat mapping does not match with Avro schema namespace:

Unable to retrieve table matadata. Table : GG_AVRO.TCUSTMER_AVRO
Mapped [catalogname.]schemaname (GG_AVRO) does not match with the schema namespace {schema namespace}

7.1.8.4 Invalid Table Name in Schema File

The target table name specified in Replicat mapping must be same as name in the Avro schema definition file.

For example, consider the following scenario:

Mapping:

MAP GG.TCUSTMER, TARGET GG_AVRO.TCUSTMER_AVRO, COLMAP(USEDEFAULTS, cust_code2 = cust_code, CITY2 = CITY);
 
Avro Schema Definition:
 
{
"namespace": "GG_AVRO",
"name": "TCUSTMER_AVRO",
..
}

In this scenario, REPLICAT abends with following exception if the target table name specified in Replicat mapping does not match with Avro schema name.

Unable to retrieve table matadata. Table : GG_AVRO.TCUSTMER_AVRO
Mapped table name (TCUSTMER_AVRO) does not match with the schema table name {table name}