6 Using the Pluggable Formatters

Formatters provide the functionality to convert operations from the Oracle GoldenGate trail file info formatted messages that can then be sent to Big Data targets by one of the Oracle GoldenGate for Big Data Handlers. The Oracle GoldenGate for Big Data release ships with the following five pluggable formatters:

  • Delimited Text

  • JSON

  • XML

  • Avro Row

  • Avro Operation

This chapter contains the following sections:

6.1 Operation versus Row Based Formatting

The Oracle GoldenGate for Big Data formatters are of two categories, operation based formatters and row based formatters. Operations represent the individual insert, update, and delete events that occur on table data in the source database. Insert operations only provide after change data (or images) since a new row is being added to the source database. Update operations provide both before and after change data which shows how existing row data is modified. Delete operations only provide before change data to provide identification of the row being deleted. The operation based formatters model the operation as it is exists in the source trail file. Operation based formats include fields for the before and after images. The row based formatters model the row data as it exists after the operation data is applied. Row based formatters only contain a single image of the data. The following represents what data is displayed for both the operation and row based formatters.

6.1.1 Operation Formatters

The formatters that support operation based formatting are JSON, Avro Operation, and XML. The output of operation based formatters are as follows:

  • Insert Operation - Before image data is NULL. After image data is output.

  • Update Operation - Both before and after image data is output.

  • Delete Operation - Before image data is output. After image data is NULL.

  • Truncate Operation - Both before and after image data is NULL.

6.1.2 Row Formatters

The formatters that support row based formatting area Delimited Text and Avro Row. Row based formatters output the following information for the following operations.

  • Insert Operation - After image data only.

  • Update Operation - After image data only. Primary key updates are a special case which will be discussed in individual sections for the specific formatters.

  • Delete Operation - Before image data only.

  • Truncate Operation - Table name is provided but both before and after image data are NULL. Truncate table is a DDL operation and it may not support different database implementations. Refer to the Oracle GoldenGate documentation for your database implementation.

  • Table Row or Column Value States

  • In an RDBMS, table data for a specific row and column can only have one of two states. Either the table row/column value has a value or the row/column value is NULL. However when data is transferred to the Oracle GoldenGate trail file by the Oracle GoldenGate capture process, this can expand to three possible states: the table row/column has a value, the row/column value is NULL, or the row/column value is missing.

  • For an insert operation it is reasonable to expect that the after image contains data for all column values whether that column has a value or is NULL. However, the data included for update and delete operations may not always contain complete data for all columns. When replicating data to an RDBMS for an update operation the only data that is required to modify the data in the target database are the primary key values and the values of the columns that changed. Additionally, for a delete operation it is only necessary to have the primary key values to delete the row from the target database. Therefore, even though table row/column values have a value in the source database, the values may be missing in the source trail file. Because it is possible for row/column data in the source trail file to have three states, the Big Data Formatters must also be able to represent data in the three states.

  • What row/column data is available in the Oracle GoldenGate trail file will have an impact on Big Data integrations. It is important for you to understand what data is required. You typically has control on what data is included for operations in the Oracle GoldenGate trail file. For Oracle Databases this is controlled by the supplemental logging level. Refer to the Oracle GoldenGate documentation for your specific source database implementation to understand how to control the row/column values that are included in the Oracle GoldenGate trail file.

6.2 Delimited Text Formatter

The Delimited Text Formatter is a row based formatter. It formats database operations from the source trail file into a delimited text output. Each insert, update, delete, or truncate operation from the source trail will be formatted into an individual delimited message. Delimited text output will be a fixed number of fields for each table separated by a field delimiter and terminated by a line delimiter. The fields are positionally relevant. Many Big Data analytical tools including Hive work well with HDFS files containing delimited text.

Column values for an operation from the source trail file can have one of three states: column has a value, column value is NULL, or column value is missing. By default the delimited text maps these column value states into the delimited text output as follows:

  • Column has a value - The column value is output.

  • Column value is NULL - The default output value is NULL. The output for the case of a NULL column value is configurable.

  • Column value is missing - The default output value is "". The output for the case of a missing column value is configurable.

6.2.1 Message Formatting Details

The default format for output of data is the following:

First is the row metadata:

<OPERATION TYPE><FIELD DELIMITER><FULLY QUALIFIED TABLE NAME><FIELD DELIMITER><OPERATION TIMESTAMP><FIELD DELIMITER><CURRENT TIMESTAMP><FIELD DELIMITER><TRAIL POSITION><FIELD DELIMITER><TOKENS><FIELD DELIMITER>

Next is the row data:

<COLUMN 1 VALUE><FIELD DELIMITER><COLUMN N VALUE><LINE DELIMITER>

Optionally, the column name can be included before each column value that changes the output format for the row data:

<COLUMN 1 NAME><FIELD DELIMITER><COLUMN 1 VALUE><COLUMN N NAME><FIELD DELIMITER><COLUMN N VALUE><LINE DELIMITER>

Operation Type - Operation type is the indicator of the type of database operation from the source trail file. Default values are I for insert, U for update, D for delete, T for truncate. Output of this field is suppressible.

Fully Qualified Table name - The fully qualified table name is the source database table include including the catalog name and schema name. The format of the fully qualified table name is CATALOG NAME.SCHEMA NAME.TABLE NAME. Output of this field is suppressible.

Operation Timestamp - The operation timestamp is the commit record timestamp from the source system. Therefore all operations in a transaction (unbatched transaction) should have the same operation timestamp. This timestamp is fixed, and the operation timestamp will be the same if the trail file is replayed. Output of this field is suppressible.

Current Timestamp - The current timestamp is a timestamp of the current time when delimited text formatter processes the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will not result in the same timestamp for the same operation. Output of this field is suppressible.

Trail Position - This is the concatenated sequence number and RBA number from the source trail file. The trail position provides traceability of the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file. Output of this field is suppressible.

Tokens - The tokens are the token key value pairs from the source trail file. The output of this field in the delimited text output is suppressed if the includeTokens configuration property on the corresponding handler is not explicitly set to true.

6.2.2 Sample Formatted Messages

The following sections contain sample messages from the Delimited Text Formatter. The default field delimiter has been changed to a pipe ("|") to more clearly display the message.

6.2.2.1 Sample Insert Message

I|GG.TCUSTORD|2013-06-02
22:14:36.000000|2015-09-18T13:23:01.612001|00000000000000001444|R=AADPkvAAEAAEqL2A
AA|WILL|1994-09-30:15:33:00|CAR|144|17520.00|3|100

6.2.2.2 Sample Update Message

U|GG.TCUSTORD|2013-06-02
22:14:41.000000|2015-09-18T13:23:01.987000|00000000000000002891|R=AADPkvAAEAAEqLzA
AA|BILL|1995-12-31:15:00:00|CAR|765|14000.00|3|100

6.2.2.3 Sample Delete Message

D|GG.TCUSTORD|2013-06-02
22:14:41.000000|2015-09-18T13:23:02.000000|00000000000000004338|L=206080450,6=9.0.
80330,R=AADPkvAAEAAEqLzAAC|DAVE|1993-11-03:07:51:35|PLANE|600|||

6.2.2.4 Sample Truncate Message

T|GG.TCUSTORD|2013-06-02
22:14:41.000000|2015-09-18T13:23:02.001000|00000000000000004515|R=AADPkvAAEAAEqL2A
AB|||||||

6.2.3 Common Pitfalls

Care should be exercised when choosing field and line delimiters. It is important to choose delimiter values that will not occur in content of the data.

The Java Adapter configuration functionality will trim out leading or trailing characters that are determined to be whitespace. Wrap the configuration value in a CDATA[] wrapper to preserve the whitespace when configuration values contain leading or trailing characters that are considered whitespace. For example a configuration value of \n should be configured as CDATA[\n].

You can search column values using regular expressions then replace matches with a specified value. This search and replace functionality can be utilized in conjunction with the Delimited Text Formatter to ensure that there are no collisions between column value contents and field and line delimiters. For more information, see Using Regular Expression Search and Replace.

6.2.4 Logging of Output Format Summary

The Java log4j logging will log a summary of the delimited text output format if INFO level logging is enabled. A summary of the delimited fields will be logged for each source table encountered and occurs when the first operation for that table is received by the Delimited Text formatter. You may find this detailed explanation of the fields of the delimited text output useful when performing an initial setup. In the case of the metadata change event, the summary of the delimited fields will be regenerated and logged again at the first operation for that table after the metadata change event.

6.2.5 Delimited Text Format Configuration

Table 6-1 Configuration Options

Parameters Optional / Required Legal Values Default Explanation
gg.handler.name.format.includeColumnNames

Optional

true | false

false

Controls the output of writing the column names as a delimited field preceding the column value. If true output is like:

COL1_Name|COL1_Value|Col2_Name|Col2_Value

If the false output is like:

COL1_Value|Col2_Value

gg.handler.name.format.includeOpTimestamp

Optional

true | false

true

A false value suppresses the output of the operation timestamp from the source trail file in the output.

gg.handler.name.format.includeCurrentTimestamp

Optional

true | false

true

A false value suppresses the output of the current timestamp in the output.

gg.handler.name.format.includeOpType

Optional

true | false

true

A false value suppresses the output of the operation type in the output.

gg.handler.name.format.insertOpKey

Optional

Any string

I

Indicator to be inserted into the output record to indicate an insert operation.

gg.handler.name.format.updateOpKey

Optional

Any string

U

Indicator to be inserted into the output record to indicate an update operation.

gg.handler.name.format.deleteOpKey

Optional

Any string

D

Indicator to be inserted into the output record to indicate a delete operation.

gg.handler.name.format.truncateOpKey

Optional

Any string

T

Indicator to be inserted into the output record to indicate a truncate operation.

gg.handler.name.format.encoding

Optional

Any encoding name or alias supported by Java.

The native system encoding of the machine hosting the Oracle GoldenGate process.

Determines the encoding of the output delimited text.

gg.handler.name.format.fieldDelimiter

Optional

Any String

ASCII 001 (the default Hive delimiter)

The delimiter used between delimited fields. This value supports CDATA[] wrapping.

gg.handler.name.format.lineDelimiter

Optional

Any String

newline (the default Hive delimiter)

The delimiter used between records. This value supports CDATA[] wrapping.

gg.handler.name.format.includeTableName

Optional

true | false

true

Use false to suppress the output of the table name in the output delimited data.

gg.handler.name.format.keyValueDelimiter

Optional

Any string

=

Provides a delimiter between keys and values in a map. Key1=value1. Tokens are mapped values. Configuration value supports CDATA[] wrapping.

gg.handler.name.format.keyValuePairDelimiter

Optional

Any string

,

Provides a delimiter between key value pairs in a map. Key1=Value1,Key2=Value2. Tokens are mapped values. Configuration value supports CDATA[] wrapping.

gg.handler.name.format.pkUpdateHandling

Optional

abend | update | delete-insert

abend

Provides configuration for how the formatter should handle update operations that change a primary key. Primary key operations can be problematic for the text formatter and require special consideration by you.

  • abend - indicates the process will abend

  • update - indicates the process will treat this as a normal update

  • delete-insert - indicates the process will treat this as a delete and an insert. Full supplemental logging needs to be enabled for this to work. Without full before and after row images the insert data will be incomplete.

gg.handler.name.format.nullValueRepresentation

Optional

Any string

NULL

Allows you to configure what will be included in the delimited output in the case of a NULL value. Configuration value supports CDATA[] wrapping.

gg.handler.name.format.missingValueRepresentation

Optional

Any string

"" (no value)

Allows you to configure what will be included in the delimited text output in the case of a missing value. Configuration value supports CDATA[] wrapping.

gg.handler.name.format.includePosition

Optional

true | false

true

Allows you to suppress the output of the operation position from the source trail file.

gg.handler.name.format.includePosition

Optional

true | false

true

Allows you to suppress the output of the operation position from the source trail file.

gg.handler.name.format.iso8601Forma

Optional

true | false

true

Controls the format of the current timestamp. The default is the ISO 8601 format. Set to false removes the “T” between the date and time in the current timestamp, which outputs “ “ instead.

6.2.6 Sample Configuration

The following is the sample configuration for the Delimited Text formatter from the Java Adapter configuration file:

gg.handler.hdfs.format.includeColumnNames=false
gg.handler.hdfs.format.includeOpTimestamp=true
gg.handler.hdfs.format.includeCurrentTimestamp=true
gg.handler.hdfs.format.insertOpKey=I
gg.handler.hdfs.format.updateOpKey=U
gg.handler.hdfs.format.deleteOpKey=D
gg.handler.hdfs.format.truncateOpKey=T
gg.handler.hdfs.format.encoding=UTF-8
gg.handler.hdfs.format.fieldDelimiter=CDATA[\u0001]
gg.handler.hdfs.format.lineDelimiter=CDATA[\n]
gg.handler.hdfs.format.includeTableName=true
gg.handler.hdfs.format.keyValueDelimiter=CDATA[=]
gg.handler.hdfs.format.kevValuePairDelimiter=CDATA[,]
gg.handler.hdfs.format.pkUpdateHandling=abend
gg.handler.hdfs.format.nullValueRepresentation=NULL
gg.handler.hdfs.format.missingValueRepresentation=CDATA[]
gg.handler.hdfs.format.includePosition=true
gg.handler.hdfs.format=delimitedtext

6.2.7 Metadata Change Events

Oracle GoldenGate for Big Data now handles metadata change events at runtime. This assumes the replicated database and upstream replication processes are propagating metadata change events.The Delimited Text Formatter will simply change the output format to accommodate the change and continue running.

However, it is important to understand that a metadata change may impact downstream applications. Delimited text formats are comprised of a fixed number of fields that are positionally relevant. Deleting a column in the source table can be handled seamlessly during Oracle GoldenGate runtime, but will result in a change in the total number of fields and potentially the positional relevance of some fields. Adding an additional column or columns is probably the least impactful metadata change event assuming the new column is added to the end. You should consider the impact of a metadata change event before executing the event. In a scenario where metadata change events will be frequent, it is recommended that you consider a more flexible and self describing format, such as JSON or XML.

6.2.8 Special Considerations

Big Data applications differ from RDBMSs in how data is stored. Update and delete operations in an RDBMS result in a change to the existing data. On the contrary, data is not changed in Big Data applications but simply appended to existing data. Therefore, the current state of a given row becomes a consolidation of all of the existing operations for that row in the HDFS system. This leads to some special scenarios.

6.2.8.1 Primary Key Updates

Primary key update operations require special consideration and planning for Big Data integrations. Primary key updates are update operations that modify one or more of the primary keys for the given row from the source database. Since data is simply appended in Big Data applications a primary key update operation looks more like a new insert than an update without any special handling. The Delimited Text formatter provides specialized handling for primary keys that is configurable to you. These are the configurable behaviors:

Table 6-2 Configurable Behavior

Value Description

abend

The default behavior is that the delimited text formatter will abend in the case of a primary key update.

update

With this configuration the primary key update will be treated just like any other update operation. This configuration alternative should only be selected if you can guarantee that the primary key that is being changed is not being used as the selection criteria when selecting row data from a Big Data system.

delete-insert

Using this configuration the primary key update is treated as a special case of a delete using the before image data and an insert using the after image data. This configuration may more accurately model the effect of a primary key update in a Big Data application. However, if this configuration is selected it is important to have full supplemental logging enabled on replication at the source database. Without full supplemental logging, the delete operation will be correct, but the insert operation will not contain all of the data for all of the columns for a full representation of the row data in the Big Data application.

6.2.8.2 Data Consolidation

As previously stated, Big Data applications simply append data to the underlying storage. Analytic tools generally spawn map reduce programs that traverse the data files and consolidate all the operations for a given row into a single output. Therefore, it is important to have an indicator of the order of operations. The Delimited Text formatter provides a number of metadata fields to fulfill this need. The operation timestamp may be sufficient to fulfill this requirement. However, two update operations may have the same operation timestamp especially if they share a common transaction. The trail position can provide a tie breaking field on the operation timestamp. Lastly, the current timestamp may provide the best indicator of order of operations in Big Data.

6.3 JSON Formatter

The JSON Formatter is an operation based formatter. It formats operation data from the source trail file into a JSON object. Each individual insert, update, delete and truncate operation will be formatted into an individual JSON message.

6.3.1 Message Formatting Details

The following two subsections detail the contents of generated JSON messages. The first section details the operation metadata and the second section details the before and after image column data values.

6.3.1.1 Operation Metadata

JSON objects generated by the JSON Formatter contain the following metadata fields at the beginning of each message:

Table 6-3 JSON metadata

Value Description

table

Contains fully qualified table name. The format of the fully qualified table name is: CATALOG NAME.SCHEMA NAME.TABLE NAME

op_type

Contains the operation type that is the indicator of the type of database operation from the source trail file. Default values are I for insert, U for update, D for delete, T for truncate.

op_ts

The operation timestamp is the timestamp of the operation from the source trail file. Since this timestamp is from the source trail it is fixed. Replaying the trail file will result in the same timestamp for the same operation.

current_ts

The current timestamp is a timestamp of the current time when delimited text formatter processes the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will not result in the same timestamp for the same operation.

pos

This is the trail file position with is the concatenated sequence number and RBA number from the source trail file. The trail position provides traceability of the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file.

primary_keys

An array variable holding the column names of the primary keys of the source table. The primary_keys field is only include in the JSON output if the includePrimaryKeys configuration property is set to true.

tokens

This member is an object whose members are the token key value pairs from the source trail file.

6.3.1.2 Operation Data

The data following the operation metadata is the operation data. This data is represented by before and after members that are objects. The objects contain members with the keys being the column names and the values being the column values.

Column values for an operation from the source trail file can have one of three states: column has a value, column value is NULL, or column value is missing. The JSON Formatter maps these column value states into the created JSON objects as follows:

  • Column has a value - The column value is output. In the following example the member STATE has a value.

        "after":{        "CUST_CODE":"BILL",        "NAME":"BILL'S USED CARS",        "CITY":"DENVER",        "STATE":"CO"    }
    
  • Column value is NULL - The default output value is a JSON NULL. In the following example the member STATE is NULL.

        "after":{        "CUST_CODE":"BILL",        "NAME":"BILL'S USED CARS",        "CITY":"DENVER",        "STATE":null    }
    
  • Column value is missing - The JSON will contain no element for a missing column value. In the following example the member STATE is missing.

        "after":{        "CUST_CODE":"BILL",        "NAME":"BILL'S USED CARS",        "CITY":"DENVER",    }
    

The default setting of the JSON Formatter is to map the data types from the source trail file to the associated JSON data type. JSON supports few data types so this functionality largely results in the mapping of numeric fields from the source trail file to members typed as numbers. This data type mapping is configurable to alternatively treat all data as strings.

6.3.2 Sample JSON Messages

The following are sample JSON messages created by the JSON Formatter for insert, update, delete, and truncate operations.

6.3.2.1 Sample Insert Message

{
    "table":"GG.TCUSTORD",
    "op_type":"I",
    "op_ts":"2013-06-02 22:14:36.000000",
    "current_ts":"2015-09-18T13:39:35.447000",
    "pos":"00000000000000001444",
    "tokens":{
        "R":"AADPkvAAEAAEqL2AAA"
    },
    "after":{
        "CUST_CODE":"WILL",
        "ORDER_DATE":"1994-09-30:15:33:00",
        "PRODUCT_CODE":"CAR",
        "ORDER_ID":"144",
        "PRODUCT_PRICE":17520.00,
        "PRODUCT_AMOUNT":3,
        "TRANSACTION_ID":"100"
    }
}

6.3.2.2 Sample Update Message

{
    "table":"GG.TCUSTORD",
    "op_type":"U",
    "op_ts":"2013-06-02 22:14:41.000000",
    "current_ts":"2015-09-18T13:39:35.748000",
    "pos":"00000000000000002891",
    "tokens":{
        "R":"AADPkvAAEAAEqLzAAA"
    },
    "before":{
        "CUST_CODE":"BILL",
        "ORDER_DATE":"1995-12-31:15:00:00",
        "PRODUCT_CODE":"CAR",
        "ORDER_ID":"765",
        "PRODUCT_PRICE":15000.00,
        "PRODUCT_AMOUNT":3,
        "TRANSACTION_ID":"100"
    },
    "after":{
        "CUST_CODE":"BILL",
        "ORDER_DATE":"1995-12-31:15:00:00",
        "PRODUCT_CODE":"CAR",
        "ORDER_ID":"765",
        "PRODUCT_PRICE":14000.00,
        "PRODUCT_AMOUNT":3,
        "TRANSACTION_ID":"100"
    }
}

6.3.2.3 Sample Delete Message

{
    "table":"GG.TCUSTORD",
    "op_type":"D",
    "op_ts":"2013-06-02 22:14:41.000000",
    "current_ts":"2015-09-18T13:39:35.766000",
    "pos":"00000000000000004338",
    "tokens":{
        "L":"206080450",
        "6":"9.0.80330",
        "R":"AADPkvAAEAAEqLzAAC"
    },
    "before":{
        "CUST_CODE":"DAVE",
        "ORDER_DATE":"1993-11-03:07:51:35",
        "PRODUCT_CODE":"PLANE",
        "ORDER_ID":"600"
    }
}

6.3.2.4 Sample Truncate Message

{
    "table":"GG.TCUSTORD",
    "op_type":"T",
    "op_ts":"2013-06-02 22:14:41.000000",
    "current_ts":"2015-09-18T13:39:35.767000",
    "pos":"00000000000000004515",
    "tokens":{
        "R":"AADPkvAAEAAEqL2AAB"
    }
}

6.3.3 JSON Schemas

By default JSON schemas are generated for each source table encountered. JSON schemas are generated on a just in time basis when an operation for that table is first encountered. A JSON schema is not required to parse a JSON object. However, many JSON parsers can use a JSON schema to perform a validating parse of a JSON object. Alternatively, you can review the JSON schemas to understand the layout of output JSON objects. The JSON schemas are created in the GoldenGate_Home/dirdef directory by default and are named by the following convention: FULLY_QUALIFIED_TABLE_NAME.schema.json. The generation of the JSON schemas is suppressible. The following is an example of a JSON schema for the JSON object listed in the previous section.

{
    "$schema":"http://json-schema.org/draft-04/schema#",
    "title":"GG.TCUSTORD",
    "description":"JSON schema for table GG.TCUSTORD",
    "definitions":{
        "row":{
            "type":"object",
            "properties":{
                "CUST_CODE":{
                    "type":[
                        "string",
                        "null"
                    ]
                },
                "ORDER_DATE":{
                    "type":[
                        "string",
                        "null"
                    ]
                },
                "PRODUCT_CODE":{
                    "type":[
                        "string",
                        "null"
                    ]
                },
                "ORDER_ID":{
                    "type":[
                        "string",
                        "null"
                    ]
                },
                "PRODUCT_PRICE":{
                    "type":[
                        "number",
                        "null"
                    ]
                },
                "PRODUCT_AMOUNT":{
                    "type":[
                        "number",
                        "null"
                    ]
                },
                "TRANSACTION_ID":{
                    "type":[
                        "string",
                        "null"
                    ]
                }
            },
            "additionalProperties":false
        },
        "tokens":{
            "type":"object",
            "description":"Token keys and values are free form key value pairs.",
            "properties":{
            },
            "additionalProperties":true
        }
    },
    "type":"object",
    "properties":{
        "table":{
            "description":"The fully qualified table name",
            "type":"string"
        },
        "op_type":{
            "description":"The operation type",
            "type":"string"
        },
        "op_ts":{
            "description":"The operation timestamp",
            "type":"string"
        },
        "current_ts":{
            "description":"The current processing timestamp",
            "type":"string"
        },
        "pos":{
            "description":"The position of the operation in the data source",
            "type":"string"
        },
        "tokens":{
            "$ref":"#/definitions/tokens"
        },
        "before":{
            "$ref":"#/definitions/row"
        },
        "after":{
            "$ref":"#/definitions/row"
        }
    },
    "required":[
        "table",
        "op_type",
        "op_ts",
        "current_ts",
        "pos"
    ],
    "additionalProperties":false
}

6.3.4 JSON Schema Configuration

Table 6-4 JSON Schema Configuration Parameters

Parameters Required/ Optional Legal Values Default Explanation

gg.handler.name.format.insertOpKey

Optional

Any string

I

Indicator to be inserted into the output record to indicate an insert operation.

gg.handler.name.format.updateOpKey

Optional

Any string

U

Indicator to be inserted into the output record to indicate an update operation.

gg.handler.name.format.deleteOpKey

Optional

Any string

D

Indicator to be inserted into the output record to indicate a delete operation.

gg.handler.name.format.truncateOpKey

Optional

Any string

T

Indicator to be inserted into the output record to indicate a truncate operation.

gg.handler.name.format.prettyPrintformat.prettyPrint

Optional

true | false

false

Controls the output format of the JSON data. True is pretty print, formatted with white space to be more easily read by humans. False is not pretty print, more compact but very difficult for humans to read.

gg.handler.name.format.jsonDelimiter

Optional

Any string

"" (no value)

Allows you to insert an optional delimiter between generated JSONs to allow them to be more easily parsed out of a continuous stream of data. Configuration value supports CDATA[] wrapping.

gg.handler.name.format.generateSchema

Optional

true | false

true

Controls the generation of JSON schemas for the generated JSON documents. JSON schemas are generated on a table by table basis. A JSON schema is not required to parse a JSON document. However, a JSON schema can provide you an indication of what the JSON documents will look like and can be used for a validating JSON parse.

gg.handler.name.format.schemaDirectory

Optional

Any legal, existing file system path

./dirdef

Controls the output location of generated JSON schemas.

gg.handler.name.format.treatAllColumnsAsStrings

Optional

true | false

false

Controls the output typing of generated JSON documents. If set to false then the formatter will attempt to map Oracle GoldenGate types to the corresponding JSON type. If set to true then all data will be treated as Strings in the generated JSONs and JSON schemas.

gg.handler.name.format.encoding

Optional

Any legal encoding name or alias supported by Java.

UTF-8 (the JSON default)

Controls the output encoding of generated JSON schemas and documents.

gg.handler.name.format.versionSchemas

Optional

true | false

false

Controls the version of created schemas. Schema versioning causes a schema with a timestamp to be created in the schema directory on the local file system every time a new schema is created. True enables schema versioning. False disables schema versioning.

gg.handler.name.format.iso8601Format

Optional

true | false

true

Controls the format of the current timestamp. The default is the  ISO 8601 format. Set to false removes the “T” between the date and time in the current timestamp, which outputs “ “ instead.

gg.handler.name.format.includePrimaryKeys

Optional

true | false

false

Set this configuration property to true to include an array of the primary key column names from the source table in the JSON output.

6.3.5 Sample Configuration

The following is sample configuration for the JSON Formatter from the Java Adapter configuration file:

gg.handler.hdfs.format=json
gg.handler.hdfs.format.insertOpKey=I
gg.handler.hdfs.format.updateOpKey=U
gg.handler.hdfs.format.deleteOpKey=D
gg.handler.hdfs.format.truncateOpKey=T
gg.handler.hdfs.format.prettyPrint=false
gg.handler.hdfs.format.jsonDelimiter=CDATA[]
gg.handler.hdfs.format.generateSchema=true
gg.handler.hdfs.format.schemaDirectory=dirdef
gg.handler.hdfs.format.treatAllColumnsAsString

6.3.6 Metadata Change Events

Metadata change events are now handled at runtime. A metadata change event for a given table will result in the regeneration of the JSON schema the next time an operation for that table is encountered. The content of created JSON messages will be changed to reflect the metadata change. For example, if the metadata change is to add an additional column, the new column will be included in created JSON messages after the metadata change event.

6.3.7 Primary Key Updates

Since the JSON models the operation data primary key updates require no special treatment are treated just as any other update. The before and after values will reflect the change in the primary key.

6.4 Avro Row Formatter

Apache Avro is an open source data serialization/deserialization framework known for its flexibility, compactness of serialized data, and good serialization/deserialization performance. Apache Avro is commonly used in Big Data applications.

The Avro Row Formatter formats operation data from the source trail file into messages in an Avro binary array format. Each individual insert, update, delete and truncate operation will be formatted into an individual Avro message. The source trail file will contain the before and after images of the operation data. The Avro Row Formatter takes that before and after image data and formats the data into an Avro binary representation of the operation data.

The Avro Row Formatter formats operations from the source trail file into a format that represents the row data. This format is more compact than the output from the Avro Operation Formatter for that the Avro messages model the change data operation.

The Avro Row Formatter may be a good choice when streaming Avro data to HDFS. Hive supports data files in HDFS in an Avro format.

6.4.1 Message Formatting Details

The following two subsections detail the contents of generated Avro row messages. The first section details the operation metadata and the second section details the column values data.

6.4.1.1 Operation Metadata

Avro messages generated by the Avro Row Formatter contain the following seven metadata fields that begin the message:

Table 6-5 Avro Formatter Metadata

Value Description

table

The fully qualified table name. The format of the fully qualified table name is: CATALOG_NAME.SCHEMA_NAME.TABLE_NAME

op_type

The operation type that is the indicator of the type of database operation from the source trail file. Default values are I for insert, U for update, D for delete, T for truncate.

op_ts

The operation timestamp is the timestamp of the operation from the source trail file. Since this timestamp is from the source trail it is fixed. Replaying the trail file will result in the same timestamp for the same operation.

current_ts

The current timestamp is the current time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will NOT result in the same timestamp for the same operation.

pos

The trail file position is the concatenated sequence number and rba number from the source trail file. The trail position provides traceability of the operation back to the source trail file. The sequence number is the source trail file number. The rba number is the offset in the trail file.

primary_keys

An array variable holding the column names of the primary keys of the source table.

tokens

A map variable holding the token key value pairs from the source trail file.

6.4.1.2 Operation Data

The data following the operation metadata is the operation data. This data is represented as individual fields identified by the column names.

Column values for an operation from the source trail file can have one of three states: column has a value, column value is NULL, or column value is missing. Avro attributes only support two states, column has a value or column value is NULL. Missing column values will be treated the same as NULL values. It is highly recommended that when using the Avro Row Formatter, you configures the Oracle GoldenGate capture process to provide full image data for all columns in the source trail file. Refer to the Oracle GoldenGate documentation for your specific RDBMS for instructions to enable this functionality.

The default setting of the Avro Row Formatter is to map the data types from the source trail file to the associated Avro data type. Avro supports few data types so this functionality largely results in the mapping of numeric fields from the source trail file to members typed as numbers. This data type mapping is configurable to alternatively treat all data as strings.

6.4.2 Sample Avro Row Messages

Avro messages are binary and therefore not human readable. For the sample messages the JSON representation of the messages are displayed here.

6.4.2.1 Sample Insert Message

{"table": "GG.TCUSTORD", 
"op_type": "I", 
"op_ts": "2013-06-02 22:14:36.000000", 
"current_ts": "2015-09-18T10:13:11.172000", 
"pos": "00000000000000001444", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], 
"tokens": {"R": "AADPkvAAEAAEqL2AAA"}, 
"CUST_CODE": "WILL", 
"ORDER_DATE": "1994-09-30:15:33:00", 
"PRODUCT_CODE": "CAR", 
"ORDER_ID": "144", 
"PRODUCT_PRICE": 17520.0, 
"PRODUCT_AMOUNT": 3.0, 
"TRANSACTION_ID": "100"}

6.4.2.2 Sample Update Message

{"table": "GG.TCUSTORD", 
"op_type": "U", 
"op_ts": "2013-06-02 22:14:41.000000", 
"current_ts": "2015-09-18T10:13:11.492000", 
"pos": "00000000000000002891", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens":
 {"R": "AADPkvAAEAAEqLzAAA"}, 
"CUST_CODE": "BILL", 
"ORDER_DATE": "1995-12-31:15:00:00", 
"PRODUCT_CODE": "CAR", 
"ORDER_ID": "765", 
"PRODUCT_PRICE": 14000.0, 
"PRODUCT_AMOUNT": 3.0, 
"TRANSACTION_ID": "100"}

6.4.2.3 Sample Delete Message

{"table": "GG.TCUSTORD",
"op_type": "D", 
"op_ts": "2013-06-02 22:14:41.000000", 
"current_ts": "2015-09-18T10:13:11.512000", 
"pos": "00000000000000004338", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens":
 {"L": "206080450", "6": "9.0.80330", "R": "AADPkvAAEAAEqLzAAC"}, "CUST_CODE":
 "DAVE", 
"ORDER_DATE": "1993-11-03:07:51:35", 
"PRODUCT_CODE": "PLANE", 
"ORDER_ID": "600", 
"PRODUCT_PRICE": null, 
"PRODUCT_AMOUNT": null, 
"TRANSACTION_ID": null}

6.4.2.4 Sample Truncate Message

{"table": "GG.TCUSTORD", 
"op_type": "T", 
"op_ts": "2013-06-02 22:14:41.000000", 
"current_ts": "2015-09-18T10:13:11.514000", 
"pos": "00000000000000004515", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens":
 {"R": "AADPkvAAEAAEqL2AAB"}, 
"CUST_CODE": null, 
"ORDER_DATE": null, 
"PRODUCT_CODE": null, 
"ORDER_ID": null, 
"PRODUCT_PRICE": null, 
"PRODUCT_AMOUNT": null, 
"TRANSACTION_ID": null}

6.4.3 Avro Schemas

Avro uses JSONs to represent schemas. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages. Schemas are generated on a JIT (just in time) basis when the first operation for a table is encountered. Generated Avro schemas are specific to a table definition that means that a separate Avro schema will be generated for every table encountered for processed operations. Avro schemas are by default written to the GoldenGate_Home/dirdef directory although the write location is configurable. Avro schema file names adhere to the following naming convention: FullyQualifiedTableName.avsc.

The following is a sample Avro schema for the Avro Row Format for the previous references examples:

{
  "type" : "record",
  "name" : "TCUSTORD",
  "namespace" : "GG",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
  }, {
    "name" : "op_type",
    "type" : "string"
  }, {
    "name" : "op_ts",
    "type" : "string"
  }, {
    "name" : "current_ts",
    "type" : "string"
  }, {
    "name" : "pos",
    "type" : "string"
  }, {
    "name" : "primary_keys",
    "type" : {
      "type" : "array",
      "items" : "string"
    }
  }, {
    "name" : "tokens",
    "type" : {
      "type" : "map",
      "values" : "string"
    },
    "default" : { }
  }, {
    "name" : "CUST_CODE",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "ORDER_DATE",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "PRODUCT_CODE",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "ORDER_ID",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "PRODUCT_PRICE",
    "type" : [ "null", "double" ],
    "default" : null
  }, {
    "name" : "PRODUCT_AMOUNT",
    "type" : [ "null", "double" ],
    "default" : null
  }, {
    "name" : "TRANSACTION_ID",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}

6.4.4 Avro Row Configuration

Table 6-6 Avro Row Configuration Options

Parameter Optional/ Required Legal Values Default Explanation

gg.handler.name.format.insertOpKey

Optional

Any string

I

Indicator to be inserted into the output record to indicate an insert operation.

gg.handler.name.format.updateOpKey

Optional

Any string

U

Indicator to be inserted into the output record to indicate an update operation.

gg.handler.name.format.deleteOpKey

Optional

Any string

D

Indicator to be inserted into the output record to indicate a delete operation.

gg.handler.name.format.truncateOpKey

Optional

Any string

T

Indicator to be inserted into the output record to indicate a truncate operation.

gg.handler.name.format.encoding

Optional

Any legal encoding name or alias supported by Java.

UTF-8 (the JSON default)

Controls the output encoding of generated Avro schema that is a JSON. The JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding.

gg.handler.name.format.treatAllColumnsAsStrings

Optional

true | false

false

Controls the output typing of generated Avro messages. If set to false then the formatter will attempt to map Oracle GoldenGate types to the corresponding AVRO type. If set to true then all data will be treated as Strings in the generated Avro messages and schemas.

gg.handler.name.format.pkUpdateHandlingformat.pkUpdateHandling

Optional

abend | update | delete-insert

abend

Provides configuration for how the formatter should handle update operations that change a primary key. Primary key operations can be problematic for the Avro Row formatter and require special consideration by you.

  • abend - indicates the process will abend.

  • update - indicates the process will treat this as a normal update.

  • delete-insert - indicates the process will treat this as a delete and an insert. Full supplemental logging needs to be enabled for this to work. Without full before and after row images the insert data will be incomplete.

gg.handler.name.format.lineDelimiter

Optional

Any string

no value

Optionally allows a user to insert a delimiter after each Avro message. This is not considered the best practice but in certain use cases customers may wish to parse a stream of data and extract individual Avro messages from the stream. This property allows the customer that option. Select a unique delimiter that cannot occur in any Avro message. This property supports CDATA[] wrapping.

gg.handler.name.format.versionSchemas

Optional

true|false

false

The created Avro schemas always follow the convention {fully qualified table name}.avsc. Setting this property to true creates an additional Avro schema in the schema directory named {fully qualified table name}_{current timestamp}.avsc. The additional Avro schema does not get destroyed or removed and thereby provides a history of schema evolution.

gg.handler.name.format.wrapMessageInGenericAvroSchema

Optional

true|false

false

Provides functionality to wrap the Avro messages for operations from the source trail file in a generic Avro wrapper message. For more information, see Generic Wrapper Functionality.

gg.handler.name.format.schemaDirectory

Optional

Any legal, existing file system path.

./dirdef

Controls the output location of generated Avro schemas.

gg.handler.name.schemaFilePath=

Optional

Any legal encoding name or alias supported by Java.

./dirdef

Controls the configuration property to a file directory inside of HDFS where you want schemas to be output. A metadata change event causes the schema to be overwritten when the next operation for the associated table is encountered. Schemas follow the same naming convention as schemas written to the local file system, catalog.schema.table.avsc.

gg.handler.name.format.iso8601Format

Optional

true | false

true

Controls the format of the current timestamp. The default is the  ISO 8601 format. Set to false removes the “T” between the date and time in the current timestamp, which outputs “ “ instead.

6.4.5 Sample Configuration

The following is sample configuration for the Avro Row Formatter from the Java Adapter properties file:

gg.handler.hdfs.format=avro_row
gg.handler.hdfs.format.insertOpKey=I
gg.handler.hdfs.format.updateOpKey=U
gg.handler.hdfs.format.deleteOpKey=D
gg.handler.hdfs.format.truncateOpKey=T
gg.handler.hdfs.format.encoding=UTF-8
gg.handler.hdfs.format.pkUpdateHandling=abend
gg.handler.hafs.format.wrapMessageInGenericAvroMessage=false

6.4.6 Metadata Change Events

The Avro Row Formatter is capable of taking action in the case of a metadata change event. This assumes that the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events. Metadata change events are of particular importance when formatting using Avro due to the tight dependency of Avro messages to its corresponding schema.

Metadata change events are handled seamlessly by the Avro Row Formatter and an updated Avro schema will be generated upon the first encounter of an operation of that table after the metadata change event. You should understand the impact of a metadata change event and change downstream targets to the new Avro schema. The tight dependency of Avro messages to Avro schemas may result in compatibility issues. Avro messages generated before the schema change may not be able to be deserialized with the newly generated Avro schema.

Conversely, Avro messages generated after the schema change may not be able to be deserialized with the previous Avro schema. It is a best practice to use the same version of the Avro schema that was used to generate the message. Consult the Apache Avro documentation for more details.

6.4.7 Special Considerations

This sections describes special considerations:

6.4.7.1 Troubleshooting

Avro is a binary format therefore is not human readable. Since Avro messages are in binary format, it will be difficult to debug any issue. The Avro Row Formatter provides a special feature to mitigate this issue. When the log4j Java logging level is set to TRACE the created Avro messages will be deserialized and displayed in the log file as a JSON object. This allows you to view the structure and contents of the created Avro messages. TRACE should never be enabled in a production environment as it has substantial negative impact on performance. Alternatively, you may want to consider switching to use a formatter that produces human readable content for content troubleshooting. The XML or JSON formatters both produce content in human readable format that may facilitate troubleshooting.

6.4.7.2 Primary Key Updates

Primary key update operations require special consideration and planning for Big Data integrations. Primary key updates are update operations that modify one or more of the primary keys for the given row from the source database. Since data is simply appended in Big Data applications a primary key update operation looks more like a new insert than an update without any special handling. The Avro Row Formatter provides specialized handling for primary keys that is configurable by you. These are the configurable behaviors:

Table 6-7 Configurable behavior

Value Description

abend

The default behavior is that the delimited text formatter will abend in the case of a primary key update.

update

With this configuration the primary key update will be treated just like any other update operation. This configuration alternative should only be selected if you can guarantee that the primary key that is being changed is not being used as the selection criteria when selecting row data from a Big Data system.

delete-insert

Using this configuration the primary key update is treated as a special case of a delete using the before image data and an insert using the after image data. This configuration may more accurately model the effect of a primary key update in a Big Data application. However, if this configuration is selected it is important to have full supplemental logging enabled on Replication at the source database. Without full supplemental logging the delete operation will be correct, however, the insert operation will not contain all of the data for all of the columns for a full representation of the row data in the Big Data application.

6.4.7.3 Generic Wrapper Functionality

Avro messages are not self describing, which means that the receiver of the message must know the schema associated with the message before the message can be deserialized. Avro messages are binary and provide no consistent or reliable way to inspect the message contents in order to ascertain the message type. Therefore, Avro can be especially troublesome when messages are interlaced into a single stream of data like Kafka.

The Avro formatter provides a special feature to wrap the Avro message in a generic Avro message. This functionality is enabled by setting the following configuration parameter.

gg.handler.name.formatter.wrapMessageInGenericAvroMessage=true

The generic message is Avro message wrapping the Avro payload message that is common to all Avro messages that are output. The schema for the generic message is name generic_wrapper.avsc and is written to the output schema directory. This message has the following three fields.

  • table_name - The fully qualified source table name.

  • schema_hash - The hash code of the Avro schema generating the message.

  • payload - The wrapped Avro message.

The following is the Avro Formatter generic wrapper schema.

{
  "type" : "record",
  "name" : "generic_wrapper",
  "namespace" : "oracle.goldengate",
  "fields" : [ {
    "name" : "table_name",
    "type" : "string"
  }, {
    "name" : "schema_hash",
    "type" : "int"
  }, {
    "name" : "payload",
    "type" : "bytes"
  } ]
}

6.5 Avro Operation Formatter

Apache Avro is an open source data serialization/deserialization framework known for its flexibility, compactness of serialized data, and good serialization/deserialization performance. Apache Avro is commonly used in Big Data applications.

The Avro Operation Formatter formats operation data from the source trail file into messages in an Avro binary array format. Each individual insert, update, delete and truncate operation will be formatted into an individual Avro message. The source trail file will contain the before and after images of the operation data. The Avro Operation Formatter takes that before and after image data and formats the data into an Avro binary representation of the operation data.

The Avro Operation Formatter formats operations from the source trail file into a format that represents the operation data. This format is more verbose than the output from the Avro Row Formatter for which the Avro messages model the row data.

6.5.1 Message Formatting Details

This section contains following topics:

6.5.1.1 Operation Metadata

Avro messages, generated by the Avro Operation Formatter, contain the following metadata fields that begin the message:

Table 6-8 Avro Messages and its Metadata

Fields Description

table

CATALOG_NAME.SCHEMA NAME.TABLE NAMEThe fully qualified table name. The format of the fully qualified table name is the following:

op_type

The operation type that is the indicator of the type of database operation from the source trail file. Default values are "I" for insert, "U" for update, "D" for delete, "T" for truncate.

op_ts

The operation timestamp is the timestamp of the operation from the source trail file. Since this timestamp is from the source trail it is fixed. Replaying the trail file will result in the same timestamp for the same operation.

current_ts

The current timestamp is the current time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will NOT result in the same timestamp for the same operation.

pos

The trail file position with is the concatenated sequence number and rba number from the source trail file. The trail position provides traceability of the operation back to the source trail file. The sequence number is the source trail file number. The rba number is the offset in the trail file.

primary_keys

An array variable holding the column names of the primary keys of the source table.

tokens

A map variable holding the token key value pairs from the source trail file.

6.5.1.2 Operation Data

The operation data is represented as individual fields identified by the column names.

Column values for an operation from the source trail file can have one of three states: column has a value, column value is NULL, or column value is missing. Avro attributes only support two states, column has a value or column value is NULL. The Avro Operation Formatter contains an additional Boolean field for each column as a special indicator if the column value is missing or not. This special Boolean field is name COLUMN_NAME_isMissing. Using the combination of the COLUMN_NAME field, all three states can be defined.

  • State 1: Column has a value

    COLUMN_NAME field has a value

    COLUMN_NAME_isMissing field is false

  • State 2: Column value is NULL

    COLUMN_NAME field value is NULL

    COLUMN_NAME_isMissing field is false

  • State 3: Column value is missing

    COLUMN_NAME field value is NULL

    COLUMN_NAME_isMissing field is true

The default setting of the Avro Row Formatter is to map the data types from the source trail file to the associated Avro data type. Avro supports few data types so this functionality largely results in the mapping of numeric fields from the source trail file to members typed as numbers. This data type mapping is configurable to alternatively treat all data as strings.

6.5.2 Sample Avro Operation Messages

Avro messages are binary and therefore not human readable. Sample messages the JSON representation of the messages displayed:

6.5.2.1 Sample Insert Message

{"table": "GG.TCUSTORD",
"op_type": "I", 
"op_ts": "2013-06-02 22:14:36.000000", 
"current_ts": "2015-09-18T10:17:49.570000", 
"pos": "00000000000000001444", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens":
 {"R": "AADPkvAAEAAEqL2AAA"}, 
"before": null, 
"after": {
"CUST_CODE": "WILL", 
"CUST_CODE_isMissing": false, 
"ORDER_DATE": "1994-09-30:15:33:00", 
"ORDER_DATE_isMissing": false, 
"PRODUCT_CODE": "CAR", 
"PRODUCT_CODE_isMissing": false, 
"ORDER_ID": "144", "ORDER_ID_isMissing": false, 
"PRODUCT_PRICE": 17520.0, 
"PRODUCT_PRICE_isMissing": false, 
"PRODUCT_AMOUNT": 3.0, "PRODUCT_AMOUNT_isMissing": false, 
"TRANSACTION_ID": "100", 
"TRANSACTION_ID_isMissing": false}}

6.5.2.2 Sample Update Message

{"table": "GG.TCUSTORD", 
"op_type": "U", 
"op_ts": "2013-06-02 22:14:41.000000", 
"current_ts": "2015-09-18T10:17:49.880000", 
"pos": "00000000000000002891", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens":
 {"R": "AADPkvAAEAAEqLzAAA"}, 
"before": {
"CUST_CODE": "BILL", 
"CUST_CODE_isMissing": false, 
"ORDER_DATE": "1995-12-31:15:00:00", 
"ORDER_DATE_isMissing": false, 
"PRODUCT_CODE": "CAR", 
"PRODUCT_CODE_isMissing": false, 
"ORDER_ID": "765", 
"ORDER_ID_isMissing": false, 
"PRODUCT_PRICE": 15000.0, 
"PRODUCT_PRICE_isMissing": false, 
"PRODUCT_AMOUNT": 3.0, 
"PRODUCT_AMOUNT_isMissing": false, 
"TRANSACTION_ID": "100", 
"TRANSACTION_ID_isMissing": false}, 
"after": {
"CUST_CODE": "BILL", 
"CUST_CODE_isMissing": false, 
"ORDER_DATE": "1995-12-31:15:00:00", 
"ORDER_DATE_isMissing": false, 
"PRODUCT_CODE": "CAR", 
"PRODUCT_CODE_isMissing": false, 
"ORDER_ID": "765", 
"ORDER_ID_isMissing": false, 
"PRODUCT_PRICE": 14000.0, 
"PRODUCT_PRICE_isMissing": false, 
"PRODUCT_AMOUNT": 3.0, 
"PRODUCT_AMOUNT_isMissing": false, 
"TRANSACTION_ID": "100", 
"TRANSACTION_ID_isMissing": false}}

6.5.2.3 Sample Delete Message

{"table": "GG.TCUSTORD", 
"op_type": "D", 
"op_ts": "2013-06-02 22:14:41.000000", 
"current_ts": "2015-09-18T10:17:49.899000", 
"pos": "00000000000000004338", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens":
 {"L": "206080450", "6": "9.0.80330", "R": "AADPkvAAEAAEqLzAAC"}, "before": {
"CUST_CODE": "DAVE", 
"CUST_CODE_isMissing": false, 
"ORDER_DATE": "1993-11-03:07:51:35", 
"ORDER_DATE_isMissing": false, 
"PRODUCT_CODE": "PLANE", 
"PRODUCT_CODE_isMissing": false, 
"ORDER_ID": "600", 
"ORDER_ID_isMissing": false, 
"PRODUCT_PRICE": null, 
"PRODUCT_PRICE_isMissing": true, 
"PRODUCT_AMOUNT": null, 
"PRODUCT_AMOUNT_isMissing": true, 
"TRANSACTION_ID": null, 
"TRANSACTION_ID_isMissing": true}, 
"after": null}

6.5.2.4 Sample Truncate Message

{"table": "GG.TCUSTORD", 
"op_type": "T", 
"op_ts": "2013-06-02 22:14:41.000000", 
"current_ts": "2015-09-18T10:17:49.900000", 
"pos": "00000000000000004515", 
"primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens":
 {"R": "AADPkvAAEAAEqL2AAB"}, 
"before": null, 
"after": null}

6.5.3 Avro Schema

Avro schemas are represented as JSONs. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages. Avro schemas are generated on a JIT (just in time) basis when the first operation for a table is encountered. Avro schemas are specific to a table definition, which means that a separate Avro schema will be generated for every table encountered for processed operations. Avro schemas are by default written to the GoldenGate_Home/dirdef directory although the write location is configurable. Avro schema file names adhere to the following naming convention: FullyQualifiedTableName.avsc directory although the write location is configurable. Avro schema file names adhere to the following naming convention: .

The following is a sample Avro schema for the Avro Operation Format for the previous references examples:

{
  "type" : "record",
  "name" : "TCUSTORD",
  "namespace" : "GG",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
  }, {
    "name" : "op_type",
    "type" : "string"
  }, {
    "name" : "op_ts",
    "type" : "string"
  }, {
    "name" : "current_ts",
    "type" : "string"
  }, {
    "name" : "pos",
    "type" : "string"
  }, {
    "name" : "primary_keys",
    "type" : {
      "type" : "array",
      "items" : "string"
    }
  }, {
    "name" : "tokens",
    "type" : {
      "type" : "map",
      "values" : "string"
    },
    "default" : { }
  }, {
    "name" : "before",
    "type" : [ "null", {
      "type" : "record",
      "name" : "columns",
      "fields" : [ {
        "name" : "CUST_CODE",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "CUST_CODE_isMissing",
        "type" : "boolean"
      }, {
        "name" : "ORDER_DATE",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "ORDER_DATE_isMissing",
        "type" : "boolean"
      }, {
        "name" : "PRODUCT_CODE",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "PRODUCT_CODE_isMissing",
        "type" : "boolean"
      }, {
        "name" : "ORDER_ID",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "ORDER_ID_isMissing",
        "type" : "boolean"
      }, {
        "name" : "PRODUCT_PRICE",
        "type" : [ "null", "double" ],
        "default" : null
      }, {
        "name" : "PRODUCT_PRICE_isMissing",
        "type" : "boolean"
      }, {
        "name" : "PRODUCT_AMOUNT",
        "type" : [ "null", "double" ],
        "default" : null
      }, {
        "name" : "PRODUCT_AMOUNT_isMissing",
        "type" : "boolean"
      }, {
        "name" : "TRANSACTION_ID",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "TRANSACTION_ID_isMissing",
        "type" : "boolean"
      } ]
    } ],
    "default" : null
  }, {
    "name" : "after",
    "type" : [ "null", "columns" ],
    "default" : null
  } ]
}

6.5.4 Avro Operation Formatter Configuration

Table 6-9 Configuration Options

Properties Optional Y/N Legal Values Default Explanation

gg.handler.name.format.insertOpKey

Optional

Any string

I

Indicator to be inserted into the output record to indicate an insert operation

gg.handler.name.format.updateOpKey

Optional

Any string

U

Indicator to be inserted into the output record to indicate an update operation.

gg.handler.name.format.deleteOpKey

Optional

Any string

D

Indicator to be inserted into the output record to indicate a delete operation.

gg.handler.name.format.truncateOpKey

Optional

Any string

T

Indicator to be inserted into the output record to indicate a truncate operation.

gg.handler.name.format.encoding

Optional

Any legal encoding name or alias supported by Java

UTF-8 (the JSON default)

Controls the output encoding of generated Avro schema that is a JSON. JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding.

gg.handler.name.format.treatAllColumnsAsStrings

Optional

true | false

false

Controls the output typing of generated Avro messages. If set to false then the formatter will attempt to map Oracle GoldenGate types to the corresponding Avro type. If set to true then all data will be treated as Strings in the generated Avro messages and schemas.

gg.handler.name.format.lineDelimiter

Optional

Any string

no value

Optionally allows a user to insert a delimiter after each Avro message. This is not considered the best practice but in certain use cases customers may wish to parse a stream of data and extract individual Avro messages from the stream. This property allows the customer that option. Select a unique delimiter that cannot occur in any Avro message. This property supports CDATA[] wrapping.

gg.handler.name.format.schemaDirectory

Optional

Any legal, existing file system path.

./dirdef

Controls the output location of generated Avro schemas.

gg.handler.name.format.wrapMessageInGenericAvroSchema

Optional

true|false

false

Provides functionality to wrap the Avro messages for operations from the source trail file in a generic Avro wrapper message. For more information, see Generic Wrapper Functionality.

gg.handler.name.format.iso8601Format

Optional

true | false

true

Controls the format of the current timestamp. The default is the  ISO 8601 format. Set to false removes the “T” between the date and time in the current timestamp, which outputs “ “ instead.

6.5.5 Sample Configuration

The following is a sample configuration for the Avro Operation Formatter from the Java Adapter properg.handlerties file:

gg.hdfs.format=avro_row
gg.handler.hdfs.format.insertOpKey=I
gg.handler.hdfs.format.updateOpKey=U
gg.handler.hdfs.format.deleteOpKey=D
gg.handler.hdfs.format.truncateOpKey=T
gg.handler.hdfs.format.encoding=UTF-8
gg.handler.hafs.format.wrapMessageInGenericAvroMessage=false

6.5.6 Metadata Change Events

The Avro Operation Formatter is capable of taking action in the case of a metadata change event. This assumes that the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events. Metadata change events are of particular importance when formatting using Avro due to the tight dependency of Avro messages to its corresponding schema. Metadata change events are handled seamlessly by the Avro Operation Formatter and an updated Avro schema will be generated upon the first encounter of an operation of that table after the metadata change event. You should understand the impact of a metadata change event and change downstream targets to the new Avro schema. The tight dependency of Avro messages to Avro schemas may result in compatibility issues. Avro messages generated before the schema change may not be able to be deserialized with the newly generated Avro schema. Conversely, Avro messages generated after the schema change may not be able to be deserialized with the previous Avro schema. It is a best practice to use the same version of the Avro schema that was used to generate the message. Consult the Apache Avro documentation for more details.

6.5.7 Special Considerations

This section describes the special considerations:

6.5.7.1 Troubleshooting

Avro is a binary format therefore is not human readable. Since Avro messages are in binary format, it will be difficult to debug any issues. When the log4j Java logging level is set to TRACE the created Avro messages will be deserialized and displayed in the log file as a JSON object. This allows you to view the structure and contents of the created Avro messages. TRACE should never be enabled in a production environment as it has a substantial impact on performance.

6.5.7.2 Primary Key Updates

The Avro Operation Formatter creates messages with complete data of before and after images for update operations. Therefore, the Avro Operation Formatter requires no special treatment for primary key updates.

6.5.7.3 Generic Wrapper Message

Avro messages are not self describing, which means the receiver of the message must know the schema associated with the message before the message can be deserialized. Avro messages are binary and provide no consistent or reliable way to inspect the message contents in order to ascertain the message type. Therefore, Avro can be especially troublesome when messages are interlaced into a single stream of data like Kafka.

The Avro formatter provides a special feature to wrap the Avro message in a generic Avro message. This functionality is enabled by setting the following configuration parameter.

gg.handler.name.format.wrapMessageInGenericAvroMessage=true

The generic message is Avro message wrapping the Avro payload message that is common to all Avro messages that are output. The schema for the generic message is name generic_wrapper.avsc and is written to the output schema directory. This message has the following three fields.

  • table_name - The fully qualified source table name.

  • schema_hash - The hash code of the Avro schema generating the message.

  • payload - The wrapped Avro message.

The following is the Avro Formatter generic wrapper schema:

{
  "type" : "record",
  "name" : "generic_wrapper",
  "namespace" : "oracle.goldengate",
  "fields" : [ {
    "name" : "table_name",
    "type" : "string"
  }, {
    "name" : "schema_hash",
    "type" : "int"
  }, {
    "name" : "payload",
    "type" : "bytes"
  } ]
}
]
}

6.6 Avro Object Container File Formatter

Oracle GoldenGate for Big Data can write to HDFS in Avro Object Container File (OCF) format. Using Avro OCF is a good choice for data formatting into HDFS because it handles schema evolution more efficiently than other formats. Compression and decompression is also supported in the Avro OCF Formatter to allow more efficient use of disk space.

The HDFS Handler integration with the Avro formatters to write files to HDFS in Avro OCF format is a specialized use case of the HDFS Handler. The Avro OCF format is required for Hive to be able to read Avro data in HDFS. The Avro OCF format is detailed in the Avro specification.

http://avro.apache.org/docs/current/spec.html#Object+Container+Files

Another important feature is that the HDFS Handler can be configured to stream data in Avro OCF format, generate table definitions in Hive, and update table definitions in Hive in the case of a metadata change event.

6.6.1 Avro OCF Formatter Configuration

Parameter Optional / Required Legal Values Default Explanation

gg.handler.name.format.insertOpKey

Optional

Any string

I

Indicator to be inserted into the output record to indicate an insert operation.

gg.handler.name.format.updateOpKey

Optional

Any string

U

Indicator to be inserted into the output record to indicate an update operation.

gg.handler.name.format.truncateOpKey

Optional

Any string

T

Indicator to be truncated into the output record to indicate a truncate operation.

gg.handler.name.format.deleteOpKey

Optional

Any string

D

Indicator to be inserted into the output record to indicate a truncate operation.

gg.handler.name.format.encoding

Optional

Any legal encoding name or alias supported by Java.

UTF-8

Controls the output encoding of generated Avro schema, which is a JSON. JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding.

gg.handler.name.format.treatAllColumnsAsStrings

Optional

true | false

false

Controls the output typing of generated Avro messages. If set to false, then the formatter attempts to map Oracle GoldenGate types to the corresponding Avro type. If set to true, then all data is treated as strings in the generated Avro messages and schemas.

gg.handler.name.format.pkUpdateHandling

Optional

abend | update | delete-insert

abend

Controls how the formatter should handle update operations that change a primary key. Primary key operations can be problematic for the Avro Row formatter and require special consideration by you.

  • abend - indicates the process will abend

  • update - indicates the process will treat this as a normal update

  • delete-insert - indicates the process will treat this as a delete and an insert. The full before image is required for this feature to work properly. This can be achieved by using full supplemental logging in Oracle. Without full before and after row images the insert data will be incomplete.

gg.handler.name.format.generateSchema

Optional

true | false

true

Schemas must be generated for Avro serialization so this property can be set to false to suppress the writing of the generated schemas to the local file system.

gg.handler.name.format.schemaDirectory

Optional

Any legal, existing file system path

./dirdef

Controls the output location of generated Avro schemas to the local file system. This property does not control where the Avro schema is written to in HDFS; that is controlled by an HDFS Handler property.

gg.handler.name.format.iso8601Format

Optional

true | false

true

The default format for the current timestamp is ISO8601. Set to false to remove the T between the date and time in the current timestamp and output a space instead.

gg.handler.name.format.versionSchemas

Optional

true | false

false

If set to true, an Avro schema is created in the schema directory and versioned by a time stamp. The format of the schema is the following:

{fully qualified table name}_{time stamp}.avsc

6.7 XML Formatter

The XML Formatter formats operation data from the source trail file into a XML documents. The XML Formatter takes that before and after image data and formats the data into an XML document representation of the operation data. The format of the XML document is effectively the same as the XML format in the previous releases of the Oracle GoldenGate Java Adapter product.

6.7.1 Message Formatting Details

The XML formatted messages contain the following information:

Table 6-10 XML formatting details

Value Description

table

The fully qualified table name.

type

The operation type.

current_ts

The current timestamp is the time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes micro second precision. Replaying the trail file does not result in the same timestamp for the same operation.

pos

The position from the source trail file.

numCols

The total number of columns in the source table.

col

The col element is a repeating element that contains the before and after images of operation data.

tokens

The tokens element contains the token values from the source trail file.

6.7.2 Sample XML Messages

This sections provides sample XML messages.

6.7.2.1 Sample Insert Message

<?xml version='1.0' encoding='UTF-8'?>
<operation table='GG.TCUSTORD' type='I' ts='2013-06-02 22:14:36.000000' current_ts='2015-10-06T12:21:50.100001' pos='00000000000000001444' numCols='7'>
 <col name='CUST_CODE' index='0'>
   <before missing='true'/>
   <after><![CDATA[WILL]]></after>
 </col>
 <col name='ORDER_DATE' index='1'>
   <before missing='true'/>
   <after><![CDATA[1994-09-30:15:33:00]]></after>
 </col>
 <col name='PRODUCT_CODE' index='2'>
   <before missing='true'/>
   <after><![CDATA[CAR]]></after>
 </col>
 <col name='ORDER_ID' index='3'>
   <before missing='true'/>
   <after><![CDATA[144]]></after>
 </col>
 <col name='PRODUCT_PRICE' index='4'>
   <before missing='true'/>
   <after><![CDATA[17520.00]]></after>
 </col>
 <col name='PRODUCT_AMOUNT' index='5'>
   <before missing='true'/>
   <after><![CDATA[3]]></after>
 </col>
 <col name='TRANSACTION_ID' index='6'>
   <before missing='true'/>
   <after><![CDATA[100]]></after>
 </col>
 <tokens>
   <token>
     <Name><![CDATA[R]]></Name>
     <Value><![CDATA[AADPkvAAEAAEqL2AAA]]></Value>
   </token>
 </tokens>
</operation>

6.7.2.2 Sample Update Message

<?xml version='1.0' encoding='UTF-8'?>
<operation table='GG.TCUSTORD' type='U' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.413000' pos='00000000000000002891' numCols='7'>
 <col name='CUST_CODE' index='0'>
   <before><![CDATA[BILL]]></before>
   <after><![CDATA[BILL]]></after>
 </col>
 <col name='ORDER_DATE' index='1'>
   <before><![CDATA[1995-12-31:15:00:00]]></before>
   <after><![CDATA[1995-12-31:15:00:00]]></after>
 </col>
 <col name='PRODUCT_CODE' index='2'>
   <before><![CDATA[CAR]]></before>
   <after><![CDATA[CAR]]></after>
 </col>
 <col name='ORDER_ID' index='3'>
   <before><![CDATA[765]]></before>
   <after><![CDATA[765]]></after>
 </col>
 <col name='PRODUCT_PRICE' index='4'>
   <before><![CDATA[15000.00]]></before>
   <after><![CDATA[14000.00]]></after>
 </col>
 <col name='PRODUCT_AMOUNT' index='5'>
   <before><![CDATA[3]]></before>
   <after><![CDATA[3]]></after>
 </col>
 <col name='TRANSACTION_ID' index='6'>
   <before><![CDATA[100]]></before>
   <after><![CDATA[100]]></after>
 </col>
 <tokens>
   <token>
     <Name><![CDATA[R]]></Name>
     <Value><![CDATA[AADPkvAAEAAEqLzAAA]]></Value>
   </token>
 </tokens>
</operation>

6.7.2.3 Sample Delete Message

<?xml version='1.0' encoding='UTF-8'?>
<operation table='GG.TCUSTORD' type='D' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.415000' pos='00000000000000004338' numCols='7'>
 <col name='CUST_CODE' index='0'>
   <before><![CDATA[DAVE]]></before>
   <after missing='true'/>
 </col>
 <col name='ORDER_DATE' index='1'>
   <before><![CDATA[1993-11-03:07:51:35]]></before>
   <after missing='true'/>
 </col>
 <col name='PRODUCT_CODE' index='2'>
   <before><![CDATA[PLANE]]></before>
   <after missing='true'/>
 </col>
 <col name='ORDER_ID' index='3'>
   <before><![CDATA[600]]></before>
   <after missing='true'/>
 </col>
 <col name='PRODUCT_PRICE' index='4'>
  <missing/>
 </col>
 <col name='PRODUCT_AMOUNT' index='5'>
  <missing/>
 </col>
 <col name='TRANSACTION_ID' index='6'>
  <missing/>
 </col>
 <tokens>
   <token>
     <Name><![CDATA[L]]></Name>
     <Value><![CDATA[206080450]]></Value>
   </token>
   <token>
     <Name><![CDATA[6]]></Name>
     <Value><![CDATA[9.0.80330]]></Value>
   </token>
   <token>
     <Name><![CDATA[R]]></Name>
     <Value><![CDATA[AADPkvAAEAAEqLzAAC]]></Value>
   </token>
 </tokens>
</operation>

6.7.2.4 Sample Truncate Message

<?xml version='1.0' encoding='UTF-8'?>
<operation table='GG.TCUSTORD' type='T' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.415001' pos='00000000000000004515' numCols='7'>
 <col name='CUST_CODE' index='0'>
   <missing/> 
 </col>
 <col name='ORDER_DATE' index='1'>
   <missing/> 
 </col>
 <col name='PRODUCT_CODE' index='2'>
   <missing/> 
 </col>
 <col name='ORDER_ID' index='3'>
   <missing/> 
 </col>
 <col name='PRODUCT_PRICE' index='4'>
  <missing/>
 </col>
 <col name='PRODUCT_AMOUNT' index='5'>
  <missing/>
 </col>
 <col name='TRANSACTION_ID' index='6'>
  <missing/>
 </col>
 <tokens>
   <token>
     <Name><![CDATA[R]]></Name>
     <Value><![CDATA[AADPkvAAEAAEqL2AAB]]></Value>
   </token>
 </tokens>
</operation>

6.7.3 XML Schema

An XML schema (XSD) is not generated as part of the XML Formatter functionality. The XSD is generic to any and all messages generated by the XML Formatter. An XSD defining the structure of output XML documents is defined as follows:

<xs:schema attributeFormDefault="unqualified" 
elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="operation">
     <xs:complexType>
       <xs:sequence>
          <xs:element name="col" maxOccurs="unbounded" minOccurs="0">
           <xs:complexType>
             <xs:sequence>
               <xs:element name="before" minOccurs="0">
                 <xs:complexType>
                   <xs:simpleContent>
                     <xs:extension base="xs:string">
                       <xs:attribute type="xs:string" name="missing" 
use="optional"/>
                     </xs:extension>
                   </xs:simpleContent>
                 </xs:complexType>
               </xs:element>
               <xs:element name="after" minOccurs="0">
                 <xs:complexType>
                   <xs:simpleContent>
                     <xs:extension base="xs:string">
                       <xs:attribute type="xs:string" name="missing" 
use="optional"/>
                     </xs:extension>
                   </xs:simpleContent>
                 </xs:complexType>
               </xs:element>
               <xs:element type="xs:string" name="missing" minOccurs="0"/>
             </xs:sequence>
             <xs:attribute type="xs:string" name="name"/>
             <xs:attribute type="xs:short" name="index"/>
           </xs:complexType>
         </xs:element>
         <xs:element name="tokens" minOccurs="0">
           <xs:complexType>
             <xs:sequence>
               <xs:element name="token" maxOccurs="unbounded" minOccurs="0">
                 <xs:complexType>
                   <xs:sequence>
                     <xs:element type="xs:string" name="Name"/>
                     <xs:element type="xs:string" name="Value"/>
                   </xs:sequence>
                 </xs:complexType>
               </xs:element>
             </xs:sequence>
           </xs:complexType>
         </xs:element>
       </xs:sequence>
       <xs:attribute type="xs:string" name="table"/>
       <xs:attribute type="xs:string" name="type"/>
       <xs:attribute type="xs:string" name="ts"/>
       <xs:attribute type="xs:dateTime" name="current_ts"/>
       <xs:attribute type="xs:long" name="pos"/>
       <xs:attribute type="xs:short" name="numCols"/>
     </xs:complexType>
   </xs:element>
</xs:schema>

6.7.4 XML Configuration

Table 6-11 Configuration Options

Parameter Optional Y/N Legal Values Default Explanation

gg.handler.name.format.insertOpKey

Optional

Any string

I

Indicator to be inserted into the output record to indicate an insert operation.

gg.handler.name.format.updateOpKey

Optional

Any string

U

Indicator to be inserted into the output record to indicate an update operation.

gg.handler.name.format.deleteOpKey

Optional

Any string

D

Indicator to be inserted into the output record to indicate a delete operation.

gg.handler.name.format.truncateOpKey

Optional

Any string

T

Indicator to be inserted into the output record to indicate a truncate operation.

gg.handler.name.format.encoding

Optional

Any legal encoding name or alias supported by Java.

UTF-8 (the XML default)

Controls the output encoding of generated XML documents.

gg.handler.name.format.includeProlog

Optional

true | false

false

Controls the output of an XML prolog on generated XML documents. The XML prolog is optional for well formed XML. Sample XML prolog looks like <?xml version='1.0' encoding='UTF-8'?>

gg.handler.name.format.iso8601Format

Optional

true | false

true

Controls the format of the current timestamp in the XML message. Set to false to suppress the "T" between the date and time and instead include blank space.

6.7.5 Sample Configuration

The following is sample configuration for the XML Formatter from the Java Adapter properties file:

gg.handler.hdfs.format=xml
gg.handler.hdfs.format.insertOpKey=I
gg.handler.hdfs.format.updateOpKey=U
gg.handler.hdfs.format.deleteOpKey=D
gg.handler.hdfs.format.truncateOpKey=T
gg.handler.hdfs.format.encoding=ISO-8859-1
gg.handler.hdfs.format.includeProlog=false

6.7.6 Metadata Change Events

The XML Formatter will seamlessly handle metadata change events. The format of the XML document is such that a metadata change event does not even result in a change to the XML schema. The XML schema is designed to be generic so that the same schema represents the data of any operation from any table.

The XML Formatter is capable of taking action in the case of a metadata change event. This assumes that the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events. The format of the XML document is such that a metadata change event does not result in a change to the XML schema. The XML schema is generic so that the same schema represents the data of any operation form any table. The resulting changes in the metadata will be reflected in messages after the metadata change event. For example in the case of adding a column, the new column and column data will begin showing up in XML messages for that table after the metadata change event.

6.7.7 Primary Key Updates

Updates to a primary key require no special handling by the XML formatter. The XML formatter creates messages that model the database operations. For update operations this includes before and after images of column values. Primary key changes are simply represented in this format as a change to a column value just like a change to any other column value