26 Using the Pluggable Formatters
The pluggable formatters are used to convert operations from the Oracle GoldenGate trail file into formatted messages that you can send to Big Data targets using one of the Oracle GoldenGate for Big Data Handlers.
This chapter describes how to use the pluggable formatters.
- Using the Avro Formatter
Apache Avro is an open source data serialization and deserialization framework known for its flexibility, compactness of serialized data, and good serialization and deserialization performance. Apache Avro is commonly used in Big Data applications. - Using the Delimited Text Formatter
- Using the JSON Formatter
- Using the Length Delimited Value Formatter
The Length Delimited Value (LDV) Formatter is a row-based formatter. It formats database operations from the source trail file into a length delimited value output. Each insert, update, delete, or truncate operation from the source trail is formatted into an individual length delimited message. - Using Operation-Based versus Row-Based Formatting
The Oracle GoldenGate for Big Data formatters include operation-based and row-based formatters. - Using the XML Formatter
The XML Formatter formats before-image and after-image data from the source trail file into an XML document representation of the operation data. The format of the XML document is effectively the same as the XML format in the previous releases of the Oracle GoldenGate Java Adapter.
26.1 Using the Avro Formatter
Apache Avro is an open source data serialization and deserialization framework known for its flexibility, compactness of serialized data, and good serialization and deserialization performance. Apache Avro is commonly used in Big Data applications.
- Avro Row Formatter
- The Avro Operation Formatter
- Avro Object Container File Formatter
- Setting Metacolumn Output
Parent topic: Using the Pluggable Formatters
26.1.1 Avro Row Formatter
The Avro Row Formatter formats operation data from the source trail file into messages in an Avro binary array format. Each individual insert, update, delete, and truncate operation is formatted into an individual Avro message. The source trail file contains the before and after images of the operation data. The Avro Row Formatter takes the before-image and after-image data and formats it into an Avro binary representation of the operation data.
The Avro Row Formatter formats operations from the source trail file into a format that represents the row data. This format is more compact than the output from the Avro Operation Formatter for the Avro messages model the change data operation.
The Avro Row Formatter may be a good choice when streaming Avro data to HDFS. Hive supports data files in HDFS in an Avro format.
This section contains the following topics:
- Operation Metadata Formatting Details
- Operation Data Formatting Details
- Sample Avro Row Messages
- Avro Schemas
Avro uses JSONs to represent schemas. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages. - Avro Row Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Special Considerations
Parent topic: Using the Avro Formatter
26.1.1.1 Operation Metadata Formatting Details
To output the metacolumns configure the following:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
For more information see the configuration property:
gg.handler.name.format.metaColumnTemplate
Table 26-1 Avro Formatter Metadata
Value | Description |
---|---|
|
The fully qualified table in the format is: |
|
The type of database operation from the source trail file. Default values are |
|
The timestamp of the operation from the source trail file. Since this timestamp is from the source trail, it is fixed. Replaying the trail file results in the same timestamp for the same operation. |
|
The time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will not result in the same timestamp for the same operation. |
|
The concatenated sequence number and the RBA number from the source trail file. This trail position lets you trace the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file. |
|
An array variable that holds the column names of the primary keys of the source table. |
|
A map variable that holds the token key value pairs from the source trail file. |
Parent topic: Avro Row Formatter
26.1.1.2 Operation Data Formatting Details
The operation data follows the operation metadata. This data is represented as individual fields identified by the column names.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. Avro attributes only support two states, the column has a value or the column value is null. Missing column values are handled the same as null values. Oracle recommends that when you use the Avro Row Formatter, you configure the Oracle GoldenGate capture process to provide full image data for all columns in the source trail file.
By default, the setting of the Avro Row Formatter maps the data types from the source trail file to the associated Avro data type. Because Avro provides limited support for data types, source columns map into Avro long, double, float, binary, or string data types. You can also configure data type mapping to handle all data as strings.
Parent topic: Avro Row Formatter
26.1.1.3 Sample Avro Row Messages
Because Avro messages are binary, they are not human readable. The following sample messages show the JSON representation of the messages.
Parent topic: Avro Row Formatter
26.1.1.3.1 Sample Insert Message
{"table": "GG.TCUSTORD", "op_type": "I", "op_ts": "2013-06-02 22:14:36.000000", "current_ts": "2015-09-18T10:13:11.172000", "pos": "00000000000000001444", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAA"}, "CUST_CODE": "WILL", "ORDER_DATE": "1994-09-30:15:33:00", "PRODUCT_CODE": "CAR", "ORDER_ID": "144", "PRODUCT_PRICE": 17520.0, "PRODUCT_AMOUNT": 3.0, "TRANSACTION_ID": "100"}
Parent topic: Sample Avro Row Messages
26.1.1.3.2 Sample Update Message
{"table": "GG.TCUSTORD", "op_type": "U", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:13:11.492000", "pos": "00000000000000002891", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqLzAAA"}, "CUST_CODE": "BILL", "ORDER_DATE": "1995-12-31:15:00:00", "PRODUCT_CODE": "CAR", "ORDER_ID": "765", "PRODUCT_PRICE": 14000.0, "PRODUCT_AMOUNT": 3.0, "TRANSACTION_ID": "100"}
Parent topic: Sample Avro Row Messages
26.1.1.3.3 Sample Delete Message
{"table": "GG.TCUSTORD", "op_type": "D", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:13:11.512000", "pos": "00000000000000004338", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"L": "206080450", "6": "9.0.80330", "R": "AADPkvAAEAAEqLzAAC"}, "CUST_CODE": "DAVE", "ORDER_DATE": "1993-11-03:07:51:35", "PRODUCT_CODE": "PLANE", "ORDER_ID": "600", "PRODUCT_PRICE": null, "PRODUCT_AMOUNT": null, "TRANSACTION_ID": null}
Parent topic: Sample Avro Row Messages
26.1.1.3.4 Sample Truncate Message
{"table": "GG.TCUSTORD", "op_type": "T", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:13:11.514000", "pos": "00000000000000004515", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAB"}, "CUST_CODE": null, "ORDER_DATE": null, "PRODUCT_CODE": null, "ORDER_ID": null, "PRODUCT_PRICE": null, "PRODUCT_AMOUNT": null, "TRANSACTION_ID": null}
Parent topic: Sample Avro Row Messages
26.1.1.4 Avro Schemas
Avro uses JSONs to represent schemas. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages.
Schemas are generated on a just-in-time basis when the first operation for a table
is encountered. Newer schemas are generated when there is a change in the metadata. The
generated Avro schemas are specific to a table definition, and therefore, a separate Avro
schema is generated for every table encountered for processed operations. By default, Avro
schemas are written to the GoldenGate_Home
/dirdef
directory, although the write location is configurable. Avro schema file names adhere to the
following naming convention:
Fully_Qualified_Table_Name
.avsc
.
The following is a sample Avro schema for the Avro Row Format for the references examples in the previous section:
{ "type" : "record", "name" : "TCUSTORD", "namespace" : "GG", "fields" : [ { "name" : "table", "type" : "string" }, { "name" : "op_type", "type" : "string" }, { "name" : "op_ts", "type" : "string" }, { "name" : "current_ts", "type" : "string" }, { "name" : "pos", "type" : "string" }, { "name" : "primary_keys", "type" : { "type" : "array", "items" : "string" } }, { "name" : "tokens", "type" : { "type" : "map", "values" : "string" }, "default" : { } }, { "name" : "CUST_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_DATE", "type" : [ "null", "string" ], "default" : null }, { "name" : "PRODUCT_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_ID", "type" : [ "null", "string" ], "default" : null }, { "name" : "PRODUCT_PRICE", "type" : [ "null", "double" ], "default" : null }, { "name" : "PRODUCT_AMOUNT", "type" : [ "null", "double" ], "default" : null }, { "name" : "TRANSACTION_ID", "type" : [ "null", "string" ], "default" : null } ] }
Parent topic: Avro Row Formatter
26.1.1.5 Avro Row Configuration Properties
Table 26-2 Avro Row Configuration Properties
Properties | Optional/ Required | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
OGAA-2201
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java. |
UTF-8 (the JSON default) |
Controls the output encoding of generated JSON Avro schema. The JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding. |
gg.handler.name.format.treatAllColumnsAsStrings |
Optional |
|
|
Controls the output typing of generated Avro messages. If set to false then the formatter will attempt to map Oracle GoldenGate types to the corresponding AVRO type. If set to true then all data will be treated as Strings in the generated Avro messages and schemas. |
gg.handler.name.format.pkUpdateHandling |
Optional |
|
|
Specifies how the formatter handles update operations that change a primary key. Primary key operations for the Avro Row formatter require special consideration.
|
|
Optional |
Any string |
no value |
Inserts a delimiter after each Avro message. This is not
a best practice, but in certain cases you may want to parse a stream
of data and extract individual Avro messages from the stream. Select
a unique delimiter that cannot occur in any Avro message. This
property supports |
|
Optional |
|
|
Avro schemas always follow
the |
|
Optional |
|
|
Wraps the Avro messages for operations from the source trail file in a generic Avro wrapper message. For more information, see Generic Wrapper Functionality. |
|
Optional |
Any legal, existing file system path. |
|
The output location of generated Avro schemas. |
|
Optional |
Any legal encoding name or alias supported by Java. |
|
The directory in the HDFS where schemas are output. A
metadata change overwrites the schema during the next operation for
the associated table. Schemas follow the same naming convention as
schemas written to the local file
system: |
|
Optional |
|
|
The format of the current timestamp. The default is the
ISO 8601 format. A setting of false removes the
T between the date and time in the current
timestamp, which outputs a space instead.
|
|
Optional |
|
|
Set to true to include a
{column_name}_isMissing boolean field
for each source field. This field allows downstream applications to
differentiate if a null value is null in the source trail file (value is
false ) or is missing in the source trail file
(value is true ).
|
|
Optional |
|
|
Enables the use of Avro |
|
Optional |
Any integer value from 0 to 38. |
None |
Allows you to set the scale on the Avro
|
|
Optional |
|
|
Set to |
gg.handler.name.format.mapLargeNumbersAsStrings |
Optional | true | false |
false |
Oracle GoldenGate supports the floating point and
integer source datatypes. Some of these datatypes may not fit into the
Avro primitive double or long datatypes. Set this property to
true to map the fields that do not fit into the
Avro primitive double or long datatypes to Avro string.
|
Parent topic: Avro Row Formatter
26.1.1.6 Review a Sample Configuration
The following is a sample configuration for the Avro Row Formatter in the Java Adapter properties file:
gg.handler.hdfs.format=avro_row gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.encoding=UTF-8 gg.handler.hdfs.format.pkUpdateHandling=abend gg.handler.hafs.format.wrapMessageInGenericAvroMessage=false
Parent topic: Avro Row Formatter
26.1.1.7 Metadata Change Events
If the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events, the Avro Row Formatter can take action when metadata changes. Because Avro messages depend closely on their corresponding schema, metadata changes are important when you use Avro formatting.
An updated Avro schema is generated as soon as a table operation occurs after a metadata change event. You must understand the impact of a metadata change event and change downstream targets to the new Avro schema. The tight dependency of Avro messages to Avro schemas may result in compatibility issues. Avro messages generated before the schema change may not be able to be deserialized with the newly generated Avro schema.
Conversely, Avro messages generated after the schema change may not be able to be deserialized with the previous Avro schema. It is a best practice to use the same version of the Avro schema that was used to generate the message. For more information, consult the Apache Avro documentation.
Parent topic: Avro Row Formatter
26.1.1.8 Special Considerations
This sections describes these special considerations:
26.1.1.8.1 Troubleshooting
Because Avro is a binary format, it is not human readable. Since Avro messages are in binary format, it is difficult to debug any issue, the Avro Row Formatter provides a special feature to help debug issues. When the log4j
Java logging level is set to TRACE
, Avro messages are deserialized and displayed in the log file as a JSON object, letting you view the structure and contents of the created Avro messages. Do not enable TRACE
in a production environment as it has substantial negative impact on performance. To troubleshoot content, you may want to consider switching to use a formatter that produces human-readable content. The XML or JSON formatters both produce content in human-readable format.
Parent topic: Special Considerations
26.1.1.8.2 Primary Key Updates
In Big Data integrations, primary key update operations require special consideration and planning. Primary key updates modify one or more of the primary keys of a given row in the source database. Because data is appended in Big Data applications, a primary key update operation looks more like a new insert than like an update without special handling. You can use the following properties to configure the Avro Row Formatter to handle primary keys:
Table 26-3 Configurable behavior
Value | Description |
---|---|
|
The formatter terminates. This behavior is the default behavior. |
|
With this configuration the primary key update is treated like any other update operation. Use this configuration only if you can guarantee that the primary key is not used as selection criteria row data from a Big Data system. |
|
The primary key update is treated as a special case of a delete, using the before image data and an insert using the after-image data. This configuration may more accurately model the effect of a primary key update in a Big Data application. However, if this configuration is selected, it is important to have full supplemental logging enabled on Replication at the source database. Without full supplemental logging the delete operation will be correct, but insert operation will not contain all of the data for all of the columns for a full representation of the row data in the Big Data application. |
Parent topic: Special Considerations
26.1.1.8.3 Generic Wrapper Functionality
Because Avro messages are not self describing, the receiver of the message must know the schema associated with the message before the message can be deserialized. Avro messages are binary and provide no consistent or reliable way to inspect the message contents in order to ascertain the message type. Therefore, Avro can be troublesome when messages are interlaced into a single stream of data such as Kafka.
The Avro formatter provides a special feature to wrap the Avro message in a generic Avro message. You can enable this functionality by setting the following configuration property.
gg.handler.name.format.wrapMessageInGenericAvroMessage=true
The generic message is Avro message wrapping the Avro payload message that is common to all Avro messages that are output. The schema for the generic message is name generic_wrapper.avsc
and is written to the output schema directory. This message has the following three fields:
-
table_name
:The fully qualified source table name. -
schema_fingerprint
: The fingerprint of the Avro schema of the wrapped message. The fingerprint is generated using the AvroSchemaNormalization.parsingFingerprint64(schema)
call. -
payload
: The wrapped Avro message.
The following is the Avro Formatter generic wrapper schema.
{ "type" : "record", "name" : "generic_wrapper", "namespace" : "oracle.goldengate", "fields" : [ { "name" : "table_name", "type" : "string" }, { "name" : "schema_fingerprint", "type" : "long" }, { "name" : "payload", "type" : "bytes" } ] }
Parent topic: Special Considerations
26.1.2 The Avro Operation Formatter
The Avro Operation Formatter formats operation data from the source trail file into messages in an Avro binary array format. Each individual insert, update, delete, and truncate operation is formatted into an individual Avro message. The source trail file contains the before and after images of the operation data. The Avro Operation Formatter formats this data into an Avro binary representation of the operation data.
This format is more verbose than the output of the Avro Row Formatter for which the Avro messages model the row data.
- Operation Metadata Formatting Details
- Operation Data Formatting Details
- Sample Avro Operation Messages
- Avro Schema
- Avro Operation Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Special Considerations
Parent topic: Using the Avro Formatter
26.1.2.1 Operation Metadata Formatting Details
To output the metacolumns configure the following:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
For more information see the configuration property:
gg.handler.name.format.metaColumnsTemplate
Table 26-4 Avro Messages and its Metadata
Fields | Description |
---|---|
|
The fully qualified table name, in the
format:
|
|
The type of database operation from the source trail
file. Default values are |
|
The timestamp of the operation from the source trail file. Since this timestamp is from the source trail, it is fixed. Replaying the trail file results in the same timestamp for the same operation. |
|
The time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file will not result in the same timestamp for the same operation. |
|
The concatenated sequence number and rba number from the source trail file. The trail position provides traceability of the operation back to the source trail file. The sequence number is the source trail file number. The rba number is the offset in the trail file. |
|
An array variable that holds the column names of the primary keys of the source table. |
|
A map variable that holds the token key value pairs from the source trail file. |
Parent topic: The Avro Operation Formatter
26.1.2.2 Operation Data Formatting Details
The operation data is represented as individual fields identified by the column names.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. Avro attributes only support two states: the column has a value or the column value is null. The Avro Operation Formatter contains an additional Boolean field COLUMN_NAME
_isMissing
for each column to indicate whether the column value is missing or not. Using COLUMN_NAME
field together with the COLUMN_NAME
_isMissing
field, all three states can be defined.
-
State 1: The column has a value
COLUMN_NAME
field has a valueCOLUMN_NAME
_isMissing
field is false -
State 2: The column value is null
COLUMN_NAME
field value is nullCOLUMN_NAME
_isMissing
field is false -
State 3: The column value is missing
COLUMN_NAME
field value is nullCOLUMN_NAME
_isMissing
field is true
By default the Avro Row Formatter maps the data types from the source trail file to the associated Avro data type. Because Avro supports few data types, this functionality usually results in the mapping of numeric fields from the source trail file to members typed as numbers. You can also configure this data type mapping to handle all data as strings.
Parent topic: The Avro Operation Formatter
26.1.2.3 Sample Avro Operation Messages
Because Avro messages are binary, they are not human readable. The following topics show example Avro messages in JSON format:
Parent topic: The Avro Operation Formatter
26.1.2.3.1 Sample Insert Message
{"table": "GG.TCUSTORD", "op_type": "I", "op_ts": "2013-06-02 22:14:36.000000", "current_ts": "2015-09-18T10:17:49.570000", "pos": "00000000000000001444", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAA"}, "before": null, "after": { "CUST_CODE": "WILL", "CUST_CODE_isMissing": false, "ORDER_DATE": "1994-09-30:15:33:00", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "CAR", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "144", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": 17520.0, "PRODUCT_PRICE_isMissing": false, "PRODUCT_AMOUNT": 3.0, "PRODUCT_AMOUNT_isMissing": false, "TRANSACTION_ID": "100", "TRANSACTION_ID_isMissing": false}}
Parent topic: Sample Avro Operation Messages
26.1.2.3.2 Sample Update Message
{"table": "GG.TCUSTORD", "op_type": "U", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:17:49.880000", "pos": "00000000000000002891", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqLzAAA"}, "before": { "CUST_CODE": "BILL", "CUST_CODE_isMissing": false, "ORDER_DATE": "1995-12-31:15:00:00", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "CAR", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "765", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": 15000.0, "PRODUCT_PRICE_isMissing": false, "PRODUCT_AMOUNT": 3.0, "PRODUCT_AMOUNT_isMissing": false, "TRANSACTION_ID": "100", "TRANSACTION_ID_isMissing": false}, "after": { "CUST_CODE": "BILL", "CUST_CODE_isMissing": false, "ORDER_DATE": "1995-12-31:15:00:00", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "CAR", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "765", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": 14000.0, "PRODUCT_PRICE_isMissing": false, "PRODUCT_AMOUNT": 3.0, "PRODUCT_AMOUNT_isMissing": false, "TRANSACTION_ID": "100", "TRANSACTION_ID_isMissing": false}}
Parent topic: Sample Avro Operation Messages
26.1.2.3.3 Sample Delete Message
{"table": "GG.TCUSTORD", "op_type": "D", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:17:49.899000", "pos": "00000000000000004338", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"L": "206080450", "6": "9.0.80330", "R": "AADPkvAAEAAEqLzAAC"}, "before": { "CUST_CODE": "DAVE", "CUST_CODE_isMissing": false, "ORDER_DATE": "1993-11-03:07:51:35", "ORDER_DATE_isMissing": false, "PRODUCT_CODE": "PLANE", "PRODUCT_CODE_isMissing": false, "ORDER_ID": "600", "ORDER_ID_isMissing": false, "PRODUCT_PRICE": null, "PRODUCT_PRICE_isMissing": true, "PRODUCT_AMOUNT": null, "PRODUCT_AMOUNT_isMissing": true, "TRANSACTION_ID": null, "TRANSACTION_ID_isMissing": true}, "after": null}
Parent topic: Sample Avro Operation Messages
26.1.2.3.4 Sample Truncate Message
{"table": "GG.TCUSTORD", "op_type": "T", "op_ts": "2013-06-02 22:14:41.000000", "current_ts": "2015-09-18T10:17:49.900000", "pos": "00000000000000004515", "primary_keys": ["CUST_CODE", "ORDER_DATE", "PRODUCT_CODE", "ORDER_ID"], "tokens": {"R": "AADPkvAAEAAEqL2AAB"}, "before": null, "after": null}
Parent topic: Sample Avro Operation Messages
26.1.2.4 Avro Schema
Avro schemas are represented as JSONs. Avro schemas define the format of generated Avro messages and are required to serialize and deserialize Avro messages.Avro schemas are generated on a just-in-time basis when the first operation for a table is encountered. Because Avro schemas are specific to a table definition, a separate Avro schema is generated for every table encountered for processed operations. By default, Avro schemas are written to the GoldenGate_Home
/dirdef
directory, although the write location is configurable. Avro schema file names adhere to the following naming convention: Fully_Qualified_Table_Name.avsc
.
The following is a sample Avro schema for the Avro Operation Format for the samples in the preceding sections:
{ "type" : "record", "name" : "TCUSTORD", "namespace" : "GG", "fields" : [ { "name" : "table", "type" : "string" }, { "name" : "op_type", "type" : "string" }, { "name" : "op_ts", "type" : "string" }, { "name" : "current_ts", "type" : "string" }, { "name" : "pos", "type" : "string" }, { "name" : "primary_keys", "type" : { "type" : "array", "items" : "string" } }, { "name" : "tokens", "type" : { "type" : "map", "values" : "string" }, "default" : { } }, { "name" : "before", "type" : [ "null", { "type" : "record", "name" : "columns", "fields" : [ { "name" : "CUST_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "CUST_CODE_isMissing", "type" : "boolean" }, { "name" : "ORDER_DATE", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_DATE_isMissing", "type" : "boolean" }, { "name" : "PRODUCT_CODE", "type" : [ "null", "string" ], "default" : null }, { "name" : "PRODUCT_CODE_isMissing", "type" : "boolean" }, { "name" : "ORDER_ID", "type" : [ "null", "string" ], "default" : null }, { "name" : "ORDER_ID_isMissing", "type" : "boolean" }, { "name" : "PRODUCT_PRICE", "type" : [ "null", "double" ], "default" : null }, { "name" : "PRODUCT_PRICE_isMissing", "type" : "boolean" }, { "name" : "PRODUCT_AMOUNT", "type" : [ "null", "double" ], "default" : null }, { "name" : "PRODUCT_AMOUNT_isMissing", "type" : "boolean" }, { "name" : "TRANSACTION_ID", "type" : [ "null", "string" ], "default" : null }, { "name" : "TRANSACTION_ID_isMissing", "type" : "boolean" } ] } ], "default" : null }, { "name" : "after", "type" : [ "null", "columns" ], "default" : null } ] }
Parent topic: The Avro Operation Formatter
26.1.2.5 Avro Operation Formatter Configuration Properties
Table 26-5 Configuration Properties
Properties | Optional Y/N | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java |
UTF-8 (the JSON default) |
Controls the output encoding of generated JSON Avro schema. The JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding. |
|
Optional |
|
|
Controls the output typing of generated Avro messages. If set to |
|
Optional |
Any string |
no value |
Inserts delimiter after each Avro message. This is not a best practice, but in certain cases you may want to parse a stream of data and extract individual Avro messages from the stream, use this property to help. Select a unique delimiter that cannot occur in any Avro message. This property supports |
|
Optional |
Any legal, existing file system path. |
|
The output location of generated Avro schemas. |
|
Optional |
|
|
Wraps Avro messages for operations from the source trail file in a generic Avro wrapper message. For more information, see Generic Wrapper Functionality. |
|
Optional |
|
|
The format of the current timestamp. By default the ISO 8601 is set to false , removes the T between the date and time in the current timestamp, which outputs a space instead.
|
|
Optional |
|
|
Set to |
|
Optional |
Any integer value from 0 to 38. |
None |
Allows you to set the scale on the Avro |
gg.handler.name.format.mapOracleNumbersAsStrings |
Optional |
|
|
This property is only applicable if decimal logical
types are enabled via the property
gg.handler.name.format.enableDecimalLogialType=true .
Oracle numbers are especially problematic because they have a large
precision (168) and floating scale of up to 38. Some analytical tools,
such as Spark cannot read numbers that large. This property allows you
to map those Oracle numbers as strings while still mapping the smaller
numbers as decimal logical types.
|
|
Optional |
|
|
Set to |
|
Optional |
|
|
Enables the use of Avro |
gg.handler.name.format.mapLargeNumbersAsStrings |
Optional |
|
false
|
Oracle GoldenGate supports the floating point and
integer source datatypes. Some of these datatypes may not fit into the
Avro primitive double or long datatypes. Set this property to
true to map the fields that do not fit into the
Avro primitive double or long datatypes to Avro string.
|
gg.handler.name.format.metaColumnsTemplate |
Optional | See unresolvable-reference.html#GUID-7231D03B-5470-4E46-9852-C61273D7EEEA | None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey |
truncateOpKey | includeTableName | includeOpTimestamp |
includeOpType | includePosition | includeCurrentTimestamp,
useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. For more information about the Metacolumn keywords, see unresolvable-reference.html#GUID-7231D03B-5470-4E46-9852-C61273D7EEEA. |
Parent topic: The Avro Operation Formatter
26.1.2.6 Review a Sample Configuration
The following is a sample configuration for the Avro Operation Formatter in the Java Adapter properg.handlerties
file:
gg.handler.hdfs.format=avro_op gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.encoding=UTF-8 gg.handler.hdfs.format.wrapMessageInGenericAvroMessage=false
Parent topic: The Avro Operation Formatter
26.1.2.7 Metadata Change Events
If the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events, the Avro Operation Formatter can take action when metadata changes. Because Avro messages depend closely on their corresponding schema, metadata changes are important when you use Avro formatting.
An updated Avro schema is generated as soon as a table operation occurs after a metadata change event.
You must understand the impact of a metadata change event and change downstream targets to the new Avro schema. The tight dependency of Avro messages to Avro schemas may result in compatibility issues. Avro messages generated before the schema change may not be able to be deserialized with the newly generated Avro schema. Conversely, Avro messages generated after the schema change may not be able to be deserialized with the previous Avro schema. It is a best practice to use the same version of the Avro schema that was used to generate the message
For more information, consult the Apache Avro documentation.
Parent topic: The Avro Operation Formatter
26.1.2.8 Special Considerations
This section describes these special considerations:
Parent topic: The Avro Operation Formatter
26.1.2.8.1 Troubleshooting
Because Avro is a binary format, it is not human readable. However, when the log4j
Java logging level is set to TRACE
, Avro messages are deserialized and displayed in the log file as a JSON object, letting you view the structure and contents of the created Avro messages. Do not enable TRACE
in a production environment, as it has a substantial impact on performance.
Parent topic: Special Considerations
26.1.2.8.2 Primary Key Updates
The Avro Operation Formatter creates messages with complete data of before-image and after-images for update operations. Therefore, the Avro Operation Formatter requires no special treatment for primary key updates.
Parent topic: Special Considerations
26.1.2.8.3 Generic Wrapper Message
Because Avro messages are not self describing, the receiver of the message must know the schema associated with the message before the message can be deserialized. Avro messages are binary and provide no consistent or reliable way to inspect the message contents in order to ascertain the message type. Therefore, Avro can be troublesome when messages are interlaced into a single stream of data such as Kafka.
The Avro formatter provides a special feature to wrap the Avro message in a generic Avro message. You can enable this functionality by setting the following configuration property:
gg.handler.name.format.wrapMessageInGenericAvroMessage=true
The generic message is Avro message wrapping the Avro payload message that is common to all Avro messages that are output. The schema for the generic message is name generic_wrapper.avsc
and is written to the output schema directory. This message has the following three fields:
-
table_name
: The fully qualified source table name. -
schema_fingerprint
: The fingerprint of the of the Avro schema generating the messages. The fingerprint is generated using theparsingFingerprint64(Schema s)
method on theorg.apache.avro.SchemaNormalization
class. -
payload
: The wrapped Avro message.
The following is the Avro Formatter generic wrapper schema:
{ "type" : "record", "name" : "generic_wrapper", "namespace" : "oracle.goldengate", "fields" : [ { "name" : "table_name", "type" : "string" }, { "name" : "schema_fingerprint", "type" : "long" }, { "name" : "payload", "type" : "bytes" } ] }
Parent topic: Special Considerations
26.1.3 Avro Object Container File Formatter
Oracle GoldenGate for Big Data can write to HDFS in Avro Object Container File (OCF) format. Avro OCF handles schema evolution more efficiently than other formats. The Avro OCF Formatter also supports compression and decompression to allow more efficient use of disk space.
The HDFS Handler integrates with the Avro formatters to write files to HDFS in Avro OCF format. The Avro OCF format is required for Hive to read Avro data in HDFS. The Avro OCF format is detailed in the Avro specification, see http://avro.apache.org/docs/current/spec.html#Object+Container+Files.
You can configure the HDFS Handler to stream data in Avro OCF format, generate table definitions in Hive, and update table definitions in Hive in the case of a metadata change event.
26.1.3.1 Avro OCF Formatter Configuration Properties
Properties | Optional / Required | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be truncated into the output record to indicate a truncate operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java. |
|
Controls the output encoding of generated JSON Avro schema. The JSON default is UTF-8. Avro messages are binary and support their own internal representation of encoding. |
|
Optional |
|
|
Controls the output typing of generated Avro messages. When the setting is |
|
Optional |
|
|
Controls how the formatter should handle update operations that change a primary key. Primary key operations can be problematic for the Avro Row formatter and require special consideration by you.
|
|
Optional |
|
|
Because schemas must be generated for Avro serialization to |
|
Optional |
Any legal, existing file system path |
|
The directory where generated Avro schemas are saved to the local file system. This property does not control where the Avro schema is written to in HDFS; that is controlled by an HDFS Handler property. |
|
Optional |
|
|
By default, the value of this property is true, and the format for the current timestamp is ISO8601. Set to |
|
Optional |
|
|
If set to
|
Parent topic: Avro Object Container File Formatter
26.1.4 Setting Metacolumn Output
The following are the configurable values for the Avro formatter metacolumns template property that controls metacolumn output:
Table 26-6 Metacolumns Template Property
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format.metaColumnsTemplate |
Optional |
|
|
None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey | truncateOpKey | includeTableName | includeOpTimestamp | includeOpType | includePosition | includeCurrentTimestamp, useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. |
${optype}, ${token.ROWID}, ${sys.username}, ${currenttimestamp}
Explanation of the Metacolumn Keywords
The metacolumns functionality allows you to select the metadata fields that you want to see in the generated output messages. The format of the metacolumn syntax is:
-
${keyword[fieldName].argument}
-
The keyword is fixed based on the metacolumn syntax. Optionally, you can provide a field name between the square brackets. If a field name is not provided, then the default field name is used.
The argument is required to resolve the metacolumn value.
-
${alltokens}
-
All of the Oracle GoldenGate tokens.
-
${token}
-
The value of a specific Oracle GoldenGate token. The token key should follow token key should follow the token using the period (
.
) operator. For example:${token.MYTOKEN}
-
${token.MYTOKEN}
-
${sys}
A system environmental variable. The variable name should follow sys using the period (.) operator.
-
${sys.MYVAR}
-
${sys.MYVAR}
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.) operator. -
${env}
-
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.
) operator. For example:${env.someVariable}
-
${javaprop}
-
A Java JVM variable. The variable name should follow
javaprop
using the period (.
) operator. For example:${javaprop.MYVAR}
-
${optype}
-
Operation type.
-
${position}
-
Record position.
-
${timestamp}
-
Record timestamp.
-
${catalog}
-
Catalog name.
-
${schema}
-
Schema name.
-
${table}
-
Table name.
-
${objectname}
-
The fully qualified table name.
-
${csn}
-
Source Commit Sequence Number.
-
${xid}
-
Source transaction ID.
-
${currenttimestamp}
-
Current timestamp.
-
${currenttimestampiso8601}
-
Current timestamp in ISO 8601 format.
-
${opseqno}
-
Record sequence number within the transaction.
-
${timestampmicro}
-
Record timestamp in microseconds after epoch.
-
${currenttimestampmicro}
-
Current timestamp in microseconds after epoch.
-
${txind}
-
The is the transactional indicator from the source trail file. The values of a transaction are
B
for the first operation,M
for the middle operations,E
for the last operation, orW
for whole if there is only one operation. Filtering operations or the use of coordinated apply negate the usefulness of this field. -
${primarykeycolumns}
-
Use to inject a field with a list of the primary key column names.
-
${static}
-
Use to inject a field with a static value into the output. The value desired should be the argument. If the desired value is
abc
, then the syntax would be${static.abc}
or${static[FieldName].abc}
. -
${seqno}
-
Use to inject a field with the trail file sequence into the output.
-
${rba}
-
Use to inject a field with the rba of the operation into the output.
Sample Configuration:
gg.handlerlist=kafkarestproxy
#The handler properties
gg.handler.kafkarestproxy.type=kafkarestproxy
#The following selects the topic name based on the fully qualified table name
gg.handler.kafkarestproxy.topicMappingTemplate=${fullyQualifiedTableName}
#The following selects the message key using the concatenated primary keys
gg.handler.kafkarestproxy.keyMappingTemplate=${primaryKeys}
gg.handler.kafkarestproxy.postDataUrl=http://localhost:8083
gg.handler.kafkarestproxy.apiVersion=v1
gg.handler.kafkarestproxy.format=json
gg.handler.kafkarestproxy.payloadsize=1
gg.handler.kafkarestproxy.mode=tx
#Server auth properties
#gg.handler.kafkarestproxy.trustStore=/keys/truststore.jks
#gg.handler.kafkarestproxy.trustStorePassword=test1234
#Client auth properites
#gg.handler.kafkarestproxy.keyStore=/keys/keystore.jks
#gg.handler.kafkarestproxy.keyStorePassword=test1234
#Proxy properties
#gg.handler.kafkarestproxy.proxy=http://proxyurl:80
#gg.handler.kafkarestproxy.proxyUserName=username
#gg.handler.kafkarestproxy.proxyPassword=password
#The MetaColumnTemplate formatter properties
gg.handler.kafkarestproxy.format.metaColumnsTemplate=${optype},${timestampmicro},${currenttimestampmicro}
Parent topic: Using the Avro Formatter
26.2 Using the Delimited Text Formatter
-
Column has a value: The column value is output.
-
Column value is null: The default output value is
NULL
. The output for the case of a null column value is configurable. -
Column value is missing: The default output value is an empty string (""). The output for the case of a missing column value is configurable.
- Using the Delimited Text Row Formatter
The Delimited Text Row Formatter is the Delimited Text Formatter that was included a release prior to the Oracle GoldeGate for Big Data 19.1.0.0 release. It writes the after change data for inserts and updates, and before change data for deletes. - Delimited Text Operation Formatter
The Delimited Text Operation Formatter is new functionality in the Oracle GoldenGate for Big Data 19.1.0.0.0 release. It outputs both before and after change data for insert, update and delete operations.
Parent topic: Using the Pluggable Formatters
26.2.1 Using the Delimited Text Row Formatter
The Delimited Text Row Formatter is the Delimited Text Formatter that was included a release prior to the Oracle GoldeGate for Big Data 19.1.0.0 release. It writes the after change data for inserts and updates, and before change data for deletes.
- Message Formatting Details
- Sample Formatted Messages
- Output Format Summary Log
- Delimited Text Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Setting Metacolumn Output
- Additional Considerations
Parent topic: Using the Delimited Text Formatter
26.2.1.1 Message Formatting Details
The default output format uses a semicolon as the delimiter and resembles the following:
First is the row metadata:
operation_type;fully_qualified_table_name;operation_timestamp;current_timestamp;trail_position;tokens;
Next is the row data:
column_1_value;column_n_value_then_line_delimeter
Optionally, the column name may be included before each column value that changes the output format for the row data:
column_1_name;column_1_value;column_n_name;column_n_value_then_line_delimeter
Formatting details:
-
Operation Type : Indicates the type of database operation from the source trail file. Default values are
I
for insert,U
for update,D
for delete,T
for truncate. Output of this field is suppressible. -
Fully Qualified Table Name: The fully qualified table name is the source database table including the catalog name, and the schema name. The format of the fully qualified table name is
catalog_name.schema_name.table_name
. The output of this field is suppressible. -
Operation Timestamp : The commit record timestamp from the source system. All operations in a transaction (unbatched transaction) will have the same operation timestamp. This timestamp is fixed, and the operation timestamp is the same if the trail file is replayed. The output of this field is suppressible.
-
Current Timestamp : The timestamp of the current time when the delimited text formatter processes the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file does not result in the same timestamp for the same operation. The output of this field is suppressible.
-
Trail Position :The concatenated sequence number and RBA number from the source trail file. The trail position lets you trace the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file. The output of this field is suppressible.
-
Tokens : The token key value pairs from the source trail file. The output of this field in the delimited text output is suppressed unless the
includeTokens
configuration property on the corresponding handler is explicitly set totrue
.
Parent topic: Using the Delimited Text Row Formatter
26.2.1.2 Sample Formatted Messages
The following sections contain sample messages from the Delimited Text Formatter. The default field delimiter has been changed to a pipe character, |
, to more clearly display the message.
Parent topic: Using the Delimited Text Row Formatter
26.2.1.2.1 Sample Insert Message
I|GG.TCUSTORD|2013-06-02 22:14:36.000000|2015-09-18T13:23:01.612001|00000000000000001444|R=AADPkvAAEAAEqL2A AA|WILL|1994-09-30:15:33:00|CAR|144|17520.00|3|100
Parent topic: Sample Formatted Messages
26.2.1.2.2 Sample Update Message
U|GG.TCUSTORD|2013-06-02 22:14:41.000000|2015-09-18T13:23:01.987000|00000000000000002891|R=AADPkvAAEAAEqLzA AA|BILL|1995-12-31:15:00:00|CAR|765|14000.00|3|100
Parent topic: Sample Formatted Messages
26.2.1.2.3 Sample Delete Message
D,GG.TCUSTORD,2013-06-02 22:14:41.000000,2015-09-18T13:23:02.000000,00000000000000004338,L=206080450,6=9.0. 80330,R=AADPkvAAEAAEqLzAAC,DAVE,1993-11-03:07:51:35,PLANE,600,,,
Parent topic: Sample Formatted Messages
26.2.1.2.4 Sample Truncate Message
T|GG.TCUSTORD|2013-06-02 22:14:41.000000|2015-09-18T13:23:02.001000|00000000000000004515|R=AADPkvAAEAAEqL2A AB|||||||
Parent topic: Sample Formatted Messages
26.2.1.3 Output Format Summary Log
If INFO
level logging is enabled, the Java log4j logging logs a
summary of the delimited text output format . A summary of the delimited fields is
logged for each source table encountered and occurs when the first operation for
that table is received by the Delimited Text formatter. This detailed explanation of
the fields of the delimited text output may be useful when you perform an initial
setup. When a metadata change event occurs, the summary of the delimited fields is
regenerated and logged again at the first subsequent operation for that table.
Parent topic: Using the Delimited Text Row Formatter
26.2.1.4 Delimited Text Formatter Configuration Properties
Table 26-7 Delimited Text Formatter Configuration Properties
Properties | Optional / Required | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format |
Required |
delimitedtext |
None |
Selects the Delimited Text Row formatter as the formatter. |
gg.handler.name.format.includeColumnNames |
Optional |
|
false |
Controls the output of writing the column names as a delimited field preceding the column value. When true, the output resembles: COL1_Name|COL1_Value|COL2_Name|COL2_Value When
|
|
Optional |
|
|
A |
|
Optional |
|
|
A |
|
Optional |
|
|
A |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any encoding name or alias supported by Java. |
The native system encoding of the machine hosting the Oracle GoldenGate process. |
Determines the encoding of the output delimited text. |
|
Optional |
Any String |
|
The delimiter used between delimited fields. This value
supports |
|
Optional |
Any String |
Newline (the default Hive delimiter) |
The delimiter used between records. This value supports
|
gg.handler.name.format.disableEscaping |
Optional | true |
false |
false |
Set to true to disable the
escaping of characters which conflict with the configured delimiters.
Must be set to true if
gg.handler.name.format.fieldDelimiter is set to a
value of multiple characters.
|
|
Optional |
|
|
Use |
|
Optional |
Any string |
|
Specifies a delimiter between keys and values in a map.
Key1=value1. Tokens are mapped values. Configuration value supports
|
|
Optional |
Any string |
|
Specifies a delimiter between key value pairs in a map.
|
|
Optional |
|
|
Specifies how the formatter handles update operations that change a primary key. Primary key operations can be problematic for the text formatter and require special consideration by you.
|
|
Optional |
Any string |
NULL |
Specifies what is included in the delimited output in the
case of a NULL value. Configuration value supports
|
|
Optional |
Any string |
|
Specifies what is included in the delimited text output
in the case of a missing value. Configuration value supports
|
|
Optional |
|
|
When true, suppresses the output of the operation position from the source trail file. |
|
Optional |
|
|
Controls the format of the current timestamp. The
default is the ISO 8601 format. When |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
gg.handler.name.format.includeGroupCols |
Optional | true | false |
false |
If set to true , the columns are grouped
into sets of all names, all before values, and all after values
U,QASOURCE.TCUSTMER,2015-11-05 18:45:39.000000,2019-04-17T05:19:30.556000,00000000000000005100,R=AAKifQAAKAAAFDHAAE,CUST_CODE,NAME,CITY,STATE,ANN,ANN'S BOATS,SEATTLE,WA,ANN,,NEW YORK,NY |
Parent topic: Using the Delimited Text Row Formatter
26.2.1.5 Review a Sample Configuration
The following is a sample configuration for the Delimited Text formatter in the Java Adapter configuration file:
gg.handler.name.format.includeColumnNames=false gg.handler.name.format.includeOpTimestamp=true gg.handler.name.format.includeCurrentTimestamp=true gg.handler.name.format.insertOpKey=I gg.handler.name.format.updateOpKey=U gg.handler.name.format.deleteOpKey=D gg.handler.name.format.truncateOpKey=T gg.handler.name.format.encoding=UTF-8 gg.handler.name.format.fieldDelimiter=CDATA[\u0001] gg.handler.name.format.lineDelimiter=CDATA[\n] gg.handler.name.format.includeTableName=true gg.handler.name.format.keyValueDelimiter=CDATA[=] gg.handler.name.format.kevValuePairDelimiter=CDATA[,] gg.handler.name.format.pkUpdateHandling=abend gg.handler.name.format.nullValueRepresentation=NULL gg.handler.name.format.missingValueRepresentation=CDATA[] gg.handler.name.format.includePosition=true gg.handler.name.format.includeGroupCols=false gg.handler.name.format=delimitedtext
Parent topic: Using the Delimited Text Row Formatter
26.2.1.6 Metadata Change Events
Oracle GoldenGate for Big Data now handles metadata change events at runtime. This assumes that the replicated database and upstream replication processes are propagating metadata change events. The Delimited Text Formatter changes the output format to accommodate the change and the Delimited Text Formatter continue running.
Note:
A metadata change may affect downstream applications. Delimited text formats include a fixed number of fields that are positionally relevant. Deleting a column in the source table can be handled seamlessly during Oracle GoldenGate runtime, but results in a change in the total number of fields, and potentially changes the positional relevance of some fields. Adding an additional column or columns is probably the least impactful metadata change event, assuming that the new column is added to the end. Consider the impact of a metadata change event before executing the event. When metadata change events are frequent, Oracle recommends that you consider a more flexible and self-describing format, such as JSON or XML.Parent topic: Using the Delimited Text Row Formatter
26.2.1.7 Setting Metacolumn Output
The following are the configurable values for the Delimiter text formatter metacolumn property that controls metacolumn output:
Table 26-8 Metacolumns Template Property
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format.metaColumnsTemplate |
Optional |
|
|
None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey | truncateOpKey | includeTableName | includeOpTimestamp | includeOpType | includePosition | includeCurrentTimestamp, useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. |
${optype}, ${token.ROWID}, ${sys.username}, ${currenttimestamp}
Explanation of the Metacolumn Keywords
The metacolumns functionality allows you to select the metadata fields that you want to see in the generated output messages. The format of the metacolumn syntax is:
-
${keyword[fieldName].argument}
-
The keyword is fixed based on the metacolumn syntax. Optionally, you can provide a field name between the square brackets. If a field name is not provided, then the default field name is used.
The argument is required to resolve the metacolumn value.
-
${alltokens}
-
All of the Oracle GoldenGate tokens.
-
${token}
-
The value of a specific Oracle GoldenGate token. The token key should follow token key should follow the token using the period (
.
) operator. For example:${token.MYTOKEN}
-
${token.MYTOKEN}
-
${sys}
A system environmental variable. The variable name should follow sys using the period (.) operator.
-
${sys.MYVAR}
-
${sys.MYVAR}
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.) operator. -
${env}
-
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.
) operator. For example:${env.someVariable}
-
${javaprop}
-
A Java JVM variable. The variable name should follow
javaprop
using the period (.
) operator. For example:${javaprop.MYVAR}
-
${optype}
-
Operation type.
-
${position}
-
Record position.
-
${timestamp}
-
Record timestamp.
-
${catalog}
-
Catalog name.
-
${schema}
-
Schema name.
-
${table}
-
Table name.
-
${objectname}
-
The fully qualified table name.
-
${csn}
-
Source Commit Sequence Number.
-
${xid}
-
Source transaction ID.
-
${currenttimestamp}
-
Current timestamp.
-
${currenttimestampiso8601}
-
Current timestamp in ISO 8601 format.
-
${opseqno}
-
Record sequence number within the transaction.
-
${timestampmicro}
-
Record timestamp in microseconds after epoch.
-
${currenttimestampmicro}
-
Current timestamp in microseconds after epoch.
-
${txind}
-
The is the transactional indicator from the source trail file. The values of a transaction are
B
for the first operation,M
for the middle operations,E
for the last operation, orW
for whole if there is only one operation. Filtering operations or the use of coordinated apply negate the usefulness of this field. -
${primarykeycolumns}
-
Use to inject a field with a list of the primary key column names.
-
${static}
-
Use to inject a field with a static value into the output. The value desired should be the argument. If the desired value is
abc
, then the syntax would be${static.abc}
or${static[FieldName].abc}
. -
${seqno}
-
Use to inject a field with the trail file sequence into the output.
-
${rba}
-
Use to inject a field with the rba of the operation into the output.
Sample Configuration:
gg.handlerlist=kafkarestproxy
#The handler properties
gg.handler.kafkarestproxy.type=kafkarestproxy
#The following selects the topic name based on the fully qualified table name
gg.handler.kafkarestproxy.topicMappingTemplate=${fullyQualifiedTableName}
#The following selects the message key using the concatenated primary keys
gg.handler.kafkarestproxy.keyMappingTemplate=${primaryKeys}
gg.handler.kafkarestproxy.postDataUrl=http://localhost:8083
gg.handler.kafkarestproxy.apiVersion=v1
gg.handler.kafkarestproxy.format=json
gg.handler.kafkarestproxy.payloadsize=1
gg.handler.kafkarestproxy.mode=tx
#Server auth properties
#gg.handler.kafkarestproxy.trustStore=/keys/truststore.jks
#gg.handler.kafkarestproxy.trustStorePassword=test1234
#Client auth properites
#gg.handler.kafkarestproxy.keyStore=/keys/keystore.jks
#gg.handler.kafkarestproxy.keyStorePassword=test1234
#Proxy properties
#gg.handler.kafkarestproxy.proxy=http://proxyurl:80
#gg.handler.kafkarestproxy.proxyUserName=username
#gg.handler.kafkarestproxy.proxyPassword=password
#The MetaColumnTemplate formatter properties
gg.handler.kafkarestproxy.format.metaColumnsTemplate=${optype},${timestampmicro},${currenttimestampmicro}
Parent topic: Using the Delimited Text Row Formatter
26.2.1.8 Additional Considerations
Exercise care when you choose field and line delimiters. It is important to choose delimiter values that will not occur in the content of the data.
The Java Adapter configuration trims leading and trailing characters from configuration values when they are determined to be whitespace. However, you may want to choose field delimiters, line delimiters, null value representations, and missing value representations that include or are fully considered to be whitespace . In these cases, you must employ specialized syntax in the Java Adapter configuration file to preserve the whitespace. To preserve the whitespace, when your configuration values contain leading or trailing characters that are considered whitespace, wrap the configuration value in a CDATA[]
wrapper. For example, a configuration value of \n
should be configured as CDATA[\n]
.
You can use regular expressions to search column values then replace matches with a specified value. You can use this search and replace functionality together with the Delimited Text Formatter to ensure that there are no collisions between column value contents and field and line delimiters. For more information, see Using Regular Expression Search and Replace.
Big Data applications sore data differently from RDBMSs. Update and delete operations in an RDBMS result in a change to the existing data. However, in Big Data applications, data is appended instead of changed. Therefore, the current state of a given row consolidates all of the existing operations for that row in the HDFS system. This leads to some special scenarios as described in the following sections.
26.2.1.8.1 Primary Key Updates
In Big Data integrations, primary key update operations require special consideration and planning. Primary key updates modify one or more of the primary keys for the given row from the source database. Because data is appended in Big Data applications, a primary key update operation looks more like an insert than an update without any special handling. You can configure how the Delimited Text formatter handles primary key updates. These are the configurable behaviors:
Table 26-9 Configurable Behavior
Value | Description |
---|---|
|
By default the delimited text formatter terminates in the case of a primary key update. |
|
The primary key update is treated like any other update operation. Use this configuration alternative only if you can guarantee that the primary key is not used as selection criteria to select row data from a Big Data system. |
|
The primary key update is treated as a special case of a delete, using the before-image data and an insert using the after-image data. This configuration may more accurately model the effect of a primary key update in a Big Data application. However, if this configuration is selected it is important to have full supplemental logging enabled on replication at the source database. Without full supplemental logging, the delete operation will be correct, but the insert operation will not contain all of the data for all of the columns for a full representation of the row data in the Big Data application. |
Parent topic: Additional Considerations
26.2.1.8.2 Data Consolidation
Big Data applications append data to the underlying storage. Analytic tools generally spawn MapReduce programs that traverse the data files and consolidate all the operations for a given row into a single output. Therefore, it is important to specify the order of operations. The Delimited Text formatter provides a number of metadata fields to do this. The operation timestamp may be sufficient to fulfill this requirement. Alternatively, the current timestamp may be the best indicator of the order of operations. In this situation, the trail position can provide a tie-breaking field on the operation timestamp. Lastly, the current timestamp may provide the best indicator of order of operations in Big Data.
Parent topic: Additional Considerations
26.2.2 Delimited Text Operation Formatter
The Delimited Text Operation Formatter is new functionality in the Oracle GoldenGate for Big Data 19.1.0.0.0 release. It outputs both before and after change data for insert, update and delete operations.
- Message Formatting Details
- Sample Formatted Messages
- Output Format Summary Log
- Delimited Text Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
Oracle GoldenGate for Big Data now handles metadata change events at runtime. This assumes that the replicated database and upstream replication processes are propagating metadata change events. The Delimited Text Formatter changes the output format to accommodate the change and the Delimited Text Formatter continue running. - Additional Considerations
Exercise care when you choose field and line delimiters. It is important to choose delimiter values that do not occur in the content of the data.
Parent topic: Using the Delimited Text Formatter
26.2.2.1 Message Formatting Details
The default output format uses a semicolon as the delimiter and resembles the following:
First is the row metadata:
operation_type;fully_qualified_table_name;operation_timestamp;current_timestamp;trail_position;tokens;
Next is the row data:
column_1_before_value;column_1_after_value;column_n_before_value_then_line_delimeter;column_n_after_value_then_line_delimeter
Optionally, the column name may be included before each column value that changes the output format for the row data:
column_1_name;column_1_before_value;column_1_after_value;column_n_name;column_n_before_value_then_line_delimeter;column_n_after_value_then_line_delimeter
Formatting details:
-
Operation Type :Indicates the type of database operation from the source trail file. Default values are
I
for insert,U
for update,D
for delete,T
for truncate. Output of this field is suppressible. -
Fully Qualified Table Name: The fully qualified table name is the source database table including the catalog name, and the schema name. The format of the fully qualified table name is catalog_name.schema_name.table_name. The output of this field is suppressible.
-
Operation Timestamp : The commit record timestamp from the source system. All operations in a transaction (unbatched transaction) will have the same operation timestamp. This timestamp is fixed, and the operation timestamp is the same if the trail file is replayed. The output of this field is suppressible.
-
Current Timestamp : The timestamp of the current time when the delimited text formatter processes the current operation record. This timestamp follows the ISO-8601 format and includes microsecond precision. Replaying the trail file does not result in the same timestamp for the same operation. The output of this field is suppressible.
-
Trail Position :The concatenated sequence number and RBA number from the source trail file. The trail position lets you trace the operation back to the source trail file. The sequence number is the source trail file number. The RBA number is the offset in the trail file. The output of this field is suppressible.
-
Tokens : The token key value pairs from the source trail file. The output of this field in the delimited text output is suppressed unless the
includeTokens
configuration property on the corresponding handler is explicitly set totrue
.
Parent topic: Delimited Text Operation Formatter
26.2.2.2 Sample Formatted Messages
The following sections contain sample messages from the Delimited Text Formatter. The default field delimiter has been changed to a pipe character, |
, to more clearly display the message.
Parent topic: Delimited Text Operation Formatter
26.2.2.2.1 Sample Insert Message
I|GG.TCUSTMER|2015-11-05
18:45:36.000000|2019-04-17T04:49:00.156000|00000000000000001956|R=AAKifQAAKAAAFDHAAA,t=,L=7824137832,6=2.3.228025||WILL||BG
SOFTWARE CO.||SEATTLE||WA
Parent topic: Sample Formatted Messages
26.2.2.2.2 Sample Update Message
U|QASOURCE.TCUSTMER|2015-11-05 18:45:39.000000|2019-07-16T11:54:06.008002|00000000000000005100|R=AAKifQAAKAAAFDHAAE|ANN|ANN|ANN'S BOATS||SEATTLE|NEW YORK|WA|NY
Parent topic: Sample Formatted Messages
26.2.2.2.3 Sample Delete Message
D|QASOURCE.TCUSTORD|2015-11-05
18:45:39.000000|2019-07-16T11:54:06.009000|00000000000000005272|L=7824137921,R=AAKifSAAKAAAMZHAAE,6=9.9.479055|DAVE||1993-11-03
07:51:35||PLANE||600||135000.00||2||200|
Parent topic: Sample Formatted Messages
26.2.2.2.4 Sample Truncate Message
T|QASOURCE.TCUSTMER|2015-11-05
18:45:39.000000|2019-07-16T11:54:06.004002|00000000000000003600|R=AAKifQAAKAAAFDHAAE||||||||
Parent topic: Sample Formatted Messages
26.2.2.3 Output Format Summary Log
If INFO
level logging is enabled, the Java log4j logging logs a
summary of the delimited text output format . A
summary of the delimited fields is logged for each
source table encountered and occurs when the first
operation for that table is received by the
Delimited Text formatter. This detailed
explanation of the fields of the delimited text
output may be useful when you perform an initial
setup. When a metadata change event occurs, the
summary of the delimited fields is regenerated and
logged again at the first subsequent operation for
that table.
Parent topic: Delimited Text Operation Formatter
26.2.2.4 Delimited Text Formatter Configuration Properties
Table 26-10 Delimited Text Formatter Configuration Properties
Properties | Optional / Required | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format |
Required |
delimitedtext_op |
None |
Selects the Delimited Text Operation Formatter as the formatter. |
gg.handler.name.format.includeColumnNames |
Optional |
|
false |
Controls the output of
writing the column names as a delimited field preceding the column
value. When COL1_Name|COL1_Before_Value|COL1_After_Value|COL2_Name|COL2_Before_Value|COL2_After_Value When
|
gg.handler.name.format.includeOpTimestamp |
Optional |
|
|
: A |
|
Optional |
|
|
A |
|
Optional |
|
|
A |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any encoding name or alias supported by Java. |
The native system encoding of the machine hosting the Oracle GoldenGate process. |
Determines the encoding of the output delimited text. |
|
Optional |
Any String |
|
The delimiter used between delimited fields. This value
supports |
|
Optional |
Any String |
Newline (the default Hive delimiter) |
The delimiter used between records. This value supports
|
gg.handler.name.format.disableEscaping |
Optional | true|false |
false |
Set to true to disable the
escaping of characters which conflict with the configured delimiters.
Must be set to true if
gg.handler.name.format.fieldDelimiter is set to a
value of multiple characters.
|
|
Optional |
|
|
Use |
|
Optional |
Any string |
|
Specifies a delimiter between keys and values in a map.
Key1=value1. Tokens are mapped values. Configuration value supports
|
|
Optional |
Any string |
|
Specifies a delimiter between key value pairs in a map.
|
|
Optional |
Any string |
NULL |
Specifies what is included in the delimited output in the
case of a NULL value. Configuration value supports
|
|
Optional |
Any string |
|
Specifies what is included in the delimited text output
in the case of a missing value. Configuration value supports
|
|
Optional |
|
|
When true, suppresses the output of the operation position from the source trail file. |
|
Optional |
|
|
Controls the format of the current timestamp. The
default is the ISO 8601 format. When |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
gg.handler.name.format.includeGroupCols |
Optional | true | false |
false |
If set to true , the columns are grouped
into sets of all names, all before values, and all after values
U,QASOURCE.TCUSTMER,2015-11-05 18:45:39.000000,2019-04-17T05:19:30.556000,00000000000000005100,R=AAKifQAAKAAAFDHAAE,CUST_CODE,NAME,CITY,STATE,ANN,ANN'S BOATS,SEATTLE,WA,ANN,,NEW YORK,NY |
Parent topic: Delimited Text Operation Formatter
26.2.2.5 Review a Sample Configuration
The following is a sample configuration for the Delimited Text formatter in the Java Adapter configuration file:
gg.handler.name.format.includeColumnNames=false gg.handler.name.format.includeOpTimestamp=true gg.handler.name.format.includeCurrentTimestamp=true gg.handler.name.format.insertOpKey=I gg.handler.name.format.updateOpKey=U gg.handler.name.format.deleteOpKey=D gg.handler.name.format.truncateOpKey=T gg.handler.name.format.encoding=UTF-8 gg.handler.name.format.fieldDelimiter=CDATA[\u0001] gg.handler.name.format.lineDelimiter=CDATA[\n] gg.handler.name.format.includeTableName=true gg.handler.name.format.keyValueDelimiter=CDATA[=] gg.handler.name.format.kevValuePairDelimiter=CDATA[,] gg.handler.name.format.nullValueRepresentation=NULL gg.handler.name.format.missingValueRepresentation=CDATA[] gg.handler.name.format.includePosition=true gg.handler.name.format.includeGroupCols=false gg.handler.name.format=delimitedtext_op
Parent topic: Delimited Text Operation Formatter
26.2.2.6 Metadata Change Events
Oracle GoldenGate for Big Data now handles metadata change events at runtime. This assumes that the replicated database and upstream replication processes are propagating metadata change events. The Delimited Text Formatter changes the output format to accommodate the change and the Delimited Text Formatter continue running.
Note:
A metadata change may affect downstream applications. Delimited text formats include a fixed number of fields that are positionally relevant. Deleting a column in the source table can be handled seamlessly during Oracle GoldenGate runtime, but results in a change in the total number of fields, and potentially changes the positional relevance of some fields. Adding an additional column or columns is probably the least impactful metadata change event, assuming that the new column is added to the end. Consider the impact of a metadata change event before executing the event. When metadata change events are frequent, Oracle recommends that you consider a more flexible and self-describing format, such as JSON or XML.
Parent topic: Delimited Text Operation Formatter
26.2.2.7 Additional Considerations
Exercise care when you choose field and line delimiters. It is important to choose delimiter values that do not occur in the content of the data.
The Java Adapter configuration trims leading and trailing characters from
configuration values when they are determined to be whitespace. However, you may
want to choose field delimiters, line delimiters, null value representations, and
missing value representations that include or are fully considered to be whitespace
. In these cases, you must employ specialized syntax in the Java Adapter
configuration file to preserve the whitespace. To preserve the whitespace, when your
configuration values contain leading or trailing characters that are considered
whitespace, wrap the configuration value in a CDATA[]
wrapper. For
example, a configuration value of \n
should be configured as
CDATA[\n]
.
You can use regular expressions to search column values then replace matches with a specified value. You can use this search and replace functionality together with the Delimited Text Formatter to ensure that there are no collisions between column value contents and field and line delimiters. For more information, see Using Regular Expression Search and Replace.
Big Data applications sore data differently from RDBMSs. Update and delete operations in an RDBMS result in a change to the existing data. However, in Big Data applications, data is appended instead of changed. Therefore, the current state of a given row consolidates all of the existing operations for that row in the HDFS system. This leads to some special scenarios as described in the following sections.
Parent topic: Delimited Text Operation Formatter
26.3 Using the JSON Formatter
The JavaScript Object Notation (JSON) formatter can output operations from the source trail file in either row-based format or operation-based format. It formats operation data from the source trail file into a JSON objects. Each insert, update, delete, and truncate operation is formatted into an individual JSON message.
- Operation Metadata Formatting Details
- Operation Data Formatting Details
- Row Data Formatting Details
- Sample JSON Messages
- JSON Schemas
- JSON Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Setting Metacolumn Output
- JSON Primary Key Updates
- Integrating Oracle Stream Analytics
Parent topic: Using the Pluggable Formatters
26.3.1 Operation Metadata Formatting Details
To output the metacolumns configure the following:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]}
To also include the primary key columns and the tokens configure as follows:
gg.handler.name.format.metaColumnsTemplate=${objectname[table]},${optype[op_type]},${timestamp[op_ts]},${currenttimestamp[current_ts]},${position[pos]},${primarykeycolumns[primary_keys]},${alltokens[tokens]}
For more information see the configuration property:
gg.handler.name.format.metaColumnsTemplate
.
Parent topic: Using the JSON Formatter
26.3.2 Operation Data Formatting Details
JSON messages begin with the operation metadata fields, which are followed by the operation data fields. This data is represented by before
and after
members that are objects. These objects contain members whose keys are the column names and whose values are the column values.
Operation data is modeled as follows:
-
Inserts: Includes the after-image data.
-
Updates: Includes both the before-image and the after-image data.
-
Deletes: Includes the before-image data.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. The JSON Formatter maps these column value states into the created JSON objects as follows:
-
The column has a value: The column value is output. In the following example, the member
STATE
has a value."after":{ "CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":"CO" }
-
The column value is null: The default output value is a JSON NULL. In the following example, the member
STATE
is null."after":{ "CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":null }
-
The column value is missing: The JSON contains no element for a missing column value. In the following example, the member
STATE
is missing."after":{ "CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", }
The default setting of the JSON Formatter is to map the data types from the source trail file to the associated JSON data type. JSON supports few data types, so this functionality usually results in the mapping of numeric fields from the source trail file to members typed as numbers. This data type mapping can be configured treat all data as strings.
Parent topic: Using the JSON Formatter
26.3.3 Row Data Formatting Details
JSON messages begin with the operation metadata fields, which are followed by the operation data fields. For row data formatting, this are the source column names and source column values as JSON key value pairs. This data is represented by before
and after
members that are objects. These objects contain members whose keys are the column names and whose values are the column values.
Row data is modeled as follows:
-
Inserts: Includes the after-image data.
-
Updates: Includes the after-image data.
-
Deletes: Includes the before-image data.
Column values for an operation from the source trail file can have one of three states: the column has a value, the column value is null, or the column value is missing. The JSON Formatter maps these column value states into the created JSON objects as follows:
-
The column has a value: The column value is output. In the following example, the member
STATE
has a value."CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":"CO" }
-
The column value is null :The default output value is a JSON NULL. In the following example, the member
STATE
is null."CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", "STATE":null }
-
The column value is missing: The JSON contains no element for a missing column value. In the following example, the member
STATE
is missing."CUST_CODE":"BILL", "NAME":"BILL'S USED CARS", "CITY":"DENVER", }
The default setting of the JSON Formatter is to map the data types from the source trail file to the associated JSON data type. JSON supports few data types, so this functionality usually results in the mapping of numeric fields from the source trail file to members typed as numbers. This data type mapping can be configured to treat all data as strings.
Parent topic: Using the JSON Formatter
26.3.4 Sample JSON Messages
The following topics are sample JSON messages created by the JSON Formatter for insert, update, delete, and truncate operations.
- Sample Operation Modeled JSON Messages
- Sample Flattened Operation Modeled JSON Messages
- Sample Row Modeled JSON Messages
- Sample Primary Key Output JSON Message
Parent topic: Using the JSON Formatter
26.3.4.1 Sample Operation Modeled JSON Messages
Insert
{
"table":"QASOURCE.TCUSTORD",
"op_type":"I",
"op_ts":"2015-11-05 18:45:36.000000",
"current_ts":"2016-10-05T10:15:51.267000",
"pos":"00000000000000002928",
"after":{
"CUST_CODE":"WILL",
"ORDER_DATE":"1994-09-30:15:33:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":144,
"PRODUCT_PRICE":17520.00,
"PRODUCT_AMOUNT":3,
"TRANSACTION_ID":100
}
}
Update
{
"table":"QASOURCE.TCUSTORD",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:15:51.310002",
"pos":"00000000000000004300",
"before":{
"CUST_CODE":"BILL",
"ORDER_DATE":"1995-12-31:15:00:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":765,
"PRODUCT_PRICE":15000.00,
"PRODUCT_AMOUNT":3,
"TRANSACTION_ID":100
},
"after":{
"CUST_CODE":"BILL",
"ORDER_DATE":"1995-12-31:15:00:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":765,
"PRODUCT_PRICE":14000.00
}
}
Delete
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:15:51.312000",
"pos":"00000000000000005272",
"before":{
"CUST_CODE":"DAVE",
"ORDER_DATE":"1993-11-03:07:51:35",
"PRODUCT_CODE":"PLANE",
"ORDER_ID":600,
"PRODUCT_PRICE":135000.00,
"PRODUCT_AMOUNT":2,
"TRANSACTION_ID":200
}
}
Truncate
{
"table":"QASOURCE.TCUSTORD",
"op_type":"T",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:15:51.312001",
"pos":"00000000000000005480",
}
Parent topic: Sample JSON Messages
26.3.4.2 Sample Flattened Operation Modeled JSON Messages
Insert
{
"table":"QASOURCE.TCUSTORD",
"op_type":"I",
"op_ts":"2015-11-05 18:45:36.000000",
"current_ts":"2016-10-05T10:34:47.956000",
"pos":"00000000000000002928",
"after.CUST_CODE":"WILL",
"after.ORDER_DATE":"1994-09-30:15:33:00",
"after.PRODUCT_CODE":"CAR",
"after.ORDER_ID":144,
"after.PRODUCT_PRICE":17520.00,
"after.PRODUCT_AMOUNT":3,
"after.TRANSACTION_ID":100
}
Update
{
"table":"QASOURCE.TCUSTORD",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:34:48.192000",
"pos":"00000000000000004300",
"before.CUST_CODE":"BILL",
"before.ORDER_DATE":"1995-12-31:15:00:00",
"before.PRODUCT_CODE":"CAR",
"before.ORDER_ID":765,
"before.PRODUCT_PRICE":15000.00,
"before.PRODUCT_AMOUNT":3,
"before.TRANSACTION_ID":100,
"after.CUST_CODE":"BILL",
"after.ORDER_DATE":"1995-12-31:15:00:00",
"after.PRODUCT_CODE":"CAR",
"after.ORDER_ID":765,
"after.PRODUCT_PRICE":14000.00
}
Delete
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:34:48.193000",
"pos":"00000000000000005272",
"before.CUST_CODE":"DAVE",
"before.ORDER_DATE":"1993-11-03:07:51:35",
"before.PRODUCT_CODE":"PLANE",
"before.ORDER_ID":600,
"before.PRODUCT_PRICE":135000.00,
"before.PRODUCT_AMOUNT":2,
"before.TRANSACTION_ID":200
}
Truncate
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T10:34:48.193001",
"pos":"00000000000000005480",
"before.CUST_CODE":"JANE",
"before.ORDER_DATE":"1995-11-11:13:52:00",
"before.PRODUCT_CODE":"PLANE",
"before.ORDER_ID":256,
"before.PRODUCT_PRICE":133300.00,
"before.PRODUCT_AMOUNT":1,
"before.TRANSACTION_ID":100
}
Parent topic: Sample JSON Messages
26.3.4.3 Sample Row Modeled JSON Messages
Insert
{
"table":"QASOURCE.TCUSTORD",
"op_type":"I",
"op_ts":"2015-11-05 18:45:36.000000",
"current_ts":"2016-10-05T11:10:42.294000",
"pos":"00000000000000002928",
"CUST_CODE":"WILL",
"ORDER_DATE":"1994-09-30:15:33:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":144,
"PRODUCT_PRICE":17520.00,
"PRODUCT_AMOUNT":3,
"TRANSACTION_ID":100
}
Update
{
"table":"QASOURCE.TCUSTORD",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T11:10:42.350005",
"pos":"00000000000000004300",
"CUST_CODE":"BILL",
"ORDER_DATE":"1995-12-31:15:00:00",
"PRODUCT_CODE":"CAR",
"ORDER_ID":765,
"PRODUCT_PRICE":14000.00
}
Delete
{
"table":"QASOURCE.TCUSTORD",
"op_type":"D",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T11:10:42.351002",
"pos":"00000000000000005272",
"CUST_CODE":"DAVE",
"ORDER_DATE":"1993-11-03:07:51:35",
"PRODUCT_CODE":"PLANE",
"ORDER_ID":600,
"PRODUCT_PRICE":135000.00,
"PRODUCT_AMOUNT":2,
"TRANSACTION_ID":200
}
Truncate
{
"table":"QASOURCE.TCUSTORD",
"op_type":"T",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-10-05T11:10:42.351003",
"pos":"00000000000000005480",
}
Parent topic: Sample JSON Messages
26.3.4.4 Sample Primary Key Output JSON Message
{ "table":"DDL_OGGSRC.TCUSTMER", "op_type":"I", "op_ts":"2015-10-26 03:00:06.000000", "current_ts":"2016-04-05T08:59:23.001000", "pos":"00000000000000006605", "primary_keys":[ "CUST_CODE" ], "after":{ "CUST_CODE":"WILL", "NAME":"BG SOFTWARE CO.", "CITY":"SEATTLE", "STATE":"WA" } }
Parent topic: Sample JSON Messages
26.3.5 JSON Schemas
By default, JSON schemas are generated for each source table encountered. JSON
schemas are generated on a just in time basis when an operation for that table is
first encountered. Newer schemas are generated when there is a change in the
metadata. A JSON schema is not required to parse a JSON object. However, many JSON
parsers can use a JSON schema to perform a validating parse of a JSON object.
Alternatively, you can review the JSON schemas to understand the layout of output
JSON objects. By default, the JSON schemas are created in the
GoldenGate_Home
/dirdef
directory
and are named by the following convention:
FULLY_QUALIFIED_TABLE_NAME
.schema.json
The generation of the JSON schemas is suppressible.
- The following JSON schema example is for the JSON object listed in Sample Operation Modeled JSON Messages.
{ "$schema":"http://json-schema.org/draft-04/schema#", "title":"QASOURCE.TCUSTORD", "description":"JSON schema for table QASOURCE.TCUSTORD", "definitions":{ "row":{ "type":"object", "properties":{ "CUST_CODE":{ "type":[ "string", "null" ] }, "ORDER_DATE":{ "type":[ "string", "null" ] }, "PRODUCT_CODE":{ "type":[ "string", "null" ] }, "ORDER_ID":{ "type":[ "number", "null" ] }, "PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "TRANSACTION_ID":{ "type":[ "number", "null" ] } }, "additionalProperties":false }, "tokens":{ "type":"object", "description":"Token keys and values are free form key value pairs.", "properties":{ }, "additionalProperties":true } }, "type":"object", "properties":{ "table":{ "description":"The fully qualified table name", "type":"string" }, "op_type":{ "description":"The operation type", "type":"string" }, "op_ts":{ "description":"The operation timestamp", "type":"string" }, "current_ts":{ "description":"The current processing timestamp", "type":"string" }, "pos":{ "description":"The position of the operation in the data source", "type":"string" }, "primary_keys":{ "description":"Array of the primary key column names.", "type":"array", "items":{ "type":"string" }, "minItems":0, "uniqueItems":true }, "tokens":{ "$ref":"#/definitions/tokens" }, "before":{ "$ref":"#/definitions/row" }, "after":{ "$ref":"#/definitions/row" } }, "required":[ "table", "op_type", "op_ts", "current_ts", "pos" ], "additionalProperties":false }
- The following JSON schema example is for the JSON object listed in Sample Flattened Operation Modeled JSON Messages.
{ "$schema":"http://json-schema.org/draft-04/schema#", "title":"QASOURCE.TCUSTORD", "description":"JSON schema for table QASOURCE.TCUSTORD", "definitions":{ "tokens":{ "type":"object", "description":"Token keys and values are free form key value pairs.", "properties":{ }, "additionalProperties":true } }, "type":"object", "properties":{ "table":{ "description":"The fully qualified table name", "type":"string" }, "op_type":{ "description":"The operation type", "type":"string" }, "op_ts":{ "description":"The operation timestamp", "type":"string" }, "current_ts":{ "description":"The current processing timestamp", "type":"string" }, "pos":{ "description":"The position of the operation in the data source", "type":"string" }, "primary_keys":{ "description":"Array of the primary key column names.", "type":"array", "items":{ "type":"string" }, "minItems":0, "uniqueItems":true }, "tokens":{ "$ref":"#/definitions/tokens" }, "before.CUST_CODE":{ "type":[ "string", "null" ] }, "before.ORDER_DATE":{ "type":[ "string", "null" ] }, "before.PRODUCT_CODE":{ "type":[ "string", "null" ] }, "before.ORDER_ID":{ "type":[ "number", "null" ] }, "before.PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "before.PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "before.TRANSACTION_ID":{ "type":[ "number", "null" ] }, "after.CUST_CODE":{ "type":[ "string", "null" ] }, "after.ORDER_DATE":{ "type":[ "string", "null" ] }, "after.PRODUCT_CODE":{ "type":[ "string", "null" ] }, "after.ORDER_ID":{ "type":[ "number", "null" ] }, "after.PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "after.PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "after.TRANSACTION_ID":{ "type":[ "number", "null" ] } }, "required":[ "table", "op_type", "op_ts", "current_ts", "pos" ], "additionalProperties":false }
- The following JSON schema example is for the JSON object listed in Sample Row Modeled JSON Messages.
{ "$schema":"http://json-schema.org/draft-04/schema#", "title":"QASOURCE.TCUSTORD", "description":"JSON schema for table QASOURCE.TCUSTORD", "definitions":{ "tokens":{ "type":"object", "description":"Token keys and values are free form key value pairs.", "properties":{ }, "additionalProperties":true } }, "type":"object", "properties":{ "table":{ "description":"The fully qualified table name", "type":"string" }, "op_type":{ "description":"The operation type", "type":"string" }, "op_ts":{ "description":"The operation timestamp", "type":"string" }, "current_ts":{ "description":"The current processing timestamp", "type":"string" }, "pos":{ "description":"The position of the operation in the data source", "type":"string" }, "primary_keys":{ "description":"Array of the primary key column names.", "type":"array", "items":{ "type":"string" }, "minItems":0, "uniqueItems":true }, "tokens":{ "$ref":"#/definitions/tokens" }, "CUST_CODE":{ "type":[ "string", "null" ] }, "ORDER_DATE":{ "type":[ "string", "null" ] }, "PRODUCT_CODE":{ "type":[ "string", "null" ] }, "ORDER_ID":{ "type":[ "number", "null" ] }, "PRODUCT_PRICE":{ "type":[ "number", "null" ] }, "PRODUCT_AMOUNT":{ "type":[ "integer", "null" ] }, "TRANSACTION_ID":{ "type":[ "number", "null" ] } }, "required":[ "table", "op_type", "op_ts", "current_ts", "pos" ], "additionalProperties":false }
Parent topic: Using the JSON Formatter
26.3.6 JSON Formatter Configuration Properties
Table 26-11 JSON Formatter Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
|
None |
Controls whether the generated JSON output messages are operation modeled or row modeled. Set to |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
|
|
Controls the output format of the JSON data. True formats the data with white space for easy reading. False generates more compact output that is difficult to read.. |
|
Optional |
Any string |
|
Inserts a delimiter between generated JSONs so that they can be more easily parsed in a continuous stream of data. Configuration value supports |
|
Optional |
|
|
Controls the generation of JSON schemas for the generated JSON documents. JSON schemas are generated on a table-by-table basis. A JSON schema is not required to parse a JSON document. However, a JSON schemahelp indicate what the JSON documents look like and can be used for a validating JSON parse. |
|
Optional |
Any legal, existing file system path |
|
Controls the output location of generated JSON schemas. |
|
Optional |
|
|
Controls the output typing of generated JSON documents. When false, the formatter attempts to map Oracle GoldenGate types to the corresponding JSON type. When true, all data is treated as strings in the generated JSONs and JSON schemas. |
|
Optional |
Any legal encoding name or alias supported by Java. |
|
Controls the output encoding of generated JSON schemas and documents. |
|
Optional |
|
|
Controls the version of created schemas. Schema versioning creates a schema with a
timestamp in the schema directory on the local file system every
time a new schema is created. |
|
Optional |
|
|
Controls the format of the current timestamp. The default is the ISO 8601 format. A
setting of |
|
Optional |
|
|
Controls sending flattened JSON formatted data to the target entity. Must be set to This property is applicable only to Operation Formatted JSON ( |
|
Optional |
Any legal character or character string for a JSON field name. |
|
Controls the delimiter for concatenated JSON element names. This property supports |
|
Optional |
Any legal character or character string for a JSON field name. |
Any legal JSON attribute name. |
Allows you to set whether the JSON element-before, that contains the change column values, can be renamed. This property is only applicable to Operation Formatted JSON ( |
|
Optional |
Any legal character or character string for a JSON field name. |
Any legal JSON attribute name. |
Allows you to set whether the JSON element, that contains the after-change column values, can be renamed. This property is only applicable to Operation Formatted JSON ( |
|
Optional |
|
|
Specifies how the formatter handles update operations that change a primary key. Primary key operations can be problematic for the JSON formatter and you need to speacially consider it. You can only use this property in conjunction with the row modeled JSON output messages. This property is only applicable to Row Formatted JSON (
|
gg.handler.name.format.omitNullValues |
Optional |
|
|
Set to |
Parent topic: Using the JSON Formatter
26.3.7 Review a Sample Configuration
The following is a sample configuration for the JSON Formatter in the Java Adapter configuration file:
gg.handler.hdfs.format=json gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.prettyPrint=false gg.handler.hdfs.format.jsonDelimiter=CDATA[] gg.handler.hdfs.format.generateSchema=true gg.handler.hdfs.format.schemaDirectory=dirdef gg.handler.hdfs.format.treatAllColumnsAsStrings=false
Parent topic: Using the JSON Formatter
26.3.8 Metadata Change Events
Metadata change events are handled at runtime. When metadata is changed in a table, the JSON schema is regenerated the next time an operation for the table is encountered. The content of created JSON messages changes to reflect the metadata change. For example, if an additional column is added, the new column is included in created JSON messages after the metadata change event.
Parent topic: Using the JSON Formatter
26.3.9 Setting Metacolumn Output
The following are the configurable values for the JSON formatter metacolumns template property that controls metacolumn output:
Table 26-12 Metacolumns Template Property
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format.metaColumnsTemplate |
Optional |
|
|
None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey | truncateOpKey | includeTableName | includeOpTimestamp | includeOpType | includePosition | includeCurrentTimestamp, useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. |
${optype}, ${token.ROWID}, ${sys.username}, ${currenttimestamp}
Explanation of the Metacolumn Keywords
The metacolumns functionality allows you to select the metadata fields that you want to see in the generated output messages. The format of the metacolumn syntax is:
-
${keyword[fieldName].argument}
-
The keyword is fixed based on the metacolumn syntax. Optionally, you can provide a field name between the square brackets. If a field name is not provided, then the default field name is used.
The argument is required to resolve the metacolumn value.
-
${alltokens}
-
All of the Oracle GoldenGate tokens.
-
${token}
-
The value of a specific Oracle GoldenGate token. The token key should follow token key should follow the token using the period (
.
) operator. For example:${token.MYTOKEN}
-
${token.MYTOKEN}
-
${sys}
A system environmental variable. The variable name should follow sys using the period (.) operator.
-
${sys.MYVAR}
-
${sys.MYVAR}
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.) operator. -
${env}
-
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.
) operator. For example:${env.someVariable}
-
${javaprop}
-
A Java JVM variable. The variable name should follow
javaprop
using the period (.
) operator. For example:${javaprop.MYVAR}
-
${optype}
-
Operation type.
-
${position}
-
Record position.
-
${timestamp}
-
Record timestamp.
-
${catalog}
-
Catalog name.
-
${schema}
-
Schema name.
-
${table}
-
Table name.
-
${objectname}
-
The fully qualified table name.
-
${csn}
-
Source Commit Sequence Number.
-
${xid}
-
Source transaction ID.
-
${currenttimestamp}
-
Current timestamp.
-
${currenttimestampiso8601}
-
Current timestamp in ISO 8601 format.
-
${opseqno}
-
Record sequence number within the transaction.
-
${timestampmicro}
-
Record timestamp in microseconds after epoch.
-
${currenttimestampmicro}
-
Current timestamp in microseconds after epoch.
-
${txind}
-
The is the transactional indicator from the source trail file. The values of a transaction are
B
for the first operation,M
for the middle operations,E
for the last operation, orW
for whole if there is only one operation. Filtering operations or the use of coordinated apply negate the usefulness of this field. -
${primarykeycolumns}
-
Use to inject a field with a list of the primary key column names.
-
${static}
-
Use to inject a field with a static value into the output. The value desired should be the argument. If the desired value is
abc
, then the syntax would be${static.abc}
or${static[FieldName].abc}
. -
${seqno}
-
Use to inject a field with the trail file sequence into the output.
-
${rba}
-
Use to inject a field with the rba of the operation into the output.
Sample Configuration:
gg.handlerlist=kafkarestproxy
#The handler properties
gg.handler.kafkarestproxy.type=kafkarestproxy
#The following selects the topic name based on the fully qualified table name
gg.handler.kafkarestproxy.topicMappingTemplate=${fullyQualifiedTableName}
#The following selects the message key using the concatenated primary keys
gg.handler.kafkarestproxy.keyMappingTemplate=${primaryKeys}
gg.handler.kafkarestproxy.postDataUrl=http://localhost:8083
gg.handler.kafkarestproxy.apiVersion=v1
gg.handler.kafkarestproxy.format=json
gg.handler.kafkarestproxy.payloadsize=1
gg.handler.kafkarestproxy.mode=tx
#Server auth properties
#gg.handler.kafkarestproxy.trustStore=/keys/truststore.jks
#gg.handler.kafkarestproxy.trustStorePassword=test1234
#Client auth properites
#gg.handler.kafkarestproxy.keyStore=/keys/keystore.jks
#gg.handler.kafkarestproxy.keyStorePassword=test1234
#Proxy properties
#gg.handler.kafkarestproxy.proxy=http://proxyurl:80
#gg.handler.kafkarestproxy.proxyUserName=username
#gg.handler.kafkarestproxy.proxyPassword=password
#The MetaColumnTemplate formatter properties
gg.handler.kafkarestproxy.format.metaColumnsTemplate=${optype},${timestampmicro},${currenttimestampmicro}
Parent topic: Using the JSON Formatter
26.3.10 JSON Primary Key Updates
When the JSON formatter is configured to model operation data, primary key updates require no special treatment and are treated like any other update. The before and after values reflect the change in the primary key.
When the JSON formatter is configured to model row data, primary key updates must be specially handled. The default behavior is to abend. However, by using thegg.handler.name.format.pkUpdateHandling
configuration property, you can configure the JSON formatter to model row data to treat primary key updates as either a regular update or as delete and then insert operations. When you configure the formatter to handle primary key updates as delete and insert operations, Oracle recommends that you configure your replication stream to contain the complete before-image and after-image data for updates. Otherwise, the generated insert operation for a primary key update will be missing data for fields that did not change.
Parent topic: Using the JSON Formatter
26.3.11 Integrating Oracle Stream Analytics
You can integrate Oracle GoldenGate for Big Data with Oracle Stream Analytics (OSA) by sending operation-modeled JSON messages to the Kafka Handler. This works only when the JSON formatter is configured to output operation-modeled JSON messages.
Because OSA requires flattened JSON objects, a new feature in the JSON formatter generates flattened JSONs. To use this feature, set the gg.handler.name.format.flatten=false
to true
. (The default setting is false). The following is an example of a flattened JSON file:
{
"table":"QASOURCE.TCUSTMER",
"op_type":"U",
"op_ts":"2015-11-05 18:45:39.000000",
"current_ts":"2016-06-22T13:38:45.335001",
"pos":"00000000000000005100",
"before.CUST_CODE":"ANN",
"before.NAME":"ANN'S BOATS",
"before.CITY":"SEATTLE",
"before.STATE":"WA",
"after.CUST_CODE":"ANN",
"after.CITY":"NEW YORK",
"after.STATE":"NY"
}
Parent topic: Using the JSON Formatter
26.4 Using the Length Delimited Value Formatter
The Length Delimited Value (LDV) Formatter is a row-based formatter. It formats database operations from the source trail file into a length delimited value output. Each insert, update, delete, or truncate operation from the source trail is formatted into an individual length delimited message.
With the length delimited, there are no field delimiters. The fields are variable in size based on the data.
By default, the length delimited maps these column value states into the length delimited value output. Column values for an operation from the source trail file can have one of three states:
-
Column has a value —The column value is output with the prefix indicator
P
. -
Column value is NULL —The default output value is
N
. The output for the case of aNULL
column value is configurable. -
Column value is missing - The default output value is
M
. The output for the case of a missing column value is configurable.
- Formatting Message Details
- Sample Formatted Messages
- LDV Formatter Configuration Properties
- Additional Considerations
Parent topic: Using the Pluggable Formatters
26.4.1 Formatting Message Details
The default format for output of data is the following:
- First is the row Length followed by metadata:
<ROW LENGTH><PRESENT INDICATOR><FIELD LENGTH><OPERATION TYPE><PRESENT INDICATOR><FIELD LENGTH><FULLY QUALIFIED TABLE NAME><PRESENT INDICATOR><FIELD LENGTH><OPERATION TIMESTAMP><PRESENT INDICATOR><FIELD LENGTH><CURRENT TIMESTAMP><PRESENT INDICATOR><FIELD LENGTH><TRAIL POSITION><PRESENT INDICATOR><FIELD LENGTH><TOKENS>
Or
<ROW LENGTH><FIELD LENGTH><FULLY QUALIFIED TABLE NAME><FIELD LENGTH><OPERATION TIMESTAMP><FIELD LENGTH><CURRENT TIMESTAMP><FIELD LENGTH><TRAIL POSITION><FIELD LENGTH><TOKENS>
- Next is the row data:
<PRESENT INDICATOR><FIELD LENGTH><COLUMN 1 VALUE><PRESENT INDICATOR><FIELD LENGTH><COLUMN N VALUE>
Parent topic: Using the Length Delimited Value Formatter
26.4.2 Sample Formatted Messages
- Insert Message:
0133P01IP161446749136000000P161529311765024000P262015-11-05 18:45:36.000000P04WILLP191994-09-30 15:33:00P03CARP03144P0817520.00P013P03100
- Update Message
0133P01UP161446749139000000P161529311765035000P262015-11-05 18:45:39.000000P04BILLP191995-12-31 15:00:00P03CARP03765P0814000.00P013P03100
- Delete Message
0136P01DP161446749139000000P161529311765038000P262015-11-05 18:45:39.000000P04DAVEP191993-11-03 07:51:35P05PLANEP03600P09135000.00P012P03200
Parent topic: Using the Length Delimited Value Formatter
26.4.3 LDV Formatter Configuration Properties
Table 26-13 LDV Formatter Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format.binaryLengthMode |
Optional |
|
|
The output can be controlled to display the field or record length in either binary or ASCII format. If set to |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Use to configure the |
|
Optional |
Any string |
|
Use to configure what is included in the output when a column value is present. This value supports CDATA[] wrapping. |
|
Optional |
Any string |
|
Use to configure what is included in the output when a missing value is present. This value supports CDATA[] wrapping. |
|
Optional |
Any string |
|
Use to configure what is included in the output when a NULL value is present. This value supports CDATA[] wrapping. |
|
Optional |
|
None |
Use to configure the current meta column information
in a simple manner and removes the explicit need of
A comma-delimited string consisting of one or more templated values represents the template. This example produces a list of meta columns: ${optype},
${token.ROWID},${sys.username},${currenttimestamp} |
|
Optional |
|
|
Specifies how the formatter handles update operations that change a primary key. Primary key operations can be problematic for the text formatter and require special consideration by you.
|
|
Optional |
Any encoding name or alias supported by Java. |
The native system encoding of the machine hosting the Oracle GoldenGate process. |
Use to set the output encoding for character data and columns. |
${optype}, ${token.ROWID}, ${sys.username}, ${currenttimestamp}
Explanation of the Metacolumn Keywords
The metacolumns functionality allows you to select the metadata fields that you want to see in the generated output messages. The format of the metacolumn syntax is:
-
${keyword[fieldName].argument}
-
The keyword is fixed based on the metacolumn syntax. Optionally, you can provide a field name between the square brackets. If a field name is not provided, then the default field name is used.
The argument is required to resolve the metacolumn value.
-
${alltokens}
-
All of the Oracle GoldenGate tokens.
-
${token}
-
The value of a specific Oracle GoldenGate token. The token key should follow token key should follow the token using the period (
.
) operator. For example:${token.MYTOKEN}
-
${token.MYTOKEN}
-
${sys}
A system environmental variable. The variable name should follow sys using the period (.) operator.
-
${sys.MYVAR}
-
${sys.MYVAR}
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.) operator. -
${env}
-
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.
) operator. For example:${env.someVariable}
-
${javaprop}
-
A Java JVM variable. The variable name should follow
javaprop
using the period (.
) operator. For example:${javaprop.MYVAR}
-
${optype}
-
Operation type.
-
${position}
-
Record position.
-
${timestamp}
-
Record timestamp.
-
${catalog}
-
Catalog name.
-
${schema}
-
Schema name.
-
${table}
-
Table name.
-
${objectname}
-
The fully qualified table name.
-
${csn}
-
Source Commit Sequence Number.
-
${xid}
-
Source transaction ID.
-
${currenttimestamp}
-
Current timestamp.
-
${currenttimestampiso8601}
-
Current timestamp in ISO 8601 format.
-
${opseqno}
-
Record sequence number within the transaction.
-
${timestampmicro}
-
Record timestamp in microseconds after epoch.
-
${currenttimestampmicro}
-
Current timestamp in microseconds after epoch.
-
${txind}
-
The is the transactional indicator from the source trail file. The values of a transaction are
B
for the first operation,M
for the middle operations,E
for the last operation, orW
for whole if there is only one operation. Filtering operations or the use of coordinated apply negate the usefulness of this field. -
${primarykeycolumns}
-
Use to inject a field with a list of the primary key column names.
-
${static}
-
Use to inject a field with a static value into the output. The value desired should be the argument. If the desired value is
abc
, then the syntax would be${static.abc}
or${static[FieldName].abc}
. -
${seqno}
-
Use to inject a field with the trail file sequence into the output.
-
${rba}
-
Use to inject a field with the rba of the operation into the output.
Review a Sample Configuration
#The LDV Handler gg.handler.filewriter.format=binary gg.handler.filewriter.format.binaryLengthMode=false gg.handler.filewriter.format.recordLength=4 gg.handler.filewriter.format.fieldLength=2 gg.handler.filewriter.format.legacyFormat=false gg.handler.filewriter.format.presentValue=CDATA[P] gg.handler.filewriter.format.missingValue=CDATA[M] gg.handler.filewriter.format.nullValue=CDATA[N] gg.handler.filewriter.format.metaColumnsTemplate=${optype},${timestampmicro},${currenttimestampmicro},${timestamp} gg.handler.filewriter.format.pkUpdateHandling=abend
Parent topic: Using the Length Delimited Value Formatter
26.4.4 Additional Considerations
Big Data applications differ from RDBMSs in how data is stored. Update and delete operations in an RDBMS result in a change to the existing data. Data is not changed in Big Data applications, it is simply appended to existing data. The current state of a given row becomes a consolidation of all of the existing operations for that row in the HDFS system.
Primary Key Updates
Primary key update operations require special consideration and planning for Big Data integrations. Primary key updates are update operations that modify one or more of the primary keys for the given row from the source database. Since data is simply appended in Big Data applications, a primary key update operation looks more like a new insert than an update without any special handling. The Length Delimited Value Formatter provides specialized handling for primary keys that is configurable to you. These are the configurable behaviors:Table 26-14 Primary Key Update Behaviors
Value | Description |
---|---|
Abend |
The default behavior is that the length delimited value formatter will abend in the case of a primary key update. |
Update |
With this configuration the primary key update will be treated just like any other update operation. This configuration alternative should only be selected if you can guarantee that the primary key that is being changed is not being used as the selection criteria when selecting row data from a Big Data system. |
Delete-Insert |
Using this configuration the primary key update is treated as a special case of a delete using the before image data and an insert using the after image data. This configuration may more accurately model the effect of a primary key update in a Big Data application. However, if this configuration is selected it is important to have full supplemental logging enabled on replication at the source database. Without full supplemental logging, the delete operation will be correct, but the insert operation do not contain all of the data for all of the columns for a full representation of the row data in the Big Data application. |
Consolidating Data
Big Data applications simply append data to the underlying storage. Typically, analytic tools spawn map reduce programs that traverse the data files and consolidate all the operations for a given row into a single output. It is important to have an indicator of the order of operations. The Length Delimited Value Formatter provides a number of metadata fields to fulfill this need. The operation timestamp may be sufficient to fulfill this requirement. However, two update operations may have the same operation timestamp especially if they share a common transaction. The trail position can provide a tie breaking field on the operation timestamp. Lastly, the current timestamp may provide the best indicator of order of operations in Big Data.
Parent topic: Using the Length Delimited Value Formatter
26.5 Using Operation-Based versus Row-Based Formatting
The Oracle GoldenGate for Big Data formatters include operation-based and row-based formatters.
The operation-based formatters represent the individual insert, update, and delete events that occur on table data in the source database. Insert operations only provide after-change data (or images), because a new row is being added to the source database. Update operations provide both before-change and after-change data that shows how existing row data is modified. Delete operations only provide before-change data to identify the row being deleted. The operation-based formatters model the operation as it is exists in the source trail file. Operation-based formats include fields for the before-change and after-change images.
The row-based formatters model the row data as it exists after the operation data is applied. Row-based formatters contain only a single image of the data. The following sections describe what data is displayed for both the operation-based and the row-based formatters.
Parent topic: Using the Pluggable Formatters
26.5.1 Operation Formatters
The formatters that support operation-based formatting are JSON, Avro Operation, and XML. The output of operation-based formatters are as follows:
-
Insert operation: Before-image data is null. After image data is output.
-
Update operation: Both before-image and after-image data is output.
-
Delete operation: Before-image data is output. After-image data is null.
-
Truncate operation: Both before-image and after-image data is null.
Parent topic: Using Operation-Based versus Row-Based Formatting
26.5.2 Row Formatters
The formatters that support row-based formatting are Delimited Text and Avro Row. Row-based formatters output the following information for the following operations:
-
Insert operation: After-image data only.
-
Update operation: After-image data only. Primary key updates are a special case which will be discussed in individual sections for the specific formatters.
-
Delete operation: Before-image data only.
-
Truncate operation: The table name is provided, but both before-image and after-image data are null. Truncate table is a DDL operation, and it may not support different database implementations. Refer to the Oracle GoldenGate documentation for your database implementation.
Parent topic: Using Operation-Based versus Row-Based Formatting
26.5.3 Table Row or Column Value States
In an RDBMS, table data for a specific row and column can only have one of two states: either the data has a value, or it is null. However; when data is transferred to the Oracle GoldenGate trail file by the Oracle GoldenGate capture process, the data can have three possible states: it can have a value, it can be null, or it can be missing.
For an insert operation, the after-image contains data for all column values regardless of whether the data is null.. However, the data included for update and delete operations may not always contain complete data for all columns. When replicating data to an RDBMS for an update operation only the primary key values and the values of the columns that changed are required to modify the data in the target database. In addition, only the primary key values are required to delete the row from the target database. Therefore, even though values are present in the source database, the values may be missing in the source trail file. Because data in the source trail file may have three states, the Plugable Formatters must also be able to represent data in all three states.
Because the row and column data in the Oracle GoldenGate trail file has an important effect on a Big Data integration, it is important to understand the data that is required. Typically, you can control the data that is included for operations in the Oracle GoldenGate trail file. In an Oracle database, this data is controlled by the supplemental logging level. To understand how to control the row and column values that are included in the Oracle GoldenGate trail file, see the Oracle GoldenGate documentation for your source database implementation..
Parent topic: Using Operation-Based versus Row-Based Formatting
26.6 Using the XML Formatter
The XML Formatter formats before-image and after-image data from the source trail file into an XML document representation of the operation data. The format of the XML document is effectively the same as the XML format in the previous releases of the Oracle GoldenGate Java Adapter.
- Message Formatting Details
- Sample XML Messages
- XML Schema
- XML Formatter Configuration Properties
- Review a Sample Configuration
- Metadata Change Events
- Setting Metacolumn Output
- Primary Key Updates
Parent topic: Using the Pluggable Formatters
26.6.1 Message Formatting Details
The XML formatted messages contain the following information:
Table 26-15 XML formatting details
Value | Description |
---|---|
|
The fully qualified table name. |
|
The operation type. |
|
The current timestamp is the time when the formatter processed the current operation record. This timestamp follows the ISO-8601 format and includes micro second precision. Replaying the trail file does not result in the same timestamp for the same operation. |
|
The position from the source trail file. |
|
The total number of columns in the source table. |
|
The |
|
The |
Parent topic: Using the XML Formatter
26.6.2 Sample XML Messages
The following sections provide sample XML messages.
Parent topic: Using the XML Formatter
26.6.2.1 Sample Insert Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='I' ts='2013-06-02 22:14:36.000000' current_ts='2015-10-06T12:21:50.100001' pos='00000000000000001444' numCols='7'> <col name='CUST_CODE' index='0'> <before missing='true'/> <after><![CDATA[WILL]]></after> </col> <col name='ORDER_DATE' index='1'> <before missing='true'/> <after><![CDATA[1994-09-30:15:33:00]]></after> </col> <col name='PRODUCT_CODE' index='2'> <before missing='true'/> <after><![CDATA[CAR]]></after> </col> <col name='ORDER_ID' index='3'> <before missing='true'/> <after><![CDATA[144]]></after> </col> <col name='PRODUCT_PRICE' index='4'> <before missing='true'/> <after><![CDATA[17520.00]]></after> </col> <col name='PRODUCT_AMOUNT' index='5'> <before missing='true'/> <after><![CDATA[3]]></after> </col> <col name='TRANSACTION_ID' index='6'> <before missing='true'/> <after><![CDATA[100]]></after> </col> <tokens> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqL2AAA]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
26.6.2.2 Sample Update Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='U' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.413000' pos='00000000000000002891' numCols='7'> <col name='CUST_CODE' index='0'> <before><![CDATA[BILL]]></before> <after><![CDATA[BILL]]></after> </col> <col name='ORDER_DATE' index='1'> <before><![CDATA[1995-12-31:15:00:00]]></before> <after><![CDATA[1995-12-31:15:00:00]]></after> </col> <col name='PRODUCT_CODE' index='2'> <before><![CDATA[CAR]]></before> <after><![CDATA[CAR]]></after> </col> <col name='ORDER_ID' index='3'> <before><![CDATA[765]]></before> <after><![CDATA[765]]></after> </col> <col name='PRODUCT_PRICE' index='4'> <before><![CDATA[15000.00]]></before> <after><![CDATA[14000.00]]></after> </col> <col name='PRODUCT_AMOUNT' index='5'> <before><![CDATA[3]]></before> <after><![CDATA[3]]></after> </col> <col name='TRANSACTION_ID' index='6'> <before><![CDATA[100]]></before> <after><![CDATA[100]]></after> </col> <tokens> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqLzAAA]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
26.6.2.3 Sample Delete Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='D' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.415000' pos='00000000000000004338' numCols='7'> <col name='CUST_CODE' index='0'> <before><![CDATA[DAVE]]></before> <after missing='true'/> </col> <col name='ORDER_DATE' index='1'> <before><![CDATA[1993-11-03:07:51:35]]></before> <after missing='true'/> </col> <col name='PRODUCT_CODE' index='2'> <before><![CDATA[PLANE]]></before> <after missing='true'/> </col> <col name='ORDER_ID' index='3'> <before><![CDATA[600]]></before> <after missing='true'/> </col> <col name='PRODUCT_PRICE' index='4'> <missing/> </col> <col name='PRODUCT_AMOUNT' index='5'> <missing/> </col> <col name='TRANSACTION_ID' index='6'> <missing/> </col> <tokens> <token> <Name><![CDATA[L]]></Name> <Value><![CDATA[206080450]]></Value> </token> <token> <Name><![CDATA[6]]></Name> <Value><![CDATA[9.0.80330]]></Value> </token> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqLzAAC]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
26.6.2.4 Sample Truncate Message
<?xml version='1.0' encoding='UTF-8'?> <operation table='GG.TCUSTORD' type='T' ts='2013-06-02 22:14:41.000000' current_ts='2015-10-06T12:21:50.415001' pos='00000000000000004515' numCols='7'> <col name='CUST_CODE' index='0'> <missing/> </col> <col name='ORDER_DATE' index='1'> <missing/> </col> <col name='PRODUCT_CODE' index='2'> <missing/> </col> <col name='ORDER_ID' index='3'> <missing/> </col> <col name='PRODUCT_PRICE' index='4'> <missing/> </col> <col name='PRODUCT_AMOUNT' index='5'> <missing/> </col> <col name='TRANSACTION_ID' index='6'> <missing/> </col> <tokens> <token> <Name><![CDATA[R]]></Name> <Value><![CDATA[AADPkvAAEAAEqL2AAB]]></Value> </token> </tokens> </operation>
Parent topic: Sample XML Messages
26.6.3 XML Schema
The XML Formatter does not generate an XML schema (XSD). The XSD applies to all messages generated by the XML Formatter. The following XSD defines the structure of the XML documents that are generated by the XML Formatter.
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="operation"> <xs:complexType> <xs:sequence> <xs:element name="col" maxOccurs="unbounded" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="before" minOccurs="0"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:string" name="missing" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="after" minOccurs="0"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:string" name="missing" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element type="xs:string" name="missing" minOccurs="0"/> </xs:sequence> <xs:attribute type="xs:string" name="name"/> <xs:attribute type="xs:short" name="index"/> </xs:complexType> </xs:element> <xs:element name="tokens" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="token" maxOccurs="unbounded" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element type="xs:string" name="Name"/> <xs:element type="xs:string" name="Value"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute type="xs:string" name="table"/> <xs:attribute type="xs:string" name="type"/> <xs:attribute type="xs:string" name="ts"/> <xs:attribute type="xs:dateTime" name="current_ts"/> <xs:attribute type="xs:long" name="pos"/> <xs:attribute type="xs:short" name="numCols"/> </xs:complexType> </xs:element> </xs:schema>
Parent topic: Using the XML Formatter
26.6.4 XML Formatter Configuration Properties
Table 26-16 XML Formatter Configuration Properties
Properties | Optional Y/N | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an insert operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate an update operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a delete operation. |
|
Optional |
Any string |
|
Indicator to be inserted into the output record to indicate a truncate operation. |
|
Optional |
Any legal encoding name or alias supported by Java. |
UTF-8 (the XML default) |
The output encoding of generated XML documents. |
|
Optional |
|
|
Determines whether an XML prolog is included in generated XML documents. An XML prolog is optional for well-formed XML. An XML prolog resembles the following: |
|
Optional |
|
|
Controls the format of the current timestamp in the XML message. The default adds a |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
|
Optional |
|
|
Set to |
Parent topic: Using the XML Formatter
26.6.5 Review a Sample Configuration
The following is a sample configuration for the XML Formatter in the Java Adapter properties file:
gg.handler.hdfs.format=xml gg.handler.hdfs.format.insertOpKey=I gg.handler.hdfs.format.updateOpKey=U gg.handler.hdfs.format.deleteOpKey=D gg.handler.hdfs.format.truncateOpKey=T gg.handler.hdfs.format.encoding=ISO-8859-1 gg.handler.hdfs.format.includeProlog=false
Parent topic: Using the XML Formatter
26.6.6 Metadata Change Events
The XML Formatter seamlessly handles metadata change events. A metadata change event does not result in a change to the XML schema. The XML schema is designed to be generic so that the same schema represents the data of any operation from any table.
If the replicated database and upstream Oracle GoldenGate replication process can propagate metadata change events, the XML Formatter can take action when metadata changes. Changes in the metadata are reflected in messages after the change. For example, when a column is added, the new column data appears in XML messages for the table.
Parent topic: Using the XML Formatter
26.6.7 Setting Metacolumn Output
The following are the configurable values for the XML metacolumns template property that controls metacolumn output:
Table 26-17 Metacolumns Template Property
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handler.name.format.metaColumnsTemplate |
Optional |
|
|
None |
The current meta column information can be configured in a simple manner and removes the explicit need to use: insertOpKey | updateOpKey | deleteOpKey | truncateOpKey | includeTableName | includeOpTimestamp | includeOpType | includePosition | includeCurrentTimestamp, useIso8601Format It is a comma-delimited string consisting of one or more templated values that represent the template. |
${optype}, ${token.ROWID}, ${sys.username}, ${currenttimestamp}
Explanation of the Metacolumn Keywords
The metacolumns functionality allows you to select the metadata fields that you want to see in the generated output messages. The format of the metacolumn syntax is:
-
${keyword[fieldName].argument}
-
The keyword is fixed based on the metacolumn syntax. Optionally, you can provide a field name between the square brackets. If a field name is not provided, then the default field name is used.
The argument is required to resolve the metacolumn value.
-
${alltokens}
-
All of the Oracle GoldenGate tokens.
-
${token}
-
The value of a specific Oracle GoldenGate token. The token key should follow token key should follow the token using the period (
.
) operator. For example:${token.MYTOKEN}
-
${token.MYTOKEN}
-
${sys}
A system environmental variable. The variable name should follow sys using the period (.) operator.
-
${sys.MYVAR}
-
${sys.MYVAR}
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.) operator. -
${env}
-
An Oracle GoldenGate environment variable. The variable name should follow
env
using the period (.
) operator. For example:${env.someVariable}
-
${javaprop}
-
A Java JVM variable. The variable name should follow
javaprop
using the period (.
) operator. For example:${javaprop.MYVAR}
-
${optype}
-
Operation type.
-
${position}
-
Record position.
-
${timestamp}
-
Record timestamp.
-
${catalog}
-
Catalog name.
-
${schema}
-
Schema name.
-
${table}
-
Table name.
-
${objectname}
-
The fully qualified table name.
-
${csn}
-
Source Commit Sequence Number.
-
${xid}
-
Source transaction ID.
-
${currenttimestamp}
-
Current timestamp.
-
${currenttimestampiso8601}
-
Current timestamp in ISO 8601 format.
-
${opseqno}
-
Record sequence number within the transaction.
-
${timestampmicro}
-
Record timestamp in microseconds after epoch.
-
${currenttimestampmicro}
-
Current timestamp in microseconds after epoch.
-
${txind}
-
The is the transactional indicator from the source trail file. The values of a transaction are
B
for the first operation,M
for the middle operations,E
for the last operation, orW
for whole if there is only one operation. Filtering operations or the use of coordinated apply negate the usefulness of this field. -
${primarykeycolumns}
-
Use to inject a field with a list of the primary key column names.
-
${static}
-
Use to inject a field with a static value into the output. The value desired should be the argument. If the desired value is
abc
, then the syntax would be${static.abc}
or${static[FieldName].abc}
. -
${seqno}
-
Use to inject a field with the trail file sequence into the output.
-
${rba}
-
Use to inject a field with the rba of the operation into the output.
Sample Configuration:
gg.handlerlist=kafkarestproxy
#The handler properties
gg.handler.kafkarestproxy.type=kafkarestproxy
#The following selects the topic name based on the fully qualified table name
gg.handler.kafkarestproxy.topicMappingTemplate=${fullyQualifiedTableName}
#The following selects the message key using the concatenated primary keys
gg.handler.kafkarestproxy.keyMappingTemplate=${primaryKeys}
gg.handler.kafkarestproxy.postDataUrl=http://localhost:8083
gg.handler.kafkarestproxy.apiVersion=v1
gg.handler.kafkarestproxy.format=json
gg.handler.kafkarestproxy.payloadsize=1
gg.handler.kafkarestproxy.mode=tx
#Server auth properties
#gg.handler.kafkarestproxy.trustStore=/keys/truststore.jks
#gg.handler.kafkarestproxy.trustStorePassword=test1234
#Client auth properites
#gg.handler.kafkarestproxy.keyStore=/keys/keystore.jks
#gg.handler.kafkarestproxy.keyStorePassword=test1234
#Proxy properties
#gg.handler.kafkarestproxy.proxy=http://proxyurl:80
#gg.handler.kafkarestproxy.proxyUserName=username
#gg.handler.kafkarestproxy.proxyPassword=password
#The MetaColumnTemplate formatter properties
gg.handler.kafkarestproxy.format.metaColumnsTemplate=${optype},${timestampmicro},${currenttimestampmicro}
Parent topic: Using the XML Formatter
26.6.8 Primary Key Updates
Updates to a primary key require no special handling by the XML formatter. The XML formatter creates messages that model database operations. For update operations, this includes before and after images of column values. Primary key changes are represented in this format as a change to a column value just like a change to any other column value.
Parent topic: Using the XML Formatter