8 Configuring the ORC Event Handler

You configure the ORC Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).

The ORC Event Handler works only in conjunction with the File Writer Handler.

To enable the selection of the ORC Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=orc and the other ORC properties as follows:

Table 8-1 ORC Event Handler Configuration Properties

Properties Required/ Optional Legal Values Default Explanation

gg.eventhandler.name.type

Required

ORC

None

Selects the ORC Event Handler.

gg.eventhandler.name.writeToHDFS

Optional

true | false

false

The ORC framework allows direct writing to HDFS. Set to false to write to the local file system. Set to true to write directly to HDFS.

gg.eventhandler.name.pathMappingTemplate

Required

A string with resolvable keywords and constants used to dynamically generate the path in the ORC bucket to write the file.

None

Use keywords interlaced with constants to dynamically generate a unique ORC path names at runtime. Typically, path names follow the format, /ogg/data/${groupName}/${fullyQualifiedTableName}.

gg.eventhandler.name.fileMappingTemplate

Optional

A string with resolvable keywords and constants used to dynamically generate the ORC file name at runtime.

None

Use resolvable keywords and constants used to dynamically generate the ORC data file name at runtime. If not set, the upstream file name is used.

gg.eventhandler.name.compressionCodec

Optional

LZ4 | LZO | NONE | SNAPPY | ZLIB

NONE

Sets the compression codec of the generated ORC file.

gg.eventhandler.name.finalizeAction

Optional

none | delete

none

Set to none to leave the ORC data file in place on the finalize action. Set to delete if you want to delete the ORC data file with the finalize action.

gg.eventhandler.name.kerberosPrincipal

Optional

The Kerberos principal name.

None

Sets the Kerberos principal when writing directly to HDFS and Kerberos authentication is enabled.

gg.eventhandler.name.kerberosKeytabFile

Optional

The path to the Keberos keytab file.

none

Sets the path to the Kerberos keytab file with writing directly to HDFS and Kerberos authentication is enabled.

gg.eventhandler.name.blockPadding

Optional

true | false

true

Set to true to enable block padding in generated ORC files or false to disable.

gg.eventhandler.name.blockSize

Optional

long

The ORC default.

Sets the block size of generated ORC files.

gg.eventhandler.name.bufferSize

Optional

integer

The ORC default.

Sets the buffer size of generated ORC files.

gg.eventhandler.name.encodingStrategy

Optional

COMPRESSION | SPEED

The ORC default.

Set if the ORC encoding strategy is optimized for compression or for speed..

gg.eventhandler.name.paddingTolerance

Optional

A percentage represented as a floating point number.

The ORC default.

Sets the percentage for padding tolerance of generated ORC files.

gg.eventhandler.name.rowIndexStride

Optional

integer

The ORC default.

Sets the row index stride of generated ORC files.

gg.eventhandler.name.stripeSize

Optional

integer

The ORC default.

Sets the stripe size of generated ORC files.

gg.eventhandler.name.eventHandler

Optional

A unique string identifier cross referencing a child event handler.

No event handler configured.

The event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3 or HDFS.

gg.eventhandler.name.bloomFilterFpp

Optional

The false positive probability must be greater than zero and less than one. For example, .25 and .75 are both legal values, but 0 and 1 are not.

The Apache ORC default.

Sets the false positive probability of the querying of a bloom filter index and the result indicating that the value being searched for is in the block, but the value is actually not in the block.

needs to set which tables to set bloom filters and on which columns. The user selects on which tables and columns to set bloom filters with the following configuration syntax:

gg.eventhandler.orc.bloomFilter.QASOURCE.TCUSTMER=CUST_CODE
gg.eventhandler.orc.bloomFilter.QASOURCE.TCUSTORD=CUST_CODE,ORDER_DATE

QASOURCE.TCUSTMER and QASOURCE.TCUSTORD are the fully qualified names of the source tables. The configured values are one or more columns on which to configure bloom filters. The columns names are delimited by a comma.

gg.eventhandler.name.bloomFilterVersion

Optional

ORIGINAL | UTF8

ORIGINAL

Sets the version of the ORC bloom filter.