6 Using the HDFS Event Handler
The HDFS Event Handler is used to load files generated by the File Writer Handler into HDFS.
This topic describes how to use the HDFS Event Handler. See Using the File Writer Handler.
6.1 Detailing the Functionality
Parent topic: Using the HDFS Event Handler
6.1.1 Configuring the Handler
The HDFS Event Handler can can upload data files to HDFS. These additional configuration steps are required:
The HDFS Event Handler dependencies and considerations are the same as the HDFS Handler, see HDFS Additional Considerations.
Ensure that gg.classpath
includes the HDFS client libraries.
Ensure that the directory containing the HDFS core-site.xml
file is in gg.classpath
. This is so the core-site.xml
file can be read at runtime and the connectivity information to HDFS can be resolved. For example:
gg.classpath=/{HDFSinstallDirectory}/etc/hadoop
If Kerberos authentication is enabled on the HDFS cluster, you have to configure the Kerberos principal and the location of the keytab
file so that the password can be resolved at runtime:
gg.eventHandler.name.kerberosPrincipal=principal
gg.eventHandler.name.kerberosKeytabFile=pathToTheKeytabFile
Parent topic: Detailing the Functionality
6.1.2 Configuring the HDFS Event Handler
You configure the HDFS Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the HDFS Event Handler, you must first configure the
handler type by specifying gg.eventhandler.name.type=hdfs
and the other HDFS Event properties as follows:
Table 6-1 HDFS Event Handler Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
|
None |
Selects the HDFS Event Handler for use. |
|
Required |
A string with resolvable keywords and constants used to dynamically generate the path in HDFS to write data files. |
None |
Use keywords interlaced with constants to dynamically generate a unique path names at runtime. Path names typically follow the format, |
|
Optional |
A string with resolvable keywords and constants used to dynamically generate the HDFS file name at runtime. |
None |
Use keywords interlaced with constants to dynamically generate a unique file names at runtime. If not set, the upstream file name is used. |
|
Optional |
|
|
Indicates what the File Writer Handler should do at the finalize action.
|
|
Optional |
The Kerberos principal name. |
None |
Set to the Kerberos principal when HDFS Kerberos authentication is enabled. |
|
Optional |
The path to the Keberos |
None |
Set to the path to the Kerberos |
|
Optional |
A unique string identifier cross referencing a child event handler. |
No event handler configured. |
A unique string identifier cross referencing an event handler. The event handler will be invoked on the file roll event. Event handlers can do thing file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS. |
Parent topic: Detailing the Functionality
6.1.3 Using Templated Strings
Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. The HDFS Event Handler makes extensive use of templated strings to generate the HDFS directory names, data file names, and HDFS bucket names. This gives you the flexibility to select where to write data files and the names of those data files.
Supported Templated Strings
Keyword | Description |
---|---|
${fullyQualifiedTableName} |
The fully qualified source table name delimited by a period ( |
${catalogName} |
The individual source catalog name. For example, |
${schemaName} |
The individual source schema name. For example, |
${tableName} |
The individual source table name. For example, |
${groupName} |
The name of the Replicat process (with the thread number appended if you’re using coordinated apply). |
${emptyString} |
Evaluates to an empty string. For example, |
${operationCount} |
The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( |
${insertCount} |
The total count of insert operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( |
${updateCount} |
The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( |
${deleteCount} |
The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( |
${truncateCount} |
The total count of truncate operations in the data file. It must be used either on rename or by the event handlers or it will be zero ( |
${currentTimestamp} |
The current timestamp. The default output format for the date time is
This format uses the syntax defined in the Java |
${toUpperCase[]} |
Converts the contents inside the square brackets to uppercase. For example, |
${toLowerCase[]} |
Converts the contents inside the square brackets to lowercase. For example, |
Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.
Parent topic: Detailing the Functionality