4 Using the Elasticsearch Handler
The Elasticsearch Handler allows you to store, search, and analyze large volumes of data quickly and in near real time.
This chapter describes how to use the Elasticsearch handler.
4.1 Overview
Elasticsearch is a highly scalable open-source full-text search and analytics engine. Elasticsearch allows you to store, search, and analyze large volumes of data quickly and in near real time. It is generally used as the underlying engine or technology that drives applications with complex search features.
The Elasticsearch Handler uses the Elasticsearch Java client to connect and receive data into Elasticsearch node, see https://www.elastic.co.
Parent topic: Using the Elasticsearch Handler
4.2 Detailing the Functionality
- About the Elasticsearch Version Property
- About the Index and Type
- About the Document
- About the Primary Key Update
- About the Data Types
- Operation Mode
- Operation Processing Support
- About the Connection
Parent topic: Using the Elasticsearch Handler
4.2.1 About the Elasticsearch Version Property
The Elasticsearch Handler supports two different clients to communicate with the Elasticsearch cluster: The Elasticsearch transport client and the Elasticsearch High Level REST client.
- Set the
gg.handler.name.version
configuration value to 5.x, 6.x or 7.x to connect to the Elasticsearch cluster using the transport client using the respective version. - Set the
gg.handler.name.version
configuration value to REST7.0 to connect to the Elasticseach cluster using the Elasticsearch High Level REST client. The REST client support Elasticsearch versions 7.x.
Parent topic: Detailing the Functionality
4.2.2 About the Index and Type
An Elasticsearch index is a collection of documents with similar characteristics. An index can only be created in lowercase. An Elasticsearch type is a logical group within an index. All the documents within an index or type should have same number and type of fields.
The Elasticsearch Handler maps the source trail schema concatenated with source trail table name to construct the index. For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.
The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.
Table 4-1 Elasticsearch Mapping
Source Trail | Elasticsearch Index | Elasticsearch Type |
---|---|---|
|
|
|
|
|
|
If an index does not already exist in the Elasticsearch cluster, a new index is created when Elasticsearch Handler receives (INSERT
or UPDATE
operation in source trail) data.
Parent topic: Detailing the Functionality
4.2.3 About the Document
An Elasticsearch document is a basic unit of information that can be indexed. Within an index or type, you can store as many documents as you want. Each document has an unique identifier based on the _id
field.
The Elasticsearch Handler maps the source trail primary key column value as the document identifier.
Parent topic: Detailing the Functionality
4.2.4 About the Primary Key Update
The Elasticsearch document identifier is created based on the source table's primary key column value. The document identifier cannot be modified. The Elasticsearch handler processes a source primary key's update operation by performing a DELETE
followed by an INSERT
. While performing the INSERT
, there is a possibility that the new document may contain fewer fields than required. For the INSERT
operation to contain all the fields in the source table, enable trail Extract to capture the full data before images for update operations or use GETBEFORECOLS
to write the required column’s before images.
Parent topic: Detailing the Functionality
4.2.5 About the Data Types
Elasticsearch supports the following data types:
-
32-bit integer
-
64-bit integer
-
Double
-
Date
-
String
-
Binary
Parent topic: Detailing the Functionality
4.2.6 Operation Mode
The Elasticsearch Handler uses the operation mode for better performance. The gg.handler.name.mode
property is not used by the handler.
Parent topic: Detailing the Functionality
4.2.7 Operation Processing Support
The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.
For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.
-
INSERT
-
The Elasticsearch Handler creates a new index if the index does not exist, and then inserts a new document.
-
UPDATE
-
If an Elasticsearch index or document exists, the document is updated. If an Elasticsearch index or document does not exist, a new index is created and the column values in the
UPDATE
operation are inserted as a new document. -
DELETE
-
If an Elasticsearch index or document exists, the document is deleted. If Elasticsearch index or document does not exist, a new index is created with zero fields.
The TRUNCATE
operation is not supported.
Parent topic: Detailing the Functionality
4.2.8 About the Connection
A cluster is a collection of one or more nodes (servers) that holds the entire data. It provides federated indexing and search capabilities across all nodes.
A node is a single server that is part of the cluster, stores the data, and participates in the cluster’s indexing and searching.
The Elasticsearch Handler property gg.handler.name.ServerAddressList
can be set to point to the nodes available in the cluster.
Parent topic: Detailing the Functionality
4.3 Setting Up and Running the Elasticsearch Handler
You must ensure that the Elasticsearch cluster is setup correctly and the cluster is up and running, see https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html. Alternatively, you can use Kibana to verify the setup.
Set the Classpath
The property gg.classpath
must include all the jars required by the
Java transport client. For a listing of the required client JAR files by version,
see Elasticsearch Handler Transport Client Dependencies.
For a listing of the required client JAR files for the Elatisticsearch High Level
REST client, see Elasticsearch High Level REST Client Dependencies.
Default location of 5.X JARs: Elasticsearch_Home/lib/* Elasticsearch_Home/plugins/x-pack/* Elasticsearch_Home/modules/transport-netty3/* Elasticsearch_Home/modules/transport-netty4/* Elasticsearch_Home/modules/reindex/*
The inclusion of the * wildcard in the path can include the * wildcard character in order to include all of the JAR files in that directory in the associated classpath. Do not use *.jar
.
The following is an example of the correctly configured classpath:
gg.classpath=Elasticsearch_Home/lib/*
Parent topic: Using the Elasticsearch Handler
4.3.1 Configuring the Elasticsearch Handler
The following are the configurable values for the Elasticsearch handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the Elasticsearch Handler, you must first configure the
handler type by specifying
gg.handler.name.type=elasticsearch
and the other
Elasticsearch properties as follows:
Table 4-2 Elasticsearch Handler Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handlerlist |
Required |
Name (any name of your choice) |
None |
The list of handlers to be used. |
gg.handler.name.type |
Required |
elasticsearch |
None |
Type of handler to use. For example, Elasticsearch, Kafka, Flume, or HDFS. |
gg.handler.name.ServerAddressList |
Optional |
|
|
Comma separated list of contact points of the nodes to connect to the Elasticsearch cluster. |
gg.handler.name.clientSettingsFile |
Required |
Transport client properties file. |
None |
The filename in classpath that holds Elasticsearch transport client properties used by the Elasticsearch Handler. |
gg.handler.name.version
|
Optional |
|
|
The legal values 5.x, 6.x, and 7.x indicate using the Elasticsearch transport client to communicate with the Elasticsearch cluster. REST indicates using the Elasticsearch High Level REST client to communicate with the Elasticsearch cluster. |
gg.handler.name.bulkWrite |
Optional |
|
|
When this property is |
gg.handler.name.numberAsString |
Optional |
|
|
When this property is |
gg.handler.name.routingKeyMappingTemplate |
Optional |
A string made up of constant values and templating keywords so that a value for the routing key can be resolved at runtime. |
None |
Set a template to dynamically resolve the routing key at runtime to control the shard in Elasticsearch to which the message is sent. The default is to use the id that is used by Elasticsearch as the routing key. |
gg.handler.elasticsearch.upsert |
Optional |
|
|
When this property is |
|
Optional | ${columnValue[table1=column1,table2=column2,…] |
None | N/A |
gg.handler.name.authType |
Optional | none | basic | ssl |
None | Controls the authentication type for the
Elasticsearch REST client.
|
gg.handler.name.basicAuthUsername |
Optional | A valid username | None | The username for the server to authenticate the
Elasticsearch REST client. Must be provided for auth types
basic and ssl .
|
gg.handler.name.basicAuthPassword |
Optional | A valid password | None | The password for the server to authenticate the
Elasticsearch REST client. Must be provided for auth types
basic and ssl .
|
gg.handler.name.trustStore |
Optional | The path and name of the truststore file. | None | The truststore for the Elasticsearch client to
validate the certificate received from the Elasticsearch server.
Must be provided if the auth type is set to ssl .
Valid only for the Elasticsearch REST client.
|
gg.handler.name.trustStorePassword |
Optional | The password to access the truststore. | None | The password for the truststore for the Elasticsearch
REST client to validate the certificate received from the
Elasticsearch server. Must be provided if the auth type is set to
ssl .
|
gg.handler.name.maxConnectTimeout |
Optional | Positive integer | The default value of the Apache HTTP Components framework. | Set the maximum wait period for a connection to be established from the Elasticsearch REST client to the Elasticsearch server. Valid only for the Elasticsearch REST client. |
gg.handler.name.maxSocketTimeout |
Optional | Positive integer | The default value of the Apache HTTP Components framework. | Sets the maximum wait period in milliseconds to wait for a response from the service after issuing a request. May need to be increased when pushing large data volumes. Valid only for the Elasticsearch REST client. |
gg.handler.name.proxyUsername |
Optional | The proxy server username. | None | If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the username of your proxy server. Most proxy servers do not require credentials. |
gg.handler.name.proxyPassword |
Optional | The proxy server password. | None | If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the password of your proxy server. Most proxy servers do not require credentials. |
gg.handler.name.proxyProtocol |
Optional | http | https |
http |
If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the protocol of your proxy server. |
gg.handler.name.proxyPort |
Optional | The port number of your proxy server. | None | If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the port number of your proxy server. |
gg.handler.name.proxyServer |
Optional | The host name of your proxy server. | None | If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the host name of your proxy server. |
Example 4-1 Sample Handler Properties file:
For 5.x Elasticsearch cluster:
gg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9300
gg.handler.elasticsearch.clientSettingsFile=client.properties
gg.handler.elasticsearch.version=5.x
gg.classpath=/path/to/elastic/lib/*:/path/to/elastic/modules/transport-netty4/*:/path/to/elastic/modules/reindex/*
For 5.x Elasticsearch cluster with x-pack:
gg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9300
gg.handler.elasticsearch.clientSettingsFile=client.properties
gg.handler.elasticsearch.version=5.x
gg.classpath=/path/to/elastic/lib/*:/path/to/elastic/plugins/x-pack/*:/path/to/elastic/modules/transport-netty4/*:/path/to/elastic/modules/reindex/*
Sample Replicat configuration and a Java Adapter Properties files can be found at the following directory:
GoldenGate_install_directory/AdapterExamples/big-data/elasticsearch
For Elasticsearch REST handler
gg.handlerlist=elasticsearch gg.handler.elasticsearch.type=elasticsearch gg.handler.elasticsearch.ServerAddressList=localhost:9300 gg.handler.elasticsearch.version=rest7.x gg.classpath=/path/to/elasticsearch/lib/*:/path/to/elasticsearch/modules/reindex/*:/path/to/elasticsearch/modules/lang-mustache/*:/path/to/elasticsearch/modules/rank-eval/*
Parent topic: Setting Up and Running the Elasticsearch Handler
4.3.2 About the Transport Client Settings Properties File
The Elasticsearch Handler uses a Java Transport client to interact with Elasticsearch cluster. The Elasticsearch cluster may have addional plug-ins like shield or x-pack, which may require additional configuration.
The gg.handler.name.clientSettingsFile
property should point to a file that
has additional client settings based on the version of Elasticsearch cluster. The
Elasticsearch Handler attempts to locate and load the client settings file using the
Java classpath. The Java classpath must include the directory containing the
properties file.
The client properties file for Elasticsearch (without any plug-in) is:
cluster.name=Elasticsearch_cluster_name
The Shield plug-in also supports additional capabilities like SSL and
IP filtering. The properties can be set in the
client.properties
file, see https://www.elastic.co/guide/en/shield/current/_using_elasticsearch_java_clients_with_shield.html.
The client.properties
file for Elasticsearch
5.x with the X-Pack plug-in
is:
cluster.name=Elasticsearch_cluster_name
xpack.security.user=x-pack_username:x-pack-password
The X-Pack plug-in also supports additional capabilities. The
properties can be set in the client.properties
file, see https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.1/transport-client.html
and https://www.elastic.co/guide/en/x-pack/current/java-clients.html.
Parent topic: Setting Up and Running the Elasticsearch Handler
4.4 Performance Consideration
The Elasticsearch Handler gg.handler.name.bulkWrite
property is used to determine whether the source trail records should be pushed to the Elasticsearch cluster one at a time or in bulk using the bulk write API. When this property is true, the source trail operations are pushed to the Elasticsearch cluster in batches whose size can be controlled by the MAXTRANSOPS
parameter in the generic Replicat parameter file. Using the bulk write API provides better performance.
Elasticsearch uses different thread pools to improve how memory consumption of threads are managed within a node. Many of these pools also have queues associated with them, which allow pending requests to be held instead of discarded.
For bulk operations, the default queue size is 50 (in version 5.2) and 200 (in version 5.3).
To avoid bulk API errors, you must set the Replicat MAXTRANSOPS
size to match the bulk thread pool queue size at a minimum. The configuration thread_pool.bulk.queue_size
property can be modified in the elasticsearch.yaml
file.
Parent topic: Using the Elasticsearch Handler
4.5 About the Shield Plug-In Support
Elasticsearch versions 5.x supports a Shield plug-in which provides basic
authentication, SSL and IP filtering. Similar capabilities exist in the X-Pack
plug-in for Elasticsearch 6.x and 7.x. The additional transport client
settings can be configured in the Elasticsearch Handler using the
gg.handler.name.clientSettingsFile
property.
Parent topic: Using the Elasticsearch Handler
4.6 About DDL Handling
The Elasticsearch Handler does not react to any DDL records in the source trail. Any data manipulation records for a new source table results in auto-creation of index or type in the Elasticsearch cluster.
Parent topic: Using the Elasticsearch Handler
4.7 Troubleshooting
This section contains information to help you troubleshoot various issues.
- Incorrect Java Classpath
- Elasticsearch Version Mismatch
- Transport Client Properties File Not Found
- Cluster Connection Problem
- Unsupported Truncate Operation
- Bulk Execute Errors
Parent topic: Using the Elasticsearch Handler
4.7.1 Incorrect Java Classpath
The most common initial error is an incorrect classpath to include all the required client libraries and creates a ClassNotFound
exception in the log4j
log file.
Also, it may be due to an error resolving the classpath if there is a typographic error in the gg.classpath
variable.
The Elasticsearch transport client libraries do not ship with the Oracle GoldenGate for Big Data product. You should properly configure the gg.classpath
property in the Java Adapter Properties file to correctly resolve the client libraries, see Setting Up and Running the Elasticsearch Handler.
Parent topic: Troubleshooting
4.7.2 Elasticsearch Version Mismatch
The Elasticsearch Handler gg.handler.name.version
property must be set to one of the following
values: 5.x, 6.x, 7.x, or REST to match the major version
number of the Elasticsearch cluster. For example,
gg.handler.name.version=7.x
.
The following errors may occur when there is a wrong version configuration:
Error: NoNodeAvailableException[None of the configured nodes are available:] ERROR 2017-01-30 22:35:07,240 [main] Unable to establish connection. Check handler properties and client settings configuration. java.lang.IllegalArgumentException: unknown setting [shield.user]
Ensure that all required plug-ins are installed and review documentation changes for any removed settings.
Parent topic: Troubleshooting
4.7.3 Transport Client Properties File Not Found
To resolve this exception:
ERROR 2017-01-30 22:33:10,058 [main] Unable to establish connection. Check handler properties and client settings configuration.
Verify that the gg.handler.name.clientSettingsFile
configuration property is correctly setting the Elasticsearch transport client settings file name. Verify that the gg.classpath
variable includes the path to the correct file name and that the path to the properties file does not contain an asterisk (*) wildcard at the end.
Parent topic: Troubleshooting
4.7.4 Cluster Connection Problem
This error occurs when the Elasticsearch Handler is unable to connect to the Elasticsearch cluster:
Error: NoNodeAvailableException[None of the configured nodes are available:]
Use the following steps to debug the issue:
-
Ensure that the Elasticsearch server process is running.
-
Validate the
cluster.name
property in the client properties configuration file. -
Validate the authentication credentials for the x-Pack or Shield plug-in in the client properties file.
-
Validate the
gg.handler.name.ServerAddressList
handler property.
Parent topic: Troubleshooting
4.7.5 Unsupported Truncate Operation
The following error occurs when the Elasticsearch Handler finds a TRUNCATE
operation in the source trail:
oracle.goldengate.util.GGException: Elasticsearch Handler does not support the operation: TRUNCATE
This exception error message is written to the handler log file before the RAeplicat process abends. Removing the GETTRUNCATES
parameter from the Replicat parameter file resolves this error.
Parent topic: Troubleshooting
4.7.6 Bulk Execute Errors
DEBUG [main] (ElasticSearch5DOTX.java:130) - Bulk execute status: failures:[true] buildFailureMessage:[failure in bulk execution: [0]: index [cs2cat_s1sch_n1tab], type [N1TAB], id [83], message [RemoteTransportException[[UOvac8l][127.0.0.1:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@43eddfb2 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5ef5f412[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 84]]];]
It may be due to the Elasticsearch running out of resources to process the operation. You can limit the Replicat batch size using MAXTRANSOPS
to match the value of the thread_pool.bulk.queue_size
Elasticsearch configuration parameter.
Note:
Changes to the Elasticsearch parameter,thread_pool.bulk.queue_size
, are effective only after the Elasticsearch node is restarted.
Parent topic: Troubleshooting
4.8 Logging
The following log messages appear in the handler log file on successful connection:
Connection to a 5.x Elasticsearch cluster:
INFO [main] (Elasticsearch5DOTX.java:38) - **BEGIN Elasticsearch client settings** INFO [main] (Elasticsearch5DOTX.java:39) - {xpack.security.user=user1:user1_kibana, cluster.name=elasticsearch-user1-myhost, request.headers.X-Found-Cluster=elasticsearch-user1-myhost} INFO [main] (Elasticsearch5DOTX.java:52) - Connecting to Server[myhost.us.example.com] Port[9300] INFO [main] (Elasticsearch5DOTX.java:64) - Client node name: _client_ INFO [main] (Elasticsearch5DOTX.java:65) - Connected nodes: [{node-myhost}{w9N25BrOSZeGsnUsogFn1A}{bIiIultVRjm0Ze57I3KChg}{myhost}{198.51.100.1:9300}] INFO [main] (Elasticsearch5DOTX.java:66) - Filtered nodes: [] INFO [main] (Elasticsearch5DOTX.java:68) - **END Elasticsearch client settings**
Parent topic: Using the Elasticsearch Handler
4.9 Known Issues in the Elasticsearch Handler
Elasticsearch: Trying to input very large number
Very large numbers result in inaccurate values with Elasticsearch document. For example, 9223372036854775807, -9223372036854775808. This is an issue with the Elasticsearch server and not a limitation of the Elasticsearch Handler.
The workaround for this issue is to ingest all the number values as strings using the gg.handler.name.numberAsString=true
property.
Elasticsearch: Issue with index
The Elasticsearch Handler is not able to input data into the same index if there are more than one table with similar column names and different column data types.
Index names are always lowercase though the catalog/schema/tablename
in the trail may be case-sensitive.
Parent topic: Using the Elasticsearch Handler