Elasticsearch

8.2.16 Elasticsearch

Elasticsearch with Elasticsearch 7x and 6x
The Elasticsearch Handler allows you to store, search, and analyze large volumes of data quickly and in near real time.
Elasticsearch 8x
The Elasticsearch Handler allows you to store, search, and analyze large volumes of data quickly and in near real time.

Parent topic: Target

8.2.16.1 Elasticsearch with Elasticsearch 7x and 6x

The Elasticsearch Handler allows you to store, search, and analyze large volumes of data quickly and in near real time.

This article describes how to use the Elasticsearch handler.

Note:

This section on the Elasticsearch Handler pertains to Oracle GoldenGate for Big Data versions 21.9.0.0.0 and before. Starting with Oracle GoldenGate for Big Data 21.10.0.0.0, the Elasticsearch client was changed in order to support Elasticsearch 8.x.

Overview
Detailing the Functionality
Setting Up and Running the Elasticsearch Handler
Troubleshooting
Performance Consideration
About the Shield Plug-In Support
About DDL Handling
Known Issues in the Elasticsearch Handler
Elasticsearch Handler Transport Client Dependencies
What are the dependencies for the Elasticsearch Handler to connect to Elasticsearch databases?
Elasticsearch High Level REST Client Dependencies

Parent topic: Elasticsearch

8.2.16.1.1 Overview

Elasticsearch is a highly scalable open-source full-text search and analytics engine. Elasticsearch allows you to store, search, and analyze large volumes of data quickly and in near real time. It is generally used as the underlying engine or technology that drives applications with complex search features.

The Elasticsearch Handler uses the Elasticsearch Java client to connect and receive data into Elasticsearch node, see https://www.elastic.co.

Note:

This section on the Elasticsearch Handler pertains to Oracle Goldengate for Big Data versions 21.9.0.0.0 and before. Starting with Oracle Goldengate for Big Data 21.10.0.0.0, the Elasticsearch client was changed in order to support Elasticsearch 8.x.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.2 Detailing the Functionality

This topic details the Elasticsearch Handler functionality.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.2.1 About the Elasticsearch Version Property

The Elasticsearch Handler supports two different clients to communicate with the Elasticsearch cluster: The Elasticsearch transport client and the Elasticsearch High Level REST client.

Elasticsearch Handler can also be configured for the two supported clients by specifying the appropriate version of Elasticsearch handler properties file. Older version of Elasticsearch (6.x) supports only Transport client and the Elasticsearch handler can be configured by setting the configurable property version value to 6.x. For the latest version of Elasticsearch (7.x), both the Transport client and the High Level REST client are supported. Therefore, in the latest version, the Elasticsearch Handler can be configured for Transport client by setting the value of configurable property version to 7.x and High Level REST client by setting the value to Rest7.x.

The configurable parameters for each of them are as follows:

Set the gg.handler.name.version configuration value to 6.x or 7.x to connect to the Elasticsearch cluster using the transport client using the respective version.
Set the gg.handler.name.version configuration value to REST7.0 to connect to the Elasticseach cluster using the Elasticsearch High Level REST client. The REST client support Elasticsearch versions 7.x.

Parent topic: Detailing the Functionality

8.2.16.1.2.2 About the Index and Type

An Elasticsearch index is a collection of documents with similar characteristics. An index can only be created in lowercase. An Elasticsearch type is a logical group within an index. All the documents within an index or type should have same number and type of fields.

The Elasticsearch Handler maps the source trail schema concatenated with source trail table name to construct the index. For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.

The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.

Note:

Elasticsearch field names are case sensitive. If the field name in the data to be either updated or inserted are in uppercase and the existing fields in Elasticsearch server are in lowercase, then they are treated as new fields and not updated as existing fields. The workaround for this is using the parameter gg.schema.normalize=lowercase, which will update the field name to lowercase, thus resolving the issue.

Table 8-14 Elasticsearch Mapping

Source Trail	Elasticsearch Index	Elasticsearch Type
`schema.tablename`	`schema_tablename`	`tablename`
`catalog.schema.tablename`	`catalog_schema_tablename`	`tablename`

If an index does not already exist in the Elasticsearch cluster, a new index is created when Elasticsearch Handler receives (INSERT or UPDATE operation in source trail) data.

Parent topic: Detailing the Functionality

8.2.16.1.2.3 About the Document

An Elasticsearch document is a basic unit of information that can be indexed. Within an index or type, you can store as many documents as you want. Each document has an unique identifier based on the _id field.

The Elasticsearch Handler maps the source trail primary key column value as the document identifier.

Parent topic: Detailing the Functionality

8.2.16.1.2.4 About the Primary Key Update

The Elasticsearch document identifier is created based on the source table's primary key column value. The document identifier cannot be modified. The Elasticsearch handler processes a source primary key's update operation by performing a DELETE followed by an INSERT. While performing the INSERT, there is a possibility that the new document may contain fewer fields than required. For the INSERT operation to contain all the fields in the source table, enable trail Extract to capture the full data before images for update operations or use GETBEFORECOLS to write the required column’s before images.

Parent topic: Detailing the Functionality

8.2.16.1.2.5 About the Data Types

Elasticsearch supports the following data types:

32-bit integer
64-bit integer
Double
Date
String
Binary

Parent topic: Detailing the Functionality

8.2.16.1.2.6 Operation Mode

The Elasticsearch Handler uses the operation mode for better performance. The gg.handler.name.mode property is not used by the handler.

Parent topic: Detailing the Functionality

8.2.16.1.2.7 Operation Processing Support

The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.

For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.

INSERT: The Elasticsearch Handler creates a new index if the index does not exist, and then inserts a new document.
UPDATE: If an Elasticsearch index or document exists, the document is updated. If an Elasticsearch index or document does not exist, a new index is created and the column values in the UPDATE operation are inserted as a new document.
DELETE: If an Elasticsearch index or document exists, the document is deleted. If Elasticsearch index or document does not exist, a new index is created with zero fields.

The TRUNCATE operation is not supported.

Parent topic: Detailing the Functionality

8.2.16.1.2.8 About the Connection

A cluster is a collection of one or more nodes (servers) that holds the entire data. It provides federated indexing and search capabilities across all nodes.

A node is a single server that is part of the cluster, stores the data, and participates in the cluster’s indexing and searching.

The Elasticsearch Handler property gg.handler.name.ServerAddressList can be set to point to the nodes available in the cluster.

Parent topic: Detailing the Functionality

8.2.16.1.3 Setting Up and Running the Elasticsearch Handler

You must ensure that the Elasticsearch cluster is setup correctly and the cluster is up and running, see https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html. Alternatively, you can use Kibana to verify the setup.

Set the Classpath

The property gg.classpath must include all the jars required by the Java transport client. For a listing of the required client JAR files by version, see Elasticsearch Handler Transport Client Dependencies. For a listing of the required client JAR files for the Elatisticsearch High Level REST client, see Elasticsearch High Level REST Client Dependencies.

The inclusion of the * wildcard in the path can include the * wildcard character in order to include all of the JAR files in that directory in the associated classpath. Do not use *.jar.

The following is an example of the correctly configured classpath:

gg.classpath=Elasticsearch_Home/lib/*

Configuring the Elasticsearch Handler

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.3.1 Configuring the Elasticsearch Handler

Elasticsearch Handler can be configured for different version of Elasticsearch. For the latest version (7.x), two types of clients are supported: the Transport client and High-level REST client. When the configurable property version is set to the values 6.x or 7.x it uses Elasticsearch Transport client for connecting and performing all other operations of handler to Elasticsearch cluster. When the configurable property version is set to rest7.x, it uses Elasticsearch High Level REST client for connecting and performing other operations of handler to Elasticsearch 7.x cluster. The configurable parameters for each of them are separately given below:

Table 8-15 Common Configurable Properties

Properties	Required/ Optional	Legal Values	Default	Explanation
`gg.handlerlist`	Required	Name (Any name of your choice for handler)	None	The list of handlers to be used.
`gg.handler.<name>.type`	Required	elasticsearch	None	Type of handler to use. For example, Elasticsearch, Kafka, or Flume.
`gg.handler.name.ServerAddressList`	Optional	`Server:Port[, Server:Port …]`	localhost:9300 (for Transport Client) localhost:9200 (for High-Level REST Client)	Comma separated list of contact points of the nodes. The allowed port for version REST7.x is 9200. For other version, it is 9300.
`gg.handler.name.version`	Required	`5.x\|6.x\|7.x\|REST7.x`	7.x	The version values 5.x, 6.x, and 7.x indicate using the Elasticsearch Transport client to communicate with Elasticsearch version 5.x, 6.x and 7.x respectively. The version REST7.x indicates using the Elasticsearch High Level REST client to communicate with Elasticsearch version 7.x.
`gg.handler.name.version gg.handler.name.bulkWrite`	Optional	`true \| false`	`false`	When this property is `true`, the Elasticsearch Handler uses the bulk write API to ingest data into Elasticsearch cluster. The batch size of bulk write can be controlled using the `MAXTRANSOPS` Replicat parameter.
`gg.handler.name.numberAsString`	Optional	`true \| false`	`false`	When this property is `true`, the Elasticsearch Handler receives all the number column values (Long, Integer, or Double) in the source trail as strings into the Elasticsearch cluster.
`gg.handler.elasticsearch.upsert`	Optional	`true \| false`	`true`	When this property is `true`, a new document is inserted if the document does not already exist when performing an `UPDATE` operation.

Example 8-1 Sample Handler Properties file:

Sample Replicat configuration and a Java Adapter Properties files can be found at the following directory:

GoldenGate_install_directory/AdapterExamples/big-data/elasticsearch

For Elasticsearch REST handler

gg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9300
gg.handler.elasticsearch.version=rest7.x
gg.classpath=/path/to/elasticsearch/lib/*:/path/to/elasticsearch/modules/reindex/*:/path/to/elasticsearch/modules/lang-mustache/*:/path/to/elasticsearch/modules/rank-eval/*

Parent topic: Setting Up and Running the Elasticsearch Handler

8.2.16.1.3.1.1 Common Configurable Properties

The common configurable properties that are applicable for all the versions of Elasticsearch and applicable for both Transport client as well as High Level REST client of Elasticsearch handler are as shown in the following table:

Table 8-16 Common Configurable Properties

Properties	Required/ Optional	Legal Values	Default	Explanation
`gg.handlerlist`	Required	Name (Any name of your choice for handler)	None	The list of handlers to be used.
`gg.handler.<name>.type`	Required	elasticsearch	None	Type of handler to use. For example, Elasticsearch, Kafka, or Flume.
`gg.handler.name.ServerAddressList`	Optional	`Server:Port[, Server:Port …]`	localhost:9300 (for Transport Client) localhost:9200 (for High-Level REST Client)	Comma separated list of contact points of the nodes. The allowed port for version REST7.x is 9200. For other version, it is 9300.
`gg.handler.name.version`	Required	`6.x\|7.x\|REST7.x`	7.x	The version values 6.x, and 7.x indicate using the Elasticsearch Transport client to communicate with Elasticsearch version 6.x and 7.x respectively. The version REST7.x indicates using the Elasticsearch High Level REST client to communicate with Elasticsearch version 7.x.
`gg.handler.name.version gg.handler.name.bulkWrite`	Optional	`true \| false`	`false`	When this property is `true`, the Elasticsearch Handler uses the bulk write API to ingest data into Elasticsearch cluster. The batch size of bulk write can be controlled using the `MAXTRANSOPS` Replicat parameter.
`gg.handler.name.numberAsString`	Optional	`true \| false`	`false`	When this property is `true`, the Elasticsearch Handler receives all the number column values (Long, Integer, or Double) in the source trail as strings into the Elasticsearch cluster.
`gg.handler.elasticsearch.upsert`	Optional	`true \| false`	`true`	When this property is `true`, a new document is inserted if the document does not already exist when performing an `UPDATE` operation.

Parent topic: Configuring the Elasticsearch Handler

8.2.16.1.3.1.2 Transport Client Configurable Properties

When the configurable property version is set to the value 6.x or 7.x, it uses Transport client to communicate with the corresponding version of Elasticsearch cluster. The configurable properties applicable when using Transport client only are as follows:

Table 8-17 Transport Client Configurable Properties

Properties	Required/ Optional	Legal Values	Default	Explanation
`gg.handler.name.clientSettingsFile`	Required	Transport client properties file.	None	The filename in classpath that holds Elasticsearch transport client properties used by the Elasticsearch Handler.

Sample Properties file for Elasticsearch Handler with Transport Client (with x-pack plugin)

Copygg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9300
gg.handler.elasticsearch.clientSettingsFile=client.properties
gg.handler.elasticsearch.version=[6.x | 7.x]
gg.classpath=/path/to/elastic/lib/*:/path/to/elastic/modules/transport-netty4/*:/path/to/elastic/modules/reindex/*: /path/to/elastic/plugins/x-pack/*:

Parent topic: Configuring the Elasticsearch Handler

8.2.16.1.3.1.3 Transport Client Setting Properties File

The Elasticsearch Handler uses a Java Transport client to interact with Elasticsearch cluster. The Elasticsearch cluster may have additional plug-ins like shield or x-pack, which may require additional configuration.

The gg.handler.name.clientSettingsFile property should point to a file that has additional client settings based on the version of Elasticsearch cluster.

The Elasticsearch Handler attempts to locate and load the client settings file using the Java classpath. The Java classpath must include the directory containing the properties file.The client properties file for Elasticsearch (without any plug-in) is: cluster.name=Elasticsearch_cluster_name.

The Shield plug-in also supports additional capabilities like SSL and IP filtering. The properties can be set in the client.properties file, see https://www.elastic.co/guide/en/shield/current/_using_elasticsearch_java_clients_with_shield.html.

Example of client.properties file for Elasticsearch Handler with X-Pack plug-in:

Copycluster.name=Elasticsearch_cluster_name
xpack.security.user=x-pack_username:x-pack-password

The X-Pack plug-in also supports additional capabilities. The properties can be set in the client.properties file, see

https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.1/transport-client.html and https://www.elastic.co/guide/en/x-pack/current/java-clients.html

Parent topic: Configuring the Elasticsearch Handler

8.2.16.1.3.1.4 Classpath Settings for Transport Client

The gg.classpath setting for Elasticsearch handler with Transport client should contain the path to jars from library (lib) and modules (transport-netty4 and reindex modules) folder inside Elasticsearch installation directory. If x-pack plugin is used for authentication purpose, then the classpath should also include the jars inside the plugins (x-pack) folder inside Elasticsearch installation directory. See the path for jars as follows:

1.	[path/to/elastic/lib/*]
2.	[/path/to/elastic/modules/transport-netty4/*]
3.	[/path/to/elastic/modules/reindex/*]
4.	[/path/to/elastic/plugins/x-pack/*]  This needs to be added only if x-pack plugin is configured in Elasticsearch

Parent topic: Configuring the Elasticsearch Handler

8.2.16.1.3.1.5 REST Client Configurable Properties

When the configurable property version is set to value rest7.x, the handler uses Elasticsearch High Level REST client to connect to Elasticsearch 7.x cluster. The configurable properties that are supported for REST client only are as follows:

Properties	Required/ Optional	Legal Values	Default	Explanation
`gg.handler.elasticsearch.routingTemplate`	Optional	`${columnValue[table1=column1,table2=column2,…]`	None	The template to be used for deciding the routing algorithm.
`gg.handler.name.authType`	Optional	`none \| basic \| ssl`	None	Controls the authentication type for the Elasticsearch REST client. `none` - No authentication `basic` - Client authentication using username and password without message encrytption. `ssl` - Mutual authentication. Client authenticates the server using a trust-store. Server authentication client using username and password. Messages are encrypted.
`gg.handler.name.authType gg.handler.name.basicAuthUsername`	Required (for auth-type basic.)	A valid username	None	The username for the server to authenticate the Elasticsearch REST client. Must be provided for auth types `basic`.
`gg.handler.name.basicAuthPassword`	Required (for auth-type basic.)	A valid password	None	The password for the server to authenticate the Elasticsearch REST client. Must be provided for auth types `basic`.
`gg.handler.name.trustStore`	Required (for auth-type SSL)	The fully qualified name (path + name) of trust-store file	None	The truststore for the Elasticsearch client to validate the certificate received from the Elasticsearch server. Must be provided if the auth type is set to `ssl`. Valid only for the Elasticsearch REST client
`gg.handler.name.trustStorePassword`	Required (for auth-type SSL)	A valid trust-store Password	None	The password for the truststore for the Elasticsearch REST client to validate the certificate received from the Elasticsearch server. Must be provided if the auth type is set to ssl.
`gg.handler.name.maxConnectTimeout`	Optional	Positive integer	Default value of Apache HTTP Components framework.	Set the maximum wait period for a connection to be established from the Elasticsearch REST client to the Elasticsearch server. Valid only for the Elasticsearch REST client.
`gg.handler.name.maxSocketTimeout`	Optional	Positive Integer	Default value of Apache HTTP Components framework.	Sets the maximum wait period in milliseconds to wait for a response from the service after issuing a request. May need to be increased when pushing large data volumes. Valid only for the Elasticsearch REST client.
`gg.handler.name.proxyUsername`	Optional	The proxy server username	None	If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the username of your proxy server. Most proxy servers do not require credentials.
`gg.handler.name.proxyPassword`	Optional	The proxy server password	None	If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the password of your proxy server. Most proxy servers do not require credentials.
`gg.handler.name.proxyProtocol`	Optional	`http \| https`	None	If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the protocol of your proxy server.
`gg.handler.name.proxyPort`	Optional	The port number of your proxy server.	None	If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the port number of your proxy server.
`gg.handler.name.proxyServer`	Optional	The host name of your proxy server.	None	If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the host name of your proxy server.

Sample Properties for Elasticsearch Handler using REST Client

gg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9200
gg.handler.elasticsearch.version=rest7.x
gg.classpath=/path/to/elasticsearch/lib/*:/path/to/elasticsearch/modules/reindex/*:/path/to/elasticsearch/modules/lang-mustache/*:/path/to/elasticsearch/modules/rank-eval/*

Parent topic: Configuring the Elasticsearch Handler

8.2.16.1.3.1.6 Authentication for REST Client

The configurable property auth-type value SSL can be used to configure the SSL authentication mechanism for communicating with Elasticsearch cluster. This property can also be used to configure the basic authentication with SSL by providing configurable property basic username/password along with the trust-store properties.

Parent topic: Configuring the Elasticsearch Handler

8.2.16.1.3.1.7 Classpath Settings for REST Client

The Classpath for High Level REST client must contain the jars from the library (lib) folder and modules folders (reindex, lang-mustache and ran-eval) inside the Elasticsearch installation directory. The REST client are dependent on these libraries and should be included in gg.classpath for the handler to work. Following are the list of dependencies:

1.	[/path/to/elasticsearch/lib/*]
2.	[/path/to/elasticsearch/modules/reindex/*]
3.	[/path/to/elasticsearch/modules/lang-mustache/*]
4.	[/path/to/elasticsearch/modules/rank-eval/*]

Parent topic: Configuring the Elasticsearch Handler

8.2.16.1.4 Troubleshooting

This section contains information to help you troubleshoot various issues.

Transport Client Properties File Not Found

This is applicable for Transport Client only when the property version is set to 6.x or 7.x.

Error:

ERROR 2017-01-30 22:33:10,058 [main] Unable to establish connection. Check handler properties
      and client settings configuration.

To resolve this exception, verify that the gg.handler.name.clientSettingsFile configuration property is correctly setting the Elasticsearch transport client settings file name. Verify that the gg.classpath variable includes the path to the correct file name and that the path to the properties file does not contain an asterisk (*) wildcard at the end.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.4.1 Incorrect Java Classpath

The most common initial error is an incorrect classpath to include all the required client libraries and creates a ClassNotFound exception in the log4j log file.

Also, it may be due to an error resolving the classpath if there is a typographic error in the gg.classpath variable.

The Elasticsearch transport client libraries do not ship with the Oracle GoldenGate for Big Data product. You should properly configure the gg.classpath property in the Java Adapter Properties file to correctly resolve the client libraries, see Setting Up and Running the Elasticsearch Handler.

Parent topic: Troubleshooting

8.2.16.1.4.2 Elasticsearch Version Mismatch

The Elasticsearch Handler gg.handler.name.version property must beset to one of the following values: 6.x, 7.x or REST7.x to match the major version number of the Elasticsearch cluster. For example, gg.handler.name.version=7.x.

The following errors may occur when there is a wrong version configuration:

Error: NoNodeAvailableException[None of the configured nodes are available:]

ERROR 2017-01-30 22:35:07,240 [main] Unable to establish connection. Check handler properties and client settings configuration.

java.lang.IllegalArgumentException: unknown setting [shield.user]

Ensure that all required plug-ins are installed and review documentation changes for any removed settings.

Parent topic: Troubleshooting

8.2.16.1.4.3 Transport Client Properties File Not Found

To resolve this exception:

ERROR 2017-01-30 22:33:10,058 [main] Unable to establish connection. Check handler properties and client settings configuration.

Verify that the gg.handler.name.clientSettingsFile configuration property is correctly setting the Elasticsearch transport client settings file name. Verify that the gg.classpath variable includes the path to the correct file name and that the path to the properties file does not contain an asterisk (*) wildcard at the end.

Parent topic: Troubleshooting

8.2.16.1.4.4 Cluster Connection Problem

This error occurs when the Elasticsearch Handler is unable to connect to the Elasticsearch cluster:

Error: NoNodeAvailableException[None of the configured nodes are available:]

Use the following steps to debug the issue:

Ensure that the Elasticsearch server process is running.
Validate the cluster.name property in the client properties configuration file.
Validate the authentication credentials for the x-Pack or Shield plug-in in the client properties file.
Validate the gg.handler.name.ServerAddressList handler property.

Parent topic: Troubleshooting

8.2.16.1.4.5 Unsupported Truncate Operation

The following error occurs when the Elasticsearch Handler finds a TRUNCATE operation in the source trail:

oracle.goldengate.util.GGException: Elasticsearch Handler does not support the operation: TRUNCATE

This exception error message is written to the handler log file before the RAeplicat process abends. Removing the GETTRUNCATES parameter from the Replicat parameter file resolves this error.

Parent topic: Troubleshooting

8.2.16.1.4.6 Bulk Execute Errors

DEBUG [main] (ElasticSearch5DOTX.java:130) - Bulk execute status: failures:[true] buildFailureMessage:[failure in bulk execution: [0]: index [cs2cat_s1sch_n1tab], type [N1TAB], id [83], message [RemoteTransportException[[UOvac8l][127.0.0.1:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@43eddfb2 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5ef5f412[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 84]]];]

It may be due to the Elasticsearch running out of resources to process the operation. You can limit the Replicat batch size using MAXTRANSOPS to match the value of the thread_pool.bulk.queue_size Elasticsearch configuration parameter.

Note:

Changes to the Elasticsearch parameter, thread_pool.bulk.queue_size, are effective only after the Elasticsearch node is restarted.

Parent topic: Troubleshooting

8.2.16.1.5 Performance Consideration

The Elasticsearch Handler gg.handler.name.bulkWrite property is used to determine whether the source trail records should be pushed to the Elasticsearch cluster one at a time or in bulk using the bulk write API. When this property is true, the source trail operations are pushed to the Elasticsearch cluster in batches whose size can be controlled by the MAXTRANSOPS parameter in the generic Replicat parameter file. Using the bulk write API provides better performance.

Elasticsearch uses different thread pools to improve how memory consumption of threads are managed within a node. Many of these pools also have queues associated with them, which allow pending requests to be held instead of discarded.

For bulk operations, the default queue size is 50 (in version 5.2) and 200 (in version 5.3).

To avoid bulk API errors, you must set the Replicat MAXTRANSOPS size to match the bulk thread pool queue size at a minimum. The configuration thread_pool.bulk.queue_size property can be modified in the elasticsearch.yaml file.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.6 About the Shield Plug-In Support

Elasticsearch versions 6.x and 7.x (X-Pack plug-in for Elasticsearch 6.x and 7.x) support a Shield plug-in which provides basic authentication, SSL and IP filtering. Similar capabilities exist in the X-Pack plug-in for Elasticsearch 6.x and 7.x. The additional transport client settings can be configured in the Elasticsearch Handler using the gg.handler.name.clientSettingsFile property.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.7 About DDL Handling

The Elasticsearch Handler does not react to any DDL records in the source trail. Any data manipulation records for a new source table results in auto-creation of index or type in the Elasticsearch cluster.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.8 Known Issues in the Elasticsearch Handler

Elasticsearch: Trying to input very large number

Very large numbers result in inaccurate values with Elasticsearch document. For example, 9223372036854775807, -9223372036854775808. This is an issue with the Elasticsearch server and not a limitation of the Elasticsearch Handler.

The workaround for this issue is to ingest all the number values as strings using the gg.handler.name.numberAsString=true property.

Elasticsearch: Issue with index

The Elasticsearch Handler is not able to input data into the same index if there are more than one table with similar column names and different column data types.

Index names are always lowercase though the catalog/schema/tablename in the trail may be case-sensitive.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.9 Elasticsearch Handler Transport Client Dependencies

What are the dependencies for the Elasticsearch Handler to connect to Elasticsearch databases?

The maven central repository artifacts for Elasticsearch databases are:

Maven groupId: org.elasticsearch.client

Maven atifactId: transport

Maven groupId: org.elasticsearch.client

Maven atifactId: x-pack-transport

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.1.10 Elasticsearch High Level REST Client Dependencies

The maven coordinates for the Elasticsearch High Level REST client are:

Maven groupId: org.elasticsearch.client

Maven atifactId: elasticsearch-rest-high-level-client

Maven version: 7.13.3

Note:

Ensure not to mix the versions in the jar files dependency stack for the Elasticsearch High Level REST Client. Mixing versions results in dependency conflicts.

Parent topic: Elasticsearch with Elasticsearch 7x and 6x

8.2.16.2 Elasticsearch 8x

The Elasticsearch Handler allows you to store, search, and analyze large volumes of data quickly and in near real time.

This article describes how to use the Elasticsearch handler (starting Oracle GoldenGate for Big Data 21.10.0.0.0). In Oracle GoldenGate for Big Data version 21.10.0.0, the Elasticsearch handler was modified to support a new Elasticsearch client. The new client supports Elasticsearch 8.x.

Overview
Detailing the Functionality
About the Index
About the Document
About the Data Types
About the Connection
About Supported Operation
About DDL Handling
About the Primary Key Update
About UPSERT
About Bulk Write
About Routing
About Request Headers
About Java API Client
Setting Up the Elasticsearch Handler
Elasticsearch Handler Configuration
Enabling Security for Elasticsearch
The Elasticsearch cluster must be accessed in secured manner in production environment. Security features must be first enabled in Elasticsearch cluster and those security configurations must be added to Elasticsearch handler properties file
Security Configuration for Elasticsearch Cluster
The latest version of Elasticsearch has the security auto-configured when it is installed and started. The logs will print security details for auto-configured cluster as follows:
Security Configuration for Elasticsearch Handler
Troubleshooting
Elasticsearch Handler Client Dependencies
What are the dependencies for the Elasticsearch Handler to connect to Elasticsearch databases?

Parent topic: Elasticsearch

8.2.16.2.1 Overview

The Elasticsearch Handler uses the Elasticsearch Java client to connect and receive data into Elasticsearch node, see https://www.elastic.co.

Parent topic: Elasticsearch 8x

8.2.16.2.2 Detailing the Functionality

This topic details the Elasticsearch Handler functionality.

Parent topic: Elasticsearch 8x

8.2.16.2.3 About the Index

For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name. The Elasticsearch Handler maps the source trail schema concatenated with source trail table name to construct the index when there is no catalog in source table.

Table 8-18 Elasticsearch Mapping

Source Trail	Elasticsearch Index
`schema.tablename`	`schema_tablename`
`catalog.schema.tablename`	`catalog_schema_tablename`

If an index does not already exist in the Elasticsearch cluster, a new index is created when Elasticsearch Handler receives (INSERT or UPDATE operation in source trail) data.

If Handler receives DELETE operation in source trail but the index does not exist in Elasticsearch cluster, then the handler will ABEND.

Parent topic: Elasticsearch 8x

8.2.16.2.4 About the Document

If Handler receives DELETE operation in source trail but the index does not exist in Elasticsearch cluster, then the handler will ABEND.

Parent topic: Elasticsearch 8x

8.2.16.2.5 About the Data Types

Elasticsearch supports the following data types:

32-bit integer
64-bit integer
Double
Date
String
Binary

Parent topic: Elasticsearch 8x

8.2.16.2.6 About the Connection

A cluster is a collection of one or more nodes (servers) that holds the entire data. It provides federated indexing and search capabilities across all nodes.

A node is a single server that is part of the cluster, stores the data, and participates in the cluster’s indexing and searching.

The Elasticsearch Handler property gg.handler.name.ServerAddressList can be set to point to the nodes available in the cluster.

Elasticsearch Handler uses the Java API client to connect to Elasticsearch cluster nodes configured in above handler property via http/https protocol, even though the cluster nodes internally communicate with each other using transport layer protocol.

Port for http/https must be configured in handler property (instead of transport port) for connection via Elasticsearch client.

Parent topic: Elasticsearch 8x

8.2.16.2.7 About Supported Operation

The Elasticsearch Handler supports the following operations for replication to Elasticsearch cluster in the target.

INSERT: The Elasticsearch Handler creates a new index if the index does not exist, and then inserts a new document. If the _id is already present, it overwrites (replaces) the existing record with new record with same _id.
UPDATE: If an Elasticsearch index or document exists, the document is updated. If an Elasticsearch index or document does not exist, then a new index is created and the column values in the UPDATE operation are inserted as a new document.
DELETE: If an Elasticsearch index or _id of document exists, then the document is deleted. If _id of document does not exist, then it continues without doing anything. If Elasticsearch index is missing, then it will ABEND the handler.

The TRUNCATE operation is not supported.

Parent topic: Elasticsearch 8x

8.2.16.2.8 About DDL Handling

Parent topic: Elasticsearch 8x

8.2.16.2.9 About the Primary Key Update

The Elasticsearch document identifier is created based on the source table's primary key column value. The document identifier cannot be modified.

The Elasticsearch handler processes a source primary key's update operation by performing a DELETE followed by an INSERT. While performing the INSERT, there is a possibility that the new document may contain fewer fields than required.

For the INSERT operation to contain all the fields in the source table, enable trail Extract to capture the full data before images for update operations or use GETBEFORECOLS to write the required column’s before images.

Parent topic: Elasticsearch 8x

8.2.16.2.10 About UPSERT

The Elasticsearch handler supports UPSERT mode for UPDATE operations. This mode can be enabled by setting the Elasticsearch handler property gg.handler.name.upsert as true. This is enabled by default.

The UPSERT mode ensures that for an UPDATE operation from source trail, if the index or the _id of document is missing from Elasticsearch cluster, it will create the index and convert the operation to INSERT for adding it as a new record.

Elasticsearch Handler will ABEND for same scenario when UPSERT is false.

In future releases, this mechanism will be enhanced to be in line with HANDLECOLLISION mode Oracle GoldenGate where:

An insert collision should result in duplicate error.
A missing update or delete should result in not found error.

The corresponding error codes will be returned back to replicat and handled by it as per Oracle GoldenGate handle collision strategy.

Parent topic: Elasticsearch 8x

8.2.16.2.11 About Bulk Write

The Elasticsearch handler supports bulk operation mode where multiple operations can be grouped into a batch and whole batch can be applied to target Elasticsearch cluster in one shot. This improves the performance.

Bulk mode can be enabled by setting the value of Elasticsearch handler property gg.handler.name.bulkWrite as true. It is disabled by default.

Bulk mode has a few limitations. If there is any failure (exception thrown) for an operation in bulk, it can result in inconsistent data at target. For example, a delete operation where the index is missing from the target Elasticsearch cluster, it will result in exception. If such an operation is part of a batch in bulk mode, then the batch is not applied after the failure of that operation, resulting in inconsistency.

To avoid bulk API errors, you must set the handler MAXTRANSOPS size to match the bulk thread pool queue size at a minimum.

The configuration thread_pool.bulk.queue_size property can be modified in the elasticsearch.yaml file.

Parent topic: Elasticsearch 8x

8.2.16.2.12 About Routing

A document is routed to a particular shard in an index using the _routing value. The default _routing value is the document’s _id field. Custom routing patterns can be implemented by specifying a custom routing value per document.

Elasticsearch Handler supports custom routing by specifying the mapping field key in the property gg.handler.name.routingKeyMappingTemplate of Elasticsearch handler properties file.

Parent topic: Elasticsearch 8x

8.2.16.2.13 About Request Headers

Elasticsearch allows sending additional request headers (header name and value pair) along with the http requests of REST calls. The Elasticsearch Handler supports sending additional headers by specifying header name and value pairs in the Elasticsearch Handler property gg.handler.name.headers in the properties file.

Parent topic: Elasticsearch 8x

8.2.16.2.14 About Java API Client

Elasticsearch Handler now uses Java API Client to connect Elasticsearch cluster for performing all operations of replication. It internally uses Elasticsearch Rest Client and Transport Client to perform all the operations. The older clients like Rest High-Level Client and Transport Client are deprecated and hence removed.

Supported Versions of Elasticsearch Cluster

To configure this handler, Elasticsearch cluster version 7.16.x or above must be configured and running. To configure Elasticsearch cluster, see Get Elasticsearch up and running

Parent topic: Elasticsearch 8x

8.2.16.2.15 Setting Up the Elasticsearch Handler

You must ensure that the Elasticsearch cluster is setup correctly and the cluster is up and running. Supported versions of Elasticsearch cluster are 7.16.x and above. See https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html. Alternatively, you can use Kibana to verify the setup.

Parent topic: Elasticsearch 8x

8.2.16.2.16 Elasticsearch Handler Configuration

To configure the Elasticsearch Handler, the parameter file (res.prm) and the properties (elasticsearch.props) file must be configured with valid values.

Parameter File:

Parameter file should point to the correct properties file for Elasticsearch Handler.

The following are the mandatory parameters for parameter file (res.prm) necessary for running Elasticsearch Handler:

-	REPLICAT replicat-name  
-	TARGETDB LIBFILE libggjava.so SET property=dirprm/elasticsearch.props 
-	MAP schema-name.table-name, TARGET schema-name.table-name

Properties File:

The following are the mandatory properties for properties file (elasticsearch.props), which is necessary for running Elasticsearch handler:

-	gg.handlerlist=elasticsearch
-	gg.handler.elasticsearch.type=elasticsearch
-	gg.handler.elasticsearch.ServerAddressList=127.0.0.1:9200

Table 8-19 Elasticsearch Handler Configuration Properties

Property Name	Required (Yes/No)	Legal Values (Default value)	Explanation
`gg.handler.name.ServerAddressList`	`Yes`	`[<Hostname\|ip>:<port>, <Hostname\|ip>:<port>, …]` `[localhost:9200]`	List of valid hostnames (or IP) and port number separated by ‘:’ of cluster nodes of Elasticsearch cluster.
`gg.handler.name.BulkWrite`	`No`	`[true \| false]` `Default [false]`	If Bulk Write mode is enabled (set true), the operations of transaction will be stored in batch and applied to target ES cluster in one shot for a batch (transaction) depending on batch size.
`gg.handler.name.Upsert`	`No`	`[true \| false]` `[true]`	If upsert mode is enabled (set to true), the update operation will be inserted as new document when it’s missing on target ES cluster.
`gg.handler.name.NumberAsString`	`No`	`[true \| false]` `[false]`	Set if the number will be stored as string.
`gg.handler.name.ProxyServer`	`No`	`[Proxy-Hostname \| Proxy-IP]`	Proxy server hostname (or IP) to connect to Elasticsearch cluster.
`gg.handler.name.ProxyPort`	`No`	`[Port number]`	Port number of proxy server. Required if proxy is configured.
`gg.handler.name.ProxyProtocol`	`No`	`[http \| https]` `[http]`	Protocol for Proxy server connection.
`gg.handler.name.ProxyUsername`	`No`	[Username of proxy server]	Username for connecting to Proxy server.
`gg.handler.name.ProxyPassword`	`No`	[Password of proxy server]	Password for connecting to Proxy server. This can be encrypted using `ORACLEWALLET`.
`gg.handler.name.AuthType`	`No`	`[basic \| ssl \| none]` `[none]`	Authentication type to be used for connecting to Elasticsearch cluster.
`gg.handler.name.BasicAuthUsername`	`No`	[username of ES cluster]	Username credential for basic authentication to connect ES server. This can be encrypted using `ORACLEWALLET`.
`gg.handler.name.BasicAuthPassword`	`No`	[password of ES cluster]	Password credential for basic authentication to connect ES server. This can be encrypted using `ORACLEWALLET`.
`gg.handler.name.Fingerprint`	`No`	[fingerprint hash code]	It is the hash of a certificate calculated on all certificate's data and its signature. Applicable for authentication type SSL. This can be encrypted using `ORACLEWALLET`.
`gg.handler.name.CertFilePath`	`No`	`[/path/to/CA_certificate_file.crt]`	CA certificate file (.`crt`) for SSL/TLS authentication.
`gg.handler.name.TrustStore`	`No`	`[/Path/to/trust-store-file]`	Path to Trust-store file in server for SSL / TLS server authentication. Applicable for authentication type SSL.
`gg.handler.name.TrustStorePassword`	`No`	`[trust-store password]`	Password for Trust-store file for SSL/TLS authentication. Applicable for authentication type SSL. This can be encrypted using `ORACLEWALLET`.
`gg.handler.name.TrustStoreType`	`No`	`[jks \| pkcs12]` `[jks]`	The key-store type for SSL/TLS authentication. Applicable if authentication type is SSL.
`gg.handler.name.RoutingKeyMappingTemplate`	`No`	[Routing field-name]	This defines the field-name whose value will be mapped for routing to particular shard in an index of ES cluster.
`gg.handler.name.Headers`	`No`	`[<key>:<value>,` `<key>:<value>, …]`	List of name and value pair of headers to be sent with REST calls.
`gg.handler.name.MaxConnectTimeout`	`No`	Time in seconds	Time in seconds that request will wait for connecting to Elasticsearch server.
`gg.handler.name.MaxSocketTimeout`	`No`	Time in seconds	Time in seconds that request will wait for response to come from Elasticsearch server.
`gg.handler.name.IOThreadCount`	`No`	Count	Count of thread to handle IO requests.
`gg.handler.name.NodeSelector`	`No`	`ANY \| SKIP_DEDICATED _MASTERS` \| [Fully qualified name of node selector class] `[ANY]`	Predefined strategy `ANY` or `SKIP_DEDICATED_MASTERS`. Or fully qualified name of class that implements custom strategy (by implementing `NodeSelector.java` interface).

Set the Classpath

The Elasticsearch handler property gg.classpath must include all the dependency jars required by the Java API client. For a listing and downloading of the required client JAR files, use the Dependency Downloader script elasticsearch_java.sh in OGG_HOME/DependencyDownloader directory and pass the version 8.7.0 as argument. For more information about Elasticsearch client dependencies, see Elasticsearch Handler Client Dependencies.

It creates a directory OGG_HOME/DepedencyDownloader/dependencies/elasticsearch_rest_8.7.0 and downloads all the dependency jars inside it. The client library version 8.7.0 can be used for all supported Elasticsearch clusters.

This location can be configured in classpath as: gg.classpath=/path/to/OGG_HOME/DepedencyDownloader/dependencies/elasticsearch_rest_8.7.0/*

The inclusion of the * wildcard character at the end of the path can be used in order to include all of the JAR files in that directory in the associated classpath. Do not use *.jar.

Sample Configuration of Elasticsearch Handler:

For reference, to configure Elasticsearch handler, sample parameter (res.prm) and sample properties file (elasticsearch.props) for Elasticsearch handler is available in directory:

OGG_HOME/AdapterExamples/big-data/elasticsearch

Parent topic: Elasticsearch 8x

8.2.16.2.17 Enabling Security for Elasticsearch

The Elasticsearch cluster must be accessed in secured manner in production environment. Security features must be first enabled in Elasticsearch cluster and those security configurations must be added to Elasticsearch handler properties file

Parent topic: Elasticsearch 8x

8.2.16.2.18 Security Configuration for Elasticsearch Cluster

The latest version of Elasticsearch has the security auto-configured when it is installed and started. The logs will print security details for auto-configured cluster as follows:

- Elasticsearch security features have been automatically configured!
-	Authentication is enabled and cluster connections are encrypted.
-	Password for the elastic user (reset with `bin/elasticsearch-reset-password -u elastic`): nnh0LWKZMLkw_QD5jxhE
-	HTTP CA certificate SHA-256 fingerprint: 862e3f117c386a63f8f43db88760d463900e4c814590b8920e1c0e25f6db4df4
-	Configure Kibana to use this cluster:
-	Run Kibana and click the configuration link in the terminal when Kibana starts.
-	Copy the following enrollment token and paste it into Kibana in your browser (valid for the next 30 minutes): eyJ2ZXIiOiI4LjYuMiIsImFkciI6WyIxMDAuNzAuOTguNzM6OTIwMCJdLCJmZ3IiOiI4NjJlM2YxMTdjMzg2YTYzZjhmNDNkYjg4NzYwZDQ2MzkwMGU0YzgxNDU5MGI4OTIwZTFjMGUyNWY2ZGI0ZGY0Iiwia2V5IjoiUTVCVF9vWUJ2TnZDVXBSSkNTWEM6NkJNc3ZXanBUYWUwa0l6V1pDU1JPQSJ9

These security parameter values must be noted down and used to configure Elasticsearch handler. All the auto-generated certificates are created inside ElasticSearch-install-directory/config/cert folder.

If security is not auto-configured for older versions of Elasticsearch, we need to manually enable the security features like basic and encrypted (SSL) authentication in below configuration file of Elasticsearch cluster before running it.

Elasticsearch-installation-directory/config/elasticsearch.yml

Following parameters must be added to enable security features in elasticsearch.yml file and restarting the Elasticsearch cluster.


#----------------------- BEGIN SECURITY AUTO CONFIGURATION ----------------
# The following settings, TLS certificates and keys have been 
# configured for SSL/TLS authentication.
# -----------------------------------------------------------------------
# Enable security features
xpack.security.enabled: true
xpack.security.enrollment.enabled: true

# Enable encryption for HTTP API client connections
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

# Enable encryption and mutual authentication between cluster nodes
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
# Create a new cluster with the current node only
# Additional nodes can still join the cluster later
cluster.initial_master_nodes: ["cluster-host-name"]

# Allow HTTP API connections from anywhere
# Connections are encrypted and require user authentication
http.host: 0.0.0.0
#----------------------- END SECURITY AUTO CONFIGURATION --------------

For more information about the security setting of Elasticsearch cluster, see https://www.elastic.co/guide/en/elasticsearch/reference/current/manually-configure-security.html

Parent topic: Elasticsearch 8x

8.2.16.2.19 Security Configuration for Elasticsearch Handler

Elasticsearch handler supports three modes of security configuration which can be configured using the Elasticsearch Handler property gg.handler.name.authType with following values: Elasticsearch-installation-directory/config/elasticsearch.yml

None: This mode is used when no security feature is enabled in Elasticsearch stack. No other configuration is required for this mode and Elasticsearch can be accessed directly using http protocol.
Basic: This mode is used when only basic security feature is enabled for a user by setting a username and password for the user. The basic authentication username and password property must be provided in properties file in order to access the Elasticsearch cluster.
```
gg.handler.name.authType=basic
gg.handler.name.basicAuthUsername=elastic
gg.handler.name.basicAuthPassword=changeme
```
SSL: This mode mode is used when SSL/TLS authentication is configured for encryption in Elasticsearch stack. User must provide either of CA fingerprint hash, path to CA certificate file (.crt) OR path to trust-store file (along with trust-store type and trust-store password) for handler to be able to connect to Elasticsearch cluster. This mode also supports combination of SSL/TLS authentication and Basic authentication configured in Elasticsearch stack. User must configure both basic authentication properties (username and password) and SSL related properties (fingerprint or certificate file or trust-store), if both are configured in Elasticsearch cluster.
```
gg.handler.name.authType=ssl

# if basic authentication username and password is configured. 
gg.handler.name.basicAuthUsername=username
gg.handler.name.basicAuthPassword=password

# for SSL one of these three must be configured
gg.handler.name.certFilePath=/path/to/ESHome/config/certs/http_ca.crt
				OR
gg.handler.name.fingerprint=862e3f117c386a63f8f43db88760d463900e4c814590b8920e1c0e25f6db4df4
				OR
gg.handler.name.trustStore=/path/to/http.p12
gg.handler.name.trustStoreType=pkcs12
gg.handler.name.trustStorePassword=pass
```

All the above security related properties that contains confidential information can be configured to use Oracle Wallet for encrypting their confidential values in properties file.

Parent topic: Elasticsearch 8x

8.2.16.2.20 Troubleshooting

Error: org.elasticsearch.ElasticsearchException[Index [index-name] is not found] - This exception occurs when there is a delete operation and the corresponding index of delete operation is not present in the Elasticsearch cluster. This can also occur for the update operation if upsert=false and the index is missing.
Error: javax.net.ssl.SSLHandshakeException:[ Connection failed ] - This can happen when properties for enabling authentication in the elasticsearch.yml file mentioned above are missing for authentication type SSL.
Error: javax.net.ssl.SSLException: [Received fatal alert: bad_certificate] - This issue comes when host validation fails. Check that certificates generated using cert-utils in Elasticsearch contains the host information.

Parent topic: Elasticsearch 8x

8.2.16.2.21 Elasticsearch Handler Client Dependencies

What are the dependencies for the Elasticsearch Handler to connect to Elasticsearch databases?

The maven central repository artifacts for Elasticsearch databases are:

Maven groupId: co.elastic.clients

Maven atifactId: elasticsearch-java

Version: 8.7.0

Elasticsearch 8.7.0

Parent topic: Elasticsearch 8x

8.2.16.2.21.1 Elasticsearch 8.7.0

commons-codec-1.15.jar
commons-logging-1.2.jar
elasticsearch-java-8.7.0.jar
elasticsearch-rest-client-8.7.0.jar
httpasyncclient-4.1.5.jar
httpclient-4.5.13.jar
httpcore-4.4.13.jar
httpcore-nio-4.4.13.jar
jakarta.json-api-2.0.1.jar
jsr305-3.0.2.jar
parsson-1.0.0.jar

Parent topic: Elasticsearch Handler Client Dependencies