8.2.18 Elasticsearch

8.2.18.1 Elasticsearch with Elasticsearch 7x and 6x

The Elasticsearch Handler allows you to store, search, and analyze large volumes of data quickly and in near real time.

This article describes how to use the Elasticsearch handler.

Note:

This section on the Elasticsearch Handler pertains to Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) versions 21.9.0.0.0 and before. Starting with Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 21.10.0.0.0, the Elasticsearch client was changed in order to support Elasticsearch 8.x.

8.2.18.1.1 Overview

Elasticsearch is a highly scalable open-source full-text search and analytics engine. Elasticsearch allows you to store, search, and analyze large volumes of data quickly and in near real time. It is generally used as the underlying engine or technology that drives applications with complex search features.

The Elasticsearch Handler uses the Elasticsearch Java client to connect and receive data into Elasticsearch node, see https://www.elastic.co.

Note:

This section on the Elasticsearch Handler pertains to Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) versions 21.9.0.0.0 and before. Starting with GG for DAA 21.10.0.0.0, the Elasticsearch client was changed in order to support Elasticsearch 8.x.

8.2.18.1.2 Detailing the Functionality

This topic details the Elasticsearch Handler functionality.
8.2.18.1.2.1 About the Elasticsearch Version Property

The Elasticsearch Handler supports two different clients to communicate with the Elasticsearch cluster: The Elasticsearch transport client and the Elasticsearch High Level REST client.

Elasticsearch Handler can also be configured for the two supported clients by specifying the appropriate version of Elasticsearch handler properties file. Older version of Elasticsearch (6.x) supports only Transport client and the Elasticsearch handler can be configured by setting the configurable property version value to 6.x. For the latest version of Elasticsearch (7.x), both the Transport client and the High Level REST client are supported. Therefore, in the latest version, the Elasticsearch Handler can be configured for Transport client by setting the value of configurable property version to 7.x and High Level REST client by setting the value to Rest7.x.

The configurable parameters for each of them are as follows:

  1. Set the gg.handler.name.version configuration value to 6.x or 7.x to connect to the Elasticsearch cluster using the transport client using the respective version.
  2. Set the gg.handler.name.version configuration value to REST7.0 to connect to the Elasticseach cluster using the Elasticsearch High Level REST client. The REST client support Elasticsearch versions 7.x.
8.2.18.1.2.2 About the Index and Type

An Elasticsearch index is a collection of documents with similar characteristics. An index can only be created in lowercase. An Elasticsearch type is a logical group within an index. All the documents within an index or type should have same number and type of fields.

The Elasticsearch Handler maps the source trail schema concatenated with source trail table name to construct the index. For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.

The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.

Note:

Elasticsearch field names are case sensitive. If the field name in the data to be either updated or inserted are in uppercase and the existing fields in Elasticsearch server are in lowercase, then they are treated as new fields and not updated as existing fields. The workaround for this is using the parameter gg.schema.normalize=lowercase, which will update the field name to lowercase, thus resolving the issue.

Table 8-16 Elasticsearch Mapping

Source Trail Elasticsearch Index Elasticsearch Type

schema.tablename

schema_tablename

tablename

catalog.schema.tablename

catalog_schema_tablename

tablename

If an index does not already exist in the Elasticsearch cluster, a new index is created when Elasticsearch Handler receives (INSERT or UPDATE operation in source trail) data.

8.2.18.1.2.3 About the Document

An Elasticsearch document is a basic unit of information that can be indexed. Within an index or type, you can store as many documents as you want. Each document has an unique identifier based on the _id field.

The Elasticsearch Handler maps the source trail primary key column value as the document identifier.

8.2.18.1.2.4 About the Primary Key Update

The Elasticsearch document identifier is created based on the source table's primary key column value. The document identifier cannot be modified. The Elasticsearch handler processes a source primary key's update operation by performing a DELETE followed by an INSERT. While performing the INSERT, there is a possibility that the new document may contain fewer fields than required. For the INSERT operation to contain all the fields in the source table, enable trail Extract to capture the full data before images for update operations or use GETBEFORECOLS to write the required column’s before images.

8.2.18.1.2.5 About the Data Types

Elasticsearch supports the following data types:

  • 32-bit integer

  • 64-bit integer

  • Double

  • Date

  • String

  • Binary

8.2.18.1.2.6 Operation Mode

The Elasticsearch Handler uses the operation mode for better performance. The gg.handler.name.mode property is not used by the handler.

8.2.18.1.2.7 Operation Processing Support

The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.

For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.

INSERT

The Elasticsearch Handler creates a new index if the index does not exist, and then inserts a new document.

UPDATE

If an Elasticsearch index or document exists, the document is updated. If an Elasticsearch index or document does not exist, a new index is created and the column values in the UPDATE operation are inserted as a new document.

DELETE

If an Elasticsearch index or document exists, the document is deleted. If Elasticsearch index or document does not exist, a new index is created with zero fields.

The TRUNCATE operation is not supported.

8.2.18.1.2.8 About the Connection

A cluster is a collection of one or more nodes (servers) that holds the entire data. It provides federated indexing and search capabilities across all nodes.

A node is a single server that is part of the cluster, stores the data, and participates in the cluster’s indexing and searching.

The Elasticsearch Handler property gg.handler.name.ServerAddressList can be set to point to the nodes available in the cluster.

8.2.18.1.3 Setting Up and Running the Elasticsearch Handler

You must ensure that the Elasticsearch cluster is setup correctly and the cluster is up and running, see https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html. Alternatively, you can use Kibana to verify the setup.

Set the Classpath

The property gg.classpath must include all the jars required by the Java transport client. For a listing of the required client JAR files by version, see Elasticsearch Handler Transport Client Dependencies. For a listing of the required client JAR files for the Elatisticsearch High Level REST client, see Elasticsearch High Level REST Client Dependencies.

The inclusion of the * wildcard in the path can include the * wildcard character in order to include all of the JAR files in that directory in the associated classpath. Do not use *.jar.

The following is an example of the correctly configured classpath:

gg.classpath=Elasticsearch_Home/lib/*
8.2.18.1.3.1 Configuring the Elasticsearch Handler

Elasticsearch Handler can be configured for different version of Elasticsearch. For the latest version (7.x), two types of clients are supported: the Transport client and High-level REST client. When the configurable property version is set to the values 6.x or 7.x it uses Elasticsearch Transport client for connecting and performing all other operations of handler to Elasticsearch cluster. When the configurable property version is set to rest7.x, it uses Elasticsearch High Level REST client for connecting and performing other operations of handler to Elasticsearch 7.x cluster. The configurable parameters for each of them are separately given below:

Table 8-17 Common Configurable Properties

Properties Required/ Optional Legal Values Default Explanation
gg.handlerlist Required Name (Any name of your choice for handler) None The list of handlers to be used.
gg.handler.<name>.type Required elasticsearch None Type of handler to use. For example, Elasticsearch, Kafka, or Flume.
gg.handler.name.ServerAddressList Optional

Server:Port[, Server:Port …]

  • localhost:9300 (for Transport Client)
  • localhost:9200 (for High-Level REST Client)

Comma separated list of contact points of the nodes. The allowed port for version REST7.x is 9200. For other version, it is 9300.

gg.handler.name.version Required

5.x|6.x|7.x|REST7.x

7.x

The version values 5.x, 6.x, and 7.x indicate using the Elasticsearch Transport client to communicate with Elasticsearch version 5.x, 6.x and 7.x respectively. The version REST7.x indicates using the Elasticsearch High Level REST client to communicate with Elasticsearch version 7.x.

gg.handler.name.version gg.handler.name.bulkWrite Optional true | false false When this property is true, the Elasticsearch Handler uses the bulk write API to ingest data into Elasticsearch cluster. The batch size of bulk write can be controlled using the MAXTRANSOPS Replicat parameter.
gg.handler.name.numberAsString Optional true | false false When this property is true, the Elasticsearch Handler receives all the number column values (Long, Integer, or Double) in the source trail as strings into the Elasticsearch cluster.
gg.handler.elasticsearch.upsert Optional true | false true When this property is true, a new document is inserted if the document does not already exist when performing an UPDATE operation.

Example 8-1 Sample Handler Properties file:

Sample Replicat configuration and a Java Adapter Properties files can be found at the following directory:

GoldenGate_install_directory/AdapterExamples/big-data/elasticsearch

For Elasticsearch REST handler

gg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9300
gg.handler.elasticsearch.version=rest7.x
gg.classpath=/path/to/elasticsearch/lib/*:/path/to/elasticsearch/modules/reindex/*:/path/to/elasticsearch/modules/lang-mustache/*:/path/to/elasticsearch/modules/rank-eval/*
8.2.18.1.3.1.1 Common Configurable Properties
The common configurable properties that are applicable for all the versions of Elasticsearch and applicable for both Transport client as well as High Level REST client of Elasticsearch handler are as shown in the following table:

Table 8-18 Common Configurable Properties

Properties Required/ Optional Legal Values Default Explanation
gg.handlerlist Required Name (Any name of your choice for handler) None The list of handlers to be used.
gg.handler.<name>.type Required elasticsearch None Type of handler to use. For example, Elasticsearch, Kafka, or Flume.
gg.handler.name.ServerAddressList Optional

Server:Port[, Server:Port …]

  • localhost:9300 (for Transport Client)
  • localhost:9200 (for High-Level REST Client)

Comma separated list of contact points of the nodes. The allowed port for version REST7.x is 9200. For other version, it is 9300.

gg.handler.name.version Required

6.x|7.x|REST7.x

7.x

The version values 6.x, and 7.x indicate using the Elasticsearch Transport client to communicate with Elasticsearch version 6.x and 7.x respectively. The version REST7.x indicates using the Elasticsearch High Level REST client to communicate with Elasticsearch version 7.x.

gg.handler.name.version gg.handler.name.bulkWrite Optional true | false false When this property is true, the Elasticsearch Handler uses the bulk write API to ingest data into Elasticsearch cluster. The batch size of bulk write can be controlled using the MAXTRANSOPS Replicat parameter.
gg.handler.name.numberAsString Optional true | false false When this property is true, the Elasticsearch Handler receives all the number column values (Long, Integer, or Double) in the source trail as strings into the Elasticsearch cluster.
gg.handler.elasticsearch.upsert Optional true | false true When this property is true, a new document is inserted if the document does not already exist when performing an UPDATE operation.
8.2.18.1.3.1.2 Transport Client Configurable Properties

When the configurable property version is set to the value 6.x or 7.x, it uses Transport client to communicate with the corresponding version of Elasticsearch cluster. The configurable properties applicable when using Transport client only are as follows:

Table 8-19 Transport Client Configurable Properties

Properties Required/ Optional Legal Values Default Explanation
gg.handler.name.clientSettingsFile Required Transport client properties file. None The filename in classpath that holds Elasticsearch transport client properties used by the Elasticsearch Handler.
Sample Properties file for Elasticsearch Handler with Transport Client (with x-pack plugin)
Copygg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9300
gg.handler.elasticsearch.clientSettingsFile=client.properties
gg.handler.elasticsearch.version=[6.x | 7.x]
gg.classpath=/path/to/elastic/lib/*:/path/to/elastic/modules/transport-netty4/*:/path/to/elastic/modules/reindex/*: /path/to/elastic/plugins/x-pack/*: 
8.2.18.1.3.1.3 Transport Client Setting Properties File

The Elasticsearch Handler uses a Java Transport client to interact with Elasticsearch cluster. The Elasticsearch cluster may have additional plug-ins like shield or x-pack, which may require additional configuration.

The gg.handler.name.clientSettingsFile property should point to a file that has additional client settings based on the version of Elasticsearch cluster.

The Elasticsearch Handler attempts to locate and load the client settings file using the Java classpath. The Java classpath must include the directory containing the properties file.The client properties file for Elasticsearch (without any plug-in) is: cluster.name=Elasticsearch_cluster_name.

The Shield plug-in also supports additional capabilities like SSL and IP filtering. The properties can be set in the client.properties file, see https://www.elastic.co/guide/en/shield/current/_using_elasticsearch_java_clients_with_shield.html.

Example of client.properties file for Elasticsearch Handler with X-Pack plug-in:

Copycluster.name=Elasticsearch_cluster_name
xpack.security.user=x-pack_username:x-pack-password

The X-Pack plug-in also supports additional capabilities. The properties can be set in the client.properties file, see

https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.1/transport-client.html and https://www.elastic.co/guide/en/x-pack/current/java-clients.html

8.2.18.1.3.1.4 Classpath Settings for Transport Client

The gg.classpath setting for Elasticsearch handler with Transport client should contain the path to jars from library (lib) and modules (transport-netty4 and reindex modules) folder inside Elasticsearch installation directory. If x-pack plugin is used for authentication purpose, then the classpath should also include the jars inside the plugins (x-pack) folder inside Elasticsearch installation directory. See the path for jars as follows:

.

1.	[path/to/elastic/lib/*]
2.	[/path/to/elastic/modules/transport-netty4/*]
3.	[/path/to/elastic/modules/reindex/*]
4.	[/path/to/elastic/plugins/x-pack/*]  This needs to be added only if x-pack plugin is configured in Elasticsearch 
8.2.18.1.3.1.5 REST Client Configurable Properties

When the configurable property version is set to value rest7.x, the handler uses Elasticsearch High Level REST client to connect to Elasticsearch 7.x cluster. The configurable properties that are supported for REST client only are as follows:

Properties Required/ Optional Legal Values Default Explanation

gg.handler.elasticsearch.routingTemplate

Optional

${columnValue[table1=column1,table2=column2,…]

None The template to be used for deciding the routing algorithm.
gg.handler.name.authType Optional none | basic | ssl None Controls the authentication type for the Elasticsearch REST client.
  • none - No authentication
  • basic - Client authentication using username and password without message encrytption.
  • ssl - Mutual authentication. Client authenticates the server using a trust-store. Server authentication client using username and password. Messages are encrypted.
gg.handler.name.authType gg.handler.name.basicAuthUsername Required (for auth-type basic.) A valid username None The username for the server to authenticate the Elasticsearch REST client. Must be provided for auth types basic.
gg.handler.name.basicAuthPassword Required (for auth-type basic.) A valid password None The password for the server to authenticate the Elasticsearch REST client. Must be provided for auth types basic.
gg.handler.name.trustStore Required (for auth-type SSL) The fully qualified name (path + name) of trust-store file None The truststore for the Elasticsearch client to validate the certificate received from the Elasticsearch server. Must be provided if the auth type is set to ssl. Valid only for the Elasticsearch REST client
gg.handler.name.trustStorePassword Required (for auth-type SSL) A valid trust-store Password None The password for the truststore for the Elasticsearch REST client to validate the certificate received from the Elasticsearch server. Must be provided if the auth type is set to ssl.
gg.handler.name.maxConnectTimeout Optional Positive integer Default value of Apache HTTP Components framework. Set the maximum wait period for a connection to be established from the Elasticsearch REST client to the Elasticsearch server. Valid only for the Elasticsearch REST client.
gg.handler.name.maxSocketTimeout Optional Positive Integer Default value of Apache HTTP Components framework. Sets the maximum wait period in milliseconds to wait for a response from the service after issuing a request. May need to be increased when pushing large data volumes. Valid only for the Elasticsearch REST client.
gg.handler.name.proxyUsername Optional The proxy server username None If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the username of your proxy server. Most proxy servers do not require credentials.
gg.handler.name.proxyPassword Optional The proxy server password None If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the password of your proxy server. Most proxy servers do not require credentials.
gg.handler.name.proxyProtocol Optional http | https None If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the protocol of your proxy server.
gg.handler.name.proxyPort Optional The port number of your proxy server. None If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the port number of your proxy server.
gg.handler.name.proxyServer Optional The host name of your proxy server. None If the connectivity to the Elasticsearch uses the REST client and routing through a proxy server, then this property sets the host name of your proxy server.

Sample Properties for Elasticsearch Handler using REST Client

gg.handlerlist=elasticsearch
gg.handler.elasticsearch.type=elasticsearch
gg.handler.elasticsearch.ServerAddressList=localhost:9200
gg.handler.elasticsearch.version=rest7.x
gg.classpath=/path/to/elasticsearch/lib/*:/path/to/elasticsearch/modules/reindex/*:/path/to/elasticsearch/modules/lang-mustache/*:/path/to/elasticsearch/modules/rank-eval/*
8.2.18.1.3.1.6 Authentication for REST Client

The configurable property auth-type value SSL can be used to configure the SSL authentication mechanism for communicating with Elasticsearch cluster. This property can also be used to configure the basic authentication with SSL by providing configurable property basic username/password along with the trust-store properties.

8.2.18.1.3.1.7 Classpath Settings for REST Client

The Classpath for High Level REST client must contain the jars from the library (lib) folder and modules folders (reindex, lang-mustache and ran-eval) inside the Elasticsearch installation directory. The REST client are dependent on these libraries and should be included in gg.classpath for the handler to work. Following are the list of dependencies:

1.	[/path/to/elasticsearch/lib/*]
2.	[/path/to/elasticsearch/modules/reindex/*]
3.	[/path/to/elasticsearch/modules/lang-mustache/*]
4.	[/path/to/elasticsearch/modules/rank-eval/*]

8.2.18.1.4 Troubleshooting

This section contains information to help you troubleshoot various issues.

Transport Client Properties File Not Found

This is applicable for Transport Client only when the property version is set to 6.x or 7.x.

Error:
ERROR 2017-01-30 22:33:10,058 [main] Unable to establish connection. Check handler properties
      and client settings configuration.

To resolve this exception, verify that the gg.handler.name.clientSettingsFile configuration property is correctly setting the Elasticsearch transport client settings file name. Verify that the gg.classpath variable includes the path to the correct file name and that the path to the properties file does not contain an asterisk (*) wildcard at the end.

8.2.18.1.4.1 Incorrect Java Classpath

The most common initial error is an incorrect classpath to include all the required client libraries and creates a ClassNotFound exception in the log4j log file.

Also, it may be due to an error resolving the classpath if there is a typographic error in the gg.classpath variable.

The Elasticsearch transport client libraries do not ship with the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) product. You should properly configure the gg.classpath property in the Java Adapter Properties file to correctly resolve the client libraries, see Setting Up and Running the Elasticsearch Handler.

8.2.18.1.4.2 Elasticsearch Version Mismatch

The Elasticsearch Handler gg.handler.name.version property must beset to one of the following values: 6.x, 7.x or REST7.x to match the major version number of the Elasticsearch cluster. For example, gg.handler.name.version=7.x.

The following errors may occur when there is a wrong version configuration:

Error: NoNodeAvailableException[None of the configured nodes are available:]

ERROR 2017-01-30 22:35:07,240 [main] Unable to establish connection. Check handler properties and client settings configuration.

java.lang.IllegalArgumentException: unknown setting [shield.user] 

Ensure that all required plug-ins are installed and review documentation changes for any removed settings.

8.2.18.1.4.3 Transport Client Properties File Not Found

To resolve this exception:

ERROR 2017-01-30 22:33:10,058 [main] Unable to establish connection. Check handler properties and client settings configuration.

Verify that the gg.handler.name.clientSettingsFile configuration property is correctly setting the Elasticsearch transport client settings file name. Verify that the gg.classpath variable includes the path to the correct file name and that the path to the properties file does not contain an asterisk (*) wildcard at the end.

8.2.18.1.4.4 Cluster Connection Problem

This error occurs when the Elasticsearch Handler is unable to connect to the Elasticsearch cluster:

Error: NoNodeAvailableException[None of the configured nodes are available:]

Use the following steps to debug the issue:

  1. Ensure that the Elasticsearch server process is running.

  2. Validate the cluster.name property in the client properties configuration file.

  3. Validate the authentication credentials for the x-Pack or Shield plug-in in the client properties file.

  4. Validate the gg.handler.name.ServerAddressList handler property.

8.2.18.1.4.5 Unsupported Truncate Operation

The following error occurs when the Elasticsearch Handler finds a TRUNCATE operation in the source trail:

oracle.goldengate.util.GGException: Elasticsearch Handler does not support the operation: TRUNCATE

This exception error message is written to the handler log file before the RAeplicat process abends. Removing the GETTRUNCATES parameter from the Replicat parameter file resolves this error.

8.2.18.1.4.6 Bulk Execute Errors
""
DEBUG [main] (ElasticSearch5DOTX.java:130) - Bulk execute status: failures:[true] buildFailureMessage:[failure in bulk execution: [0]: index [cs2cat_s1sch_n1tab], type [N1TAB], id [83], message [RemoteTransportException[[UOvac8l][127.0.0.1:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@43eddfb2 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5ef5f412[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 84]]];]

It may be due to the Elasticsearch running out of resources to process the operation. You can limit the Replicat batch size using MAXTRANSOPS to match the value of the thread_pool.bulk.queue_size Elasticsearch configuration parameter.

Note:

Changes to the Elasticsearch parameter, thread_pool.bulk.queue_size, are effective only after the Elasticsearch node is restarted.

8.2.18.1.5 Performance Consideration

The Elasticsearch Handler gg.handler.name.bulkWrite property is used to determine whether the source trail records should be pushed to the Elasticsearch cluster one at a time or in bulk using the bulk write API. When this property is true, the source trail operations are pushed to the Elasticsearch cluster in batches whose size can be controlled by the MAXTRANSOPS parameter in the generic Replicat parameter file. Using the bulk write API provides better performance.

Elasticsearch uses different thread pools to improve how memory consumption of threads are managed within a node. Many of these pools also have queues associated with them, which allow pending requests to be held instead of discarded.

For bulk operations, the default queue size is 50 (in version 5.2) and 200 (in version 5.3).

To avoid bulk API errors, you must set the Replicat MAXTRANSOPS size to match the bulk thread pool queue size at a minimum. The configuration thread_pool.bulk.queue_size property can be modified in the elasticsearch.yaml file.

8.2.18.1.6 About the Shield Plug-In Support

Elasticsearch versions 6.x and 7.x (X-Pack plug-in for Elasticsearch 6.x and 7.x) support a Shield plug-in which provides basic authentication, SSL and IP filtering. Similar capabilities exist in the X-Pack plug-in for Elasticsearch 6.x and 7.x. The additional transport client settings can be configured in the Elasticsearch Handler using the gg.handler.name.clientSettingsFile property.

8.2.18.1.7 About DDL Handling

The Elasticsearch Handler does not react to any DDL records in the source trail. Any data manipulation records for a new source table results in auto-creation of index or type in the Elasticsearch cluster.

8.2.18.1.8 Known Issues in the Elasticsearch Handler

Elasticsearch: Trying to input very large number

Very large numbers result in inaccurate values with Elasticsearch document. For example, 9223372036854775807, -9223372036854775808. This is an issue with the Elasticsearch server and not a limitation of the Elasticsearch Handler.

The workaround for this issue is to ingest all the number values as strings using the gg.handler.name.numberAsString=true property.

Elasticsearch: Issue with index

The Elasticsearch Handler is not able to input data into the same index if there are more than one table with similar column names and different column data types.

Index names are always lowercase though the catalog/schema/tablename in the trail may be case-sensitive.

8.2.18.1.9 Elasticsearch Handler Transport Client Dependencies

What are the dependencies for the Elasticsearch Handler to connect to Elasticsearch databases?

The maven central repository artifacts for Elasticsearch databases are:

Maven groupId: org.elasticsearch.client

Maven atifactId: transport

Maven groupId: org.elasticsearch.client

Maven atifactId: x-pack-transport

8.2.18.1.10 Elasticsearch High Level REST Client Dependencies

The maven coordinates for the Elasticsearch High Level REST client are:

Maven groupId: org.elasticsearch.client

Maven atifactId: elasticsearch-rest-high-level-client

Maven version: 7.13.3

Note:

Ensure not to mix the versions in the jar files dependency stack for the Elasticsearch High Level REST Client. Mixing versions results in dependency conflicts.

8.2.18.2 Elasticsearch 8x

The Elasticsearch Handler allows you to store, search, and analyze large volumes of data quickly and in near real time.

This article describes how to use the Elasticsearch handler (starting Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) 21.10.0.0.0). In Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) version 21.10.0.0, the Elasticsearch handler was modified to support a new Elasticsearch client. The new client supports Elasticsearch 8.x.

8.2.18.2.1 Overview

Elasticsearch is a highly scalable open-source full-text search and analytics engine. Elasticsearch allows you to store, search, and analyze large volumes of data quickly and in near real time. It is generally used as the underlying engine or technology that drives applications with complex search features.

The Elasticsearch Handler uses the Elasticsearch Java client to connect and receive data into Elasticsearch node, see https://www.elastic.co.

8.2.18.2.2 Detailing the Functionality

This topic details the Elasticsearch Handler functionality.

8.2.18.2.3 About the Index

An Elasticsearch index is a collection of documents with similar characteristics. An index can only be created in lowercase. An Elasticsearch type is a logical group within an index. All the documents within an index or type should have same number and type of fields. Index in Elasticsearch is equivalent to table in RDBMS.

For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name. The Elasticsearch Handler maps the source trail schema concatenated with source trail table name to construct the index when there is no catalog in source table.

Table 8-20 Elasticsearch Mapping

Source Trail Elasticsearch Index

schema.tablename

schema_tablename

catalog.schema.tablename

catalog_schema_tablename

If an index does not already exist in the Elasticsearch cluster, a new index is created when Elasticsearch Handler receives (INSERT or UPDATE operation in source trail) data.

If Handler receives DELETE operation in source trail but the index does not exist in Elasticsearch cluster, then the handler will ABEND.

8.2.18.2.4 About the Document

An Elasticsearch document is a basic unit of information that can be indexed. Within an index or type, you can store as many documents as you want. Each document has an unique identifier based on the _id field.

If Handler receives DELETE operation in source trail but the index does not exist in Elasticsearch cluster, then the handler will ABEND.

8.2.18.2.5 About the Data Types

Elasticsearch supports the following data types:

  • 32-bit integer

  • 64-bit integer

  • Double

  • Date

  • String

  • Binary

8.2.18.2.6 About the Connection

A cluster is a collection of one or more nodes (servers) that holds the entire data. It provides federated indexing and search capabilities across all nodes.

A node is a single server that is part of the cluster, stores the data, and participates in the cluster’s indexing and searching.

The Elasticsearch Handler property gg.handler.name.ServerAddressList can be set to point to the nodes available in the cluster.

Elasticsearch Handler uses the Java API client to connect to Elasticsearch cluster nodes configured in above handler property via http/https protocol, even though the cluster nodes internally communicate with each other using transport layer protocol.

Port for http/https must be configured in handler property (instead of transport port) for connection via Elasticsearch client.

8.2.18.2.7 About Supported Operation

The Elasticsearch Handler supports the following operations for replication to Elasticsearch cluster in the target.

INSERT

The Elasticsearch Handler creates a new index if the index does not exist, and then inserts a new document. If the _id is already present, it overwrites (replaces) the existing record with new record with same _id.

UPDATE

If an Elasticsearch index or document exists, the document is updated. If an Elasticsearch index or document does not exist, then a new index is created and the column values in the UPDATE operation are inserted as a new document.

DELETE

If an Elasticsearch index or _id of document exists, then the document is deleted. If _id of document does not exist, then it continues without doing anything. If Elasticsearch index is missing, then it will ABEND the handler.

The TRUNCATE operation is not supported.

8.2.18.2.8 About DDL Handling

The Elasticsearch Handler does not react to any DDL records in the source trail. Any data manipulation records for a new source table results in auto-creation of index or type in the Elasticsearch cluster.

8.2.18.2.9 About the Primary Key Update

The Elasticsearch document identifier is created based on the source table's primary key column value. The document identifier cannot be modified.

The Elasticsearch handler processes a source primary key's update operation by performing a DELETE followed by an INSERT. While performing the INSERT, there is a possibility that the new document may contain fewer fields than required.

For the INSERT operation to contain all the fields in the source table, enable trail Extract to capture the full data before images for update operations or use GETBEFORECOLS to write the required column’s before images.

8.2.18.2.10 About UPSERT

The Elasticsearch handler supports UPSERT mode for UPDATE operations. This mode can be enabled by setting the Elasticsearch handler property gg.handler.name.upsert as true. This is enabled by default.

The UPSERT mode ensures that for an UPDATE operation from source trail, if the index or the _id of document is missing from Elasticsearch cluster, it will create the index and convert the operation to INSERT for adding it as a new record.

Elasticsearch Handler will ABEND for same scenario when UPSERT is false.

In future releases, this mechanism will be enhanced to be in line with HANDLECOLLISION mode Oracle GoldenGate where:
  • An insert collision should result in duplicate error.
  • A missing update or delete should result in not found error.
The corresponding error codes will be returned back to replicat and handled by it as per Oracle GoldenGate handle collision strategy.

8.2.18.2.11 About Bulk Write

The Elasticsearch handler supports bulk operation mode where multiple operations can be grouped into a batch and whole batch can be applied to target Elasticsearch cluster in one shot. This improves the performance.

Bulk mode can be enabled by setting the value of Elasticsearch handler property gg.handler.name.bulkWrite as true. It is disabled by default.

Bulk mode has a few limitations. If there is any failure (exception thrown) for an operation in bulk, it can result in inconsistent data at target. For example, a delete operation where the index is missing from the target Elasticsearch cluster, it will result in exception. If such an operation is part of a batch in bulk mode, then the batch is not applied after the failure of that operation, resulting in inconsistency.

To avoid bulk API errors, you must set the handler MAXTRANSOPS size to match the bulk thread pool queue size at a minimum.

The configuration thread_pool.bulk.queue_size property can be modified in the elasticsearch.yaml file.

8.2.18.2.12 About Routing

A document is routed to a particular shard in an index using the _routing value. The default _routing value is the document’s _id field. Custom routing patterns can be implemented by specifying a custom routing value per document.

Elasticsearch Handler supports custom routing by specifying the mapping field key in the property gg.handler.name.routingKeyMappingTemplate of Elasticsearch handler properties file.

8.2.18.2.13 About Request Headers

Elasticsearch allows sending additional request headers (header name and value pair) along with the http requests of REST calls. The Elasticsearch Handler supports sending additional headers by specifying header name and value pairs in the Elasticsearch Handler property gg.handler.name.headers in the properties file.

8.2.18.2.14 About Java API Client

Elasticsearch Handler now uses Java API Client to connect Elasticsearch cluster for performing all operations of replication. It internally uses Elasticsearch Rest Client and Transport Client to perform all the operations. The older clients like Rest High-Level Client and Transport Client are deprecated and hence removed.

Supported Versions of Elasticsearch Cluster

To configure this handler, Elasticsearch cluster version 7.16.x or above must be configured and running. To configure Elasticsearch cluster, see Get Elasticsearch up and running

8.2.18.2.15 Setting Up the Elasticsearch Handler

You must ensure that the Elasticsearch cluster is setup correctly and the cluster is up and running. Supported versions of Elasticsearch cluster are 7.16.x and above. See https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html. Alternatively, you can use Kibana to verify the setup.

8.2.18.2.16 Elasticsearch Handler Configuration

To configure the Elasticsearch Handler, the parameter file (res.prm) and the properties (elasticsearch.props) file must be configured with valid values.

Parameter File:

Parameter file should point to the correct properties file for Elasticsearch Handler.

The following are the mandatory parameters for parameter file (res.prm) necessary for running Elasticsearch Handler:
-	REPLICAT replicat-name  
-	TARGETDB LIBFILE libggjava.so SET property=dirprm/elasticsearch.props 
-	MAP schema-name.table-name, TARGET schema-name.table-name

Properties File:

The following are the mandatory properties for properties file (elasticsearch.props), which is necessary for running Elasticsearch handler:

-	gg.handlerlist=elasticsearch
-	gg.handler.elasticsearch.type=elasticsearch
-	gg.handler.elasticsearch.ServerAddressList=127.0.0.1:9200

Table 8-21 Elasticsearch Handler Configuration Properties

Property Name Required (Yes/No) Legal Values (Default value) Explanation
gg.handler.name.ServerAddressList Yes [<Hostname|ip>:<port>, <Hostname|ip>:<port>, …]

[localhost:9200]

List of valid hostnames (or IP) and port number separated by ‘:’ of cluster nodes of Elasticsearch cluster.
gg.handler.name.BulkWrite No

[true | false]

Default [false]

If Bulk Write mode is enabled (set true), the operations of transaction will be stored in batch and applied to target ES cluster in one shot for a batch (transaction) depending on batch size.
gg.handler.name.Upsert No

[true | false]

[true]

If upsert mode is enabled (set to true), the update operation will be inserted as new document when it’s missing on target ES cluster.
gg.handler.name.NumberAsString No

[true | false]

[false]

Set if the number will be stored as string.
gg.handler.name.ProxyServer No [Proxy-Hostname | Proxy-IP] Proxy server hostname (or IP) to connect to Elasticsearch cluster.
gg.handler.name.ProxyPort No [Port number] Port number of proxy server. Required if proxy is configured.
gg.handler.name.ProxyProtocol No

[http | https]

[http]

Protocol for Proxy server connection.
gg.handler.name.ProxyUsername No [Username of proxy server] Username for connecting to Proxy server.
gg.handler.name.ProxyPassword No [Password of proxy server] Password for connecting to Proxy server. This can be encrypted using ORACLEWALLET.
gg.handler.name.AuthType No

[basic | ssl | none]

[none]

Authentication type to be used for connecting to Elasticsearch cluster.
gg.handler.name.BasicAuthUsername No [username of ES cluster] Username credential for basic authentication to connect ES server. This can be encrypted using ORACLEWALLET.
gg.handler.name.BasicAuthPassword No [password of ES cluster] Password credential for basic authentication to connect ES server. This can be encrypted using ORACLEWALLET.
gg.handler.name.Fingerprint No [fingerprint hash code] It is the hash of a certificate calculated on all certificate's data and its signature. Applicable for authentication type SSL. This can be encrypted using ORACLEWALLET.
gg.handler.name.CertFilePath No [/path/to/CA_certificate_file.crt] CA certificate file (.crt) for SSL/TLS authentication.
gg.handler.name.TrustStore No [/Path/to/trust-store-file] Path to Trust-store file in server for SSL / TLS server authentication. Applicable for authentication type SSL.
gg.handler.name.TrustStorePassword No [trust-store password] Password for Trust-store file for SSL/TLS authentication. Applicable for authentication type SSL. This can be encrypted using ORACLEWALLET.
gg.handler.name.TrustStoreType No [jks | pkcs12]

[jks]

The key-store type for SSL/TLS authentication. Applicable if authentication type is SSL.
gg.handler.name.RoutingKeyMappingTemplate No [Routing field-name] This defines the field-name whose value will be mapped for routing to particular shard in an index of ES cluster.
gg.handler.name.Headers No

[<key>:<value>,

<key>:<value>, …]

List of name and value pair of headers to be sent with REST calls.

gg.handler.name.MaxConnectTimeout

No Time in seconds Time in seconds that request will wait for connecting to Elasticsearch server.
gg.handler.name.MaxSocketTimeout No Time in seconds Time in seconds that request will wait for response to come from Elasticsearch server.
gg.handler.name.IOThreadCount No Count Count of thread to handle IO requests.
gg.handler.name.NodeSelector No

ANY | SKIP_DEDICATED _MASTERS | [Fully qualified name of node selector class]

[ANY]

Predefined strategy ANY or SKIP_DEDICATED_MASTERS. Or fully qualified name of class that implements custom strategy (by implementing NodeSelector.java interface).

Set the Classpath

The Elasticsearch handler property gg.classpath must include all the dependency jars required by the Java API client. For a listing and downloading of the required client JAR files, use the Dependency Downloader script elasticsearch_java.sh in OGG_HOME/DependencyDownloader directory and pass the version 8.7.0 as argument. For more information about Elasticsearch client dependencies, see Elasticsearch Handler Client Dependencies.

It creates a directory OGG_HOME/DepedencyDownloader/dependencies/elasticsearch_rest_8.7.0 and downloads all the dependency jars inside it. The client library version 8.7.0 can be used for all supported Elasticsearch clusters.

This location can be configured in classpath as: gg.classpath=/path/to/OGG_HOME/DepedencyDownloader/dependencies/elasticsearch_rest_8.7.0/*

The inclusion of the * wildcard character at the end of the path can be used in order to include all of the JAR files in that directory in the associated classpath. Do not use *.jar.

Sample Configuration of Elasticsearch Handler:

For reference, to configure Elasticsearch handler, sample parameter (res.prm) and sample properties file (elasticsearch.props) for Elasticsearch handler is available in directory:

OGG_HOME/AdapterExamples/big-data/elasticsearch

8.2.18.2.17 Enabling Security for Elasticsearch

The Elasticsearch cluster must be accessed in secured manner in production environment. Security features must be first enabled in Elasticsearch cluster and those security configurations must be added to Elasticsearch handler properties file

8.2.18.2.18 Security Configuration for Elasticsearch Cluster

The latest version of Elasticsearch has the security auto-configured when it is installed and started. The logs will print security details for auto-configured cluster as follows:

- Elasticsearch security features have been automatically configured!
-	Authentication is enabled and cluster connections are encrypted.
-	Password for the elastic user (reset with `bin/elasticsearch-reset-password -u elastic`): nnh0LWKZMLkw_QD5jxhE
-	HTTP CA certificate SHA-256 fingerprint: 862e3f117c386a63f8f43db88760d463900e4c814590b8920e1c0e25f6db4df4
-	Configure Kibana to use this cluster:
-	Run Kibana and click the configuration link in the terminal when Kibana starts.
-	Copy the following enrollment token and paste it into Kibana in your browser (valid for the next 30 minutes): eyJ2ZXIiOiI4LjYuMiIsImFkciI6WyIxMDAuNzAuOTguNzM6OTIwMCJdLCJmZ3IiOiI4NjJlM2YxMTdjMzg2YTYzZjhmNDNkYjg4NzYwZDQ2MzkwMGU0YzgxNDU5MGI4OTIwZTFjMGUyNWY2ZGI0ZGY0Iiwia2V5IjoiUTVCVF9vWUJ2TnZDVXBSSkNTWEM6NkJNc3ZXanBUYWUwa0l6V1pDU1JPQSJ9

These security parameter values must be noted down and used to configure Elasticsearch handler. All the auto-generated certificates are created inside ElasticSearch-install-directory/config/cert folder.

If security is not auto-configured for older versions of Elasticsearch, we need to manually enable the security features like basic and encrypted (SSL) authentication in below configuration file of Elasticsearch cluster before running it.

Elasticsearch-installation-directory/config/elasticsearch.yml
Following parameters must be added to enable security features in elasticsearch.yml file and restarting the Elasticsearch cluster.

#----------------------- BEGIN SECURITY AUTO CONFIGURATION ----------------
# The following settings, TLS certificates and keys have been 
# configured for SSL/TLS authentication.
# -----------------------------------------------------------------------
# Enable security features
xpack.security.enabled: true
xpack.security.enrollment.enabled: true

# Enable encryption for HTTP API client connections
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

# Enable encryption and mutual authentication between cluster nodes
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
# Create a new cluster with the current node only
# Additional nodes can still join the cluster later
cluster.initial_master_nodes: ["cluster-host-name"]

# Allow HTTP API connections from anywhere
# Connections are encrypted and require user authentication
http.host: 0.0.0.0
#----------------------- END SECURITY AUTO CONFIGURATION --------------
For more information about the security setting of Elasticsearch cluster, see https://www.elastic.co/guide/en/elasticsearch/reference/current/manually-configure-security.html

8.2.18.2.19 Security Configuration for Elasticsearch Handler

Elasticsearch handler supports three modes of security configuration which can be configured using the Elasticsearch Handler property gg.handler.name.authType with following values: Elasticsearch-installation-directory/config/elasticsearch.yml
  1. None: This mode is used when no security feature is enabled in Elasticsearch stack. No other configuration is required for this mode and Elasticsearch can be accessed directly using http protocol.
  2. Basic: This mode is used when only basic security feature is enabled for a user by setting a username and password for the user. The basic authentication username and password property must be provided in properties file in order to access the Elasticsearch cluster.
    gg.handler.name.authType=basic
    gg.handler.name.basicAuthUsername=elastic
    gg.handler.name.basicAuthPassword=changeme
    
  3. SSL: This mode mode is used when SSL/TLS authentication is configured for encryption in Elasticsearch stack. User must provide either of CA fingerprint hash, path to CA certificate file (.crt) OR path to trust-store file (along with trust-store type and trust-store password) for handler to be able to connect to Elasticsearch cluster. This mode also supports combination of SSL/TLS authentication and Basic authentication configured in Elasticsearch stack. User must configure both basic authentication properties (username and password) and SSL related properties (fingerprint or certificate file or trust-store), if both are configured in Elasticsearch cluster.
    gg.handler.name.authType=ssl
    
    # if basic authentication username and password is configured. 
    gg.handler.name.basicAuthUsername=username
    gg.handler.name.basicAuthPassword=password
    
    # for SSL one of these three must be configured
    gg.handler.name.certFilePath=/path/to/ESHome/config/certs/http_ca.crt
    				OR
    gg.handler.name.fingerprint=862e3f117c386a63f8f43db88760d463900e4c814590b8920e1c0e25f6db4df4
    				OR
    gg.handler.name.trustStore=/path/to/http.p12
    gg.handler.name.trustStoreType=pkcs12
    gg.handler.name.trustStorePassword=pass
    

All the above security related properties that contains confidential information can be configured to use Oracle Wallet for encrypting their confidential values in properties file.

8.2.18.2.20 Troubleshooting

  1. Error: org.elasticsearch.ElasticsearchException[Index [index-name] is not found] - This exception occurs when there is a delete operation and the corresponding index of delete operation is not present in the Elasticsearch cluster. This can also occur for the update operation if upsert=false and the index is missing.
  2. Error: javax.net.ssl.SSLHandshakeException:[ Connection failed ] - This can happen when properties for enabling authentication in the elasticsearch.yml file mentioned above are missing for authentication type SSL.
  3. Error: javax.net.ssl.SSLException: [Received fatal alert: bad_certificate] - This issue comes when host validation fails. Check that certificates generated using cert-utils in Elasticsearch contains the host information.

8.2.18.2.21 Elasticsearch Handler Client Dependencies

What are the dependencies for the Elasticsearch Handler to connect to Elasticsearch databases?

The maven central repository artifacts for Elasticsearch databases are:

Maven groupId: co.elastic.clients

Maven atifactId: elasticsearch-java

Version: 8.7.0

8.2.18.2.21.1 Elasticsearch 8.7.0
commons-codec-1.15.jar
commons-logging-1.2.jar
elasticsearch-java-8.7.0.jar
elasticsearch-rest-client-8.7.0.jar
httpasyncclient-4.1.5.jar
httpclient-4.5.13.jar
httpcore-4.4.13.jar
httpcore-nio-4.4.13.jar
jakarta.json-api-2.0.1.jar
jsr305-3.0.2.jar
parsson-1.0.0.jar

8.2.18.3 Support for Vector Data

Elasticsearch handler supports replication of numeric vector / array type data in record by mapping it into the dense vector type field of Elasticsearch. Dense vector is a new data type introduced in Elasticsearch version 8.11.0 to store numeric array of any dimension primarily used for k-nearest neighbor (kNN) search.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html

Note:

The automatic creation of indices by Elasticsearch on insertion of records will not map the vector/array data into dense vector type field of Elasticsearch. Index with dense vector type field must be explicitly created to map the vector data into dense vector field of Elasticsearch.

Vector data with different dimensions are not supported by the dense_vector type field in Elasticsearch. It can support fixed dimension vector/array data. Dimension can be explicitly specified while creating the index. If not specified, it will take the dimension of the first record inserted to that field.