6 Setting Up a Cluster

You can change the default out-of-box cluster settings to set up and configure unique Coherence clusters for your solution.

This chapter includes the following sections:

Overview of Setting Up Clusters

Coherence provides a default out-of-box cluster configuration that is used for demonstration purposes. It allows clusters to be quickly created and often requires little or no configuration changes. However, beyond testing and demonstration, the default setup should not be used. You should set up unique clusters based on the network environment in which they run and based on the requirements of the applications that use them.A cluster that runs in single-server mode can be configured for unit testing and basic development.

Setting up a cluster includes defining the cluster's name. If multicast is undesirable or unavailable in an environment, then setting up the Well Known Addresses (WKA) feature is required. The rest of the tasks presented in this chapter are typically used when setting up a cluster and are completed when the default settings must be changed.

Clusters are set up within an operational override file (tangosol-coherence-override.xml). Each cluster member uses an override file to specify unique values that override the default configuration that is defined in the operational deployment descriptor. See Specifying an Operational Configuration File and Operational Configuration Elements.

Specifying a Cluster's Name

A cluster name is a user-defined name that uniquely identifies a cluster from other clusters that run on the network.
Cluster members must specify the same cluster name to join and form a cluster. A cluster member does not start if the wrong name is specified when attempting to join an existing cluster.

Note:

If a name is not explicitly specified, then a cluster name is automatically generated based on the operating system user name. The recommended best practice is to not use the system generated cluster name.

To specify a cluster name, edit the operational override file and add a <cluster-name> element, within the <member-identity> element, that includes the cluster name. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <member-identity>
         <cluster-name system-property="coherence.cluster">MyCluster
         </cluster-name>
      </member-identity>
   </cluster-config>
</coherence>

The coherence.cluster system property is used to specify the cluster name instead of using the operational override file. For example:

-Dcoherence.cluster=name

Specifying a Cluster Member's Identity

A set of identifiers are used to give a cluster member an identity within the cluster.The identity information is used to differentiate cluster members and conveys the members' role within the cluster. Some identifiers are also used by the cluster service when performing cluster tasks. Lastly, the identity information is valuable when displaying management information (for example, JMX) and facilitates interpreting log entries. The following list describes each of the identifiers:
  • Site Name – the name of the geographic site that hosts the cluster member. The server's domain name is used if no name is specified. For WAN clustering, this value identifies the datacenter where the member is located. The site name can be used as the basis for intelligent routing, load balancing, and disaster recovery planning (that is, the explicit backing up of data on separate geographic sites). The site name also helps determine where to back up data when using distributed caching and the default partition assignment strategy. Lastly, the name is useful for displaying management information (for example, JMX) and interpreting log entries.

  • Rack Name – the name of the location within a geographic site that the member is hosted at and is often a cage, rack, or bladeframe identifier. The rack name can be used as the basis for intelligent routing, load balancing, and disaster recovery planning (that is, the explicit backing up of data on separate bladeframes). The rack name also helps determine where to back up data when using distributed caching and the default partition assignment strategy. Lastly, the name is useful for displaying management information (for example, JMX) and interpreting log entries.

  • Machine Name – the name of the server that hosts the cluster member. The server's host name is used if no name is specified. The name is used as the basis for creating an ID. The cluster service uses the ID to ensure that data are backed up on different computers to prevent single points of failure.

  • Process Name – the name of the JVM process that hosts the cluster member. The JVM process number is used if no name is specified. The process name makes it possible to easily differentiate among multiple JVMs running on the same computer.

  • Member Name – the cluster member's unique name. The name makes it easy to differentiate cluster members especially when multiple members run on the same computer or within the same JVM. Always specify a member name (as a best practice) even though it is not required to do so.

  • Role Name – the cluster member's role in the cluster. The role name allows an application to organize cluster members into specialized roles, such as cache servers and cache clients. Default role names (CoherenceServer for cache servers and application_class_name for cache clients) are used if no role name is specified.

To specify member identity information, edit the operational override file and add the member identity elements within the <member-identity> element as demonstrated below:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <member-identity>
         <site-name system-property="coherence.site">pa-1</site-name>
         <rack-name system-property="coherence.rack">100A</rack-name>
         <machine-name system-property="coherence.machine">prod001
         </machine-name>
         <process-name system-property="coherence.process">JVM1
         </process-name>
         <member-name system-property="coherence.member">C1</member-name>
         <role-name system-property="coherence.role">Server</role-name>
      </member-identity>
   </cluster-config>
</coherence>

The following system properties are used to specify a cluster member's identity information instead of using the operational override file.

-Dcoherence.site=pa-1
-Dcoherence.rack=100A
-Dcoherence.machine=prod001
-Dcoherence.process=JVM1
-Dcoherence.member=C1
-Dcoherence.role=Server

Configuring Multicast Communication

Multicast communication is configured in an operational override file within the <multicast-listener> element node. Many system properties are also available to configure multicast communication when starting a cluster member.

This section includes the following topics:

Changing the Multicast Socket Interface

The multicast socket network interface (NIC) is automatically selected. For configurations which use multicast only for discovery, the default value is calculated using the <discovery-address> value that is specified as part of the <unicast-listener> configuration. For configurations which use multicast for both discovery and data transmission (that is, the <multicast-threshold-percent> is set to a value less than 100), the default value is the unicast listener interface. Using a different NIC for multicast is not a best practice and is strongly discouraged as it can lead to partial failure of the cluster and prolongs failure detection and failover.

To change the default multicast network interface, edit the operational override file and add an <interface> element that specifies the IP address to which the multicast socket binds. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <multicast-listener>
         <interface>192.168.0.1</interface>
      </multicast-listener>
   </cluster-config>
</coherence>

Specifying a Cluster's Multicast Address and Port

A multicast address and port can be specified for a cluster member. Cluster members must use the same multicast address and port to join and cluster. The default multicast address is 239.192.0.0. The default cluster port is 7574.

Note:

  • The multicast cluster address and port may be safely shared by multiple Coherence clusters. However, clusters that are configured to use SSL cannot share a multicast address and port. In addition, all clusters must be configured to use the same IP protocol (for example, either IPv6 or IPv4).

  • The cluster port is also used by clusters that are configured to use Well Known Addresses (WKA) instead of multicast. See Using Well Known Addresses.

The Coherence default cluster port is registered with the Internet Assigned Numbers Authority (IANA) and, for most clusters, the port does not need to be changed. If a different port is required, then the recommended best practice is to select a value between 1024 and 8999; these values typically fall outside of most operating systems' ephemeral port range. Ephemeral ports can be randomly assigned to other processes resulting in Coherence not being able to bind to the port and startup will fail. Refer to the documentation for you operating system to ensure that the selected port is not within the ephemeral port range.

To specify a cluster multicast address and port, edit the operational override file and add both an <address> and <port> element and specify the address and port to be used by the cluster member. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <multicast-listener>
         <address system-property="coherence.clusteraddress">224.7.2.9
         </address>
         <port system-property="coherence.clusterport">3206</port>
      </multicast-listener>
   </cluster-config>
</coherence>

The coherence.clusteraddress and coherence.clusterport system properties are used to specify the cluster multicast address instead of using the operational override file. For example:

-Dcoherence.clusteraddress=224.7.2.9
-Dcoherence.clusterport=3206

Specifying the Multicast Time-to-Live

The time-to-live value (TTL) setting designates how far multicast packets can travel on a network. The TTL is expressed in terms of how many hops a packet survives; each network interface, router, and managed switch is considered one hop.

The TTL value should be set to the lowest integer value that works. Setting the value too high can use unnecessary bandwidth on other LAN segments and can even cause the operating system or network devices to disable multicast traffic. Typically, setting the TTL value to 1 works on a simple switched backbone. A value of 2 or more may be required on an advanced backbone with intelligent switching. A value of 0 is used for single server clusters that are used for development and testing. See Enabling Single-Server Mode.

To specify the TTL, edit the operational override file and add a <time-to-live> element that includes the TTL value. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <multicast-listener>
         <time-to-live system-property="coherence.ttl">3</time-to-live>
      </multicast-listener>
   </cluster-config>
</coherence>

The coherence.ttl system property is used to specify the TTL value instead of using the operational override file. For example:

-Dcoherence.ttl=3

Specifying the Multicast Join Timeout

The multicast join timeout defines how much time a cluster member waits to join a cluster. If the timeout is reached and an existing cluster is not detected, then the cluster member starts its own cluster and elects itself as the senior cluster member. Generally, there is no need to change the default join timeout value. However, if a server starts a new cluster instead of joining an existing cluster, then the join timeout value can be increased to provide additional time for the server to join the cluster.

Note:

The first member of the cluster waits the full duration of the join timeout before it assumes the role of the senior member. If the cluster startup timeout is less than the join timeout, then the first member of the cluster fails during cluster startup. The cluster member timeout is specified using the packet publisher timeout (<timeout-milliseconds>). See packet-delivery.

To specify the join timeout, edit the operational override file and add a <join-timeout-milliseconds> element that includes the timeout value. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <multicast-listener>
         <join-timeout-milliseconds>6000</join-timeout-milliseconds>
      </multicast-listener>
   </cluster-config>
</coherence>

Note:

The <join-timeout-milliseconds> setting will be used for both multicast and unicast communication.

Changing the Multicast Threshold

Cluster members use both multicast and unicast communication when sending cluster packets. The multicast threshold value is used to determine whether to use multicast for packet delivery or unicast. Setting the threshold higher or lower can force a cluster to favor one style of communication over the other. The threshold setting is not used if multicast communication is disabled.

The multicast threshold is a percentage value and is in the range of 1% to 100%. In a cluster of n members, a cluster member that is sending a packet to a set of destination nodes (not counting itself) of size d (in the range of 0 to n-1) sends a packet using multicast only if the following hold true:

  • The packet is being sent over the network to multiple nodes (d > 1).

  • The number of nodes is greater than the specified threshold (d > (n-1) * (threshold/100)).

    For example, in a 25 member cluster with a multicast threshold of 25%, a cluster member only uses multicast if the packet is destined for 6 or more members (24 * .25 = 6).

Setting this value to 1 allows the cluster to use multicast for basically all multi-point traffic. Setting this value to 100 forces the cluster to use unicast for all multi-point traffic except for explicit broadcast traffic (for example, cluster heartbeat and discovery) because the 100% threshold is never exceeded. With the setting of 25 (the default) a cluster member sends the packet using unicast if it is destined for less than one-fourth of all nodes, and sends the packet using multicast if it is destined for one-fourth or more of all nodes.

To specify the multicast threshold, edit the operational override file and add a <multicast-threshold-percent> element that includes the threshold value. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <multicast-listener>
         <multicast-threshold-percent>40</multicast-threshold-percent>
      </multicast-listener>
   </cluster-config>
</coherence>

Disabling Multicast Communication

Multicast traffic may be undesirable or may be disallowed in some network environments. In this case, use the Well Known Addresses feature to prevent Coherence from using multicast. This disables multicast discovery; unicast (point-to-point) is used instead. Coherence is designed to use point-to-point communication as much as possible even when multicast is enabled, so most application profiles do not see a substantial performance impact. See Using Well Known Addresses.

Note:

Disabling multicast does put a higher strain on the network. However, this only becomes an issue for large clusters with greater than 100 members.

Specifying a Cluster Member's Unicast Address

Unicast communication is configured in an operational override file within the <unicast-listener> element node. System properties are also available to configure unicast communication when starting a cluster member.

This section includes the following topics:

Changing the Default Unicast Address

Cluster members attempt to obtain the IP to bind to using the java.net.InetAddress.getLocalHost() call. Coherence automatically selects a routable IP with the highest MTU for computers that have multiple IPs or NICs. If WKA is configured, then Coherence selects the IP which is routable to the IPs on the WKA list. If you do not want to use the selected NIC, then manual configuration is needed to override the default.

Note:

The multicast socket binds to the same interface as defined by the unicast address if multicast is used for both member discovery and data transmission. See Changing the Multicast Socket Interface.

Unicast addresses can be entered using Classless Inter-Domain Routing (CIDR) notation, which uses a subnet and mask pattern for a local IP address to bind to instead of specifying an exact IP address. CIDR simplifies configuration by allowing a single address configuration to be shared across computers on the same subnet. Each cluster member specifies the same CIDR address block and a local NIC on each computer is automatically found that matches the address pattern. For example, to specify a unicast address for multiple multi-NIC computers that are located on the same network and that will run a cluster on their 192.168.1.* address, specify an address such as 192.168.1.0/24 and each node finds a local NIC that matches the pattern. The /24 prefix size matches up to 256 available addresses: from 192.168.1.0 to 192.168.1.255. The <address> element also supports external NAT addresses that route to local addresses; however, both addresses must use the same port number.

To specify a cluster member's unicast address, edit the operational override file and add an <address> element that includes the unicast address. For example:

<cluster-config>
   <unicast-listener>
      <address system-property="coherence.localhost">192.168.1.0/24
      </address>
   </unicast-listener>
</cluster-config>

The coherence.localhost system property is used to specify the unicast address instead of using the operational override file. For example:

-Dcoherence.localhost=192.168.1.0/24

Changing the Default Unicast Port

Cluster member unicast ports are automatically assigned from the operating system's available ephemeral port range. This ensures that Coherence cannot accidentally cause port conflicts with other applications. However, if a firewall is required between cluster members (an atypical configuration), then the port must be manually configured. See Configuring Firewalls for Cluster Members.

When manually configuring a unicast port, a single port is specified. If the port is not available, then the default behavior is to select the next available port. For example, if port 9000 is configured for the port and it is not available, then the next available port is automatically selected. Automatic port adjustment can be disabled. In this case, the specified port must be available. Automatic port adjustment can also be used to specify the upper limit of the port range.

To specify a cluster member's unicast ports, edit the operational override file and add a <port> element that includes a port value. For example:

<cluster-config>
   <unicast-listener>
      <port system-property="coherence.localport">9000</port>
   </unicast-listener>
</cluster-config>

To disable automatic port adjustment, add a <port-auto-adjust> element that includes the value false. Or, to specify a range of ports from which ports are selected, include a port value that represents the upper limit of the port range. The following example sets a port range from 9000 to 9200:

<cluster-config>
   <unicast-listener>
      <port system-property="coherence.localport">9000</port>
      <port-auto-adjust system-property="coherence.localport.adjust">9200
      </port-auto-adjust>
   </unicast-listener>
</cluster-config>

The coherence.localhost, coherence.localport, and coherence.localport.adjust system properties are used to specify the unicast port and automatic port adjustment settings instead of using the operational override file. For example:

-Dcoherence.localport=9000 -Dcoherence.localport.adjust=9200

Using Well Known Addresses

The Well Known Addresses (WKA) feature is a mechanism that allows cluster members to discover and join a cluster using unicast instead of multicast.WKA is most often used when multicast networking is undesirable or unavailable in an environment or when an environment is not properly configured to support multicast. All cluster multicast communication is disabled if WKA is enabled.

WKA is enabled by specifying a small subset of cluster member addresses that are able to start a cluster. The optimal number of WKA addresses varies based on the cluster size. Generally, WKA addresses should be less than 10% of the cluster. One or two WKA addresses for each switch is recommended.

WKA addresses are expected to remain available over the lifetime of the cluster but are not required to be simultaneously active at any point in time. Only one WKA address must be operational for cluster members to discover and join the cluster. In addition, after a cluster member has joined the cluster, it receives the addresses of all cluster members and then broadcasts are performed by individually sending messages to each cluster member. This allows a cluster to operate even if all WKA addresses are stopped. However, new cluster members are not able to join the cluster unless they themselves are hosted on a WKA address or until a cluster member that is on a WKA address is started. In this case, the senior-most member of the cluster polls the WKA address list and allows the WKA address to rejoin the existing cluster.

There are two ways to specify WKA addresses. The first method specifies a list of WKA addresses. The second method uses an address provider implementation to get a list of WKA addresses. Both methods are configured in an operational override file within the <well-known-addresses> subelement of the <unicast-listener> element.

This section includes the following topics:

Specifying WKA Addresses

WKA addresses (IP address or DNS name) are specified using the <address> element. Any number of WKA addresses can be specified and a unique id attribute must be included for each address. The list of WKA addresses should be the same for every cluster member to ensure that different cluster members do not operate independently from the rest of the cluster. If a cluster member specifies its own address, then it can start a cluster.

Note:

WKA uses the cluster port. See Specifying a Cluster's Multicast Address and Port.

The following example specifies two WKA addresses:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <well-known-addresses>
            <address id="1">Server1</address>
            <address id="2">Server2</address>
         </well-known-addresses>
      </unicast-listener>
   </cluster-config>
</coherence>

While either an IP address or DNS name can be used, DNS names have an additional advantage: any IP addresses that are associated with a DNS name are automatically resolved at runtime. This allows the list of WKA addresses to be stored in a DNS server and centrally managed and updated in real time. For example, if the WKA address list for a cluster that is named cluster1 is going to be 192.168.1.1, 192.168.1.2, 192.168.1.3, then a single DNS entry for hostname cluster1 can contain those addresses and a single address named cluster1 can be specified for the WKA address:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <well-known-addresses>
            <address>cluster1</address>
         </well-known-addresses>
      </unicast-listener>
   </cluster-config>
</coherence>

For networks that use Network Address Translation (NAT), you can build the WKA addresses list using external NAT addresses that route to local addresses. The Coherence network layer automatically discovers the WKA cluster members that map to the external NAT addresses and also discovers external NAT addresses for cluster members that are not on the WKA list. These learned addresses are then automatically used for other Coherence services. When using NAT addresses, the external addresses and local addresses must use the same port number.

Using WKA System Properties

A single WKA address can be specified using the coherence.wka system properties instead of specifying the address in an operational override file. The system property is intended for demonstration and testing scenarios to quickly specify a single WKA address. For example:

-Dcoherence.wka=192.168.0.100

To create additional system properties to specify multiple WKA addresses, an operational override file must be used to define multiple WKA addresses and a system-property attribute must be defined for each WKA address element. The attributes must include the system property names to be used to override the elements. The below example defines two addresses including system properties:

Note:

Defining additional system properties to specify a list of WKA addresses can be used during testing or in controlled production environments. However, the best practice is to exclusively use an operational override file to specify WKA addresses in production environments. This ensures that the same list of WKA addresses exists on each cluster member.

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <well-known-addresses>
            <address id="1" system-property="coherence.wka"></address>
            <address id="2" system-property="coherence.wka2"></address>
         </well-known-addresses>
      </unicast-listener>
   </cluster-config>
</coherence>

For the above example, the WKA addresses are specified using the system properties as follows:

-Dcoherence.wka=192.168.0.102 -Dcoherence.wka2=192.168.0.103

See Creating Custom System Properties.

Specifying a WKA Address Provider

A WKA address provider offers a programmatic way to define WKA addresses. A WKA address provider must implement the com.tangosol.net.AddressProvider interface. Implementations may be as simple as a static list or as complex as using dynamic discovery protocols. The address provider must return a terminating null address to indicate that all available addresses have been returned. The address provider implementation is called when the cluster member starts.

Note:

implementations must exercise extreme caution since any delay with returned or unhandled exceptions causes a discovery delay and may cause a complete shutdown of the cluster service on the member. Implementations that involve more expensive operations (for example, network fetch) may choose to do so asynchronously by extending the com.tangosol.net.RefreshableAddressProvider class.

To use a WKA address provider implementation, add an <address-provider> element and specify the fully qualified name of the implementation class within the <class-name> element. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <well-known-addresses>
            <address-provider>
               <class-name>package.MyAddressProvider</class-name>
            </address-provider>
         </well-known-addresses>
      </unicast-listener>
   </cluster-config>
</coherence>

As an alternative, the <address-provider> element supports the use of a <class-factory-name> element that is used to specify a factory class for creating AddressProvider instances, and a <method-name> element to specify the static factory method on the factory class that performs object instantiation. The following example gets an address provider instance using the getAddressProvider method on the MyAddressProviderFactory class.

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <well-known-addresses>
            <address-provider>
               <class-factory-name>package.MyAddressProviderFactory
               </class-factory-name>
               <method-name>getAddressProvider</method-name>
            </address-provider>
         </well-known-addresses>
      </unicast-listener>
   </cluster-config>
</coherence>

Any initialization parameters that are required for a class or class factory implementation can be specified using the <init-params> element. Initialization parameters are accessible by implementations that include a public constructor with a matching signature. The following example sets the iMaxTime parameter to 2000.

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <well-known-addresses>
            <address-provider>
               <class-name>package.MyAddressProvider</class-name>
               <init-params>
                  <init-param>
                     <param-name>iMaxTime</param-name>
                     <param-value>2000</param-value>
                  </init-param>
               </init-params>
            </address-provider>
         </well-known-addresses>
      </unicast-listener>
   </cluster-config>
</coherence>

Enabling Single-Server Mode

Single-Server mode is a cluster that is constrained to run on a single computer and does not access the network.Single-Server mode offers a quick way to start and stop a cluster for development and unit testing.

To enable single-server mode, edit the operational override file and add a unicast <address> element that is set to an address that is routed to loopback. On most computers, setting the address to 127.0.0.1 works. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <address system-property="coherence.localhost">127.0.0.1
         </address>
      </unicast-listener>
   </cluster-config>
</coherence>

The coherence.localhost system property is used to enable single-server mode instead of using the operational override file. For example:

-Dcoherence.localhost=127.0.0.1

Configuring Death Detection

Death detection is a cluster mechanism that quickly detects when a cluster member has failed.Failed cluster members are removed from the cluster and all other cluster members are notified about the departed member. Death detection allows the cluster to differentiate between actual member failure and an unresponsive member, such as the case when a JVM conducts a full garbage collection.

Death detection identifies both process failures (TcpRing component) and hardware failure (IpMonitor component). Process failure is detected using a ring of TCP connections opened on the same port that is used for cluster communication. Each cluster member issues a unicast heartbeat, and the most senior cluster member issues the cluster heartbeat, which is a broadcast message. Hardware failure is detected using the Java InetAddress.isReachable method which either issues a trace ICMP ping, or a pseudo ping and uses TCP port 7. Death detection is enabled by default and is configured within the <tcp-ring-listener> element.

This section includes the following topics:

Changing TCP-Ring Settings

Several settings are used to change the default behavior of the TCP-ring listener. These include changing the amount of attempts and time before determining that a computer that is hosting cluster members has become unreachable. The default values are 3 attempts and 5 seconds allowing for a network disconnect of up to 15 seconds. The TCP/IP server socket backlog queue can also be set and defaults to the value used by the operating system.

To change the TCP-ring settings, edit the operational override file and add the following TCP-Ring elements:

Note:

The values of the <ip-timeout> and <ip-attempts> elements should be high enough to insulate against allowable temporary network outages.

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <tcp-ring-listener>
         <ip-timeout system-property="coherence.ipmonitor.pingtimeout">
            25s</ip-timeout>
         <ip-attempts>5</ip-attempts>
         <listen-backlog>10</listen-backlog>
      </tcp-ring-listener>
   </cluster-config>
</coherence>

The coherence.ipmonitor.pingtimeout system property is used to specify a timeout instead of using the operational override file. For example:

-Dcoherence.ipmonitor.pingtimeout=20s

Changing the Heartbeat Interval

The death detection heartbeat interval can be changed. A higher interval may alleviate minimal network traffic but may also prolongs detection of failed members. The default heartbeat value is 1 second.

Note:

The heartbeat setting technically controls how often to evaluate whether or not a heartbeat needs to be emitted. The actual heartbeat interval may or may not be emitted within the specified interval depending on the evaluation process.

To change the death detection heartbeat interval, edit the operational override file and add a <heartbeat-milliseconds> element that includes the heartbeat value. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <packet-publisher>
         <packet-delivery>
            <heartbeat-milliseconds>5000</heartbeat-milliseconds>
         </packet-delivery>
      </packet-publisher>
   </cluster-config>
</coherence>

Disabling Death Detection

Death detection is enabled by default and must be explicitly disabled. Disabling death detection alleviates only minimal network traffic and prolongs the detection of failed members. If disabled, a cluster member uses the packet publisher's resend timeout interval to determine that another member has stopped responding to packets. By default, the timeout interval is set to 5 minutes. See Changing the Packet Resend Timeout.

Note:

Using the packet publisher's resend timeout to detect a failed cluster member is error prone and can produce false positives due to high garbage collection intervals.

To disable death detection, edit the operational override file and add an <enabled> element that is set to false. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <tcp-ring-listener>
         <enabled>false</enabled>
      </tcp-ring-listener>
   </cluster-config>
</coherence>

Specifying Cluster Priorities

The cluster priority mechanism allows a priority value to be assigned to a cluster member and to different threads running within a member.

This section includes the following topics:

Specifying a Cluster Member's Priority

A cluster member's priority is used as the basis for determining tie-breakers between members. If a condition occurs in which one of two members is ejected from the cluster, and in the rare case that it is not possible to objectively determine which of the two is at fault and should be ejected, then the member with the lower priority is ejected.

To specify a cluster member's priority, edit the operational override file and add a <priority> element, within the <member-identity> node, that includes a priority value between 1 and 10 where 10 is the highest priority. For example:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <member-identity>
         <priority system-property="coherence.priority">1</priority>
      </member-identity>
   </cluster-config>
</coherence>

The coherence.priority system property can also be used to specify a cluster member's priority instead of using the operational override file. For example:

-Dcoherence.priority=1

Specifying Communication Thread Priorities

Multiple cluster components support thread priority. The priority is used as the basis for determining Java thread execution importance. The components include: the multicast listener, the unicast listener, the TCP ring listener, the packet speaker, the packet publisher, and the incoming message handler.

Thread priority is specified within each component's configuration element (<unicast-listener>, <multicast-listener>, <packet-speaker>, <packet-publisher>, <tcp-ring-listener>, and <incoming-message-handler> elements, respectively). For example, to specify a thread priority for the unicast listener, edit the operational override file and add a <priority> element, within the <unicast-listener> node, that includes a priority value between 1 and 10 where 10 is the highest priority:

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <unicast-listener>
         <priority>5</priority>
      </unicast-listener>
   </cluster-config>
</coherence>

Specifying Thread Priorities for Services

Cluster services support thread priority. The priority is used as the basis for determining Java thread execution importance and indicates which threads of a service are considered critical. There are three types of threads that can have a priority: service threads, event dispatcher threads, and worker threads. The default setup gives service and event dispatcher threads precedence followed by worker threads.

Thread priorities for services can be changed for all services in a cluster by overriding the <service> element in an operational override file. However, a better practice is to configure thread priorities for a service instance within in a cache configuration file when defining a cache scheme. See Defining Cache Schemes. Use the <service-priority>, <event-dispatcher-priority>, and <worker-priority> subelements, respectively and enter a value between 1 and 10 where 10 is the highest priority. For example:

...
<distributed-scheme>
   <scheme-name>distributed</scheme-name>
   <service-name>MyDistributedService</service-name>
   <service-priority>10</service-priority>
   <event-dispatcher-priority>10</event-dispatcher-priority>
   <worker-priority>5</worker-priority>
   ...
</distributed-scheme>
...

Configuring Firewalls for Cluster Members

Firewalls are not typically setup between Coherence cluster members because it is assumed that all cluster members communicate behind the firewall in a secured environment. Multiple ports must be opened in a firewall if you intend to secure communication between cluster members.If a solution requires the use of a firewall between cluster members, then ensure the following:
  • The cluster port (7574 by default) is open for both UDP and TCP for both multicast and unicast configurations.

  • TCP port 7 is open for the Coherence TcpRing/IpMonitor death detection feature.

  • The unicast port range is open for both UDP and TCP traffic. Ensure that the unicast listen port range is explicitly set rather then relying upon a system assigned ephemeral port. See Changing the Default Unicast Port.