|Oracle® Coherence Administrator's Guide
Part Number E18679-01
This chapter provides a checklist of areas that should be planned for and considered before moving from a development environment to a production environment. Solutions and best practices are provided an should be implement as required.
The following sections are included in this chapter:
Developers often use and test Coherence locally on their workstations by either:
Setting the multicast TTL to zero,
Using a "loopback", or
Using a different multi-cast address and port from all other developers.
If one of these approaches is not used, then multiple developers on the same network join existing clusters across different developers' locally running instances of the application. In fact, this happens relatively often and causes confusion when it is not understood by the developers.
Setting the TTL to zero on the command line is very simple: Add the following to the JVM startup parameters:
Setting the TTL to zero for all developers is also very simple. Edit the
tangosol-coherence-override-dev.xml in the
coherence.jar file, changing the TTL setting as follows:
On some UNIX operating systems, including some versions of Linux and Mac OS X, setting the TTL to zero may not be enough to isolate a cluster to a single computer. To be safe, assign a different cluster name for each developer, for example using the developer's email address as the cluster name. If the cluster communication does go across the network to other developer computers, then the different cluster name cause an error on the node that is attempting to start.
To ensure that the clusters are completely isolated, select a different multicast IP address and port for each developer. In some organizations, a simple approach is to use the developer's phone extension number as part of the multicast address and as the port number (or some part of it). For information on configuring the multicast IP address and port, see the
After the POC or prototype stage is complete, and until load testing begins, it is not out of the ordinary for the application to be developed and tested by engineers in a non-clustered form. This is dangerous, as testing primarily in the non-clustered configuration can hide problems with the application architecture and implementation that appear later in staging or even production.
Make sure that the application is being tested in a clustered configuration as development proceeds. There are several ways for clustered testing to be a natural part of the development process; for example:
Developers can test with a locally clustered configuration (at least two instances running on their own computer). This works well with the
TTL=0 setting, since clustering on a single computer works with the
Unit and regression tests can be introduced that run in a test environment that is clustered. This may help automate certain types of clustered testing that an individual developer would not always remember (or have the time) to do.
Most production networks are based on gigabit Ethernet, with a few still built on slower 100Mb Ethernet or faster ten-gigabit Ethernet. It is important to understand the topology of the production network, and what the devices are used to connect all of the servers that run Coherence. For example, if there are ten different switches being used to connect the servers, are they all the same type (make and model) of switch? Are they all the same speed? Do the servers support the network speeds that are available?
In general, all servers should share a reliable, fully switched network. This generally implies sharing a single switch (ideally, two parallel switches and two network cards per server for availability). There are two primary reasons for this. The first is that using multiple switches almost always results in a reduction in effective network capacity. The second is that multi-switch environments are more likely to have network "partitioning" events where a partial network failure results in two or more disconnected sets of servers. While partitioning events are rare, Coherence cache servers ideally should share a common switch.
To demonstrate the impact of multiple switches on bandwidth, consider several servers plugged into a single switch. As additional servers are added, each server receives dedicated bandwidth from the switch backplane. For example, on a fully switched gigabit backplane, each server receives a gigabit of inbound bandwidth and a gigabit of outbound bandwidth for a total of 2Gbps "full duplex" bandwidth. Four servers would have an aggregate of 8Gbps bandwidth. Eight servers would have an aggregate of 16Gbps. And so on up to the limit of the switch (in practice, usually in the range of 160-192Gbps for a gigabit switch). However, consider the case of two switches connected by a 4Gbps (8Gbps full duplex) link. In this case, as servers are added to each switch, they have full "mesh" bandwidth up to a limit of four servers on each switch (that is, all four servers on one switch can communicate at full speed with the four servers on the other switch). However, adding additional servers potentially create a bottleneck on the inter-switch link. For example, if five servers on one switch send data to five servers on the other switch at 1Gbps per server, then the combined 5Gbps is restricted by the 4Gbps link. Note that the actual limit may be much higher depending on the traffic-per-server and also the portion of traffic that must actually move across the link. Also note that other factors such as network protocol overhead and uneven traffic patterns may make the usable limit much lower from an application perspective.
Avoid mixing and matching network speeds: Make sure that all servers can and do connect to the network at the same speed, and that all of the switches and routers between those servers run at that same speed or faster.
Oracle strongly suggests GigE or faster: Gigabit Ethernet is supported by most servers built since 2004, and Gigabit switches are economical, available and widely deployed.
Before deploying an application, run the datagram test utility to test the actual network speed and determine its capability for pushing large amounts of data. See Chapter 4, "Performing a Network Performance Test," for details. Furthermore, the datagram test utility must be run with an increasing ratio of publishers to consumers, since a network that appears fine with a single publisher and a single consumer may completely fall apart as the number of publishers increases, as occurs with the default configuration of Cisco 6500 series switches. See "Deploying to Cisco Switches" for more information.
The term "multicast" refers to the ability to send a packet of information from one server and to have that packet delivered in parallel by the network to many servers. Coherence supports both multicast and multicast-free clustering. Oracle suggests the use of multicast when possible because it is an efficient option for many servers to communicate. However, there are several common reasons why multicast cannot be used:
Some organizations disallow the use of multicast.
Multicast cannot operate over certain types of network equipment; for example, many WAN routers disallow or do not support multicast traffic.
Multicast is occasionally unavailable for technical reasons; for example, some switches do not support multicast traffic.
First, determine if the desired deployment configuration is to use multicast.
Before deploying an application that uses multicast, you must run the Multicast Test to verify that multicast is working and to determine the correct (the minimum) TTL value for the production environment. See Chapter 5, "Performing a Multicast Connectivity Test" for more information.
Applications that cannot use multicast for deployment must use the WKA configuration. See the
If either the datagram test and multicast test have failed or returned poor results, there may be configuration problems with the network devices in use. Even if the tests passed without incident and the results were perfect, it is still possible that there are lurking issues with the configuration of the network devices.
Review the suggestions in "Network Tuning".
The Coherence cluster protocol can of detect and handle a wide variety of connectivity failures. The clustered services are able to identify the connectivity issue, and force the offending cluster node to leave and re-join the cluster. In this way the cluster ensures a consistent shared state among its members.
See "Death Detection Recommendations" for more details. See also:
Most developers have relatively fast workstations. Combined with test cases that are typically non-clustered and tend to represent single-user access (that is, only the developer), the application may seem extraordinarily responsive.
Include as a requirement that realistic load tests be built that can be run with simulated concurrent user load.
Test routinely in a clustered configuration with simulated concurrent user load.
Coherence is compatible with all common workstation hardware. Most developers use PC or Apple hardware, including notebooks, desktops and workstations.
Developer systems should have a significant amount of RAM to run a modern IDE, debugger, application server, database and at least two cluster instances. Memory utilization varies widely, but to ensure productivity, the suggested minimum memory configuration for developer systems is 2GB. Desktop systems and workstations can often be configured with 4GB for minimal additional cost.
Developer systems should have two or more CPU cores to increase the quality of code related to multi-threading. Many bugs related to concurrent execution of multiple threads only appear on multi-CPU systems (systems that contain multiple processor sockets or CPU cores).
The short answer is that Oracle works to support the hardware that the customer has standardized on or otherwise selected for production deployment.
Oracle has customers running on virtually all major server hardware platforms. The majority of customers use "commodity x86" servers, with a significant number deploying Sun Sparc (including Niagra) and IBM Power servers.
Oracle continually tests Coherence on "commodity x86" servers, both Intel and AMD.
Intel, Apple and IBM provide hardware, tuning assistance and testing support to Oracle.
Oracle conducts internal Coherence certification on all IBM server platforms.
Oracle and Azul test Coherence regularly on Azul appliances, including the 48-core "Vega 2" chip.
If the server hardware purchase is still in the future, the following are suggested for Coherence:
The most cost-effective server hardware platform is "commodity x86", either Intel or AMD, with one to two processor sockets and two to four CPU cores per processor socket. If selecting an AMD Opteron system, it is strongly recommended that it be a two processor socket system, since memory capacity is usually halved in a single socket system. Intel "Woodcrest" and "Clovertown" Xeons are strongly recommended over the previous Intel Xeon CPUs due to significantly improved 64-bit support, much lower power consumption, much lower heat emission and far better performance. These new Xeons are currently the fastest commodity x86 CPUs, and can support a large memory capacity per server regardless of the processor socket count by using fully buffered memory called "FB-DIMMs".
It is strongly recommended that servers be configured with a minimum of 4GB of RAM. For applications that plan to store massive amounts of data in memory (tens or hundreds of gigabytes, or more), evaluate the cost-effectiveness of 16GB or even 32GB of RAM per server. Commodity x86 server RAM is readily available in a density of 2GB per DIMM, with higher densities available from only a few vendors and carrying a large price premium; so, a server with 8 memory slots only supports 16GB in a cost-effective manner. Also, note that a server with a very large amount of RAM likely must run more Coherence nodes (JVMs) per server to use that much memory, so having a larger number of CPU cores helps. Applications that are "data heavy" require a higher ratio of RAM to CPU, while applications that are "processing heavy" require a lower ratio. For example, it may be sufficient to have two dual-core Xeon CPUs in a 32GB server running 15 Coherence "Cache Server" nodes performing mostly identity-based operations (cache accesses and updates), but if an application makes frequent use of Coherence features such as indexing, parallel queries, entry processors and parallel aggregation, then it is more effective to have two quad-core Xeon CPUs in a 16GB server - a 4:1 increase in the CPU:RAM ratio.
A minimum of 1000Mbps for networking (for example, Gigabit Ethernet or better) is strongly recommended. NICs should be on a high bandwidth bus such as PCI-X or PCIe, and not on standard PCI. In the case of PCI-X having the NIC on an isolated or otherwise lightly loaded 133MHz bus may significantly improve performance.
Coherence is primarily a scale-out technology. While Coherence can effectively scale-up on large servers by using multiple JVMs per server, the natural mode of operation is to span several small servers (for example, 2-socket or 4-socket commodity servers). Specifically, failover and failback are more efficient in larger configurations. And the impact of a server failure is lessened. As a rule of thumb, a cluster should contain at least four physical servers. In most WAN configurations, each data center has independent clusters (usually interconnected by Extend-TCP). This increases the total number of discrete servers (four servers per data center, multiplied by the number of data centers).
Coherence is quite often deployed on smaller clusters (one, two or three physical servers) but this practice has increased risk if a server failure occurs under heavy load. As discussed in the network section of this document, Coherence clusters are ideally confined to a single switch (for example, fewer than 96 physical servers). In some use cases, applications that are compute-bound or memory-bound applications (as opposed to network-bound) may run acceptably on larger clusters.
Also note that given the choice between a few large JVMs and a lot of small JVMs, the latter may be the better option. There are several production environments of Coherence that span hundreds of JVMs. Some care is required to properly prepare for clusters of this size, but smaller clusters of dozens of JVMs are readily achieved. Please note that disabling UDP multicast (by using WKA) or running on slower networks (for example, 100Mbps Ethernet) reduces network efficiency and make scaling more difficult.
The following rules should be followed in determining how many servers are required for reliable high availability configuration and how to configure the number of storage-enabled JVMs.
There must be more than two servers. A grid with only two servers stops being machine-safe as soon as several JVMs on one server are different than the number of JVMs on the other server; so, even when starting with two servers with equal number of JVMs, losing one JVM forces the grid out of machine-safe state. Four or more computers present the most stable topology, but deploying on just three servers would work if the other rules are adhered to.
For a server that has the largest number of JVMs in the cluster, that number of JVMs must not exceed the total number of JVMs on all the other servers in the cluster.
A server with the smallest number of JVMs should run at least half the number of JVMs as a server with the largest number of JVMs; this rule is particularly important for smaller clusters.
The margin of safety improves as the number of JVMs tends toward equality on all computers in the cluster; this is more of a "rule of thumb" than the preceding "hard" rules.
The top three operating systems for application development using Coherence are, in this order: Windows 2000/XP (~85%), Mac OS X (~10%) and Linux (~5%). The top four operating systems for production deployment are, in this order: Linux, Solaris, AIX and Windows. Thus, it is relatively unlikely that the development and deployment operating systems are the same.
Make sure that regular testing is occurring on the target operating system.
Oracle tests on and supports various Linux distributions (including customers that have custom Linux builds), Sun Solaris, IBM AIX, Windows Vista/2003/2000/XP, Apple Mac OS X, OS/400 and z/OS. Additionally, Oracle supports customers running HP-UX and various BSD UNIX distributions.
If the server operating system decision is still in the future, the following are suggested for Coherence:
For commodity x86 servers, Linux distributions based on the Linux 2.6 kernel are recommended. While it is expected that most 2.6-based Linux distributions provide a good environment for running Coherence, the following are recommended by Oracle: Oracle Unbreakable Linux supported Linux including Oracle Linux and Red Hat Enterprise Linux (version 4 or later) and Suse Linux Enterprise (version 10 or later). Oracle also routinely tests using distributions such as RedHat Fedora Core 5 and even Knoppix "Live CD".
Review and follow the instructions in Chapter 2, "Platform-Specific Deployment Considerations" for the operating system on which Coherence is deployed.
In a Coherence-based application, primary data management responsibilities (for example, Dedicated Cache Servers) are hosted by Java-based processes. Modern Java distributions do not work well with virtual memory. In particular, garbage collection (GC) operations may slow down by several orders of magnitude if memory is paged to disk. With modern commodity hardware and a modern JVM, a Java process with a reasonable heap size (512MB-2GB) typically performs a full garbage collection in a few seconds if all of the process memory is in RAM. However, this may grow to many minutes if the JVM is partially resident on disk. During garbage collection, the node appears unresponsive for an extended period, and the choice for the rest of the cluster is to either wait for the node (blocking a portion of application activity for a corresponding amount of time), or to mark the unresponsive node as "failed" and perform failover processing. Neither of these is a good option, and so it is important to avoid excessive pauses due to garbage collection. JVMs should be pinned into physical RAM, or at least configured so that the JVM does not be page to disk.
Note that periodic processes (such as daily backup programs) may cause memory usage spikes that could cause Coherence JVMs to be paged to disk.
During development, developers typically use the latest Sun JVM or a direct derivative such as the Mac OS X JVM.
The main issues related to using a different JVM in production are:
Command line differences, which may expose problems in shell scripts and batch files;
Logging and monitoring differences, which may mean that tools used to analyze logs and monitor live JVMs during development testing may not be available in production;
Significant differences in optimal GC configuration and approaches to GC tuning;
Differing behaviors in thread scheduling, garbage collection behavior and performance, and the performance of running code.
Make sure that regular testing is occurring on the JVM that are used in production.
JVM configuration options vary over versions and between vendors, but the following are generally suggested:
-server option results in substantially better performance.
Using identical heap size values for both
-Xmx yields substantially better performance on Sun and JRockit JVMs, and "fail fast" memory allocation. See the specific Deployment Considerations for various JVMs below.
For naive tuning, a heap size of 1GB is a good compromise that balances per-JVM overhead and garbage collection performance.
Larger heap sizes are allowed and commonly used, but may require tuning to keep garbage collection pauses manageable.
JVMs that experience an
OutOfMemoryError can be left in an indeterministic state which can have adverse effects on a cluster. We recommend configuring JVMs to exit upon encountering an
OutOfMemoryError instead of allowing the JVM to attempt recovery. See the specific Deployment Considerations below for instructions on configuring this on common JVMs.
In terms of Oracle Coherence versions:
Coherence is supported on Sun 1.6 update 23.
Often the choice of JVM is dictated by other software. For example:
IBM only supports IBM WebSphere running on IBM JVMs. Most of the time, this is the IBM "Sovereign" or "J9" JVM, but when WebSphere runs on Sun Solaris/Sparc, IBM builds a JVM using the Sun JVM source code instead of its own.
Oracle WebLogic typically includes a JVM which is intended to be used with it. On some platforms, this is the Oracle WebLogic JRockit JVM.
Apple Mac OS X, HP-UX, IBM AIX and other operating systems only have one JVM vendor (Apple, HP and IBM respectively).
Certain software libraries and frameworks have minimum Java version requirements because they take advantage of relatively new Java features.
On commodity x86 servers running Linux or Windows, the Sun JVM is recommended. Generally speaking, the recent update versions are recommended. For example:
Note:Oracle recommends testing and deploying using the latest supported Sun JVM based on your platform and Coherence version.
Basically, at some point before going to production, a JVM vendor and version should be selected and well tested, and absent any flaws appearing during testing and staging with that JVM, that should be the JVM that is used when going to production. For applications requiring continuous availability, a long-duration application load test (for example, at least two weeks) should be run with that JVM before signing off on it.
Review and follow the instructions in Chapter 2, "Platform-Specific Deployment Considerations" for the JVM on which Coherence is deployed.
No. Coherence is pure Java software and can run in clusters composed of any combination of JVM vendors and versions, and Oracle tests such configurations.
Note that it is possible for different JVMs to have slightly different serialization formats for Java objects, meaning that it is possible for an incompatibility to exist when objects are serialized by one JVM, passed over the wire, and a different JVM (vendor, version, or both) attempts to deserialize it. Fortunately, the Java serialization format has been very stable for several years, so this type of issue is extremely unlikely. However, it is highly recommended to test mixed configurations for consistent serialization before deploying in a production environment.
The minimum set of privileges required for Coherence to function are specified in the
security.policy file which is included as part of the Coherence installation. This file can be found in
coherence/lib/security/security.policy. If using the Java Security Manager these privileges must be granted in order for Coherence to function properly.
Some Java-based management and monitoring solutions use instrumentation (for example, bytecode-manipulation and
ClassLoader substitution). While there are no known open issues with the latest versions of the primary vendors, Oracle has observed issues before.
The Coherence download includes a fully functional Coherence product supporting all editions and modes. The default configuration is for Grid Edition in Development mode.
Coherence may be configured to operate in either development or production mode. These modes do not limit access to features, but instead alter some default configuration settings. For instance, development mode allows for faster cluster startup to ease the development process.
It is recommended to use the development mode for all pre-production activities, such as development and testing. This is an important safety feature, because Coherence automatically prevents these nodes from joining a production cluster. The production mode must be explicitly specified when using Coherence in a production environment.
Coherence may be configured to support a limited feature set, based on the customer license agreement. Only the edition and the number of licensed CPUs specified within the customer license agreement can be used in a production environment.
When operating outside of the production environment it is allowable to run any Coherence edition. However, it is recommended that only the edition specified within the customer license agreement be used. This protects the application from unknowingly making use of unlicensed features. See Oracle Fusion Middleware Licensing Information for a complete list of features and the editions on which they are supported.
All nodes within a cluster must use the same license edition and mode. Be sure to obtain enough licenses for the all the cluster members in the production environment. The servers hardware configuration (number or type of processor sockets, processor packages or CPU cores) may be verified using
ProcessorInfo utility included with Coherence.
java -cp coherence.jar com.tangosol.license.ProcessorInfo
If the result of the
ProcessorInfo program differs from the licensed configuration, send the program's output and the actual configuration as a support issue.
There is a
<license-config> element configuration section in
tangosol-coherence.xml (located in
coherence.jar) for edition and mode related information.
<license-config> <edition-name system-property="tangosol.coherence.edition">GE</edition-name> <license-mode system-property="tangosol.coherence.mode">dev</license-mode> </license-config>
In addition to preventing mixed mode clustering, the
license-mode also dictates the operational override file to use. When in
dev mode the
tangosol-coherence-override-dev.xml file is used; whereas, the
tangosol-coherence-override-prod.xml file is used when the
prod mode is specified. Because the mode controls which override file is used, the
<license-mode> configuration element is only usable in the base
tangosol-coherence.xml file and not within the override files.
These elements are defined by the corresponding
coherence-operational-config.xsd schema file that is located in
coherence.jar. It is possible to specify this edition on the command line using the command line override:
Valid values are listed in Table 7-1:
|Value||Coherence Edition||Compatible Editions|
GE, EE, SE
Note:clusters running different editions may connect by using Coherence*Extend as a Data Client.
The RTC nodes can connect to clusters using either Coherence TCMP or Coherence Extend. If the intention is to connect over Extend it is advisable to disable TCMP on that node to ensure that it only connects by using Extend. TCMP may be disabled using the system property
tangosol.coherence.tcmp.enabled. See the
<enabled> subelement of the
Operational configuration relates to the configuration of Coherence at the cluster level and includes such things as:
Cluster and cluster member settings
The operational aspects are normally configured by using the
tangosol-coherence-override.xml file. See Oracle Coherence Developer's Guide for more information on specifying an operational override file.
The contents of this file typically differs between development and production. It is recommended that these variants be maintained independently due to the significant differences between these environments. The production operational configuration file should not be the responsibility of the application developers, instead it should fall under the jurisdiction of the systems administrators who are far more familiar with the workings of the production systems.
All cluster nodes should use the same operational configuration descriptor. A centralized configuration file may be maintained and accessed by specifying the file's location as a URL using the
tangosol.coherence.override system property. Any node specific values may be specified by using system properties. See Oracle Coherence Developer's Guide for more information on the properties.
The override file should contain only the subset of configuration elements which you want to customize. This not only makes your configuration more readable, but enables you take advantage of updated defaults in future Coherence releases. All override elements should be copied exactly from the original tangosol-coherence.xml, including the id attribute of the element.
Member descriptors may be used to provide detailed identity information that is useful for defining the location and role of the cluster member. Specifying these items aids in the management of large clusters by making it easier to identify the role of a remote nodes if issues arise.
Cache configuration relates to the configuration of Coherence at a per-cache level including such things as:
Cache topology (
<near-scheme>, and so on)
Cache capacities (see
<high-units> subelement of <local-scheme>)
Cache redundancy level (
<backup-count> subelement of
The cache configuration aspects are normally configured by using the
coherence-cache-config.xml file. See Oracle Coherence Developer's Guide for more information on specifying a cache configuration file.
coherence-cache-config.xml file included within
coherence.jar is intended only as an example and is not suitable for production use. It is suggested that you produce your own cache configuration file with definitions tailored to your application needs.
All cluster nodes should use the same cache configuration descriptor. A centralized configuration file may be maintained and accessed by specifying the file's location as a URL using the
tangosol.coherence.cacheconfig system property.
Choose the cache topology which is most appropriate for each cache's usage scenario.
It is important to size limit your caches based on the allocated JVM heap size. Even if you never expect to fully load the cache, having the limits in place helps protect your application from
OutOfMemoryExceptions if your expectations are later negated.
For a 1GB heap that at most ¾ of the heap be allocated for cache storage. With the default one level of data redundancy this implies a per server cache limit of 375MB for primary data, and 375MB for backup data. The amount of memory allocated to cache storage should fit within the tenured heap space for the JVM. See Sun's GC tuning guide for details:
It is important to note that when multiple cache schemes are defined for the same cache service name, the first to be loaded dictates the service level parameters. Specifically the
<thread-count> subelements of
<distributed-scheme> are shared by all caches of the same service. it is recommended that a single service be defined and inherited by the various cache-schemes. If you want different values for these items on a cache by cache basis then multiple services may be configured.
For partitioned caches Coherence evenly distributes the storage responsibilities to all cache servers, regardless of their cache configuration or heap size. For this reason it is recommended that all cache server processes be configured with the same heap size. For computers with additional resources multiple cache servers may be used to effectively make use of the computer's resources.
To ensure even storage responsibility across a partitioned cache the
<partition-count> subelement of
<distributed-scheme>, should be set to a prime number which is at least the square of the number of expected cache servers.
For caches which are backed by a cache store it is recommended that the parent service be configured with a thread pool as requests to the cache store may block on I/O. The pool is enabled by using the
<thread-count> subelement of
<distributed-scheme> element. For non-CacheStore-based caches more threads are unlikely to improve performance and should left disabled.
Unless explicitly specified, all cluster nodes are storage enabled, that is, they act as cache servers. It is important to control which nodes in your production environment are storage enabled and storage disabled. The
tangosol.coherence.distributed.localstorage system property may be used to control storage, setting it to either
false. Generally, only dedicated cache servers should enable storage. All other cluster nodes should be configured as storage disabled. This is especially important for short lived processes which may join the cluster perform some work, and exit the cluster, having these nodes as storage enabled introduces unneeded re-partitioning. See the
<local-storage> subelement of
<distributed-scheme> for more information about the system property.
The general recommendation for the
<partition-count> subelement of
<distributed-scheme> is to be a prime number close to the square of the number of storage enabled nodes. While is a good suggestion for small to medium sized clusters, for large clusters it can add too much overhead. For clusters exceeding 128 storage enabled JVMs, the partition count should be fixed, at roughly 16,381.
Coherence clusters which consist of over 400 TCMP nodes must increase the default maximum packet size that Coherence uses. The default of 1468 should be increased relative to the size of the cluster, that is, a 600 node cluster would need the maximum packet size increased by 50%. A simple formula is to allow four bytes per node, that is,
maximum_packet_size >= maximum_cluster_size * 4B. The maximum packet size is configured as part of the coherence operational configuration file, see the
<packet-size> element for details on changing this setting.
For large clusters which have hundreds of JVMs, it is also recommended that
<multicast-listener> be enabled, as it allows for more efficient cluster wide transmissions. These cluster wide transmissions are rare, but when they do occur multicast can provide noticeable benefits in large clusters.
The Coherence death detection algorithms are based on sustained loss of connectivity between two or more cluster nodes. When a node identifies that it has lost connectivity with any other node, it consults with other cluster nodes to determine what action should be taken.
In attempting to consult with others, the node may find that it cannot communicate with any other nodes and assumes that it has been disconnected from the cluster. Such a condition could be triggered by physically unplugging a node's network adapter. In such an event, the isolated node restarts it's clustered services and attempts to rejoin the cluster.
If connectivity with other cluster nodes remains unavailable, the node may (depending on well known address configuration) form a new isolated cluster, or continue searching for the larger cluster. In either case, the previously isolated cluster nodes rejoins the running cluster when connectivity is restored. As part of rejoining the cluster, the nodes former cluster state is discarded, including any cache data it may have held, as the remainder of the cluster has taken on ownership of that data (restoring from backups).
It is obviously not possible for a node to identify the state of other nodes without connectivity. To a single node, local network adapter failure and network wide switch failure looks identical and are handled in the same way, as described above. The important difference is that for a switch failure all nodes are attempting to re-join the cluster, which is the equivalent of a full cluster restart, and all prior state and data is dropped.
Dropping all data is not desirable and, to avoid this as part of a sustained switch failure, you must take additional precautions. Options include:
Extend allowable outage duration: The maximum time a node(s) may be unresponsive before being removed from the cluster is configured by using the <
timeout-milliseconds> subelement of
<packet-delivery>, and defaults to one minute for production configurations. Increasing this value allows the cluster to wait longer for connectivity to return. The downside of increasing this value it may also take longer to handle the case where just a single node has lost connectivity.
Persist data to external storage: By using a Read Write Backing Map, the cluster persists data to external storage, and can retrieve it after a cluster restart. So long as write-behind is disabled (the
<write-delay> subelement of
<read-write-backing-map-scheme>) no data would be lost if a switch fails. The downside here is that synchronously writing through to external storage increases the latency of cache update operations, and the external storage may become a bottleneck.
Delay node restart: The cluster death detection action can be configured to delay the node restart until connectivity is restored. By delaying the restart until connectivity is restored an isolated node is allowed to continue running with whatever data it had available at the time of disconnect. When connectivity is restored, the nodes detect each other and form a new cluster. In forming a new cluster, all but the most senior node are required to restart. This results in behavior which is nearly identical to the default behavior because the majority of the nodes restart and drop their data. It may be beneficial for cases in which replicated caches are in use as the senior most node's copy of the data survives the restart. To enable the delayed restart, the
tangosol.coherence.departure.threshold system property must be set to a value that is greater then the size of the cluster.
Note:To ensure that Windows does not disable a network adapter when it is disconnected, add the following Windows registry
DWORD, setting it to
1:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\DisableDHCPMediaSense. This setting also affects static IPs despite the name.
Add network level fault tolerance: Adding a redundant layer to the cluster's network infrastructure allows for individual pieces of networking equipment to fail without disrupting connectivity. This is commonly achieved by using at least two network adapters per computer, and having each adapter connected to a separate switch. This is not a feature of Coherence but rather of the underlying operating system or network driver. The only change to Coherence is that it should be configured to bind to the virtual rather then physical network adapter. This form of network redundancy goes by different names depending on the operating system, see Linux bonding, Solaris trunking and Windows teaming for further details.