5 Introduction to Coherence Clusters

This chapter describes Coherence clusters and how cluster members use the Tangosol Cluster Management Protocol (TCMP) to communicate with each other. Clustered services are also described.

This chapter includes the following sections:

5.1 Cluster Overview

A Coherence cluster is a network of JVM processes that run Coherence. JVMs automatically join together to form a cluster and are called cluster members or cluster nodes. Cluster members communicate using the Tangosol Cluster Management Protocol (TCMP). Cluster members use TCMP for both multicast communication (broadcast) and unicast communication (point-to-point communication).

A cluster contains services that are shared by all cluster members. The services include connectivity services (such as the root Cluster service), cache services (such as the Distributed Cache service), and processing services (such as the Invocation service). Each cluster member can provide and consume such services. The first cluster member is referred to as the senior member and typically starts the core services that are required to create the cluster. If the senior member of the cluster is shutdown, another cluster member assumes the senior member role.

5.2 Understanding Clustered Services

Coherence functionality is based on the concept of services. Each cluster member can register, provide, and consume services. Multiple services can be running on a cluster member. A cluster member always contains a single root cluster service and can also contain any number of grid services. Grid services have a service name that uniquely identifies the service within the cluster and a service type that defines what the service can do. There may be multiple instances of each service type (other than the root cluster service).

The services are categorized below based on functionality. The categories are used for clarity and do not represent actual components or imply a relationship between services.

Connectivity Services

Cluster Service: This service is automatically started when a cluster node must join the cluster and is often referred to as the root cluster service; each cluster node always has exactly one service of this type running. This service is responsible for the detection of other cluster nodes, for detecting the failure of a cluster node, and for registering the availability of other services in the cluster.

Proxy Service: This service allows connections (using TCP) from clients that run outside the cluster. While many applications are configured so that all clients are also cluster members, there are many use cases where it is desirable to have clients running outside the cluster. Remote clients are especially useful in cases where there are hundreds or thousands of client processes, where the clients are not running on the Java platform, or where a greater degree of de-coupling is desired.

Processing Services

Invocation Service: This service provides clustered invocation and supports grid computing architectures. This services allows applications to invoke agents on any node in the cluster, or any group of nodes, or across the entire cluster. The agent invocations can be request/response, fire and forget, or an asynchronous user-definable model.

Data Services

Distributed Cache Service: This service allows cluster nodes to distribute (partition) data across the cluster so that each piece of data in the cache is managed (held) by only one cluster node. The Distributed Cache Service supports pessimistic locking. Additionally, to support failover without any data loss, the service can be configured so that each piece of data is backed up by one or more other cluster nodes. Lastly, some cluster nodes can be configured to hold no data at all; this is useful, for example, to limit the Java heap size of an application server process, by setting the application server processes to not hold any distributed data, and by running additional cache server JVMs to provide the distributed cache storage. For more information on distributed caches, see "Understanding Distributed Caches".
Federated Cache Service: This service is a version of the distributed cache service that replicates and synchronizes cached data across geographically dispersed clusters that are participants in a federation. Replication between clusters participants is controlled by the federation topology. Topologies include: active-active, active-passive, hub and spoke, and central replication. Custom topologies can also be created as required. For more information on federated caches, see Administering Oracle Coherence.

Replicated Cache Service: This is a synchronized replicated cache service that fully replicates all of its data to all cluster nodes that run the service. Replicated caches support pessimistic locking to ensure that all cluster members receive the update when data is modified. Replicated caches are often used to manage internal application metadata. For more information on replicated caches, see "Understanding Replicated Caches".

Optimistic Cache Service: This is an optimistic-concurrency version of the Replicated Cache Service that fully replicates all of its data to all cluster nodes and employs an optimization similar to optimistic database locking to maintain coherency. All servers end up with the same current value even if multiple updates occur at the same exact time from different servers. The Optimistic Cache Service does not support pessimistic locking; so, in general, it should only be used for caching most recently known values for read-only uses. This service is rarely used. For more information on optimistic caches, see "Understanding Optimistic Caches".

A clustered service can perform all tasks on the service thread, a caller's thread (if possible), or any number of daemon (worker) threads. Daemon threads are managed by a dynamic thread pool that provides the service with additional processing bandwidth. For example, the invocation service and the distributed cache service both support thread pooling to accelerate database load operations, parallel distributed queries, and agent invocations.

The above services are only the basic cluster services and not the full set of types of caches provided by Coherence. By combining clustered services with cache features, such as backing maps and overflow maps, Coherence provides an extremely flexible and configurable set of options for clustered applications.

Within a cache service, there exists any number of named caches. A named cache provides the standard JCache API, which is based on the Java collections API for key-value pairs, known as java.util.Map.

5.3 Understanding TCMP

TCMP is an IP-based protocol that is used to discover cluster members, manage the cluster, provision services, and transmit data.

Cluster Service Communication

TCMP for cluster service communication can be configured to use:

A combination of UDP/IP multicast and UDP/IP unicast. This is the default cluster protocol for cluster service communication.
UDP/IP unicast only (that is, no multicast). See "Disabling Multicast Communication". This configuration is used for network environments that do not support multicast or where multicast is not optimally configured.
TCP/IP only (no UDP/IP multicast or UDP/IP unicast). See "Using the TCP Socket Provider". This configuration is used for network environments that favor TCP.
SDP/IP only (no UDP/IP multicast or UDP/IP unicast). See "Using the SDP Socket Provider". This configuration is used for network environments that favor SDP.
SSL over TCP/IP or SDP/IP. See "Using the SSL Socket Provider". This configuration is used for network environments that require highly secure communication between cluster members.

Data Service Communication

TCMP for data service communication can be configured to use a reliable transport:

datagram – Specifies the use of the TCMP reliable UDP protocol.
tmb (default)– Specifies the TCP Message Bus (TMB) protocol. TMB provides support for TCP/IP.
tmbs – TCP/IP message bus protocol with SSL support. TMBS requires the use of an SSL socket provider. See"Using the SSL Socket Provider."
sdmb – Specifies the Sockets Direct Protocol Message Bus (SDMB).
sdmbs – SDP message bus with SSL support. SDMBS requires the use of an SSL socket provider. See"Using the SSL Socket Provider."
imb (default on Exalogic) – InfiniBand message bus (IMB). IMB is automatically used on Exalogic systems as long as TCMP has not been configured with SSL.

Use of Multicast

The cluster protocol makes very minimal and judicious use of multicast. Multicast is used as follows:

Cluster discovery: Multicast is used to discover if there is a cluster running that a new member can join. Only the two most senior cluster members join a multicast group. Additional cluster members do not need to join the multicast group to discover the cluster.
Cluster heartbeat: The most senior member in the cluster issues a periodic heartbeat through multicast; the rate can be configured and defaults to one per second. The cluster heart beat is used to detect member failure and helps avoid when a cluster splits into two independently operating clusters (referred to as split brain).

Use of Unicast

Cluster members use unicast for direct member-to-member (point-to-point) communication, which makes up the majority of communication on the cluster. Unicast is used as follows:

Data transfer: Unicast is used to transfer data between service members on a shared bus instance.
Message delivery: Unicast is used for communication such as asynchronous acknowledgments (ACKs), asynchronous negative acknowledgments (NACKs) and peer-to-peer heartbeats.
Under some circumstances, a message may be sent through unicast even if the message is directed to multiple members. This is done to shape traffic flow and to reduce CPU load in very large clusters.
All communication is sent using unicast if multicast communication is disabled.

Use of TCP

TCP is used as follows:

TCMP uses a shared TCP/IP Message Bus (TMB) for data transfers.
A TCP/IP ring is used as an additional death detection mechanism to differentiate between actual node failure and an unresponsive node (for example, when a JVM conducts a full GC).

Protocol Reliability

Bus-based transport protocols inherently support message reliability. For UDP transport, Coherence provides fully reliable and in-order delivery of all messages; Coherence uses a queued and fully asynchronous ACK- and NACK-based mechanism for reliable delivery of messages with unique integral identity for guaranteed ordering of messages.

Protocol Resource Utilization

The TCMP protocol (as configured by default) requires only three sockets (one multicast, two unicast) and six threads per JVM, regardless of the cluster size. This is a key element in the scalability of Coherence; regardless of the number of servers, each node in the cluster still communicates either point-to-point or with collections of cluster members without requiring additional network connections.

The optional TCP/IP ring uses a few additional TCP/IP sockets.

Note:

For TCMP/TMB, which is the default protocol for point-to-point data communication, each cluster member binds to a single port.

Protocol Tunability

The TCMP protocol is very tunable to take advantage of specific network topologies, or to add tolerance for low-bandwidth and high-latency segments in a geographically distributed cluster. Coherence comes with a pre-set configuration. Some TCMP attributes are dynamically self-configuring at run time, but can also be overridden and locked down for deployment purposes.