Cluster Features and Infrastructure


	Corporate Info \| News \| Solutions \| Products \| Partners \| Services \| Events \| Download \| How To Buy
	e-docs \| Site Map \| Search \| Contact \| Glossary \| WebLogic Server Documentation

Using WebLogic Server Clusters: Previous topic \| Next topic \| Contents

This topic includes the following sections:

Overview

The following sections describe the infrastructure that a WebLogic Server cluster uses to support clustered objects and HTTP session states. The sections also describe the common features-load balancing and failover-that are available to APIs and services running in a WebLogic Server cluster. Understanding these topics is important for planning and configuring a WebLogic Server cluster that meets the needs of your web application.

Server Communication in a Cluster

WebLogic Server instances in a cluster communicate with one another using two basic network technologies:

IP multicast, which broadcasts all one-to-many communications among clustered WebLogic Server instances.
IP sockets, which act as the conduits for peer-to-peer communication between clustered server instances.

The way in which WebLogic Server uses IP multicast and socket communication has a direct implication on the way you plan and configure your cluster.

One-to-Many Communication Using IP Multicast

IP multicast is a simple broadcast technology that enables multiple applications to "subscribe" to a given IP address and port number and listen for messages. A multicast address is an IP address in the range from 224.0.0.0 to 239.255.255.255.

IP multicast provides a simple method to broadcast messages to applications, but it does not guarantee that messages are actually received. If an application's local multicast buffer is full, new multicast messages cannot be written to the buffer and the application is not notified as to when messages are "dropped." Because of this limitation, WebLogic Servers account for the possibility that they may occasionally miss messages that were broadcast over IP multicast.

WebLogic Server uses IP multicast for all one-to-many communication among server instances in a cluster. This includes:

Cluster-wide JNDI updates-all servers use multicast to announce the availability of clustered objects that are deployed or removed locally. Servers monitor these announcements so that they can update their local JNDI tree to reflect current deployments of clustered objects. See Cluster-Wide JNDI Naming Service for more details.
Cluster "heartbeats-" WebLogic Server uses multicast to broadcast regular "heartbeat" messages that advertise the availability of individual server instances in a cluster. All servers in the cluster listen to heartbeat messages as a way to determine when a server has failed. (Clustered servers also monitor IP sockets as a more immediate method of determining when a server has failed.) See Failover Support for Clustered Services for more information.

Implications for Cluster Planning and Configuration

Because multicast controls critical functions related to detecting failures and maintaining the cluster-wide JNDI tree, it is important that neither the cluster configuration nor the basic network topology interfere with multicast communication. Always consider the following rules when configuring or planning a WebLogic Server cluster.

Multicast Requirements for WAN Clustering

For most deployments, limiting clustered servers to a single subnet ensures that multicast messages are reliably transmitted. In special cases, however, you may want to distribute a WebLogic Server cluster across subnets in a Wide Area Network (WAN). This may be desirable to increase redundancy in a clustered deployment, or to distribute clustered instances over a larger geographical area.

If you choose to distribute a cluster over a WAN (or across multiple subnets), you must plan and configure your network topology to ensure that multicast messages are reliably transmitted to all servers in the cluster. Specifically, your network must meet the following requirements:

The network must fully support IP multicast packet propagation. In other words, all routers and other tunneling technologies must be configured to propagate multicast messages to clustered instances.
The network latency must be sufficiently small as to ensure that most multicast messages reach their final destination in 200 to 300 milliseconds.
The multicast Time To Live (TTL) value must be high enough to ensure that routers do not discard multicast packets before they reach their final destination.
Note: Distributing a WebLogic Server cluster over a WAN may require network facilities in addition to the multicast requirements described above. For example, you may want to configure load balancing hardware to ensure that client requests are directed to servers in the most efficient manner (to avoid unnecessary network hops).

To configure the multicast TTL for a cluster, change the Multicast TTL value in the WebLogic Server administration console. This sets the number of network hops a multicast message makes before the packet can be discarded. The config.xml excerpt below shows a cluster with a Multicast TTL value of three. This value ensures that the cluster's multicast messages can pass through three routers before being discarded:

<Cluster

  Name="testcluster"

  ClusterAddress="wanclust"

  MulticastAddress="wanclust-multi"

  MulticastTTL="3"

/>

Firewalls Can Break Multicast Communication

Although it may be possible to tunnel multicast traffic through a firewall, this practice is not recommended for WebLogic Server clusters. Each WebLogic Server cluster should be treated as a logical unit that provides one or more distinct services to clients of a web application. Such a logical unit should not be split between different security zones. Furthermore, any technologies that can potentially delay or interrupt IP traffic can prove disruptive to a WebLogic Server cluster by generating false failures due to missed heartbeats.

Use an Exclusive Multicast Address for WebLogic Server Clusters

Although multiple WebLogic Server clusters can share a single IP multicast address and port number, other applications should not broadcast or subscribe to the same address. "Sharing" a multicast address with other applications forces clustered servers to process unnecessary messages, introducing overhead to the system.

Sharing a multicast address may also overload the IP multicast buffer and delay transmission of WebLogic Server heartbeat messages. Such delays can potentially result in a WebLogic Server instance being marked as failed, simply because its heartbeat messages were not received in a timely manner.

For these reasons, assign a dedicated multicast address for use by WebLogic Server clusters, and ensure that the address can support the broadcast traffic of all clusters that use the address.

If Multicast Storms Occur

If server instances in a cluster do not process incoming messages on a timely basis, increased network traffic, including NAK messages and heartbeat re-transmissions, can result. The repeated transmission of multicast packets on a network is referred to as a multicast storm, and can stress the network and attached stations, potentially causing end-stations to hang or fail. Increasing the size of the multicast buffers can improve the rate at which announcements are transmitted and received, and prevent multicast storms.

If multicast storms occur because server instances in a cluster are not processing incoming messages on a timely basis, you can increase the size of multicast buffers.

TCP/IP kernel parameters can be configured with the UNIX ndd utility. The udp_max_buf parameter controls the size of send and receive buffers (in bytes) for a UDP socket. The appropriate value for udp_max_buf varies from deployment to deployment. If you are experiencing multicast storms, increase the value of udp_max_buf by 32K, and evaluate the effect of this change.

Do not change udp_max_buf unless necessary. Before changing udp_max_buf, read the Sun warning in the "UDP Parameters with Additional Cautions" section in the "TCP/IP Tunable Parameters" chapter in Solaris Tunable Parameters Reference Manual at http://docs.sun.com/?p=/doc/806-6779/6jfmsfr7o&.

Peer-to-Peer Communication Using IP Sockets

While one-to-many communication among clustered servers takes place using multicast, peer-to-peer communication between WebLogic Server instances uses IP sockets. IP sockets provide a simple, high-performance mechanism for transferring messages and data between two applications. WebLogic Server instances in a cluster may use IP sockets for:

Accessing non-clustered objects that reside on a remote server instance in the cluster.
Replicating HTTP session states and stateful session EJB states between a primary and secondary server for high availability.
Accessing clustered objects that reside on a remote server instance. (This generally occurs only in a multi-tier cluster architecture, as described in Recommended Multi-tier Architecture.)
Note: The use of IP sockets in WebLogic Server actually extends beyond the cluster scenario-all RMI communication takes place using sockets, for example, when a remote Java client application accesses a remote object.

Proper socket configuration is crucial to the performance of a WebLogic Server cluster. Two factors determine the efficiency of socket communications in WebLogic Server:

Whether the server's host system uses a native or a pure-Java socket reader implementation.
For systems that use Java socket readers, whether or not the server is configured to use enough socket reader threads.

Pure-Java Versus Native Socket Reader Implementations

Although the pure-Java implementation of socket reader threads provides a reliable and portable method of peer-to-peer communication, it does not provide the best performance for heavy-duty socket usage in a WebLogic Server cluster. With pure-Java socket readers, threads must actively poll all opened sockets to determine if they contain data to read. In other words, socket reader threads are always "busy" polling sockets, even if the sockets have no data to read.

This problem is magnified when a server has more open sockets than it has socket reader threads. In this case, each reader thread must poll more than one open socket, waiting for a timeout condition to determine that the socket is inactive. After a timeout, the thread moves to another waiting socket, as shown below.

When the number of opened sockets outnumbers the available socket reader threads, active sockets may go unserviced until an available reader thread polls them.

For best socket performance, always configure the WebLogic Server host machine to use the native socket reader implementation for your operating system, rather than the pure-Java implementation. Native socket readers use far more efficient techniques to determine if there is data to read on a socket. With a native socket reader implementation, reader threads do not need to poll inactive sockets-they service only active sockets, and they are immediately notified (via an interrupt) when a given socket becomes active.

Configuring Native Sockets

To use native socket reader threads with WebLogic Server:

Open the Administration Console.
Select the Servers node.
Select the server to configure.
Select the Tuning tab.
Check the Enable Native IO box.
Apply the changes.
Note: Applets cannot make use of native socket reader implementations, and therefore have limited efficiency in socket communication.

Configuring Reader Threads for Java Socket Implementation

If you do use the pure-Java socket reader implementation, you can still improve the performance of socket communication by configuring the proper number of socket reader threads. For best performance, the number of socket reader threads in WebLogic Server should equal the potential maximum number of opened sockets. This avoids "sharing" a reader thread with more than one socket, and ensures that socket data is read immediately.

Determining Potential Socket Usage

Each WebLogic Server instance can potentially open a socket for every other server instance in the cluster. However, the actual maximum number of sockets used at a given time is determined by the configuration of your cluster. In practice, clustered systems generally do not open a socket for every other server instance, due to the way in which clustered services are deployed.

For example, if your cluster uses in-memory HTTP session state replication, and you deploy only clustered objects to all WebLogic Server instances, each server potentially opens a maximum of only two sockets, as shown below.

The two sockets in the above example are used to replicate HTTP session states between primary and secondary servers. Sockets are not required for accessing clustered objects, due to the collocation optimizations that WebLogic Server uses to access those objects. In this configuration, the default socket reader thread configuration is sufficient.

If you pin non-clustered RMI objects to particular servers, the potential maximum number sockets increases, because server instances may need to open additional sockets to access the pinned object. (This potential can only be released if a remote server actually looks up the pinned object.) The figure below shows the potential affect of deploying a non-clustered RMI object to Server A.

In this example, each server can potentially open a maximum of three sockets at a given time, to accommodate HTTP session state replication and to access the pinned RMI object on Server A.

Note: Additional sockets may also be required for servlet clusters in a multi-tier cluster architecture, as described in Recommended Multi-tier Architecture.

Setting the Number of Reader Threads

By default, WebLogic Server creates three socket reader threads upon booting. If you determine that your cluster system may utilize more than three sockets during peak periods, increase the number of socket reader threads using these instructions:

Open the Administration Console.
Select the Servers node.
Select the server to configure.
Select the Tuning tab.
Edit the percentage of Java reader threads in the Socket Readers attribute field. The number of Java socket readers is computed as a percentage of the number of total execute threads (as shown in the Execute Threads attribute field).
Apply the changes.

Client Communication via Sockets

Java client applications in WebLogic Server version 6.0 can potentially open more IP sockets than clients of previous WebLogic Server versions, even when clients connect through a firewall. Whereas in versions 4.5 and 5.1 java clients connecting to a cluster through a firewall utilized a single socket, WebLogic Server version 6.0 imposes no such restrictions. If clients make requests of multiple server instances in a cluster (either explicitly or by accessing "pinned" objects), the client opens individual sockets to each server.

Because Java clients can potentially open multiple sockets to a WebLogic Server cluster, it is important to configure enough socket reader threads, as described in Configuring Reader Threads for Java Socket Implementation.

Note: Browser-based clients and Applets connecting to a WebLogic Server version 6.0 cluster use only a single IP socket.

Cluster-Wide JNDI Naming Service

Clients of an individual WebLogic Server access objects and services by using a JNDI-compliant naming service. The JNDI naming service contains a list of the public services that the server offers, organized in a "tree" structure. A WebLogic Server offers a new service by binding into the JNDI tree a name that represents the service. Clients obtain the service by connecting to the server and looking up the bound name of the service.

Server instances in a cluster utilize a cluster-wide JNDI tree. A cluster-wide JNDI tree is similar to a single server JNDI tree, insofar as the tree contains a list of available services. In addition to storing the names of local services, however, the cluster-wide JNDI tree stores the services offered by clustered objects (EJBs and RMI classes) from other servers in the cluster.

Each WebLogic Server instance in a cluster creates and maintains a local copy of the logical cluster-wide JNDI tree. By understanding how the cluster-wide naming tree is maintained, you can better diagnose naming conflicts that may occur in a clustered environment.

Creating the Cluster-Wide JNDI Tree

Each WebLogic Server in a cluster builds and maintains its own local copy of the cluster-wide JNDI tree, which lists the services offered by all members of the cluster. Creating a cluster-wide JNDI tree begins with the local JNDI tree bindings of each server instance. As a server boots (or as new services are dynamically deployed to a running server), the server first binds the implementations of those services to the local JNDI tree. The implementation is bound into the JNDI tree only if no other service of the same name exists.

Once the server successfully binds a service into the local JNDI tree, additional steps are taken for clustered objects that use replica-aware stubs. After binding a clustered object's implementation into the local JNDI tree, the server sends the object's stub to other members of the cluster. Other members of the cluster monitor the multicast address to detect when remote servers offer new services.

The example above shows a snapshot of the JNDI binding process. Server A has successfully bound an implementation of clustered Object X into its local JNDI tree. Because object X is clustered, it offers this service to all other members of the cluster. Server C is still in the process of binding an implementation of Object C.

Other servers in the cluster listening to the multicast address observe that Server A offers a new service for clustered object, X. These servers update their local JNDI trees to include the new service.

Updating the local JNDI bindings occurs in one of two ways:

If the clustered service is not yet bound in the local JNDI tree, the server binds a new replica-aware stub into the local tree that indicates the availability of Object X on Server A. Servers B and D would update their local JNDI trees in this manner, because the clustered object is not yet deployed on those servers.
If the server already has a binding for the cluster-aware service, it updates its local JNDI tree to indicate that a replica of the service is also available on Server A. Server C would update its JNDI tree in this manner, because it will already have a binding for the clustered object X.

In this manner, each server in the cluster creates its own copy of a cluster-wide JNDI tree. The same process would be used when Server C announces that object X has been bound into its local JNDI tree. After all broadcast messages are received, each server in the cluster would have identical to indicate availability of the object on servers A and C, as shown below.

Note: In an actual cluster system, Object X would be deployed homogeneously, and an implementation would be available on all four servers.

Handling JNDI Naming Conflicts

Simple JNDI naming conflicts occur when a server attempts to bind a non-clustered service that uses the same name as a non-clustered service already bound in the JNDI tree. Cluster-level JNDI conflicts occur when a server attempts to bind a clustered object that uses the name of a non-clustered object already bound in the JNDI tree.

WebLogic Server detects simple naming conflicts (of non-clustered services) when those services are bound to the local JNDI tree. Cluster-level JNDI conflicts may occur when new services are advertised over multicast. For example, if you deploy a pinned RMI object on one server in the cluster, you cannot deploy a replica-aware version of the same object on another server instance.

If two servers in a cluster attempt to bind different clustered objects using the same name, both will succeed in binding the object locally. However, each server will refuse to bind the other server's replica-aware stub in to the JNDI tree, due to the JNDI naming conflict. A conflict of this type would remain until one of the two servers was shut down, or until one of the servers undeployed the clustered object. This same conflict could also occur if both servers attempt to deploy a pinned object with the same name.

Homogeneous Deployment

To avoid cluster-level JNDI conflicts, you must deploy all replica-aware objects to all WebLogic Server instances in a cluster (homogeneous deployment). Having unbalanced deployments across WebLogic Server instances increases the chance of JNDI naming conflicts during startup or redeployment. It can also lead to unbalanced processing loads in the cluster.

If you must pin specific RMI objects or EJBs to individual servers, make sure you do not replicate the object's bindings across the cluster.

Updating the JNDI Tree

If a clustered object is removed (undeployed from a server), updates to the JNDI tree are handled similar to the way in which new services are added. The WebLogic Server on which the service was undeployed broadcasts a message indicating that it no longer provides the service. Again, other servers in the cluster that observe the multicast message update their local copies of the JNDI tree to indicate that the service is no longer available on the server that undeployed the object.

Once the client has obtained a replica-aware stub, the server instances in the cluster may continue adding and removing host servers for the clustered objects, as described in Updating the JNDI Tree. As the information in the JNDI tree changes, the client's stub may also be updated. Subsequent RMI requests contain update information as necessary to ensure that the client stub remains up-to-date.

Client Interaction with the Cluster-wide JNDI Tree

Clients that connect to a WebLogic Server cluster and look up a clustered object obtain a replica-aware stub for the object. This stub contains the list of available server instances that host implementations of the object. The stub also contains the load balancing logic for distributing the load among its host servers. ( Understanding Object Clustering provides more details about replica-aware stubs for EJBs and RMI classes.)

Load Balancing of Clustered Services

In order for a cluster to be scalable, it must ensure that each server is fully utilized. The standard technique for accomplishing this is load balancing. The basic idea behind load balancing is that by distributing the load proportionally among all the servers in the cluster, the servers can each run at full capacity. The trick to load-balancing is coming up with a technique that is simple yet sufficient. If all servers in the cluster are the same power and offer the same services, it is possible to use a very simple algorithm that requires no knowledge of the servers. If the servers vary in power or in the kind of services they deploy, the algorithm must take into account these differences.

Load Balancing for HTTP Session States

Load balancing for servlet and JSP HTTP session states can be accomplished either by using separate load balancing hardware or by using the built-in load balancing capabilities of a WebLogic proxy plug-in.

For clusters that utilize a bank of web servers and WebLogic proxy plug-ins, the proxy plug-ins provide only a round-robin algorithm for distributing requests to servlets and JSPs in a cluster. This load balancing method is described below in Round-Robin (Default).

Clusters that utilize a hardware load balancing solution can utilize any load balancing algorithms supported by the hardware. These may include advanced load-based balancing strategies that monitor the utilization of individual machines.

Load Balancing for Clustered Objects

WebLogic Server clusters support several algorithms for load balancing clustered objects. The particular algorithm you choose is maintained within the replica-aware stub obtained for the clustered object. Configurable algorithms for load balancing clustered objects are:

Round-robin
Weight-based
Random

Round-Robin (Default)

WebLogic Server uses the round-robin algorithm as the default load balancing strategy for clustered object stubs when no algorithm is specified. Round-robin is the only load balancing strategy used by WebLogic proxy plug-ins for HTTP session state clustering.

The round-robin algorithm cycles through a list of WebLogic Server instances in order. For clustered objects, the server list consists of WebLogic Server instances that host the clustered object. For proxy plug-ins, the list consists of all WebLogic Servers that host the clustered servlet or JSP.

The advantages of this algorithm are that it is simple, cheap and very predictable. The primary disadvantage is that there is some chance of convoying. Convoying occurs when one server is significantly slower than the others. Because replica-aware stubs or proxy plug-ins access the servers in the same order, one slow server can cause requests to "synchronize" on the server, then follow other servers in order for future requests.

Weight-Based

The weight-based algorithm applies only to object clustering. The algorithm improves on the round-robin algorithm by taking into account a pre-assigned weight for each server. Each server in the cluster is assigned a weight in the range (1-100) using the Cluster Weight field in the WebLogic Server administration console. This is a declaration of what proportion of the load the server will bear relative to other servers. If all servers have either the default weight (100) or the same weight, they will each bear an equal proportion of the load. If one server has weight 50 and all other servers have weight 100, the 50-weight server will bear half as much as any other server. This algorithm makes it possible to apply the advantages of the round-robin algorithm to clusters that are not homogeneous.

If you use the weight-based algorithm, you should spend some time to accurately determine the relative weights to assign to each server instance. Factors that could affect a server's assigned weight include:

The processing capacity of the server's hardware in relationship to other servers (for example, the number and performance of CPUs dedicated to WebLogic Server).
The number of non-clustered ("pinned") objects each server hosts.

If you change the specified weight of a server and reboot it, the new weighting information is propagated throughout the cluster via the replica-aware stubs. See Cluster-Wide JNDI Naming Service for more information.

Random

This algorithm applies only to object clustering. The algorithm chooses the next replica at random. This will tend to distribute calls evenly among the replicas. It is only recommended in a clusters where each server has the same power and hosts the same services. The advantages are that it is simple and relatively cheap. The primary disadvantage is that there is a small cost to generating a random number on every request, and there is a slight probability that the load will not be evenly balanced over a small number or runs.

Using Parameter-based Routing for Clustered Objects

It is also possible to gain finer grain control over load balancing. Any clustered object can be assigned a CallRouter. This is a class that is called before each invocation with the parameters of the call. The CallRouter is free to examine the parameters and return the name server to which the call should be routed. See The WebLogic Cluster API for information about creating custom CallRouter classes.

Failover Support for Clustered Services

In order for a cluster to provide high availability it must be able to recover from service failures. This section describes how WebLogic Server detect failures in a cluster, and provides an overview of how failover works for replicated HTTP session states and clustered objects.

How WebLogic Server Detects Failures

WebLogic Server instances in a cluster detect failures of their peer server instances by monitoring:

Socket connections to a peer server
Regular server "heartbeat" messages

Failure Detection Using IP Sockets

WebLogic Servers monitor the use of IP sockets between peer server instances as an immediate method of detecting failures. If a server connects to one of its peers in a cluster and begins transmitting data over a socket, an unexpected closure of that socket causes the peer server to be marked as "failed," and its associated services are removed from the JNDI naming tree.

The WebLogic Server "Heartbeat"

If clustered server instances do not have opened sockets for peer-to-peer communication, failed servers may also be detected via the WebLogic Server "heartbeat." All server instances in a cluster use multicast to broadcast regular server "heartbeat" messages to other members of the cluster. Each server heartbeat contains data that uniquely identifies the server that sends the message. Servers broadcast their heartbeat messages at regular intervals of 10 seconds. In turn, each server in a cluster monitors the multicast address to ensure that all peer servers' heartbeat messages are being sent.

If a server monitoring the multicast address misses three heartbeats from a peer server (i.e., if it does not receive a heartbeat from the server for 30 seconds or longer), the monitoring server marks the peer server as "failed." It then updates its local JNDI tree, if necessary, to retract the services that were hosted on the failed server.

In this way, servers can detect failures even if they have no sockets open for peer-to-peer communication.

Failover for Clustered Servlets and JSPs

For clusters that utilize web servers with WebLogic proxy plug-ins, the proxy plug-in handles failover transparently to the client. If a given server fails, the plug-in locates the replicated HTTP session state on a secondary server and redirects the client's request accordingly.

For clusters that use a supported hardware load balancing solution, the load balancing hardware simply redirects client requests to any available server in the WebLogic Server cluster. The cluster itself obtains the replica of the client's HTTP session state from a secondary server in the cluster.

Understanding HTTP Session State Replication describes the fail over procedure for replicated HTTP session states in more detail.

Failover for Clustered Objects

For clustered objects, failover is accomplished using the object's replica-aware stub. When a client makes a call through a replica-aware stub to a service that fails, the stub detects the failure and retries the call on another replica.

Idempotent Objects

With clustered objects, automatic failover generally occurs only in cases where there the object is idempotent. An object is idempotent if any method can be called multiple times with no different effect than calling the method once. This is always true for methods that have no permanent side effects. Methods that do have side effects have to be written specially with idempotence in mind.

Consider a shopping cart service call addItem() that adds an item to a shopping cart. Suppose client C invokes this call on a replica on server S1. After S1 receives the call, but before it successfully returns to C, S1 crashes. At this point the item has been added to the shopping cart, but the replica-aware stub has received an exception. If the stub were to retry the method on server S2, the item would be added a second time to the shopping cart. Because of this, replica-aware stubs will not, by default, attempt to retry a method that fails after the request is sent but before it returns. This behavior can be overridden by marking a service idempotent.

Other Failover Exceptions

Even if a clustered object is not idempotent, WebLogic Server performs automatic failover in the case of a ConnectException or MarshalException. Either of these exceptions indicates that the object could not have been modified, and therefore there is no danger of causing data inconsistency by failing over to another instance.