Using WebLogic Server Clusters

Failover and Replication in a Cluster

In order for a cluster to provide high availability it must be able to recover from service failures. The following sections describe how WebLogic Server detect failures in a cluster, and provides an overview of how failover is accomplished for different types of objects:

How WebLogic Server Detects Failures

WebLogic Server instances in a cluster detect failures of their peer server instances by monitoring:

Socket connections to a peer server
Regular server heartbeat messages

Failure Detection Using IP Sockets

WebLogic Server instances monitor the use of IP sockets between peer server instances as an immediate method of detecting failures. If a server connects to one of its peers in a cluster and begins transmitting data over a socket, an unexpected closure of that socket causes the peer server to be marked as "failed," and its associated services are removed from the JNDI naming tree.

The WebLogic Server "Heartbeat"

If clustered server instances do not have opened sockets for peer-to-peer communication, failed servers may also be detected via the WebLogic Server heartbeat. All server instances in a cluster use multicast to broadcast regular server heartbeat messages to other members of the cluster. Each heartbeat message contains data that uniquely identifies the server that sends the message. Servers broadcast their heartbeat messages at regular intervals of 10 seconds. In turn, each server in a cluster monitors the multicast address to ensure that all peer servers' heartbeat messages are being sent.

If a server monitoring the multicast address misses three heartbeats from a peer server (i.e., if it does not receive a heartbeat from the server for 30 seconds or longer), the monitoring server marks the peer server as "failed." It then updates its local JNDI tree, if necessary, to retract the services that were hosted on the failed server.

In this way, servers can detect failures even if they have no sockets open for peer-to-peer communication.

Note: For more information about how WebLogic Server uses IP sockets and multicast communications see WebLogic Server Communication in a Cluster.

Replication and Failover for Servlets and JSPs

To support automatic replication and failover for servlets and JSPs within a cluster, Weblogic Server supports two mechanisms for preserving HTTP session state:

hardware load balancers

For clusters that use a supported hardware load balancing solution, the load balancing hardware simply redirects client requests to any available server in the WebLogic Server cluster. The cluster itself obtains the replica of the client's HTTP session state from a secondary server in the cluster.

proxy plug-ins

In clusters that utilize Web servers with WebLogic proxy plug-ins, the proxy plug-in handles failover transparently to the client. If a server fails, the plug-in locates the replicated HTTP session state on a secondary server and redirects the client's request accordingly.

This section covers the following topics:

HTTP Session State Replication

Weblogic Server uses two methods for replicating HTTP session state across clusters:

in-memory replication

Using in-memory replication, WebLogic Server copies a session state from one server instance to another. The primary server creates a primary session state on the server to which the client first connects, and a secondary replica on another WebLogic Server instance in the cluster. The replica is kept up-to-date so that it may be used if the server that hosts the servlet fails.

JDBC-based persistence

In JDBC-based persistence, WebLogic Server maintains the HTTP session state of a servlet or JSP using file-based or JDBC-based persistence. For more information on these persistence mechanisms, see Configuring Session Persistence in Programming WebLogic HTTP Servlets.
JDBC-based persistence is also used for HTTP session state replication within a Wide Area Network (WAN). For more information, see WAN HTTP Session State Replication.

The following section describe session state replication using in-memory replication.

Requirements for HTTP Session State Replication

To utilize in-memory replication for HTTP session states, you must access the WebLogic Server cluster using either a collection of Web servers with identically configured WebLogic proxy plug-ins, or load balancing hardware.

Supported Server and Proxy Software

The WebLogic proxy plug-in maintains a list of WebLogic Server instances that host a clustered servlet or JSP, and forwards HTTP requests to those instances using a round-robin strategy. The plug-in also provides the logic necessary to locate the replica of a client's HTTP session state if a WebLogic Server instance should fail.

In-memory replication for HTTP session states is supported by the following Web servers and proxy software:

WebLogic Server with the HttpClusterServlet
Netscape Enterprise Server with the Netscape (proxy) plug-in
Apache with the Apache Server (proxy) plug-in
Microsoft Internet Information Server with the Microsoft-IIS (proxy) plug-in

For instructions on setting up proxy plug-ins, see Configure Proxy Plug-Ins.

Load Balancer Requirements

If you choose to use load balancing hardware instead of a proxy plug-in, it must support a compatible passive or active cookie persistence mechanism, and SSL persistence. For details on these requirements, see Load Balancer Configuration Requirements. For instructions on setting up a load balancer, see Configuring Load Balancers that Support Passive Cookie Persistence.

Programming Considerations for Clustered Servlets and JSPs

This section highlights key programming constraints and recommendations for servlets and JSPs that you will deploy in a clustered environment.

Session Data Must Be Serializable

To support in-memory replication of HTTP session states, all servlet and JSP session data must be serializable.

Note: Serialization is the process of converting a complex data structure, such as a parallel arrangement of data (in which a number of bits are transmitted at a time along parallel channels) into a serial form (in which one bit at a time is transmitted); a serial interface provides this conversion to enable data transmission.

Every field in an object must be serializable or transient in order for the object to be considered serializable. If the servlet or JSP uses a combination of serializable and non-serializable objects, WebLogic Server does not replicate the session state of the non-serializable objects.

Use setAttribute to Change Session State

In an HTTP servlet that implements javax.servlet.http.HttpSession, use HttpSession.setAttribute (which replaces the deprecated putValue) to change attributes in a session object. If you set attributes in a session object with setAttribute, the object and its attributes are replicated in a cluster using in-memory replication. If you use other set methods to change objects within a session, WebLogic Server does not replicate those changes. Every time a change is made to an object that is in the session, setAttribute() should be called to update that object across the cluster.
Likewise, use removeAttribute (which, in turn, replaces the deprecated removeValue) to remove an attribute from a session object.

Note: Use of the deprecated putValue and removeValue methods will also cause session attributes to be replicated.

Consider Serialization Overhead

Serializing session data introduces some overhead for replicating the session state. The overhead increases as the size of serialized objects grows. If you plan to create very large objects in the session, test the performance of your servlets to ensure that performance is acceptable.

Control Frame Access to Session Data

If you are designing a Web application that utilizes multiple frames, keep in mind that there is no synchronization of requests made by frames in a given frameset. For example, it is possible for multiple frames in a frameset to create multiple sessions on behalf of the client application, even though the client should logically create only a single session.
In a clustered environment, poor coordination of frame requests can cause unexpected application behavior. For example, multiple frame requests can "reset" the application's association with a clustered instance, because the proxy plug-in treats each request independently. It is also possible for an application to corrupt session data by modifying the same session attribute via multiple frames in a frameset.
To avoid unexpected application behavior, carefully plan how you access session data with frames. You can apply one of the following general rules to avoid common problems:

In a given frameset, ensure that only one frame creates and modifies session data.
Always create the session in a frame of the first frameset your application uses (for example, create the session in the first HTML page that is visited). After the session has been created, access the session data only in framesets other than the first frameset.

Using Replication Groups

By default, WebLogic Server attempts to create session state replicas on a different machine than the one that hosts the primary session state. You can further control where secondary states are placed using replication groups. A replication group is a preferred list of clustered servers to be used for storing session state replicas.

Using the WebLogic Server Console, you can define unique machine names that will host individual server instances. These machine names can be associated with new WebLogic Server instances to identify where the servers reside in your system.

Machine names are generally used to indicate servers that run on the same machine. For example, you would assign the same machine name to all server instances that run on the same machine, or the same server hardware.

If you do not run multiple WebLogic Server instances on a single machine, you do not need to specify WebLogic Server machine names. Servers without a machine name are treated as though they reside on separate machines. For detailed instructions on setting machine names, see Configure Machine Names.

When you configure a clustered server instance, you can assign the server to a replication group, and a preferred secondary replication group for hosting replicas of the primary HTTP session states created on the server.

When a client attaches to a server in the cluster and creates a primary session state, the server hosting the primary state ranks other servers in the cluster to determine which server should host the secondary. Server ranks are assigned using a combination of the server's location (whether or not it resides on the same machine as the primary server) and its participation in the primary server's preferred replication group. The following table shows the relative ranking of servers in a cluster.

Table 6-1 Ranking Server Instances for Session Replication

Server Rank	Server Resides on a Different Machine	Server is a Member of Preferred Replication Group
1	Yes	Yes
2	No	Yes
3	Yes	No
4	No	No

Using these rules, the primary WebLogic Server ranks other members of the cluster and chooses the highest-ranked server to host the secondary session state. For example, the following figure shows replication groups configured for different geographic locations.

Figure 6-1 Replication Groups for Different Geographic Locations

Replication Groups for Different Geographic Locations

In this example, Servers A, B, and C are members of the replication group "Headquarters" and use the preferred secondary replication group "Crosstown." Conversely, Servers X, Y, and Z are members of the "Crosstown" group and use the preferred secondary replication group "Headquarters." Servers A, B, and X reside on the same machine, "sardina."

If a client connects to Server A and creates an HTTP session state,

Servers Y and Z are most likely to host the replica of this state, since they reside on separate machines and are members of Server A's preferred secondary group.
Server X holds the next-highest ranking because it is also a member of the preferred replication group (even though it resides on the same machine as the primary.)
Server C holds the third-highest ranking since it resides on a separate machine but is not a member of the preferred secondary group.
Server B holds the lowest ranking, because it resides on the same machine as Server A (and could potentially fail along with A if there is a hardware failure) and it is not a member of the preferred secondary group.

To configure a server's membership in a replication group, or to assign a server's preferred secondary replication group, follow the instructions in Configure Replication Groups.

Accessing Clustered Servlets and JSPs Using a Proxy

This section describes the connection and failover processes for requests that are proxied to clustered servlets and JSPs. For instructions on setting up proxy plug-ins, see Configure Proxy Plug-Ins.

The following figure depicts a client accessing a servlet hosted in a cluster. This example uses a single WebLogic Server to serve static HTTP requests only; all servlet requests are forwarded to the WebLogic Server cluster via the HttpClusterServlet.

Figure 6-2 Accessing Servlets and JSPs using a Proxy

Accessing Servlets and JSPs using a Proxy

Note: The discussion that follows also applies if you use a third-party Web server and WebLogic proxy plug-in, rather than WebLogic Server and the HttpClusterServlet.

Proxy Connection Procedure

When the HTTP client requests the servlet, HttpClusterServlet proxies the request to the WebLogic Server cluster. HttpClusterServlet maintains the list of all servers in the cluster, and the load balancing logic to use when accessing the cluster. In the above example, HttpClusterServlet routes the client request to the servlet hosted on WebLogic Server A. WebLogic Server A becomes the primary server hosting the client's servlet session.

To provide failover services for the servlet, the primary server replicates the client's servlet session state to a secondary WebLogic Server in the cluster. This ensures that a replica of the session state exists even if the primary server fails (for example, due to a network failure). In the example above, Server B is selected as the secondary.

The servlet page is returned to the client through the HttpClusterServlet, and the client browser is instructed to write a cookie that lists the primary and secondary locations of the servlet session state. If the client browser does not support cookies, WebLogic Server can use URL rewriting instead.

Using URL Rewriting to Track Session Replicas

In its default configuration, WebLogic Server uses client-side cookies to keep track of the primary and secondary server that host the client's servlet session state. If client browsers have disabled cookie usage, WebLogic Server can also keep track of primary and secondary servers using URL rewriting. With URL rewriting, both locations of the client session state are embedded into the URLs passed between the client and proxy server. To support this feature, you must ensure that URL rewriting is enabled on the WebLogic Server cluster. For instructions on how to enable URL rewriting, see Using URL Rewriting, in Assembling and Configuring Web Applications.

Proxy Failover Procedure

Should the primary server fail, HttpClusterServlet uses the client's cookie information to determine the location of the secondary WebLogic Server that hosts the replica of the session state. HttpClusterServlet automatically redirects the client's next HTTP request to the secondary server, and failover is transparent to the client.

After the failure, WebLogic Server B becomes the primary server hosting the servlet session state, and a new secondary is created (Server C in the previous example). In the HTTP response, the proxy updates the client's cookie to reflect the new primary and secondary servers, to account for the possibility of subsequent failovers.

Note: Now WebLogic proxy plug-ins randomly pick up a secondary server after the failover.

In a two-server cluster, the client would transparently fail over to the server hosting the secondary session state. However, replication of the client's session state would not continue unless another WebLogic Server became available and joined the cluster. For example, if the original primary server was restarted or reconnected to the network, it would be used to host the secondary session state.

Accessing Clustered Servlets and JSPs with Load Balancing Hardware

To support direct client access via load balancing hardware, the WebLogic Server replication system allows clients to use secondary session states regardless of the server to which the client fails over. WebLogic Server uses client-side cookies or URL rewriting to record primary and secondary server locations. However, this information is used only as a history of the servlet session state location; when accessing a cluster via load balancing hardware, clients do not use the cookie information to actively locate a server after a failure.

The following sections describe the connection and failover procedure when using HTTP session state replication with load balancing hardware.

Connection with Load Balancing Hardware

The following figure illustrates the connection procedure for a client accessing a cluster through a load balancer.

Figure 6-3 Connection with Load Balancing Hardware

Connection with Load Balancing Hardware

When the client of a Web application requests a servlet using a public IP address:

The load balancer routes the client's connection request to a WebLogic Server cluster in accordance with its configured policies. It directs the request to WebLogic Server A.

WebLogic Server A acts as the primary host of the client's servlet session state. It uses the ranking system described in Using Replication Groups to select a server to host the replica of the session state. In the example above, WebLogic Server B is selected to host the replica.

The client is instructed to record the location of WebLogic Server instances A and B in a local cookie. If the client does not allow cookies, the record of the primary and secondary servers can be recorded in the URL returned to the client via URL rewriting.

Note: You must enable WebLogic Server URL rewriting capabilities to support clients that disallow cookies, as described in Using URL Rewriting to Track Session Replicas.

As the client makes additional requests to the cluster, the load balancer uses an identifier in the client-side cookie to ensure that those requests continue to go to WebLogic Server A (rather than being load-balanced to another server in the cluster). This ensures that the client remains associated with the server hosting the primary session object for the life of the session.

Failover with Load Balancing Hardware

Should Server A fail during the course of the client's session, the client's next connection request to Server A also fails, as illustrated in the following figure.

Figure 6-4 Failover with Load Balancing Hardware

Failover with Load Balancing Hardware

In response to the connection failure:

The load balancing hardware uses its configured policies to direct the request to an available WebLogic Server in the cluster. In the above example, assume that the load balancer routes the client's request to WebLogic Server C after WebLogic Server A fails.

When the client connects to WebLogic Server C, the server uses the information in the client's cookie (or the information in the HTTP request if URL rewriting is used) to acquire the session state replica on WebLogic Server B. The failover process remains completely transparent to the client.

WebLogic Server C becomes the new host for the client's primary session state, and WebLogic Server B continues to host the session state replica. This new information about the primary and secondary host is again updated in the client's cookie, or via URL rewriting.

Session State Replication Across Clusters

In addition to providing HTTP session state replication across servers within a cluster, WebLogic server provides the ability to replicate HTTP session state across multiple clusters. This improves high-availability and fault tolerance by allowing clusters to be spread across multiple geographic regions, power grids, and internet service providers. This section discusses the two mechanisms for cross-cluster replication supported by WebLogic Server:

For general information on HTTP session state replication, see HTTP Session State Replication. For more information on using hardware load balancers, see Accessing Clustered Servlets and JSPs with Load Balancing Hardware.

Network Requirements for Cross-cluster Replication

To perform cross-cluster replication with WebLogic Server, your network must include global and local hardware load balancers. Figure 6-5 shows how both types of load balancers interact within a multi-cluster environment to support cross-cluster replication. For general information on using load balancer within a WebLogic Server environment, see Connection with Load Balancing Hardware.

Figure 6-5 Load Balancer Requirements for Cross-cluster Replications

Load Balancer Requirements for Cross-cluster Replications

The following sections describe each of the components in this network configuration.

Global Load Balancer

In a network configuration that supports cross-cluster replication, the global load balancer is responsible for balancing HTTP requests across clusters. When a request comes in, the global load balancer determines which cluster to send it to based on the current number of requests being handled by each cluster. Then the request is passed to the local load balancer for the chosen cluster.

Local Load Balancer

The local load balancer receives HTTP requests from the global load balancer. The local load balancer is responsible for balancing HTTP requests across servers within the cluster.

Replication

In order to replicate session data from one cluster to another, a replication channel must be configured to communicate session state information from the primary to the secondary cluster. The specific method used to replicate session information depends on which type of cross-cluster replication you are implementing. For more information, see MAN HTTP Session State Replication or WAN HTTP Session State Replication.

Failover

When a server within a cluster fails, the local load balancer is responsible for transferring the request to other servers within a cluster. When the entire cluster fails, the local load balancer returns HTTP requests back to the global load balancer. The global load balancer then redirects this request to the other local load balancer.

Configuration Requirements for Cross-Cluster Replication

The following procedures outline the basic steps required to configure cross-cluster replication.

Install WebLogic server according to your network configuration and requirements. This includes installing a WebLogic Server instance on every physical machine that hosts a WLS instance.

Install and configure the hardware load balancers. For more information on load balancer requirements see Network Requirements for Cross-cluster Replication. For more information on installing and configuring load balancers, see the documentation for your load balancer.

Following are some general considerations when configuring hardware load balancers to support cross-cluster replications:

You must configure your load balancer to maintain session ids. If the load balancers do not maintain session id, subsequent requests will always be sent to a new server. For more information, see Connection with Load Balancing Hardware.
You should ensure that the cluster failover timeout value is not set to high. This value should be around 3-5 seconds. Some hardware load balancers have default values that are much longer.
You must configure your load balancer to know which backup cluster to use when a primary cluster or server fails.

Create and configure your domains according to your cluster requirements.

Note: Cross-cluster replication requires that each cluster be assigned to a different domain.

In addition to creating and configuring your domains, you should also create and configure your clusters and managed servers. For information on creating and configuring a domain, see Using WebLogic Tools to Configure a Domain, in Understanding Domain Configuration.
Following are some considerations when configuring domains to support cross-cluster replication:

Each domain should be set up and configured identically. In addition to identical domain, cluster and server configuration, the number of servers clusters, etc. should be identical.
Application deployment should be identical in each domain.
When setting up your domains, you must enable trust between both domains. For more information on enabling trust between domains, see Enabling Trust Between WebLogic Server Domains, in Securing WebLogic Server.

If you are using cross cluster replication in a WAN environment, you must create a datasource that is used to maintain session state. For more information, see Database Configuration for WAN Session State Replication.

After you have created and configured your domains, servers, and clusters you should verify the configuration elements specific to cross-cluster replication have been configured correctly. These parameters must be configured identically for both domains.

The following table lists the subelements of the cluster element in config.xml that are used to configure cross-cluster replication:

Element

Description

cluster-type

This setting must match the replication type you are using and must be consistent across both clusters.

The valid values are man or wan

remote-cluster-address

This is the address used to communicate replication information to the other cluster. This should be configured so that communications between clusters do not go through a load balancer.

replication-channel

This is the network channel used to communicate replication information to the other cluster.

Note: The named channel must exist on all members of the cluster and must be configured to use the same protocol. The selected channel may be configured to use a secure protocol.

data-source-for-session-persistence

This is the data source that is used to store session information when using JDBC-based session persistence.

This method of session state replication is used to perform cross-cluster replication within a WAN. For more information, see Database Configuration for WAN Session State Replication.

session-flush-interval

This is the interval, in seconds, the cluster waits to flush HTTP sessions to the backup cluster.

session-flush-threshold

If the number of HTTP sessions reaches the value of session-flush-threshold, the sessions are

inter-cluster-comm-link-
health-check-interval

This is the amount of time, in milliseconds, that the cluster waits to

Element	Description
cluster-type	This setting must match the replication type you are using and must be consistent across both clusters. The valid values are `man` or `wan`
remote-cluster-address	This is the address used to communicate replication information to the other cluster. This should be configured so that communications between clusters do not go through a load balancer.
replication-channel	This is the network channel used to communicate replication information to the other cluster. Note: The named channel must exist on all members of the cluster and must be configured to use the same protocol. The selected channel may be configured to use a secure protocol.
data-source-for-session-persistence	This is the data source that is used to store session information when using JDBC-based session persistence. This method of session state replication is used to perform cross-cluster replication within a WAN. For more information, see Database Configuration for WAN Session State Replication.
session-flush-interval	This is the interval, in seconds, the cluster waits to flush HTTP sessions to the backup cluster.
session-flush-threshold	If the number of HTTP sessions reaches the value of session-flush-threshold, the sessions are
inter-cluster-comm-link- health-check-interval	This is the amount of time, in milliseconds, that the cluster waits to

Configuring Session State Replication Across Clusters

You can use a third-party replication product to replicate state across clusters, or you can allow WebLogic Server to replicate session state across clusters. The following configuration considerations should be kept in mind depending on which method you use:

If you are using a third-party product, ensure that you have specified a value for jdbc-pool, and that backup-cluster-address is blank.
If you are using WebLogic Server to handle session state replication, you must configure both the jdbc-pool and the backup-cluster-address.

If backup-cluster-address is NULL, WebLogic Server assumes that you are using a third-party product to handle replication. In this case, session data is not persisted to the remote database, but is persisted locally.

MAN HTTP Session State Replication

Resources within a metropolitan area network (MAN) are often in physically separate locations, but are geographically close enough that network latency is not an issue. Network communication in a MAN generally has low latency and fast interconnect. Clusters within a MAN can be installed in physically separate locations which improves availability.

To provide failover within a MAN, WebLogic Server provides an in-memory mechanism that works between two separate clusters. This allows session state to be replicated synchronously from one cluster to another, provided that the network latency is a few milliseconds. The advantage of using a synchronous method is that reliability of in-memory replication is guaranteed.

Note: The performance of synchronous state replication is dependant on the network latency between clusters. You should use this method only if the network latency between the clusters is tolerable.

Replication Within a MAN

This section discusses possible failover scenarios across multiple clusters within a MAN. Figure 5-6 shows a typical multi-cluster environment within a MAN.

Figure 6-6 MAN Replication

MAN Replication

This figure demonstrates the following HTTP session state scenario:

A client makes a request which passes through the global load balancer.

The global load balancer passes the request to a local load balancer based on current system load. In this case, the session request is passed to Local Load Balancer 1.

The local load balancer in turn passes the request to a server within a cluster based on system load, in this case S1. Once the request reaches S1, this managed server becomes the primary server for this HTTP session. This server will handle subsequent requests assuming there are no failures.

After the server establishes the HTTP session, the current session state is replicated to the designated secondary server.

Failover Scenarios in a MAN

The following sections describe various failover scenarios based on the MAN configuration in Figure 5-6.

Failover Scenario 1

If all of the servers in Cluster 1 fail, the global load balancer will automatically fail all subsequent session requests to Cluster 2. All sessions that have been replicated to Cluster 2 will be recovered and the client will experience no data loss.

Failover Scenario 2

Assume that the primary server S1 is being hosted on Cluster 1, and the secondary server S6 is being hosted on Cluster 2. If S1 crashes, then any other server in Cluster 1 (S2 or S3) can pick up the request and retrieve the session data from server S6. S6 will continue to be the backup server.

Failover Scenario 3

Assume that the primary server S1 is being hosted on Cluster 1, and the secondary server S6 is being hosted on Cluster 2. If the secondary server S6 fails, then the primary server S1 will automatically select a new secondary server on Cluster 2. Upon receiving a client request, the session information will be backed up on the new secondary server.

Failover Scenario 4

If the communication between the two clusters fails, the primary server will automatically replicate session state to a new secondary server within the local cluster. Once the communication between the two clusters, any subsequent client requests will be replicated on the remote cluster.

MAN Replication, Load Balancers, and Session Stickiness

MAN replication relies on global load balancers to maintain cluster affinity and local load balancers to maintain server affinity. If a server within a cluster fails, the local load balancer is responsible for ensuring that session state is replicated to another server in the cluster. If all of the servers within a cluster have failed or are unavailable, the global load balancer is responsible for replicating session state to another cluster. This ensures that failover to another cluster does not occur unless the entire cluster fails.

Once a client establishes a connection through a load balancer to a cluster, the client must maintain stickiness to that cluster as long as it is healthy.

WAN HTTP Session State Replication

Resources in a wide area network (WAN) are frequently spread across separate geographical regions. In addition to requiring network traffic to cross long distances, these resources are often separated by multiple routers and other network bottle necks. Network communication in a WAN generally has higher latency and slower interconnect.

Slower network performance within a WAN makes it difficult to use a synchronous replication mechanism like the one used within a MAN. WebLogic Server provides failover across clusters in WAN by using an asynchronous data replication scheme.

Replication Within a WAN

This section discusses possible failover scenarios across multiple clusters within a WAN. Figure 5-7 shows a typical multi-cluster environment within a WAN.

Figure 6-7 WAN Replication

WAN Replication

This figure demonstrates the following HTTP session state scenario:

A client makes a request which passes through the global load balancer.

The global load balancer passes the request to a local load balancer based on current system load. In this case, the session request is passed to Local Load Balancer 1.

After the server establishes the HTTP session, the current session state is replicated to the designated secondary server.

Failover Scenarios Within a WAN

This section describes possible failover scenarios within a WAN environment.

Failover Scenario 1

If all of the servers in Cluster 1 fail, the global load balancer will automatically fail all subsequent session requests to Cluster 2. All sessions will be backed up according to the last know flush to the database.

Failover Scenario 2

Assume that the primary server S1 is being hosted on Cluster 1, and the secondary server S6 is being hosted on Cluster 2. If S1 crashes, then S6 will become the new primary server. The session state will be backed up on a new secondary sever.

Database Configuration for WAN Session State Replication

This section describes the data source configuration requirements for cross-cluster session state replication in a WAN. For more general information about setting up cross-cluster replication, see Configuration Requirements for Cross-Cluster Replication.

To enable cross-cluster replication within a WAN environment, you must create a JDBC data source that points to the database where session state information is stored. Perform the following procedures to setup and configure your database:

Install and configure your database server software according to your vendor's documentation.

Create a JDBC data source that references this database. For more information on creating a JDBC datasource, see Configuring JDBC Data Sources in Configuring and Managing WebLogic JDBC.

This data source can also be configured as a JDBC Multi Data Source. For more information on configuring a Multi Data Source, see Configuring JDBC Multi Data Sources in Configuring and Managing WebLogic JDBC.

Set the DataSourceForSessionPersistence for both the primary and secondary cluster to point to this data source.

Create a table called WLS_WAN_PERSISTENCE in your database according to the following schema:

CREATE TABLE WLS_WAN_PERSISTENCE_TABLE (
  WL_ID VARCHAR2(100) NOT NULL,
  WL_CONTEXT_PATH VARCHAR(50) NOT NULL,
  WL_CREATE_TIME NUMBER(20),
  WL_ACCESS_TIME NUMBER(20),
  WL_MAX_INACTIVE_INTERVAL NUMBER(38),
  WL_VERSION NUMBER(38) NOT NULL,
  WL_INTERNAL_ATTRIBUTE NUMBER(20),
  WL_SESSION_ATTRIBUTE_KEY NUMBER(20),
  WL_SESSION_ATTRIBUTE_VALUE LONG RAW,
  PRIMARY KEY(WL_ID, WL_CONTEXT_PATH,
  WL_VERSION));

The following table describes what each row of this table contains: Database Row Description wl_id Stores the HTTP session ID. wl_context_path Stores the context path to the web application that created the session. wl_create_time Stores the time the session state was created. wl_session_values Stores the session attributes. wl_access_time Stores the time of the last update to the session state. wl_max_inactive_interval Stores the MaxInactiveInterval of the session state. wl_version Stores the version of the session. Each update to a session has an associated version.

Database Row	Description
wl_id	Stores the HTTP session ID.
wl_context_path	Stores the context path to the web application that created the session.
wl_create_time	Stores the time the session state was created.
wl_session_values	Stores the session attributes.
wl_access_time	Stores the time of the last update to the session state.
wl_max_inactive_interval	Stores the `MaxInactiveInterval` of the session state.
wl_version	Stores the version of the session. Each update to a session has an associated version.

Replication and Failover for EJBs and RMIs

For clustered EJBs and RMIs, failover is accomplished using the object's replica-aware stub. When a client makes a call through a replica-aware stub to a service that fails, the stub detects the failure and retries the call on another replica.

With clustered objects, automatic failover generally occurs only in cases where the object is idempotent. An object is idempotent if any method can be called multiple times with no different effect than calling the method once. This is always true for methods that have no permanent side effects. Methods that do have side effects have to be written with idempotence in mind.

Consider a shopping cart service call addItem() that adds an item to a shopping cart. Suppose client C invokes this call on a replica on Server S1. After S1 receives the call, but before it successfully returns to C, S1 crashes. At this point the item has been added to the shopping cart, but the replica-aware stub has received an exception. If the stub were to retry the method on Server S2, the item would be added a second time to the shopping cart. Because of this, replica-aware stubs will not, by default, attempt to retry a method that fails after the request is sent but before it returns. This behavior can be overridden by marking a service idempotent.

Clustering Objects with Replica-Aware Stubs

If an EJB or RMI object is clustered, instances of the object are deployed on all WebLogic Server instances in the cluster. The client has a choice about which instance of the object to call. Each instance of the object is referred to as a replica.

The key technology that supports object clustering objects in WebLogic Server is the replica-aware stub. When you compile an EJB that supports clustering (as defined in its deployment descriptor), appc passes the EJB's interfaces through the rmic compiler to generate replica-aware stubs for the bean. For RMI objects, you generate replica-aware stubs explicitly using command-line options to rmic, as described in WebLogic RMI Compiler, in Programming WebLogic RMI.

A replica-aware stub appears to the caller as a normal RMI stub. Instead of representing a single object, however, the stub represents a collection of replicas. The replica-aware stub contains the logic required to locate an EJB or RMI class on any WebLogic Server instance on which the object is deployed. When you deploy a cluster-aware EJB or RMI object, its implementation is bound into the JNDI tree. As described in Cluster-Wide JNDI Naming Service, clustered WebLogic Server instances have the capability to update the JNDI tree to list all server instances on which the object is available. When a client accesses a clustered object, the implementation is replaced by a replica-aware stub, which is sent to the client.

The stub contains the load balancing algorithm (or the call routing class) used to load balance method calls to the object. On each call, the stub can employ its load algorithm to choose which replica to call. This provides load balancing across the cluster in a way that is transparent to the caller. To understand the load balancing algorithms available for RMI objects and EJBs, see Load Balancing for EJBs and RMI Objects. If a failure occurs during the call, the stub intercepts the exception and retries the call on another replica. This provides a failover that is also transparent to the caller.

Clustering Support for Different Types of EJBs

EJBs differ from plain RMI objects in that each EJB can potentially generate two different replica-aware stubs: one for the EJBHome interface and one for the EJBObject interface. This means that EJBs can potentially realize the benefits of load balancing and failover on two levels:

When a client looks up an EJB object using the EJBHome stub
When a client makes method calls against the EJB using the EJBObject stub

The following sections describe clustering support for different types of EJBs.

Clustered EJBHomes

All bean homes interfaces—used to find or create bean instances—can be clustered, by specifying the determined by the home-is-clusterable element in weblogic-ejb-jar.xml.

Note: Stateless session beans, stateful session beans, and entity beans have home interfaces. Message-driven beans do not.

When a bean is deployed to a cluster, each server binds the bean's home interface to its cluster JNDI tree under the same name. When a client requests the bean's home from the cluster, the server instance that does the look-up returns a EJBHome stub that has a reference to the home on each server.

When the client issues a create() or find() call, the stub routes selects a server from the replica list in accordance with the load balancing algorithm, and routes the call to the home interface on that server. The selected home interface receives the call, and creates a bean instance on that server instance and executes the call, creating an instance of the bean.

Note: WebLogic Server supports load balancing algorithms that provide server affinity for EJB home interfaces. To understand server affinity and how it affects load balancing and failover, see Round-Robin Affinity, Weight-Based Affinity, and Random-Affinity.

Clustered EJBObjects

An EJBObject stub tracks available replicas of an EJB in a cluster.

Stateless Session Beans

When a home creates a stateless bean, it returns a EJBObject stub that lists all of the servers in the cluster, to which the bean should be deployed. Because a stateless bean holds no state on behalf of the client, the stub is free to route any call to any server that hosts the bean. The stub can automatically fail over in the event of a failure. The stub does not automatically treat the bean as idempotent, so it will not recover automatically from all failures. If the bean has been written with idempotent methods, this can be noted in the deployment descriptor and automatic failover will be enabled in all cases.

Note: WebLogic Server supports load balancing options that provide server affinity for stateless EJB remote interfaces. To understand server affinity and how it affects load balancing and failover, see Round-Robin Affinity, Weight-Based Affinity, and Random-Affinity.

Stateful Session Beans

Method-level failover for a stateful service requires state replication. WebLogic Server satisfies this requirement by replicating the state of the primary bean instance to a secondary server instance, using a replication scheme similar to that used for HTTP session state.

When a home interface creates a stateless session bean instance, it selects a secondary instance to host the replicated state, using the same rules defined in Using Replication Groups. The home interface returns a EJBObject stub to the client that lists the location of the primary bean instance, and the location for the replicated bean state.

The following figure shows a client accessing a clustered stateful session EJB.

Figure 6-8 Client Accessing Stateful Session EJB

Client Accessing Stateful Session EJB

As the client makes changes to the state of the EJB, state differences are replicated to the secondary server instance. For EJBs that are involved in a transaction, replication occurs immediately after the transaction commits. For EJBs that are not involved in a transaction, replication occurs after each method invocation.

In both cases, only the actual changes to the EJB's state are replicated to the secondary server. This ensures that there is minimal overhead associated with the replication process.

Note: The actual state of a stateful EJB is non-transactional, as described in the EJB specification. Although it is unlikely, there is a possibility that the current state of the EJB can be lost. For example, if a client commits a transaction involving the EJB and there is a failure of the primary server before the state change is replicated, the client will fail over to the previously-stored state of the EJB. If it is critical to preserve the state of your EJB in all possible failover scenarios, use an entity EJB rather than a stateful session EJB.

Failover for Stateful Session EJBs

Should the primary server fail, the client's EJB stub automatically redirects further requests to the secondary WebLogic Server instance. At this point, the secondary server creates a new EJB instance using the replicated state data, and processing continues on the secondary server.

After a failover, WebLogic Server chooses a new secondary server to replicate EJB session states (if another server is available in the cluster). The location of the new primary and secondary server instances is automatically updated in the client's replica-aware stub on the next method invocation, as shown below.

Figure 6-9 Replica Aware Stubs are Updated after Failover

Replica Aware Stubs are Updated after Failover

Entity EJBs

There are two types of entity beans to consider: read-write entity beans and read-only entity beans.

Read-Write Entities

When a home finds or creates a read-write entity bean, it obtains an instance on the local server and returns a stub pinned to that server. Load balancing and failover occur only at the home level. Because it is possible for multiple instances of the entity bean to exist in the cluster, each instance must read from the database before each transaction and write on each commit.

Read-Only Entities

When a home finds or creates a read-only entity bean, it returns a replica-aware stub. This stub load balances on every call but does not automatically fail over in the event of a recoverable call failure. Read-only beans are also cached on every server to avoid database reads.

Failover for Entity Beans and EJB Handles

Failover for entity beans and EJB handles depends upon the existence of the cluster address. You can explicitly define the cluster address, or allow WebLogic Server to generate it automatically, as described in Cluster Address. If you explicitly define cluster address, you must specify it as a DNS name that maps to all server instances in the cluster and only server instances in the cluster. The cluster DNS name should not map to a server instance that is not a member of the cluster.

Clustering Support for RMI Objects

WebLogic RMI provides special extensions for building clustered remote objects. These are the extensions used to build the replica-aware stubs described in the EJB section. For more information about using RMI in clusters, see WebLogic RMI Features and Guidelines in Programming WebLogic RMI.

Object Deployment Requirements

If you are programming EJBs to be used in a WebLogic Server cluster, read the instructions in this section to understand the capabilities of different EJB types in a cluster. Then ensure that you enable clustering in the EJB's deployment descriptor. weblogic-ejb-jar.xml Deployment Descriptor Reference in Programming WebLogic Enterprise JavaBeans describes the XML deployment elements relevant for clustering.

If you are developing either EJBs or custom RMI objects, also refer to "Using WebLogic JNDI in a Clustered Environment in Programming WebLogic JNDI to understand the implications of binding clustered objects in the JNDI tree.

Other Failover Exceptions

Even if a clustered object is not idempotent, WebLogic Server performs automatic failover in the case of a ConnectException or MarshalException. Either of these exceptions indicates that the object could not have been modified, and therefore there is no danger of causing data inconsistency by failing over to another instance.

Server Migration

Note: Server Migration is not supported on all platforms. See Server Migration in Supported Configurations for WebLogic Server 9.1.

In a WebLogic Server cluster, most services are deployed homogeneously on all the server instances in the cluster, enabling transparent failover from one server to another. In contrast, "pinned services" such as JMS and the JTA transaction recovery system are targeted at individual server instances within a cluster—for these services, WebLogic Server supports failure recovery with migration, as opposed to failover.

In previous releases of WebLogic Server, JMS servers and the JTA transaction recovery system could be migrated manually upon failure of the hosting server instance. This feature is still supported, and is described in Service Migration.

WebLogic Server provides a feature for making JMS and the JTA transaction system highly available: migratable servers. Migratable servers provide for both automatic and manual migration at the server-level, rather than the service level.

Note: Server-level migration is an alternative to service-level migration. Service migration and server migration are not intended to be used in combination. If you migrate an individual service within your cluster, do not migrate an entire server instance.

Note: Server migration is only supported using the SSH version of Node Manager.

A migratable server is a clustered server instance that migrates in its entirety, along with all the services it hosts. Migratable servers are intended to host pinned services, such as JMS servers and the JTA transaction recovery servers, but they can also host clusterable services. All services that run on a migratable server are highly available.

When a migratable server becomes unavailable for any reason, for instance, if it hangs, loses network connectivity, or its host machine fails—migration is automatic. Upon failure, a migratable server is automatically restarted on the same machine if possible. If the migratable server cannot be restarted on the machine where it failed, it is migrated to another machine. In addition, an administrator can manually initiate migration of a server instance.

Setting Up Your Environment for Server Migration

The following considerations apply to setting up your server environment before configuring server migration:

Each managed server must be configured to use the same subnet mask. Multicast communications between servers require that each server be configured to use the same subnet. Server migration will not work without multicast communication.
All servers hosting migratble servers are time synched. Although migration will work when servers are not time synched, it is generally advised to have time synched servers in a clustered environment.
If you are using different operating system versions among migratable servers, you should ensure that all versions support identical functionality for ifconfig.
Interfaces names used by migratble servers should be the same. If your environment requires different interface names, then each migratable server should have its own local version of wlscontrol.sh configured.
You cannot create Channels/NetworkAccessPoints that have a different Listen Address on a migratable server.
There is no built-in mechanism for transferring files that a server depends on between machines. Using a disk that is accessible from all machines is the preferred way to ensure file availability. If you cannot share disks between servers, you must ensure that the contents of domain_dir/bin is copied to each machine. See Use High Availability Storage for State Data.
You must ensure that the Node Manager security files are copied to each machine using the nmEnroll() WLST command.

Configuring Server Migration

Before configuring server migration, ensure that your environment meets the requirements outlined in Setting Up Your Environment for Server Migration.

To configure server migration for a managed server within a cluster, perform the following tasks:

Obtain floating IP addresses for each managed server that will have migration enabled.

Each migratable server must be assigned a floating IP address which follows the server from one physical machine to another after migration. Any server that is assigned a floating IP address must also have AutoMigrationEnabled set to true.

Note: The migratable IP address should not present on the interface of any of the candidate machines before the migratable server is started.

Configure Node Manager. Node manager must be running and configured to allow server migration.

Note: Server migration is only supported using the SSH version of Node Manager.

For general information on using Node Manager in server migration, see Node Manager's Role in Server Migration.

Configure the database for server migration. Server migration requires you to have a database which is used to store leasing information that is used to determine whether or not a server is running or needs to be migrated.

Your database must be reliable. The server instances will only be as reliable as the database is. For experimental purposes, a normal database will suffice. For a production environment, only high-availability databases are recommended. If the database goes down, all the migratable servers will shut themselves down.
Create the leasing tables in the database using the following database schema below. These tables are used to store the machine-server associations used to enable server migration.

( SERVER VARCHAR2(50) NOT NULL, INSTANCE VARCHAR2(100) NOT NULL, DOMAINNAME VARCHAR2(50) NOT NULL, CLUSTERNAME VARCHAR2(50) NOT NULL, TIMEOUT DATE, PRIMARY KEY (SERVER, DOMAINNAME, CLUSTERNAME));
CREATE TABLE ACTIVE_MT ( SERVER VARCHAR2(50) NOT NULL, HOSTMACHINE VARCHAR2(200) NOT NULL, DOMAINNAME VARCHAR2(50) NOT NULL, CLUSTERNAME VARCHAR2(50) NOT NULL, PRIMARY KEY (SERVER, DOMAINNAME, CLUSTERNAME));
Note: The leasing tables should be stored in a highly available database. Migratable servers are only as reliable as the database used to store the leasing table.

Set up and configure a data source. This data source should point to the database configured in the previous procedure. You should set DataSourceForAutomaticMigration to this data source in each cluster configuration.

Note: XA data sources are not supported for server migration.

For more information on creating a JDBC data source, see Configuring JDBC Data Sources in Configuring and Managing WebLogic JDBC.

Grant superuser privileges to the wlsifconfig.sh script.

This script is used to transfer IP addresses from one machine to another during migration. It must be able to run ifconfig, which is generally only available to superusers. You can edit the script so that it is invoked using sudo.
This script is available in the $BEA_HOME/weblogic91/common/bin directory.

Ensure that wlsifconfig.sh, wlscontrol.sh and nodemanager.domains are included in your machines' PATH. These files are located in
$BEA_HOME/weblogic91/common/bin. nodemanager.domains is located in $BEA_HOME/weblogic91/common/nodemanager.

Depending on you default shell, you may need to edit the first line of these scripts.

The machines that host migratable servers must trust each other. For server migration to occur, it must be possible to get to a shell prompt using 'ssh/rsh machine_A' from machine_B and vice versa without having to explicitly enter a username/password. Also, each machine must be able to connect to itself using SSH in the same way.

Note: You should ensure that your login scripts (.cshrc, .profile, .login, etc.) only echo messages from your shell profile if the shell is interactive. WebLogic server uses an ssh command to login and echo the contents of the server.state file. Only the first line of this output is used to determine the server state.

Set the candidate machines for server migration.Each server can have a different set of Candidate machines, or they can all have the same set.

Edit wlscontrol.sh and set the Interface variable to the name of your network interface.

Use High Availability Storage for State Data

The server migration process migrates services, but not the state information associated with work in process at the time of failure.

To ensure high availability, it is critical that such state information remains available to the server instance and the services it hosts after migration. Otherwise, data about the work in process at the time of failure may be lost. State information maintained by a migratable server, such as the data contained in transaction logs, should be stored in a shared storage system that is accessible to any potential machine to which a failed migratable server might be migrated. For highest reliability, use a shared storage solution that is itself highly available—for example, a storage area network (SAN).

In addition, the lease table, described in the following sections, which is used to track the health and liveness of migratable servers should also stored in a high availability database.

Server Migration Processes and Communications

The server migration process involves the following WebLogic Server services and resources:

Migratable Servers—You configure one or more clustered Managed Servers that host pinned services as migratable servers.
Cluster Master—One server instance in a cluster that contains migratable servers acts a the cluster master and orchestrates the process of automatic server migration, in the event of failure. Any Managed Server in a cluster can server as the cluster master, whether it hosts pinned services or not.
Target Machines—A set of machines that are designated as allowable or preferred hosts for migratable servers.
Node Manager—Node Manager is used by the Administration Server or a stand-alone Node Manager client, to start and stop migratable servers, and is invoked by the cluster master to shutdown and restart migratable servers, as necessary.

For background information about Node Manager and how it fits into a WebLogic Server environment, see Using Node Manager to Control WebLogic Server in Configuring WebLogic Server Environments.

Lease table—You configure a database table in which migratable servers persist their state, and which the cluster master monitors to verify the health and liveness migratable servers.
Administration Server—Used to configure migratable servers and target machines, to obtain the runtime state of migratable servers, and to orchestrate the manual migration process.

The sections that follow describe key processes in a cluster that contains migratable servers:

Startup Process in a Cluster with Migratable Servers

Figure 6-10, Startup of Cluster With Migratable Servers, on page 6-37 illustrates the processing and communications that occur during startup of a cluster that contains migratable servers.

The example cluster contains two Managed Servers, both of which are migratable. The Administration Server and the two Managed Servers each run on different machines. A fourth machine is available as a backup—in the event that one of the migratable servers fails. Node Manager running on the backup machine and on each machine with a running migratable server.

Figure 6-10 Startup of Cluster With Migratable Servers

Startup of Cluster With Migratable Servers

These are the key steps that occur during startup of the cluster illustrated in Figure 6-10:

The administrator starts up the cluster.

The Administration Server invokes Node Manager on Machines B and C to start Managed Servers 1 and 2, respectively. For more information, see Administration Server's Role in Server Migration.

The Node Manager on each machine starts up the Managed Server that runs there. For more information, see Node Manager's Role in Server Migration.

Managed Servers 1 and 2 each start up, and contact the Administration Server for their configuration. For more information, see Migratable Server Behavior in a Cluster.

Managed Servers 1 and 2 cache the configuration they started up.

Managed Servers 1 and 2 each obtain a migratable server lease in the lease table. Because Managed Server 1 starts up first, it also obtains a cluster master lease. For more information, see Cluster Master's Role in Server Migration.

Managed Server 1 and 2 periodically renew their leases in the lease table, proving their health and liveness.

Automatic Migration Process

Figure 6-11, Automatic Migration of a Failed Server, on page 6-39 illustrates the automatic migration process after failure of the machine hosting Managed Server 2.

Figure 6-11 Automatic Migration of a Failed Server

Automatic Migration of a Failed Server

Machine C, which hosts Managed Server 2, fails.

Upon its next periodic review of the lease table, the cluster master detects that Managed Server 2's lease has expired. For more information, see Cluster Master's Role in Server Migration.

The cluster master tries to contact Node Manager on Machine C to restart Managed Server 2, but fails, because Machine C is unreachable.

Note: If the Managed Server 2's lease had expired because it was hung, and Machine C was reachable, the cluster master would use Node Manager to restart Managed Server 2 on Machine C.

The cluster master contacts Node Manager on Machine D, which is configured as an available host for migratable servers in the cluster.

Node Manager on Machine D starts Managed Server 2. For more information, see Node Manager's Role in Server Migration.

Managed Server 2 starts up and contacts the Administration Server to obtain its configuration.

Managed Server 2 caches the configuration it started up with.

Managed Server 2 obtains a migratable server lease.

During migration, the clients of the Managed Server that is migrating may experience a brief interruption in service; it may be necessary to reconnect. On Solaris and Linux operating systems, this can be done using ifconfig command. The clients of a migrated server do not need to know the particular machine to which it has migrated.

When a machine from a server that was migrated becomes available again, the reversal of the migration process—migrating the server instance back to its original host machine—is known as failback. WebLogic Server does not automate the process of failback. An administrator can accomplish failback by manually restoring the server instance to its original host.

Manual Migration Process

Figure 6-12, Manual Server Migration, on page 6-41 illustrates what happens when an administrator manually migrates a migratable server.

Figure 6-12 Manual Server Migration

Manual Server Migration

An administrator uses the Administration Console to initiate the migration of Managed Server 2 from Machine C to Machine B.

The Administration Server contacts Node Manager on Machine C. For more information, see Administration Server's Role in Server Migration.

Node Manager on Machine C stops Managed Server 2.

Managed Server 2 removes its row from the lease table.

The Administration Server invokes Node Manager on Machine B.

Node Manager on Machine B starts Managed Server 2.

Managed Server 2 obtains its configuration from the Administration Server.

Managed Server 2 caches the configuration it started up with.

Managed Server 2 adds a row to the lease table.

Administration Server's Role in Server Migration

In a cluster that contains migratable servers, the Administration Server:

Invokes Node Manager, on each machine that hosts cluster members, to start up the migratable serves. This is a prerequisite for server migratability—if a server instance was not initially started by Node Manager, it cannot be migrated.
Invokes Node Manager on each machine involved in a manual migration process to stop and start the migratable server.
Invokes Node Manager on each machine that hosts cluster members to stop server instances during a normal shutdown. This is a prerequisite for server migratability—if a server instance is shutdown directly, without using Node Manager, when the cluster master detects that the server instance is not running, it will call Node Manager to restart it.

In addition, the Administration Server provides its regular domain management functionality, persisting configuration updates issued by an administrator, and providing a run-time view of the domain, including the migratable servers it contains.

Migratable Server Behavior in a Cluster

A migratable server is a clustered Managed Server that has been configured as migratable. These are the key behaviors of a migratable server:

Upon startup and restart by Node Manager, a migratable server adds a row to the lease table. The row for a migratable server contains a timestamp, and the machine where it is running.

Note: There are two tables used in server migration. One table maintains leasing information, while the other keeps track of the machine-server association. For more information on the schema of these tables, see Configuring Server Migration.

When a migratable server adds a row to the database as a result of startup, it tries to take on the role of cluster master, and succeeds if it is the first server instance to join the cluster.
Periodically, the server renews its "lease" by updating the timestamp in the lease table.

By default a migratable server renews its lease every 30,000 milliseconds—the product of two configurable ServerMBean properties:

HealthCheckIntervalMillis, which by default is 10,000.
HealthCheckPeriodsUntilFencing, which by default is 3.

If a migratable server fails to reach the lease table and renew its lease before the lease expires, it terminates as quickly as possible using a Java System.exit—in this case, the lease table still contains a row for that server instance. For information about how this relates to automatic migration, see Cluster Master's Role in Server Migration.
Upon a suspend, a shutdown, or a force shutdown by its local Node Manager process, a migratable server removes the row that it created at startup in the lease table.
During operation, a migratable server listens for heartbeats from the cluster master. When it detects that cluster master is not sending heartbeats, it attempts to take over the role of cluster master, and succeeds if no other server instance has claimed that role.

Node Manager's Role in Server Migration

The use of Node Manager is required for server migration—it must run on each machine that hosts, or is intended to host.

Node Manager supports server migration in these ways:

Node Manager must be used for initial startup of migratable servers.

When you initiate the startup of a Managed Server from the Administration Console, the Administration Server uses Node Manager to start up the server instance. You can also invoke Node Manager to start the server instance using the stand-alone Node Manager client, however, the Administration Server must be available so that the Managed Server can obtain its configuration.

Note: Migration of a server instance that not initially started with Node Manager will fail.

Node Manager must be used for suspend, shutdown, or force shutdown of migratable servers.
Node Manager tries to restart a migratable server whose lease has expired on the machine where it was running at the time of failure.

Node Manager performs the steps in the server migrate process by running customizable shell scripts, provided with WebLogic Server, that start, restart and stop servers; migrate IP addresses; and mount and unmount disks. The scripts are available for Solaris and Linux.

In an automatic migration, the cluster master invokes Node Manager to perform the migration.
In a manual migration, the Administration Server invokes Node Manager to perform the migration.

Cluster Master's Role in Server Migration

In a cluster that contains migratable servers, one server instance acts as the cluster master—whose role is to orchestrate the server migration process. Any server instance in the cluster can serve as the cluster master. When you start a cluster that contains migratable servers, the first server to join the cluster becomes the cluster master and starts up the cluster manager service. If a cluster does not include at least one migratable server, it does not require a cluster manager, and the cluster master service does not start up. In the absence of a cluster master, migratable servers can continue to operate, but server migration is not possible. These are the key functions of the cluster master:

Issue periodic heartbeats to the other servers in the cluster.
Periodically read the lease table to verify that each migratable server has a current lease. An expired lease indicates to the Cluster Master that the migratable server should be restarted.
Upon determining that a migratable server's lease is expired, wait for period specified by the FencingGracePeriodMillis on the ClusterMBean, and then try to invoke the Node Manager process on the machine that hosts the migratable server whose lease is expired, to restart the migratable server.
If unable to restart a migratable server whose lease has expired on its current machine, the cluster master selects a target machine in this fashion:

If you have configured a list of preferred destination machines for the migratable server, the cluster master chooses a machine on that list, in the order the machines are listed.
Otherwise, the cluster master chooses a machine on the list of those configured as available for hosting migratable servers in the cluster.

A list of machines that can host migratable servers can be configured at two levels: for the cluster as a whole, and for an individual migratable server. You can define a machine list at both levels. You must define a machine list at least one level.

To accomplish the migration of a server instance to a new machine, the cluster master invokes the Node Manager process on the target machine to create a process for server instance,.

The time required to perform the migration depends on the server configuration and startup time.

The maximum time taken for cluster master to restart the migratable server is (HealthCheckPeriodsUntilFencing * HealthCheckIntervalMillis) + FencingGracePeriodMillis.
The total time before the server becomes available for client requests depends on the server startup time and the application deployment time.

Service Migration

WebLogic Server supports service-level migration for JMS servers and the JTA transaction recovery service. This document refers to these services as migratable services, because you can move them from one server to another within a cluster. Note that JMS also offers improved service continuity in the event of a single Weblogic Server failure by enabling you to configure multiple physical destinations (queues and topics) as part of a single distributed destination set.

WebLogic Server also supports migration at the server level—a complete server instance, and all of the services it hosts can be migrated to another machine, either automatically, or manually. This feature is described in Server Migration.

Note: The leasing tables should be stored in a highly available database. Migratable servers are only as reliable as the database used to store the leasing table.

In a WebLogic Server cluster, most services are deployed homogeneously on all the server instances in the cluster, enabling transparent failover from one server to another. In contrast, singleton services, such as JMS and the JTA transaction recovery system, run only on one server in the cluster at any given time.

WebLogic Server allows the administrator to migrate singleton services from one server to another in the cluster, either in response to a server failure or as part of regularly-scheduled maintenance. This capability improves the availability of singleton services in a cluster, because those services can be quickly restarted on a redundant server should the host server fail.

How Migration of Pinned Services Works

Clients access a migratable service in a cluster using a migration-aware RMI stub. The RMI stub keeps track of which server currently hosts the pinned service, and it directs client requests accordingly. For example, when a client first accesses a pinned service, the stub directs the client request to the server instance in the cluster that currently hosts the service. If the service migrates to a different WebLogic Server between subsequent client requests, the stub transparently redirects the request to the correct target server.

WebLogic Server implements a migration-aware RMI stub for JMS servers and the JTA transaction recovery service when those services reside in a cluster and are configured for migration.

Migrating a Service When Currently Active Host is Unavailable

There are special considerations when you migrate a service from a server instance that has crashed or is unavailable to the Administration Server. If the Administration Server cannot reach the previously active host of the service at the time you perform the migration, that Managed Server's local configuration information will not be updated to reflect that it is no longer the active host for the service. In this situation, you must purge the unreachable Managed Server's local configuration cache before starting it again. This prevents the previous active host from re-activating at startup a service that has been migrated to another Managed Server. For more information see Migrating When the Currently Active Host is Unavailable.

Defining Migratable Target Servers in a Cluster

By default, WebLogic Server can migrate the JTA transaction recovery service or a JMS server to any other server in the cluster. You can optionally configure a list of servers in the cluster that can potentially host a pinned service. This list of servers is referred to as a migratable target, and it controls the servers to which you can migrate a service. In the case of JMS, the migratable target also defines the list of servers to which you can deploy a JMS server.

For example, the following figure shows a cluster of four servers. Servers A and B are configured as the migratable target for a JMS server in the cluster.

Figure 6-13 Migratable Target in Cluster

Migratable Target in Cluster

In the above example, the migratable target allows the administrator to migrate the pinned JMS server only from Server A to Server B, or vice versa. Similarly, when deploying the JMS server to the cluster, the administrator selects either Server A or B as the deployment target to enable migration for the service. (If the administrator does not use a migratable target, the JMS server can be deployed or migrated to any available server in the cluster.)

WebLogic Server enables you to create separate migratable targets for the JTA transaction recovery service and JMS servers. This allows you to always keep each service running on a different server in the cluster, if necessary. Conversely, you can configure the same selection of servers as the migratable target for both JTA and JMS, to ensure that the services remain co-located on the same server in the cluster.

Failover and JDBC Connections

JDBC is a highly stateful client-DBMS protocol, in which the DBMS connection and transactional state are tied directly to the socket between the DBMS process and the client (driver). For this reason, failover of a connection is not supported. If a WebLogic Server instance dies, any JDBC connections that it managed will die, and the DBMS(s) will roll back any transactions that were under way. Any applications affected will have to restart their current transactions from the beginning. All JDBC objects associated with dead connections will also be defunct. Clustered JDBC eases the reconnection process: the cluster-aware nature of WebLogic data sources in external client applications allow a client to request another connection from them if the server instance that was hosting the previous connection fails.

If you have replicated, synchronized database instances, you can use a JDBC multi data source to support database failover. In such an environment, if a client cannot obtain a connection from one data source in the multi data source because the data source doesn't exist or because database connectivity from the data source is down, WebLogic Server will attempt to obtain a connection from the next data source in the list of data sources.

For instructions on clustering JDBC objects, see Configure Clustered JDBC.

Note: Any data source assigned to a multi data source must be configured to test its connections at reserve time. This is the only way a pool can verify it has a good connection, and the only way a multi data source can know when to fail over to the next pool on its list.