Understanding WebLogic Integration High Availability

A clustered Oracle WebLogic Integration application provides scalability and high availability. A highly available deployment has recovery provisions in the event of hardware or network failures, and provides for the transfer of control to a backup component when a failure occurs.

For recommendations and database-specific requirements for configuring high availability Oracle WebLogic Integration applications, see Maintaining Availability.

About Oracle WebLogic Integration High Availability

For a cluster to provide high availability, it must be able to recover from service failures. Oracle WebLogic Server supports failover for replicated HTTP session states, clustered objects, and services pinned to servers in a clustered environment.

For information about how Oracle WebLogic Server handles such failover scenarios, see Communications in a Cluster in Using WebLogic Server Clusters.

Recommended Hardware and Software

The basic components of a highly available Oracle WebLogic Integration environment include the following:

An administration server.
A set of managed servers in a cluster.
An HTTP load balancer (router).
Physically shared, highly-available disk subsystems for transaction recovery—Transaction logs from a failed server must be available to a managed server in order for migration to occur. A typical and recommended way to do this is by using a multi-ported disk subsystem or SAN, and allowing two or more servers to mount file systems within the disk subsystem. It is not necessary for the file system to be simultaneously shared; it is only necessary for one server to mount a file system at any one time.
An IBM DB2, Microsoft SQL Server, Oracle, or Sybase database—You should take advantage of any high availability or failover solutions offered by your database vendor. (For database-specific information, see your database vendor’s documentation.)

Note:

For information about availability and performance considerations associated with the various types of JDBC drivers, see in Configure JDBC Data sources in Oracle WebLogic Server Administration Console Online Help.

A full discussion of how to plan the network topology of your clustered system is beyond the scope of this section. For information about how to fully utilize load balancing and failover features for your Web application by organizing one or more Oracle WebLogic Server clusters in relation to load balancers, firewalls, and Web servers, see Cluster Architectures in Using WebLogic Server Clusters.

For a simplified view of a cluster, showing the HTTP load balancer, highly available database and multi-ported file system, see the following figure.

Regarding JMS File Stores

The default Oracle WebLogic Integration domain configuration uses a JDBC store for JMS servers. A file store can be used for JMS persistence in cases where a highly available multi-ported disk can be shared between managed servers, as described in the configuration shown in the preceding graphic. This will typically be more performant than a JDBC store.

What Happens When a Server Fails

A server can fail due to either software or hardware problems. The following sections describe the processes that occur automatically in each case and the manual steps that must be taken in these situations.

Software Faults

If a software fault occurs, the Node Manager (if configured to do so) will restart the Oracle WebLogic Server. For information about the Node Manager, see Overview of Node Manager. For information about the steps to take to prepare for recovering a secure installation, see Avoiding and Recovering From Server Failure.

Hardware Faults

If a hardware fault occurs, the physical machine may need to be repaired and could be out of operation for an extended period. In this case, the following events occur:

The HTTP load balancer will detect the failed server and will redirect to other managed servers. (The actual algorithm for doing this will depend on the vendor for the HTTP load balancer.)
All new internal requests, either RMI or JMS, will be redirected to other managed servers (JMS, if using distributed destinations).
All in-flight transactions on the failed server are terminated.
Another managed server can access process state, since it is held in the highly available database server. (A process will be in the state of the last successful transaction commit.)
JMS messages that are already enqueued are not automatically migrated, but must be manually migrated. For more information, see Server Migration.

Server Migration

In the case of a failure of extended duration, it may be necessary to migrate to another, operational managed server. When manually migrating a failed server to another managed server:

The transaction logs from the failed server must be made available to the new migrated server. If you are using a shared disk subsystem, you should mount the file system from the failed server containing the transaction logs on the migrated server.
The server must be manually migrated using the Oracle WebLogic Server Administration Console or, alternatively, through the command line utility.
When JTA is migrated, it will read the TLOGs from the failed server and recover any in-doubt transactions. We recommend that you migrate JTA before migrating JMS.
When JMS is migrated, it will allow access to the messages enqueued on the failed server.
Any “singleton” Message Driven Beans (message driven beans that were tied to physical queues rather than distributed destination) will be activated, if the migrated JMS server contains the physical queues needed by the message driven beans.

For detailed information regarding Oracle WebLogic Server migration, see the following topics in the Oracle WebLogic Server documentation set:

Failover and Replication in a Cluster in Using WebLogic Server Clusters
Avoiding and Recovering From Server Failure

Failure and Recovery: Trading Partner Integration

In addition to the high availability features of Oracle WebLogic Server, Oracle WebLogic Integration has failure and recovery characteristics that are based on the implementation and configuration of your Oracle WebLogic Integration solution.

For more information about Oracle WebLogic Integration failure and recovery topics, see WebLogic Integration Application Recovery in the WebLogic Integration Solutions Best Practices FAQ.

RosettaNet and ebXML handle failure and recovery differently because of differences in the business protocols. However, both protocols send messages that fail to be delivered after the configured number of retries to wli.b2b.failedmessage.queue. If you require additional processing of failed messages, you can implement custom message listeners for this queue.

RosettaNet

When message delivery fails in the case of RosettaNet messages, the Oracle WebLogic Integration protocol layer does not retry messages. It returns HttpStatus code to the workflow layer, instead. RosettaNet workflows are usually designed to handle retries.

The Oracle WebLogic Integration Administration Console enables you to specify retry intervals, retry counts, and process timeouts for various trading partners based on the PIP(s) being used. For example, RosettaNet typically supports three retries at two-hour intervals with an overall 24-hour limit on the life of the actual PIP exchange. For information about changing these settings, see “Viewing and Changing Bindings” in Trading Partner Management in Using the WebLogic Integration Administration Console.

If one instance of Oracle WebLogic Integration sends a message to another instance and the destination instance has failed, you may see one or more error messages followed by a stack trace in the server console.

ebXML

You can specify ebXML message retries using the Oracle WebLogic Integration Administration Console, Trading Partner Management Bulk Loader, or third-party Trading Partner Management message beans. If you set ebXML Delivery Semantics to OnceAndOnlyOnce or AtLeastOnce, messages will be retried according to the values you specify for Retry Count and Retry Interval. For information about using the Oracle WebLogic Integration Administration Console to set ebXML message retries, see “Defining Protocol Bindings” in Trading Partner Management in Using the WebLogic Integration Administration Console.

For ebXML processes, set the action mode value to non-default to guarantee recovery and high availability. For information about setting the action mode, see “ebXML Business Processes” in Introducing ebXML Solutions in Introducing Trading Partner Integration.

Deploying WebLogic Integration Solutions