3 Middle-tier High Availability

This chapter describes solutions that are available to protect the Oracle Application Server middle-tier from failures. It contains the following sections:

Section 3.1, "Redundancy"
Section 3.2, "Highly Available Middle-tier Configuration Management Concepts"
Section 3.3, "Middle-tier Backup and Recovery Considerations"

3.1 Redundancy

Oracle Application Server middle-tier can be configured to provide two types of redundancy:

Active-Active
Active-Passive

3.1.1 Active-Active

An Oracle Application Server middle-tier can be made redundant in an active-active configuration with OracleAS Cluster (Middle-Tier). An OracleAS Cluster (Middle-Tier) is a set of middle-tier instances configured to act in active-active configuration to deliver greater scalability and availability than a single instance. Using OracleAS Cluster (Middle-Tier) removes the single point of failure that a single instance poses. While a single Oracle Application Server instance leverages the resources of a single host, a cluster can span multiple hosts, distributing application execution over a greater number of CPUs. A single Oracle Application Server instance is vulnerable to the failure of its host and operating system, but a cluster continues to function despite the loss of an operating system or a host, hiding any such failure from clients

Figure 3-1 presents the various sub-tiers of the Oracle Application Server middle-tier in a redundant active-active configuration. Each sub-tier is configured with redundant processes so that the failure of any of these processes is handled by the sub-tier above the processes such that the failure does not affect incoming requests from clients.

Figure 3-1 Overall Active-Active Architecture for Oracle Application Server Middle Tier

Description of "Figure 3-1 Overall Active-Active Architecture for Oracle Application Server Middle Tier"

The following sub-sections describe features that characterize each sub-tier's active-active configuration:

Section 3.1.1.1, "OracleAS Web Cache"
Section 3.1.1.2, "Oracle HTTP Server"
Section 3.1.1.3, "OC4J"

3.1.1.1 OracleAS Web Cache

You can configure multiple instances of OracleAS Web Cache to run as independent caches, with no interaction with one another. However, to increase the availability and scalability of your Web cache, you can configure multiple OracleAS Web Cache instances to run as members of a cache cluster, called OracleAS Cluster (Web Cache). A cache cluster is a loosely coupled collection of cooperating OracleAS Web Cache instances working together to provide a single logical cache.

Physically, the cache can be distributed over several nodes. If one node fails, a remaining node in the same cluster can fulfill the requests serviced by the failed node. The failure is detected by the remaining nodes in the cluster who take over ownership of the cacheable content of the failed member. The load balancing mechanism in front of the OracleAS Web Cache cluster (for example, a hardware load balancing appliance) redirects the requests to the live OracleAS Web Cache nodes.

OracleAS Web Cache clusters also add to the availability of Oracle Application Server instances. By caching static and dynamic content in front of the Oracle Application Server instances, requests can be serviced by OracleAS Web Cache reducing the need for the requests to be fulfilled by Oracle Application Server instances, particularly for Oracle HTTP Servers. The load and stress on Oracle Application Server instances is reduced, thereby increasing availability of the components in the instances.

OracleAS Web Cache can also perform a stateless or stateful load balancing role for Oracle HTTP Servers . Load balancing is done based on the percentage of the available capacity of each Oracle HTTP Server, or, in other words, the weighted available capacity of each Oracle HTTP Server. If the weighted available capacity is equal for several Oracle HTTP Servers, OracleAS Web Cache uses round robin to distribute the load. Refer to Oracle Application Server Web Cache Administrator's Guide for the formula to calculate weighted available capacity.

Table 3-1 provides a summary of the high availability characteristics of OracleAS Web Cache.

Table 3-1 High Availability Characteristics for OracleAS Web Cache in Active-Active Configurations

Item	Description
Protection from Node Failure	OracleAS Web Cache cluster protects from single point of failure. An external load balancer should be deployed in front of this cluster to route requests to live OracleAS Web Cache nodes.
Protection from Service Failure	In an OracleAS Web Cache cluster, pings are made to a specific URL in each cluster member to ensure that the URL is still serviceable.
Protection from Process Failure	OPMN monitors OracleAS Web Cache processes and restarts them upon process failure.
Automatic Re-routing	OracleAS Web Cache members in a cluster ping each other to verify that peer members are alive or have failed. External load balancers provide failover capabilities for requests routed to OracleAS Web Cache components.
State Replication	OracleAS Web Cache clustering manages cached contents that need to be transferred between OracleAS Web Cache nodes.
Configuration Cloning	OracleAS Web Cache cluster maintains uniform configuration across cluster.

In the case of failure of an Oracle HTTP Server instance, OracleAS Web Cache redistributes the load to the remaining Oracle HTTP Servers and polls the failed server intermittently until it comes back online. Thereafter, OracleAS Web Cache recalculates the load distribution with the revived Oracle HTTP Server in scope.

See Also:

3.1.1.2 Oracle HTTP Server

Oracle HTTP Server and OracleAS Web Cache handle HTTP and HTTPS requests. Each HTTP request is met by a response from Oracle HTTP Server or OracleAS Web Cache, if the content requested is cached.

Oracle HTTP Server routes a request to different plug-in modules depending on the type of request received. These modules in turn delegate the request to different types of processes. The most common modules are mod_oc4j for J2EE applications and mod_plsql for PL/SQL applications. mod_oc4j delegates requests to OC4J processes. mod_plsql delegates requests to database processes. For all these types of requests, no state is required to be maintained in the Oracle HTTP Server processes.

This section covers the following topics:

Section 3.1.1.2.1, "Oracle HTTP Server High Availability Summary"
Section 3.1.1.2.2, "OC4J Load Balancing Using mod_oc4j"
Section 3.1.1.2.3, "Database Load Balancing with mod_plsql"

3.1.1.2.1 Oracle HTTP Server High Availability Summary

Table 3-2 summarizes some of the Oracle Application Server high availability features for Oracle HTTP Server.

Table 3-2 High Availability Characteristics for Oracle HTTP Server

Item	Description
Protection from Node Failure	OracleAS Cluster protects from single point of failure. A load balancer should be deployed in front of Oracle HTTP Server instances. This can be an external load balancer or OracleAS Web Cache.
Protection from Service Failure	Load balancer or OracleAS Web Cache in front of Oracle HTTP Server sends request to another Oracle HTTP Server if first one does not respond or is deemed failed through URL pings. Load balancer can be either OracleAS Web Cache or hardware appliance.
Protection from Process Failure	OPMN monitors Oracle HTTP Server processes and restarts them upon process failure. Each Oracle HTTP Server is also notified by OPMN when another Oracle HTTP Server process in the OracleAS Cluster fails.
Automatic Re-routing	Load balancer or OracleAS Web Cache in front of Oracle HTTP Server auto re-routes to another Oracle HTTP Server if first does not respond.
State Replication	None
Configuration Cloning	OracleAS Cluster allows configuration to be replicated across to other Oracle HTTP Servers in the cluster through DCM.

See Also:

3.1.1.2.2 OC4J Load Balancing Using mod_oc4j

The mod_oc4j Oracle HTTP Server module provides routing for HTTP requests that are handled by OC4J. Whenever a request is received for a URL that matches one of the mount points specified in mod_oc4j.conf, the request is routed to one of the available destinations specified for that URL. A destination can be a single OC4J process, or a set of OC4J instances. If an OC4J process fails, OPMN detects the failure and mod_oc4j does not send requests to the failed OC4J process until the OC4J process is restarted.

Using mod_oc4j configuration options, you can specify different load balancing routing algorithms depending on the type and complexity of routing you need. Stateless requests are routed to any destination available based on the algorithm specified in mod_oc4j.conf. Stateful HTTP requests are forwarded to the OC4J process that served the previous request using session identifiers, unless mod_oc4j determines through communication with OPMN that the process is not available. In this case, mod_oc4j forwards the request to an available OC4J process following the specified load balancing protocol.

Table 3-3 summarizes the routing styles that mod_oc4j provides. For each routing style, Table 3-3 lists the different algorithms that you can configure to modify the routing behavior. These mod_oc4j configuration options determine the OC4J process where mod_oc4j sends incoming HTTP requests to be handled.

See Also:

Section 4.2.6.1, "Using and Configuring mod_oc4j Load Balancing"
Oracle HTTP Server Administrator's Guide for information on using weighted routing and selecting local affinity with mod_oc4j load balancing options.

Table 3-3 mod_oc4j Routing Algorithms Summary

Routing Method	Description
Round Robin	Using the simple round robin configuration, all OC4J processes, remote and local to the application server instance running the Oracle HTTP Server, are placed in an ordered list. Oracle HTTP Server then chooses an OC4J process at random for the first request. For each subsequent request, Oracle HTTP Server forwards requests to another OC4J process in round robin style. The round robin configuration supports local affinity and weighted routing options.
Random	Using the simple random configuration, all OC4J processes, remote and local to the application server instance running the Oracle HTTP Server, are placed in an ordered list. For every request, Oracle HTTP Server chooses an OC4J process at random and forwards the request to that instance. The random configuration supports local affinity and weighted routing options.
Metric-Based	Using the metric-based configuration OC4J processes, remote and local to the application server instance running the Oracle HTTP Server, are placed into an ordered list. OC4J processes then regularly communicate to Oracle HTTP Server how busy they are and Oracle HTTP Server uses this information to send requests to the OC4J processes that are less busy. The overhead in each OC4J node is measured using the runtime performance metrics of OC4J processes. When there are no local OC4J processes available, mod_oc4j routes requests to each OC4J process on different hosts as per their performance metrics only. The metric-based configuration supports a local affinity option.

Routing Method

Description

Round Robin

Using the simple round robin configuration, all OC4J processes, remote and local to the application server instance running the Oracle HTTP Server, are placed in an ordered list. Oracle HTTP Server then chooses an OC4J process at random for the first request. For each subsequent request, Oracle HTTP Server forwards requests to another OC4J process in round robin style.

The round robin configuration supports local affinity and weighted routing options.

Random

Using the simple random configuration, all OC4J processes, remote and local to the application server instance running the Oracle HTTP Server, are placed in an ordered list. For every request, Oracle HTTP Server chooses an OC4J process at random and forwards the request to that instance.

The random configuration supports local affinity and weighted routing options.

Metric-Based

Using the metric-based configuration OC4J processes, remote and local to the application server instance running the Oracle HTTP Server, are placed into an ordered list. OC4J processes then regularly communicate to Oracle HTTP Server how busy they are and Oracle HTTP Server uses this information to send requests to the OC4J processes that are less busy. The overhead in each OC4J node is measured using the runtime performance metrics of OC4J processes. When there are no local OC4J processes available, mod_oc4j routes requests to each OC4J process on different hosts as per their performance metrics only.

The metric-based configuration supports a local affinity option.

OC4J Load Balancing Using Local Affinity and Weighted Routing Options

Using mod_oc4j options, you can select a routing method for routing OC4J requests. If you select either round robin or random routing, you can also use local affinity or weighted routing options. If you select metric-based routing, you can also use the local affinity option.

Using the weighted routing option, a weight is associated with OC4J processes on a node, as configured in mod_oc4j, on a node by node basis. During request routing, mod_oc4j then uses the routing weight to calculate which OC4J process to assign requests to. Thus, OC4J processes running on different nodes can be assigned different weights.

Using the local affinity option, mod_oc4j keeps two lists of available OC4J processes to handle requests, a local list and a remote list. If processes are available from the local list then requests are assigned locally using the random routing method or, for metric-based routing using metric-based routing. If no processes are available in the local list, then mod_oc4j selects processes randomly from the remote list when random method, using a round robin method for the round robin method, or using metric-based routing with the metric-based method.

Choosing a mod_oc4j Routing Algorithm

Table 3-3 summarizes the available routing options. To select a routing algorithm to configure with mod_oc4j, you need to consider the type of environment where Oracle HTTP Server runs. Use the following guidelines to help determine which configuration options to use with mod_oc4j:

For a Oracle Application Server cluster setup, with multiple identical machines running Oracle HTTP Server and OC4J in the same node, the round robin with local affinity algorithm is preferred. Using this configuration, an external router distributes requests to multiple machines running Oracle HTTP Server and OC4J. In this case Oracle HTTP Server gains little by using mod_oc4j to route requests to other machines, except in the extreme case that all OC4J processes on the same machine are not available.
For a tiered deployment, where one tier of machines contains Oracle HTTP Server and another contains OC4J instances that handle requests, the preferred algorithms are simple round robin and simple metric-based. To determine which of these two is best in a specific setup, you may need to experiment with each and compare the results. This is required because the results are dependent on system behavior and incoming request distribution.
For a heterogeneous deployment, where the different application server instances run on nodes that have different characteristics, the weighted round robin algorithm is preferred. Tune the number of OC4J processes running on each application server instance may allow you to achieve the maximum benefit. For example, a machine with a weight of 4 gets 4 times as many requests as a machine with a weight of 1, but if the system with a weight of 4 may not be running 4 times as many OC4J processes.

Metric-based load balancing is useful when there are only a few metrics that dominate the performance of an application. For example, CPU or number of database connections.

See Also:

Section 4.2.6.1, "Using and Configuring mod_oc4j Load Balancing"
Oracle HTTP Server Administrator's Guide for information on using weighted routing and selecting local affinity with mod_oc4j load balancing options.

3.1.1.2.3 Database Load Balancing with mod_plsql

mod_plsql maintains a pool of connections to the database and reuses established database connections for subsequent requests. If there is no response from a database connection in a connection pool, mod_plsql detects this, discards the dead connection, and creates a fresh database connection for subsequent requests.

The dead database connection detection feature of mod_plsql eliminates the occurrence of errors when a database node or instance goes down. This feature is also extremely useful in high availability configurations like Real Application Clusters. If a node in a Real Application Clusters database fails, mod_plsql detects this and immediately starts servicing requests using the other Real Application Clusters nodes. mod_plsql provides different configuration options to satisfy maximum protection or maximum performance needs. By default, mod_plsql tests all pooled database connections which were created prior to the detection of a failure, but it also allows constant validation of all pooled database connections prior to issuing a request.

3.1.1.3 OC4J

The OC4J tier consists of the Oracle Application Server implementation of the J2EE container. This section discusses how the various OC4J components can be made highly available and consists of the following topics:

Section 3.1.1.3.1, "OracleAS Cluster (OC4J)"
Section 3.1.1.3.2, "OC4J Distributed Caching Using Java Object Cache"
Section 3.1.1.3.3, "JMS High Availability"

3.1.1.3.1 OracleAS Cluster (OC4J)

Oracle Application Server provides several strategies for ensuring high availability with OC4J instances, both within an application server instance and across a cluster that includes multiple application server instances.

Besides the high availability features described in this section, other Oracle Application Server features enable OC4J processes to be highly available, including the load balancing feature in Oracle HTTP Server and the Oracle Process Manager and Notification Server system that automatically monitors and restarts processes.

See Also:

The following sections explain the strategies for ensuring high availability for stateful applications in OC4J instances. Overall, there are two strategies:

Web Application Session State Replication with OracleAS Cluster (OC4J)
Stateful Session EJB State Replication with OracleAS Cluster (OC4J)

Web Application Session State Replication with OracleAS Cluster (OC4J)

When a stateful Web application is deployed to OC4J, multiple HTTP requests from the same client may need to access the application. However, if the application running on the OC4J server experiences a problem where the OC4J process fails, the state associated with a client request may be lost. There are two ways to guard against such failures:

State safe applications save their state in a database or other persistent storage system, avoiding the loss of state when the server goes down. Obviously, there is a performance cost for continually writing the application state to persistent storage.

Note:
Saving application state to persistent storage is the application developer's responsibility.
Stateful applications can use OC4J session state replication, with OracleAS Cluster (OC4J), to automatically replicate the session state across multiple processes in an application server instance, and in a cluster, across multiple application instances which may run on different nodes.

An OC4J instance is the entity to which J2EE applications are deployed and configured. An OC4J instance is characterized by a specific set of binaries and configuration files. Several OC4J processes can be started for each OC4J instance. The OC4J process is what executes the J2EE applications for the OC4J instance. Within the application server instance, you can configure multiple OC4J instances, each with its own number of OC4J processes. The advantage of this is for configuration management and application deployment for separate OC4J processes in a cluster.

OC4J processes can be grouped into OracleAS Cluster (OC4J) to support session state replication for the high availability of Web applications. Using an OracleAS Cluster (OC4J) together with mod_oc4j request routing provides stateful failover in the event of a software or hardware problem. For example, if an OC4J process that is part of an OracleAS Cluster (OC4J) fails, mod_oc4j is notified of the failure by OPMN and routes requests to another OC4J process in the same cluster.

Each OC4J instance in a cluster has the following features:

The configuration of the OC4J instance is valid for one or more OC4J processes. This way, you can duplicate the configuration for multiple OC4J processes by managing these processes in the OC4J instance construct. When you modify the cluster-wide configuration within the OC4J instance, the modifications are valid for all OC4J processes.
Each OC4J instance can be configured with one or more OC4J processes.
When you deploy an application to an OC4J instance, all OC4J processes share the same application properties and configuration defined in the OC4J instance. The OC4J instance is also responsible for replicating the state of its applications.
The number of OC4J processes is specific to each OC4J instance. This must be configured for each application server instance in the cluster. The OC4J process configuration provides flexibility to tune according to the specific hardware capabilities of the host. By default, each OC4J instance is instantiated with a single OC4J process.

Web Application Session State Replication Protecting Against Software Problems To guard against software problems, such as OC4J process failure or hang, you can configure an OC4J instance to run multiple OC4J processes in the same OracleAS Cluster (OC4J). The processes in the OracleAS Cluster (OC4J) communicate their session state between each other. Using this configuration provides failover and high availability by replicating state across multiple OC4J processes running on an application server instance.

In the event of a failure, Oracle HTTP Server forwards requests to active (alive) OC4J process within the OracleAS Cluster (OC4J). In this case, the Web application state for the client is preserved and the client does not notice any loss of service.

Figure 3-2 shows this type of software failure within an application server instance and the failover to the surviving process.

Figure 3-2 Web Application Session State Failover Within an OracleAS Cluster (OC4J) in an OC4J instance

Description of "Figure 3-2 Web Application Session State Failover Within an OracleAS Cluster (OC4J) in an OC4J instance"

Web Application Session State Replication Protecting Against Hardware Problems To guard against hardware problems, such as the failure of the node where an application server instance runs, you can configure OracleAS Cluster (OC4J) across application server instances that are in more than one node in an OracleAS Cluster. By configuring an OracleAS Cluster (OC4J) that uses the same name across multiple application server instances, the OC4J processes can share session state information across the OracleAS Cluster (OC4J). When an application server instance fails or is not available, for example, when the node it runs on goes down, Oracle HTTP Server forwards requests to an OC4J process in an application server instance that is available. Thus, Oracle HTTP Server forwards requests only to active (alive) OC4J processes within the cluster.

In this case, the Web application state for the client is preserved and the client does not notice any irregularity.

Figure 3-3 depicts an OracleAS Cluster (OC4J) configured across two Oracle Application Server instances. This configuration allows for web application session state replication failover within an OracleAS Cluster (OC4J).

Figure 3-3 Web Application Session State Failover Within an OracleAS Cluster (OC4J)

Description of "Figure 3-3 Web Application Session State Failover Within an OracleAS Cluster (OC4J)"

Configuring OracleAS Cluster (OC4J) for Web Application Session State Replication To protect against software or hardware failure while maintaining state with the least number of OC4J processes, you need to configure at least two OC4J processes in the same cluster. For example, if you have two application server instances, instance 1 and instance 2, you can configure two OC4J processes in the default_island on each application server instance. With this configuration, stateful session applications are protected against hardware and software failures, and the client maintains state if either of the following types of failures occurs:

If one of the OC4J processes fails, then the client request is redirected to the other OC4J process in the default_island on the same application server instance. State is preserved and the client does not notice any irregularity.
If application server instance 1 terminates abnormally, then the client is redirected to the OC4J process in the default_island on application server instance 2. The state is preserved and the client does not notice any irregularity.

See Also:
Section 4.4.3.1, "Configuring OC4J Islands and OC4J Processes"

Stateful Session EJB State Replication with OracleAS Cluster (OC4J)

Stateful session EJBs can be configured to provide state replication across OC4J processes associated to an application server instance or across an OracleAS Cluster. This EJB replication configuration provides high availability for stateful session EJBs by using multiple OC4J processes to run instances of the same stateful session EJB.

Note:

Use of EJB replication OracleAS Cluster (OC4J-EJB) for high availability is independent of middle-tier OracleAS Clusters and can involve multiple application server instances installed across nodes that are or are not part of middle-tier OracleAS Clusters.

OracleAS Cluster (OC4J-EJB)s provide high availability for stateful session EJBs. They allow for failover of these EJBs across multiple OC4J processes that communicate over the same multicast address. Thus, when stateful session EJBs use replication, this can protect against process and node failures and can provide for high availability of stateful session EJBs running on Oracle Application Server.

See Also:

JNDI Namespace Replication When EJB clustering is enabled, JNDI namespace replication is also enabled between the OC4J instances in a middle-tier OracleAS Cluster. New bindings to the JNDI namespace in one OC4J instance are propagated to other OC4J instances in the middle-tier OracleAS Cluster. Re-bindings and unbindings are not replicated.

The replication is done outside the scope of each OracleAS Cluster (OC4J). In other words, multiple OracleAS Clusters (OC4J) in an OC4J instance have visibility into the same replicated JNDI namespace.

EJB Client Routing In EJB client routing, EJB classes take on the routing functionality that mod_oc4j provides between Oracle HTTP Server and servlets/JSPs. Clients invoke EJBs using the Remote Method Invocation (RMI) protocol. The RMI protocol listener is set up by in the RMI configuration file, rmi.xml, for each OC4J instance. It is separate from the Web site configuration. EJB clients and the OC4J tools access the OC4J server through a configured RMI port. OPMN designates a range of ports that the RMI listener could be using.

When you use the "opmn:ormi://" prefix string in the EJB look up, the client retrieves the assigned RMI port automatically. The load balancing and client request routing is provided by OPMN selecting the different OC4J processes available. The algorithm used for this load balancing is the random algorithm. Multiple OPMN URLs separated by commas can be used for higher availability.

See Also:

The EJB primer section in Oracle Application Server Containers for J2EE User's Guide.

3.1.1.3.2 OC4J Distributed Caching Using Java Object Cache

Oracle Application Server Java Object Cache provides a distributed cache that can serve as a high availability solution for applications deployed to OC4J. The Java Object Cache is an in-process cache of Java objects that can be used on any Java platform by any Java application. It enables applications to share objects across requests and across users, and coordinates the life cycle of the objects across processes.

Java Object Cache enables data replication among OC4J processes even if they do not belong to the same OracleAS Cluster (OC4J), application server instance, or overall Oracle Application Server Cluster.

By using Java Object Cache, performance can be improved because shared Java objects are cached locally, regardless of which application produces the objects. This also improves availability; in the event that the source for an object becomes unavailable, the locally cached version is still available.

See Also:

The Java Object Cache chapter in the Oracle Application Server Web Services Developer's Guide for complete information on using Java Object Cache

3.1.1.3.3 JMS High Availability

Two JMS providers are available with Oracle Application Server. Due to differing implementations, each provider achieves high availability in different ways. As such, they are discussed in two sections:

OracleAS JMS High Availability
Oracle JMS High Availability

Oracle Application Server JMS (OracleAS JMS) is implemented in OC4J. Hence, it utilizes OPMN for process monitoring and restart.

Oracle JMS (OJMS) is implemented through Oracle Streams Advanced Queuing (AQ). It requires the Oracle database and can have active-active availability through the Real Application Clusters database and Transparent Application Failover (TAF) features.

Table 3-4 provides an overview of high availability and configuration characteristics of the two JMS providers. The sections following the table discuss each provider in more detail.

Table 3-4 Comparing High Availability Characteristics of OJMS and OracleAS JMS

Item	OJMS	OracleAS JMS
Process-level High Availability	OPMN (JMS application)	OPMN
Node-level High Availability	Real Application Clusters (AQ, TAF)	OracleAS Cold Failover Cluster (Middle-Tier)
Configuration	Real Application Clusters configuration, resource provider configuration	dedicated JMS server, `jmx.xml` configuration, `opmn.xml` configuration
Message Store	in Real Application Clusters database	in dedicated JMS server/persistence files
Failover	same or different machine (depending on Real Application Clusters setup)	same or different machine only in active-passive configuration with OracleAS Cold Failover Cluster (Middle-Tier) (see Section 3.1.2.1, "OracleAS Cold Failover Cluster (Middle-Tier)")

Note:

The Oracle Application Server Containers for J2EE Services Guide provides detailed information and instructions on setting up OracleAS JMS and OJMS to be highly available. High availability for third-party JMS providers is not discussed as it is provider-specific.

The following sections provide details on how each JMS provider achieves high availability.

OracleAS JMS High Availability

High availability for OracleAS JMS can be achieved by grouping multiple instances of OC4J together in one cluster. This cluster is called OracleAS Cluster (OC4J-JMS). OPMN can be used to monitor and restart OC4J processes in the event of process failure.

OracleAS Cluster (OC4J-JMS) provides an environment wherein JMS applications deployed in this environment can load balance JMS requests across multiple OC4J instances or processes. Redundancy is also achieved as the failure of an OC4J instance with a JMS server does not impact the availability of the JMS service because at least one other OC4J instance is available with a JMS server.

Both the JMS client and the JMS server contain state about each other, which includes information about connections, sessions, and durable subscriptions. Application developers can configure their environment and use a few simple techniques when writing their applications to make them cluster-friendly.

OracleAS Cluster (OC4J-JMS) allows for two configurations:

OracleAS JMS Server Distributed Destinations

This configuration requires multiple OC4J instances. Each instance contains a JMS server, queue destination, and application. There is no inter-process communication between JMS servers, queues, and applications in other OC4J instances. The sender and receiver of each application must be deployed together in an OC4J instance. A message enqueued to the JMS server in one OC4J process can be dequeued only from that OC4J process.

This configuration has the following advantages:
- High throughput is achieved because applications and JMS servers are executing within the same JVMs and no inter-process communication is required.
- There is no single point of failure. As long as one OC4J process is running, requests can be processed.
- Destination objects can be persistent or in-memory. Persistence is file-based.
The disadvantage of this configuration is that there is no failover from one JMS server to another.

Dedicated OracleAS JMS Server

This configuration defines that only one OC4J instance has the dedicated JMS server in an OracleAS Cluster (OC4J-JMS). The OC4J instance with the JMS server handles all messages. Message ordering is always maintained due to the single JMS server. All JMS applications use this dedicated server to host their connection factories and destination, and to service their enqueue and dequeue requests.

Only one OC4J JVM acts as the dedicated JMS server for all JMS applications within the OracleAS Cluster (OC4J-JMS). The single JVM ensures that other JVMs do not attempt to use the same set of persistent files. Other OC4J execute only applications. The single JMS server can be configured by limiting the JMS port range in the opmn.xml file to only one port for the dedicated OC4J instance. The single port value ensures that OPMN always assigns the same port value to the dedicated JMS server. This port value is used to define the connection factory in the jms.xml file that other OC4J instances in the OracleAS Cluster (OC4J-JMS) use to connect to the dedicated JMS server.

Refer to the JMS chapter in the Oracle Application Server Containers for J2EE Services Guide for more information on how to modify the opmn.xml file for this dedicated JMS server configuration.

See Also:

The section "Abnormal Termination" in the Java Message Service chapter of the Oracle Application Server Containers for J2EE Services Guide. This section describes how to manage persistence files when an unexpected failure occurs.

Oracle JMS High Availability

High availability for Oracle JMS (OJMS) can be achieved using a Real Application Clusters database. AQ queues and topics should be available in the Real Application Clusters environment.

Each JMS application in Oracle Application Server uses OC4J resource providers to point to the backend Real Application Clusters database. JMS operations invoked on objects derived from these resources providers are directed to the database.

An OJMS application that uses a Real Application Clusters database must be able to handle database failover scenarios. Two failover scenarios are possible:

Real Application Clusters Network Failover

In the event of the failure of a database instance, a standalone OJMS application running against a Real Application Clusters database must have code to obtain the connection again and to determine if the connection object is invalid or not. The code must reestablish the connection if necessary. Use the API method com.evermind.sql.DbUtil.oracleFatalError() to determine if a connection object is invalid. If invalid, a good strategy is to aggressively roll back transactions and re-create the JMS state, such as connections, session, and messages, that were lost. Refer to the JMS chapter in Oracle Application Server Containers for J2EE Services Guide for a code example.
Transparent Application Failover

For most cases when Transparent Application Failover (TAF) is configured, an OJMS application will not be aware of a failed database instance that it is connected to. Hence, the application code need not perform any tasks to handle the failure.

However, in some cases, OC4J may throw an ORA error when a failure occurs. OJMS passes these errors to the application as a JMSException with a linked SQL exception. To handle these exceptions, the following can be done:
- As in the previous point, "Real Application Clusters Network Failover", provide code to use the method com.evermind.sql.DbUtil.oracleFatalError() to determine if the error is a fatal error. If it is, follow the approach outlined in the previous point. If not, the client can recover by sleeping for a short period of time and then wake up and retry the last operation.
- Failback and transient errors caused by incomplete failover can be recovered from by attempting to use the JMS connection after a short time. Pausing allows the database failover to recover from the failure and reinstate itself.
- In the case of transaction exceptions, such as "Transaction must roll back" (ORA-25402) or "Transaction status unknown" (ORA-25405), the current operation must be rolled back and all operations past the last commit must be retried. The connection is not usable until the cause of the exception is dealt with. If the retry fails, close and re-create all connections and retry all uncommitted operations.

Clustering Best Practices

The following are best practice guidelines for working with clustered JMS servers:

Minimize JMS client-side state.
- Perform work in transacted sessions.
- Save/checkpoint intermediate program state in JMS queues/topics for full recoverability.
- Do not depend on J2EE application state to be serializable or recoverable across JVM boundaries. Always use transient member variables for JMS objects, and write passivate/activate and serialize/deserialize functions that save and recover JMS state appropriately.
Do not use nondurable subscriptions on topics.
- Nondurable topic subscriptions duplicate messages per active subscriber. Clustering and load balancing creates multiple application instances. If the application creates a nondurable subscriber, it causes the duplication of each message published to the topic. This is either inefficient or semantically invalid.
- Use only durable subscriptions for topics. Use queues whenever possible.
Do not keep durable subscriptions alive for extended periods of time.
- Only one instance of a durable subscription can be active at any given time. Clustering and load-balancing creates multiple application instances. If the application creates a durable subscription, only one instance of the application in the cluster succeeds. All other instances fail with a JMSException.
- Create, use, and close a durable subscription in small time/code windows, minimizing the duration when the subscription is active.
- Write application code that accommodates failure to create durable subscription due to clustering (when some other instance of the application running in a cluster is currently in the same block of code) and program appropriate back-off strategies. Do not always treat the failure to create a durable subscription as a fatal error.

3.1.2 Active-Passive

Active-passive high availability for the middle tier is achieved using a cold failover cluster. This is discussed in the following section.

3.1.2.1 OracleAS Cold Failover Cluster (Middle-Tier)

A two-node OracleAS Cold Failover Cluster (Middle-Tier) can be used to achieve active-passive availability for Oracle Application Server middle-tier components. In an OracleAS Cold Failover Cluster (Middle-Tier), one node is active while the other is passive, on standby. In the event that the active node fails, the standby node is activated, and the middle-tier components continue servicing clients from that node. All middle-tier components are failed over to the new active node. No middle-tier components run on the failed node after the failover.

In the OracleAS Cold Failover Cluster (Middle-Tier) solution, a virtual hostname and a virtual IP are shared between the two nodes (the virtual hostname maps to the virtual IP in their subnet). However, only one node, the active node, can use these virtual settings at any one time. When the active node fails and the standby node is made active, the virtual IP is moved to the new active node. All requests to the virtual IP are now serviced by the new active node.

The OracleAS Cold Failover Cluster (Middle-Tier) can use the same machines as the OracleAS Cold Failover Cluster (Infrastructure) solution. In this scenario, two pairs of virtual hostnames and virtual IPs are used, one pair for the middle-tier cold failover cluster and one pair for the OracleAS Cold Failover Cluster (Infrastructure) solution. Figure 9-10 illustrates such a scenario. In this setup, the middle-tier components can fail over independently from the OracleAS Infrastructure.

You can install the Oracle home for the middle tier on a shared storage (which would give you a single Oracle home) or on the local storage of each node (which would give you two separate Oracle homes). Some guidelines:

OracleAS Wireless is not supported on single Oracle home installations. If you need to run OracleAS Wireless, then you need to install the Oracle home locally on each node.
For OracleAS JMS file-based persistence, you need to set up a shared disk to store the persistence files. The Oracle home for the middle tier can be on the local storage or on the shared storage.

No shared storage is required for the middle-tier cold failover cluster, unless you are using OracleAS JMS file-based persistence. However, for operational reasons, it is highly recommended to use a shared storage; otherwise, every administrative change needs to be applied twice, once for each Oracle home.

Each node must have an identical mount point for the middle-tier software. One installation for the middle-tier software must be done on each node, and both installations must have the same local Oracle home path.

Note:

For instructions on installing the OracleAS Cold Failover Cluster (Middle-Tier), see the Oracle Application Server Installation Guide. For instructions on managing it, see Section 4.5, "Managing OracleAS Cold Failover Cluster (Middle-Tier)".

3.1.2.1.1 Managing Failover

The typical deployment expected for the solution is a two-node hardware cluster with one node running the OracleAS Infrastructure and the other running the Oracle Application Server middle-tier. If either node needs to be brought down for hardware or software maintenance or crashes, the surviving node can be brought online, and the OracleAS Infrastructure or Oracle Application Server middle-tier service can be started on this node.However, because a typical middle-tier cold failover deployment does not require any shared storage (except for when OracleAS JMS file persistence is used), alternate deployments may include two standalone machines on a subnet, each with a local installation of the middle-tier software and a virtual IP which can failover between them.

The overall steps for failing over to the standby node are as follows:

Stop the middle-tier service on current primary node (if node is still available).
Fail over the virtual IP to the new primary node.
If OracleAS JMS file based persistence is using a shared disk for the messages, the shared disk is failed over to the new primary node.
Start the middle-tier service on the new primary node.

For failover management, two approaches can be employed:

Automated failover using a cluster manager facility

The cluster manager offers services, which allows development of packages to monitor the state of a service. If the service or the node is found to be down, it automatically fails over the service from one node to the other node. The package can be developed to try restarting the service on a given node before failing over.
Manual failover

For this approach, the failover steps outlined above are executed manually. Since both the detection of the failure and the failover itself is manual, this method may result in a longer period of service unavailability.

3.1.2.1.2 OracleAS JMS in an OracleAS Cold Failover Cluster (Middle-Tier) Environment

OracleAS JMS can be deployed in an active-passive configuration by leveraging the two-node OracleAS Cold Failover Cluster (Middle-Tier) environment. In such an environment, the OC4J instances in the active node provide OracleAS JMS services, while OC4J instances in the passive node are inactive. OracleAS JMS file-based persistence data is stored in a shared disk.

Upon the failure of the active node, the entire middle-tier environment is failed over to the passive node, including the OracleAS JMS services and the shared disk used to persist messages. The OC4J instances in the passive node are started up together with other processes for the middle-tier environment to run. This node is now the new active node. OracleAS JMS requests are then serviced by this node from thereon.

3.2 Highly Available Middle-tier Configuration Management Concepts

This section describes how configuration management can improve high availability for the middle tier. It covers the following:

Section 3.2.1, "OracleAS Clusters Managed Using DCM"
Section 3.2.2, "Manually Managed Oracle Application Server Clusters"

3.2.1 OracleAS Clusters Managed Using DCM

Distributed Configuration Management (DCM) is a management framework that enables you to manage the configurations of multiple Oracle Application Server instances. To administer an OracleAS Cluster that is managed by DCM, you can use either Application Server Control Console or dcmctl commands to manage and configure common configuration information on one Oracle Application Server instance. DCM then replicates the common configuration information across all Oracle Application Server instances within the OracleAS Cluster. The common configuration information for the cluster is called the cluster-wide configuration.

Note:

There is configuration information that can be configured individually, per Oracle Application Server instance within a cluster (these configuration options are also called instance-specific parameters).

This section covers the following:

Section 3.2.1.1, "What is a DCM-Managed OracleAS Cluster?"
Section 3.2.1.2, "Oracle Application Server DCM Configuration Repository Types"

3.2.1.1 What is a DCM-Managed OracleAS Cluster?

A DCM-Managed OracleAS Cluster provides distributed configuration information and enables you to configure multiple Oracle Application Server instances together.

The features of a DCM-Managed OracleAS Cluster include:

Synchronization of configuration across instances in the DCM-Managed OracleAS Cluster.
OC4J distributed application deployment – deploying to one OC4J triggers deployment to all OC4Js.
Distributed diagnostic logging – all members of a DCM-Managed OracleAS Cluster log to same the same log file repository when the log loader is enabled.
A shared OC4J island is setup by default. Replication is not enabled automatically for the applications deployed in the cluster, each application needs to be marked as "distributable" in its web.xml file, and multicast replication needs to be enabled in the replication properties for the OC4J instance.
Load-balancing – Oracle HTTP Server is automatically configured to share load among all DCM-Managed OracleAS Cluster members.
Distributed process control – DCM-Managed OracleAS Cluster membership enables the opmnctl DCM-Managed OracleAS Cluster scope start, stop, and restart commands.

Each application server instance in an DCM-Managed OracleAS Cluster has the same base configuration. The base configuration contains the cluster-wide configuration and excludes instance-specific parameters.

See Also:

For Oracle Application Server high availability, when a system in an Oracle Application Server cluster is down, there is no single point of failure for DCM. DCM remains available on all the available nodes in the cluster.

Using DCM helps reduce deployment and configuration errors in a cluster; these errors could, without using DCM, be a significant cause of system downtime.

Application Server Control Console uses DCM commands to perform configuration and deployment. You can also issue DCM commands using the dcmctl command.

DCM provides the following configuration commands:

Create or remove a cluster
Add or remove application server instances to or from a cluster
Synchronize configuration changes across application server instances

Note the following when making configuration changes to a cluster or deploying applications to a cluster:

If Application Server Control Console is up and managing the cluster, you can invoke the DCM command-line tool from any host where a clustered application server instance is running. The DCM daemon must be running on each node in the cluster.
If Application Server Control Console is not up and managing the cluster, if you want configuration changes to by applied dynamically across the cluster, the DCM daemon must be running on each node in the cluster. To start the DCM daemon, run the DCM command-line tool, dcmctl, on each Oracle Application Server instance in the cluster.

See Also:
Distributed Configuration Management Administrator's Guide

3.2.1.2 Oracle Application Server DCM Configuration Repository Types

Oracle Application Server supports two types of DCM configuration repositories: Database-based and File-based DCM configuration repositories. The DCM configuration repository stores configuration information and metadata related to the instances in an OracleAS Farm, and when the OracleAS Farm contains DCM-Managed OracleAS Clusters, stores both cluster-wide configuration information and instance-specific parameters for instances in DCM-Managed OracleAS Clusters.

An OracleAS Database-based Farm stores repository information and protects configuration information using an Oracle database.
An OracleAS File-based Farm stores repository information in the file system. When the farm contains a DCM-Managed OracleAS Cluster, the DCM configuration repository stores both cluster-wide configuration information and instance-specific parameters. Using an OracleAS File-based Farm, cluster-wide configuration information and related metadata is stored on the file system of an Oracle Application Server instance that is the repository host (host). Oracle Application Server instances that are part of an OracleAS File-based Farm depend on the repository host to store cluster-wide configuration information.

See Also:

Distributed Configuration Management Administrator's Guide

3.2.2 Manually Managed Oracle Application Server Clusters

In a Manually Managed OracleAS Cluster, it is your responsibility to synchronize the configuration of Oracle Application Server instances within the OracleAS Cluster. See Appendix B, "Manually Managed OracleAS Clusters" for details.

3.3 Middle-tier Backup and Recovery Considerations

If a failure occurs in your system, it is important to recover from that failure as quickly as possible. Depending on the type of failure, recovery of a middle-tier installation involves one or both of the following tasks:

restart processes
restore middle-tier files, which include:
- Oracle system files
- Oracle software files
- configuration files

Note:

The Oracle Application Server Administrator's Guide contains all required backup and recovery strategies and procedures.

The restoration of middle-tier files can be done from backups made using procedures described in the "Backup Strategy and Procedures" chapter of the Oracle Application Server Administrator's Guide. The backups encompass both the middle-tier and OracleAS Infrastructure installations and are performed Oracle home by Oracle home. Thus, the restoration of the middle tier is also performed Oracle home by Oracle home. Each restoration can be done on the same node that the backup was taken from or on a new node. The "Recovery Strategies and Procedures" chapter in the Oracle Application Server Administrator's Guide provides details on using backups for recovery.

Restoration of a middle-tier installation on the same node restores the Oracle home, Oracle Application Server configuration files, and DCM repository. The backup of the Oracle home and configuration files is done when performing a complete Oracle Application Server environment backup, which is a backup of the entire Oracle Application Server system. Additionally, any time stamped backups of the configuration files should be restored, if required.

Restoration of a middle-tier installation on a new node requires the restoration of Oracle system files, the middle-tier Oracle home, and configuration files. Because the host is new, the DCM-managed and non DCM-managed components have to be updated with the host information.

See Also: