8 Scalability, High Availability, and Recovery

This chapter provides an overview of high availability and scalability solutions provided by Oracle Application Server. The topics include:

Scalability
High Availability
Recovery Solutions

Scalability

Scalability is the ability of a system to provide throughput in proportion to, and limited only by, available hardware resources. A scalable system is one that can handle increasing numbers of requests without adversely affecting response time and throughput.

The following sections describe Oracle Application Server components and features you can use to increase the scalability of your system.

Web Cache Clusters

You can configure multiple instances of OracleAS Web Cache to run as independent caches, with no interaction with one another. However, to increase the availability and scalability of your Web Cache, you can configure multiple instances of OracleAS Web Cache to run as members of a cache cluster. A cache cluster is a loosely related set of Web cache instances working together to provide a single logical cache.

Oracle Application Server Containers for J2EE (OC4J) Routing

The mod_oc4j module of the Oracle HTTP Server (OHS) is an intelligent router that routes requests to J2EE instances. It generally deals with stateless requests, since stateful requests are forwarded to the OC4J instance that served the previous request.

There are eight different intelligent routing algorithms that you can use, depending on the type and complexity of routing you need. Table 8-1describes each type of routing and the algorithms available.

Table 8-1 Intelligent Routing Algorithms Matrix

Routing Style	Simple Algorithm	Algorithm with Local Affinity	Algorithm with Routing Weight
Round Robin	All OC4J processes (remote and local) are placed in an ordered list. Oracle HTTP Server chooses an OC4J process at random for the first request. For each subsequent request, OHS forwards requests to another OC4J process in round robin style.	There are two lists of requests, one each for local and remote OC4J processes. OHS always chooses a local process, using the simple random mechanism, until the local process list is empty. Then it selects a process from the remote processes list in a simple round robin style.	Each machine is assigned an integer that represents its ability to service requests. OHS uses this routing weight to decide which machine to route requests to, and chooses an OC4J process in round robin style from the OC4J processes on that machine.
Random	All OC4J processes (remote and local) are placed in an ordered list. For every request, OHS chooses an OC4J process at random and forwards the request to that instance.	There are two lists of requests, one each for local and remote OC4J processes. OHS always chooses a local process, using the simple random mechanism, until the local process list is empty. Then it selects a process from the remote processes list in a simple random style.	Each machine is assigned an integer that represents its ability to service requests. OHS uses this routing weight to decide which machine to route requests to, and chooses an OC4J process at random from the OC4J processes on that machine.
Metric-based	All OC4J processes (remote and local) are placed into an ordered list. Through an asynchronous heartbeat, OC4J processes communicate to OHS how busy they are. OHS uses this information as a routing weight, and sends more requests to the OC4J processes that are less busy.	There are two lists of requests, one each for local and remote OC4J processes. OHS always chooses a local process, using the simple metric mechanism, until the local process list is empty. Then it selects a process from the remote processes list using the simple metric algorithm.	NA

Choosing an Intelligent Routing Algorithm

Each of these intelligent routing algorithms has advantages and drawbacks. Choosing which one to use will depend on the type of environment you are using. Use the following guidelines to help determine which algorithm is best for your enterprise:

For a loose cluster setup (multiple identical machines with OHS and OC4J), the round robin with local affinity algorithm is preferred. In this setup, an external router distributes requests to multiple machines running OHS and OC4J. In this case OHS gains little by routing to other machines, except in the extreme case that all OC4J processes on the same machine are dead.
For a tiered deployment (where one tier of machines contains OHS and another contains OC4J), the preferred algorithms are simple round robin and simple metric-based. To determine which of these two is best in a specific setup, you may need to experiment with each and compare the results. This is required because the results are dependent on system behavior and incoming request distribution.
For a heterogeneous deployment (where the different OC4J machines vary in power), a weighted round robin algorithm is preferred. You may need to tune the number of OC4J processes to achieve the maximum benefit. For example, a machine with a weight of 4 gets 4 times as many requests as a machine with a weight of 1, but may not contain 4 times as many OC4J processes. If the metric computation provides an effective way of modelling the application load, you may also consider using the simple metric-based algorithm.

Enterprise JavaBean (EJB) Client Routing

In EJB client routing, EJB classes take on the routing functionality that mod_oc4j provides for Oracle HTTP Server. Using the Active Components for Java (AC4J) architecture, EJBs can interact in a loosely-coupled fashion. This provides support for reliable asynchronous, disconnected, one-way request and response interactions, without the complexity of JMS programming. It automatically routes service requests to the appropriate service provider, and provides automatic security context propagation, authorization and identity impersonation. It also provides automatic exception routing and handling, which is integrated into the EJB framework.

High Availability

The availability of a system or any component in that system is defined by the percentage of time that it works normally. For example, a system that works normally for twelve hours per day is 50% available. A system that has 99% availability is down 3.65 days per year on average. Critical systems may need to meet exceptionally high availability standards, and experience as little as four to five minutes of downtime per year.

Oracle Application Server is designed to provide a wide variety of high availability solutions, ranging from load balancing and basic clustering to providing maximum system availability during catastrophic hardware and software failures.

The underlying high availability paradigm for Oracle Application Server is clustering. You can create different types of clusters depending on which Oracle Application Server components are involved.

High Availability for Planned Maintenance

Every system requires maintenance, and minimizing downtime during maintenance and upgrade operations is important for ensuring high availability. Oracle Application Server provides the following capabilities to minimize planned downtime for these activities.

Rolling Upgrade

Oracle Application Server supports the upgrade of Infrastructure and middle tier instances from Oracle9i Application Server Release 2 (9.0.2) or (9.0.3) to Oracle Application Server 10g (9.0.4) with minimal operational impact. A new archiving capability enables one system configuration to be captured and then re-applied to another system. This allows one system to be upgraded and tested before its configuration is cloned. Similarly, Web Cache clusters allow incremental update of configuration to the clustered instances.

Cloning

You can clone an existing Oracle Application Server instance, creating an additional instance with the exact same configuration. With cloning, you reduce the possibility of introducing configuration errors during installation and setup of the new instance.

Application High Availability

Oracle Application Server provides several features for ensuring application-level high availability. You can design your applications to take advantage of these capabilities when deployed on Oracle Application Server. This section discusses session replication, session persistence, and distributed caching.

Session Replication

Session replication is a high availability solution that is implemented in Oracle Application Server using Oracle Application Server Containers for J2EE (OC4J) islands.

On the client side, different HTTP requests from the same client can be grouped together with a cookie. The cookie contains enough information to provide continuity between client requests. However, if the server experiences a failure, the state associated with a request or cookie may be lost. There are three ways to guard against this:

State safe applications save the state in a database or other persistent storage system, avoiding the loss of state when the server goes down. However, there is a performance cost to continually writing this information.
Stateless applications do not have a state that needs to be carried between requests, and so are not impacted by the server going down. Any other server is able to handle the request.
Stateful applications benefit from OC4J session state replication, which replicates the session state across all islands in the cluster automatically. This replication takes place when the session is updated.

During OC4J session state replication, if one OC4J process fails, another OC4J process that has the session state replicated to it takes over the application request. When an OC4J process fails during a stateful request, Oracle HTTP Server forwards the request in the following order:

If another OC4J process is active within the same application server instance, OHS forwards the request to this process.
Otherwise, OHS forwards the request to an OC4J process in another application server instance in the cluster.

Session Persistence

Working with EJBs, you can have session persistence, which means that the EJB’s state exists beyond the lifetime of the application or the server process. Entity beans, which represent business objects in a persistent storage mechanism, can have either bean-managed persistence or container-managed persistence. For more information on EJBs and persistence, see Chapter 2, " J2EE, Web Services, and Internet Applications ".

Distributed Cache

Distributed cache is a high availability solution that is implemented using Oracle Application Server Java Object Cache. Java Object Cache is an "in-process" cache of Java objects that can be used on any Java platform by any Java application. It allows applications to share objects across requests and across users, and coordinates the life cycle of the objects across processes.

Java Object Cache in OC4J enables data replication among processes even if they do not share an island, instance, or cluster relationship. This type of replication enhances performance by caching large, shared Java objects, regardless of which application produced them. It also improves availability in the event that the sources of these objects becomes unavailable. Java Object Cache also supports object versioning, allowing different applications to have different versions of an object available.

Oracle Application Server Middle Tier High Availability

You can make Oracle Application Server middle tier instances highly availably using a variety of solutions, each leveraging the availability features of single instances as well as the power of Oracle Application Server clustering.

Oracle Application Server Single Instance Availability Features

An Oracle Application Server Instance (also called an application server instance) is the set of processes required to run the configured components within an application server installation. There can be only one application server instance per application server installation. The terms installation and instance are sometimes used interchangeably; however, it is important to note that an installation is the set of files installed into an Oracle home, while an instance is a set of processes associated with those files.

Each Oracle Application Server instance has features that enhance availability and facilitate the implementation of other high availability solutions. Such features include:

A single instance is the minimum unit that can be used in any type of Oracle Application Server clustering. You can have a single instance associated with an infrastructure, a file-based repository, or neither.

Process death detection and restart: Oracle Process Management and Notification (OPMN) provides process monitoring, process death detection, and process restarting for monitored processes.
Configuration cloning: Distributed Configuration Management (DCM) uses a metadata repository for configuration information, allowing you to manage distributed Oracle Application Server instances.
Data replication: Oracle Application Server Containers for J2EE (OC4J) islands within an instance provide Web-application-level stateful session replication. An OC4J island is a group of OC4J processes within an OC4J cluster that replicate session state among one another. These processes must have the same configuration, which is enforced by having all of the processes be part of the same OC4J instance. You can also replicate data across processes within an Oracle Application Server instance using EJB session replication.
Intelligent routing: Oracle Application Server Web Cache and Oracle HTTP Server (through mod_oc4j) provide configurable and intelligent routing for incoming requests. By communicating with OPMN, mod_oc4j routes requests only to processes and components that it determines to be alive.

Oracle Application Server Clusters

An Oracle Application Server cluster is a set of application server instances configured to act in concert to deliver greater scalability and availability than a single instance can provide. While a single application server instance can only leverage the operating resources of a single host, a cluster can span multiple hosts, distributing application execution over a greater number of CPUs. While a single application server instance is vulnerable to the failure of its host and operating system, a cluster continues to function despite the loss of an operating system or host, hiding any such failure from clients.

Clusters leverage the combined power and reliability of multiple application server instances while maintaining the simplicity of a single application server instance. For example, browser clients of applications running in a cluster interact with the application as if it were running on a single server. With appropriate front-end load balancing, any instance in an application server cluster can serve client requests. This simplifies configuration and deployment across multiple instances and enables fault tolerance among clustered instances.

Types of Oracle Application Server Clusters

Oracle Application Server clusters are able to propagate configuration information across all cluster instances instantly when you change the configuration of any member instance. The propagation is handled through the built-in distributed configuration management (DCM) system.

You can manually configure Oracle Application Server instances to act as a cluster. However, Oracle Application Server provides two ways to manage clusters more effectively.

Centrally-managed Oracle Application Server Clusters propagate configuration information across all application server instances within the cluster, which simplifies configuration and deployment. There are two ways you can choose to manage this type of cluster: using a database repository or using a file-based repository.

Oracle Application Server Clusters Managed Using a Database Repository

Oracle Application Server Clusters managed using a database repository store their metadata and configuration information in the database. This type of cluster requires the Oracle Application Server Infrastructure, since the metadata and configuration information is stored in a metadata repository that resides on an Infrastructure system.

Oracle Application Server Clusters Managed Using a File-Based Repository

Oracle Application Server Clusters managed using a file-based repository designate an application server instance as the repository host. The repository host uses its file system to store the metadata repository that contains the metadata and configuration information for the cluster.

Benefits of Oracle Application Server Clustering

There are three main benefits of using clusters: scalability, availability, and manageability. Oracle Application Server clustering provides scalability by unifying multiple application server instances spread over multiple hosts to collectively serve a single group of applications, making it possible to serve increasing numbers of concurrent users after the capacity of a single piece of hardware is exhausted.

Oracle Application Server clustering enables high availability by removing the single point of failure that a single host poses. It introduces redundancy and failover into the system, and protects against the loss of session state in case of process failure. It ensures that configuration information remains synchronized across instances, and enables intelligent routing with mod_oc4j and session replication through islands.

Oracle Application Server clusters can be managed using Oracle Enterprise Manager Application Server Control. Oracle Application Server managed clustered instances synchronize their configurations automatically, relieving the administrator of the responsibility to manually update each individual instance. Oracle Application Server cluster management simplifies the tasks of creating and administering clusters and reduces the chance of human error corrupting the system.

Oracle Application Server also enables you to create manually configured Oracle Application Server clusters that do not require a metadata repository, and therefore have no database dependency. Manually configured clusters provide scalability and availability, but not manageability. In a manually configured cluster, the application server administrator has the responsibility for synchronizing the configuration of the application server instances.

Improving Availability with an External Load Balancer

You can use an external load balancer to improve the availability of both clustered and non-clustered Oracle Application Server instances.

Clients access the cluster through a load balancer, which hides the cluster configuration. The load balancer can send requests to any application server instance in the cluster, as any instance can service any request. An administrator can raise the capacity of the system by introducing additional application server instances to the cluster.

You can also use a load balancer to increase the availability of non-clusterable Oracle Application Server instances, such as Portal and Wireless. As long as the load balancer is configured to serve a set of instances, it will route requests accordingly.

Types of External Load Balancers

There are three main types of external load balancers you can use with Oracle Application Server instances: hardware load balancers, network load balancers, and Oracle Application Server Web Cache.

Hardware Load Balancer

Hardware load balancing involves placing a hardware load balancer (such as Big-IP or Alteon) in front of a group of Oracle Application Server instances. The hardware load balancer routes requests to the instances in a client-transparent fashion.

Windows Network Load Balancer

With some Windows operating systems, you can use the features of your operating system to perform network load balancing. For example, with Microsoft Advanced Server, the NLB functionality allows you to send requests to different machines that share the same virtual IP or MAC address. The servers themselves to do not need to be clustered at the operating system level.

Leveraging Web Cache as an External Load Balancer

OracleAS Web Cache supports content-aware load balancing and failover detection. These features ensure that cache misses are directed to the most available, highest-performing application server. For more information on OracleAS Web Cache, see Chapter 9, "Performance and Caching ".

Benefits of External Load Balancing

There are three main benefits of using clusters: scalability, availability, and manageability. Load balancing improves scalability by providing an access point through which requests are routed to one of many available instances. Instances can be added to the group that the load balancer serves to accommodate additional users.

Load balancing improves availability by routing requests to the most available instances. If one instance goes down, or is particularly busy, the load balancer can send requests to another active instance.

Load balancing improves manageability by routing application deployment and system configuration requests to the most available instances. If one instance goes down, or is particularly busy, the load balancer sends requests to another active instance.

Improving Availability with Operating System Clusters

This type if cluster involves installing Oracle Application Server on a hardware cluster created through the operating system or other clustering system solutions, such as Veritas. Operating system clustering is now supported in Oracle Application Server 10g (9.0.4). However, using operating system clusters with Oracle Application Server does not have a significant impact on system availability. Leveraging operating system clusters is similar in effect to using a simple external load balancer across a system that does not use operating system clusters.

Oracle Application Server Infrastructure High Availability

This section discusses the Oracle Application Server Infrastructure and strategies for improving Infrastructure availability.

High Availability Requirements for Oracle Application Server Infrastructure

The Oracle Application Server Infrastructure product metadata service, security service, and management service are provided by the following components, which must all be available to guarantee the availability of the Infrastructure:

Oracle Application Server Metadata Repository
Oracle Net listener
Oracle HTTP Server (OHS)
For Oracle Identity Management:
- Oracle Internet Directory and Oracle Internet Directory monitor
- Oracle Application Server Single Sign-On
- Oracle Application Server Certificate Authority
- Oracle HTTP Server for use by Single Sign-On
- OC4J instance for use by Oracle Delegated Administration Services
For Oracle Management Services:
- Oracle Enterprise Manager Application Server Control

For the Infrastructure to provide all essential services, all of these components must be available. Any high availability solution must be able to detect and recover from any software failures of any of the processes associated with the Infrastructure components. Solutions must also be able to detect and recover from any hardware failures on the hosts that are running the Infrastructure.

In Oracle Application Server 10g (9.0.4), all of the Infrastructure processes, except the database and its listener, are started, managed, and restarted by the Oracle Process Management and Notification (OPMN) framework. This means that any failure of an OPMN-managed process is handled internally by OPMN. However, database process failures or database listener failures are not handled by OPMN. Also, failure of any OPMN processes leaves the Infrastructure in a non-resilient mode if the failure is not detected and resolved.

See Also:

Chapter 7, " Oracle Application Server Infrastructure" for a detailed discussion of the Oracle Application Server Infrastructure

Oracle Application Server Cold Failover Clusters

Oracle Application Server Cold Failover Clusters are a high availability solution for Oracle Application Server Infrastructure that protects against system failure. All of the Infrastructure components are installed on a set of shared disks that can be mounted on any of the nodes in a hardware cluster. If the Infrastructure on the active node fails, another node in the cluster restarts the Infrastructure automatically.

In a typical Cold Failover Cluster scenario, instead of the middle tier pointing to a specific node in the cluster, it is configured to point to a virtual hostname. Each node in the cluster has its own private hostname as well, and the active node carries the virtual hostname for the entire cluster.

Oracle Application Server Infrastructure is only active on one node in the cluster at any time. When the Infrastructure on the active node goes down, the cluster software brings up the Infrastructure on one of the inactive nodes, with the same virtual hostname as the failed node. Although there will be some minimal downtime, this allows for faster recovery times on the middle tier, as it does not need to be reconfigured to point to a new Infrastructure. From the perspective of middle tier applications, the new active node in the Cold Failover Cluster is identical to the node that failed.

Note that the inactive nodes in the hardware cluster are only inactive with respect to Oracle Application Server Infrastructure. The inactive nodes can be used for other purposes, such as running middle tier instances or other applications.

Oracle Application Server Middle Tiers in a Cold Failover Cluster Environment

Oracle Application Server 10g (9.0.4) middle tier instances can be installed on the local disks of each node in a hardware cluster. Requests across middle tiers are distributed using an external load balancer. Although this configuration does not provide middle-tier failover, it does increase availability because if a node fails, the middle tier instances on the remaining nodes will continue to function. The Infrastructure will still fail over as designed.

Oracle Application Server Active Failover Clusters

In the initial release of Oracle Application Server 10g (9.0.4), Active Failover Cluster is a Limited Release feature. Please check OracleMetaLink (http://metalink.oracle.com) for the most current certification status of this feature or consult your Sales Representative, before deploying this feature in a production environment.

Active Failover Cluster configurations are similar to Cold Failover Cluster solutions, except that with Active Failover Clusters the Infrastructure is active on all nodes in the cluster at the same time (also known as hot failover). Active Failover Clusters provide a robust cluster architecture for the Oracle Application Server Infrastructure, and are a more transparent high availability solution than Cold Failover Clusters. Because all of the nodes are active, failover from one node to another is quick and requires no manual intervention. This active-active setup also provides scalability to the Infrastructure that is deployed on it.

This configuration leverages the Real Application Cluster (RAC) feature of the Oracle database for running the Infrastructure database. Each node in the hardware cluster contains its own $ORACLE_HOME, which contains the configuration files and binaries needed to run the Infrastructure on that node. The Infrastructure installation across nodes is accomplished in one process. Additionally, all nodes access a set of shared files on the RAC database. The middle tier instances are configured to point to a load balancer in front of the hardware cluster.

Installing the Metadata Repository into an Existing RAC Database

During Infrastructure installation, you can choose to install the Metadata Repository into an existing Oracle database, rather than installing a new database. You can install the Metadata Repository into an existing RAC database to ensure that the Metadata Repository is highly available. However, this only provides high availability for the Metadata Repository; other Infrastructure components must be dealt with separately.

Recovery Solutions

In addition to failing over components to increase availability, once a failure has occurred in your system, it is important to restore the failed component or process as quickly as possible. There are three main types of recovery solutions that you can use, depending on the type and severity of the failure.

Restarting Processes

Recovering from almost all types of failures requires restarting one or more failed processes in your system. There are three process restart scenarios:

Automatic restart of processes: The failed processes are automatically restarted. No manual intervention is required.
Manually restart an individual process: This implies that the process failure does not affect any other middle-tier or infrastructure processes, and can be restarted individually.
Manually restart all processes.

Many types of failures in both the middle tier and the infrastructure only require a process restart solution. Such failures include the death of OPMN, a metadata repository instance failure, or an Oracle Enterprise Manager Application Server Control crash. For details about specific failure types and how to recover, see the Oracle Application Server 10g High Availability Guide

Cold Backup and Restore

Some failures require more involved recovery scenarios than simply restarting processes. In some cases, you will have to perform restoration operations based on backup procedures you have previously implemented. Cold backup and restore operations can be done for both the middle tier and the infrastructure.

Middle tier cold backup and restore: Restoration of the entire Oracle Application Server middle tier, including the Oracle Home, configuration files, and database files, which were backed up after completing a clean and normal shutdown of all Oracle Application Server infrastructure processes and the metadata repository.
Infrastructure cold backup and restore: Restoration of the entire Oracle Application Server infrastructure instance, including the Oracle Home, configuration files, and data base files, which were backed up after completing a clean and normal shutdown of all Oracle Application Server infrastructure processes and the metadata repository.

Failures that require the cold backup and restore solution for recovery include node failure where the node needs to be completely replaced, and the deletion or corruption of Oracle software or binary files. Failures that require this type of recovery solution also then require the manual restart of all processes. For details about specific failure types and how to recover, see the Oracle Application Server 10g Administrator's Guide

Online Backup and Restore

Depending on the type of failure your system is experiencing, you may need to restore your system from an online backup. There are four types of online backup and restore scenarios:

Middle tier online backup and restore: Restoration of the Oracle Application Server configuration files, which were backed up while processes were up and running on the middle tier. This also includes restoring a stamped image, which may require additional steps to complete the restoration.
Infrastructure online backup and restore: Restoration of the Oracle Application Server infrastructure configuration files, which were backed up after completing a proper online backup of the Oracle Application Server instance and metadata repository.
Infrastructure metadata repository online backup and restore: Restoration of the Oracle Application Server infrastructure metadata repository taken from a proper online backup. Complete recovery is required of the database component.
Infrastructure configuration files online backup and restore: Restoration of the Oracle Application Server infrastructure configuration files taken from an online backup.

Failures that require online backup and restore solutions for recovery include data failure in the metadata repository and deletion or corruption of Oracle Application Server component runtime configuration files. Failures that require this type of solution also then require one or more processes to be restarted. For details about specific failure types and how to recover, see the Oracle Application Server 10g Administrator's Guide.

Disaster Recovery

Disaster recovery refers to how a system recovers after a catastrophic site failure. Catastrophic failures include earthquakes, tornadoes, floods, and fires. On the most basic level, disaster recovery involves replicating an entire site, not just pieces of hardware or subcomponents.

The service-level requirements for disaster recovery depend on the business applications you are running. Some applications may not have any disaster recovery requirements. Others may simply have backup data tapes from which they would rebuild a new working site over a period of time. Still others may have requirements to begin operations with a few days or hours after the disaster. The most stringent requirement is to keep the services running despite the disaster.

The Oracle Application Server disaster recovery solution consists of two identically configured sites. Both sites may be dispersed geographically, and if so, they are connected via a network. When the primary site becomes unavailable due to disaster, the secondary site can become operational within a reasonable amount of time.

Client requests are always routed to the site in the production role. After a failover or switchover operation occurs due to an outage, client requests are routed to another site that assumes the production role. Each site contains identical middle tier servers, which are also identical between the two sites.

The site that is in the production role contains a production backend customer database and production Oracle Application Server Metadata Repository configured using the Cold Failover Cluster Infrastructure high availability solution to protect from host failure. The site in the standby role contains a physical standby of the Oracle Application Server Metadata Repository managed by Oracle Data Guard. Database switchover and failover functions allow the roles to be traded between sites.

The Oracle Application Server Cold Failover Cluster high availability solution and Oracle Data Guard provide the basis of the Oracle Application Server Disaster Recovery solution. Using an Oracle Data Guard physical standby database provides disaster recovery and prevention against user error and data corruption. In the Oracle Application Server Disaster Recovery solution, Oracle Data Guard provides automatic log transport services, managed recovery, and role management between the production and physical standby databases containing the OracleAS Metadata Repository.

The Oracle Application Server Disaster Recovery solution is restricted to identical site configurations to ensure that processes and procedures are kept the same between sites, making operational tasks easier to maintain and execute. Identical site configuration also allows for a higher success rate for manually maintaining the synchronization of Oracle Application Server component files between sites.

The sites are configured in an active-passive configuration. This configuration has one primary site used for production and one secondary site that is initially passive. The secondary site is made active only after an application fails or switches over to it. Since the site are symmetric, after an application fails or switches over, the application does not necessarily need to switch back to the original production site.

Distributed Configuration Management Archiving Feature

The Distributed Configuration Management (DCM) archiving and recovery feature allows you to take a snapshot of your system configuration. Taken at a time when everything is working properly and optimally, you can restore the system to this previous configuration in the event of a failure. In response to a catastrophic failure, the snapshot can be restored to a system in a remote location.