Oracle® Fail Safe Concepts and Administration Guide
Release 3.3.1 for Windows
Part No. A96684-01
Oracle Fail Safe has a number of configuration options to satisfy whatever architecture or failover requirements you have.
This chapter discusses the following topics:
|Customizing Your Configuration||Section 3.1|
|Disaster-Tolerant High Availability||Section 3.2|
|Integrating Clients and Applications||Section 3.3|
There are four main ways to deploy highly available solutions:
Partitioned workload configurations
While each of these configurations differs in the way work is allocated among the cluster nodes, all share the following features:
One or more Oracle homes are created on a private disk (usually the system disk) on each node.
All necessary Oracle product executables are installed in the Oracle homes on each node.
All data files, configuration files, log files, html files, and so on that are required by the application being made highly available are placed on cluster disks, so that they can be accessed by each cluster node.
The Oracle Services for MSCS software automatically runs as needed on one or more cluster nodes to ensure proper configuration and failover.
Figure 1-4 shows the software and hardware components in a cluster configured with Oracle Fail Safe.
The simplest configuration is an active/passive configuration, in which one or more nodes host the entire cluster workload (such as Oracle database servers, Forms Servers, Reports Servers, or Oracle HTTP Servers), but one node remains idle (as a standby server), ready to take over processing in case a node running an application fails. This solution guarantees that the performance for the fail-safe workload will be the same before and after failover.
Figure 3-1 shows a two-node configuration with Oracle Services for MSCS, an Oracle Forms Server, an Oracle Reports Server, and an Oracle database server running on Node 1, and with Node 2 as a standby node. Currently, nothing is running on Node 2. Node 2 will take over the workload of Node 1 in the event of a failover.
Figure 3-1 Active/Passive (Standby) Two-Node Configuration
Figure 3-2 shows a four-node configuration with Oracle Services for MSCS and an Oracle database server running on Node 1, an Oracle Reports Server and an Oracle database server running on Node 2, and an Oracle Forms Server and an Oracle database server running on Node 3. Node 4 is the standby node. Currently, nothing is running on Node 4. In the event of a failover, Node 4 will take over the failover workload.
Figure 3-2 Active/Passive (Standby) Four-Node Configuration
The active/active configuration is more cost-effective than the active/passive configuration because each node shares the application processing tasks, while also backing up other nodes in the event of a failure. If one node fails, another node must be capable of running its own applications and services as well as those that fail over from the failed node. This configuration provides a flexible architecture that allows you to divide the workload to best meet your business needs.
Figure 3-3 shows a two-node active/active configuration with an Oracle database server running on each cluster node. In addition, an Oracle HTTP Server and an Oracle Forms Server are running on Node 1, and Oracle Services for MSCS, an Oracle HTTP Server, and a Reports Server are running on Node 2. In Figure 3-3, an Oracle database server is used for marketing on Node 1, and an Oracle database server is used for sales on Node 2. The cluster disks owned by Node 1 store the marketing files, and the cluster disks owned by Node 2 store the sales files.
Figure 3-3 Active/Active Configuration
In the active/active configuration, all nodes are actively processing applications during normal operations. This configuration provides better performance (higher throughput) when all nodes are operating, but slower failover and possibly reduced performance when a node fails. Also, the client connections are distributed over all nodes.
Balancing workload means making trade-offs concerning the size of the normal workload on each system. If all systems run at nearly full capacity, then few resources are available to handle the load of another system in an outage, and client systems will experience significantly slower response during and after a failover. If you have the resources to quickly repair or replace a failed system, then the temporary period during which one cluster node serves both workloads will be small; a short period of slow response will be tolerated better than a long one. In fact, some businesses actually prefer having applications run more slowly than usual than to have a period of downtime.
Alternatively, running all systems slightly under 75% to 50% capacity (depending on the number of nodes in the cluster) ensures that clients will experience no loss of response time after a failover, but the equivalent of an entire system can remain idle under normal conditions, much like an active/passive configuration.
Oracle Fail Safe can be configured to avoid some of the performance problems with this type of configuration. For example, you can:
Enable failover only for your mission-critical applications
Use different database parameter files on each node so that fewer system resources are used after a failover
Configure each component (Oracle database server, Oracle Forms Server, Oracle Reports Server, and so on) into a separate group with its own failover and failback policies
This is possible because Oracle Fail Safe allows you to configure each cluster node to host several virtual servers.
Combine the scripting support of Oracle Fail Safe (using the FSCMD command described in Chapter 5) with a system monitoring tool (such as Oracle Enterprise Manager) to automate the movement of groups for load-balancing purposes
Although the nodes do not need to be physically identical, it is wise to select servers with enough power, memory, disk host adapters, and disk drives to support an adequate level of service should failover occur at a busy time of the day.
The partitioned workload configuration is a variation of the active/active configuration described in Section 3.1.2. In a partitioned workload, there might be a database on one node and Oracle applications or third-party applications on another node. If one node fails, another node takes over the workload of the failed node in addition to its own. The advantages and disadvantages are the same as described for the active/active configuration.
Figure 3-4 shows a two-node partitioned workload configuration in which the application and database workloads have been divided so that the database work is served by one node and the application work is served by the other node. This configuration is used to ensure high availability for enterprise applications such as Oracle applications. Also, having independent workloads running on different nodes can maximize throughput. Then, if one node fails, the other node in the cluster serves both the database and application server.
Figure 3-4 Active/Active Partitioned Workload Configuration
In Figure 3-4, an Oracle HTTP Server and an Oracle Forms Server are running on Node 1, and Oracle Services for MSCS and an Oracle database server are running on Node 2. The cluster disks owned by Node 1 store the Web pages and Oracle Forms files, and the cluster disks owned by Node 2 store the database files.
If the private interconnect (heartbeat) has high bandwidth, then the Forms Server might be able to optimize database transaction processing by using the private network, rather than the public network, to communicate with the database instance. Because the bandwidth requirement for internode communication is small, the Oracle HTTP Server can take advantage of what is effectively a dedicated network link to the database instance.
In a multitiered configuration, application tier and database tier components are configured to run on separate cluster nodes or to scale across multiple clusters and systems. For example, you might configure a single back-end database driving a number of Forms and Reports Servers running on multiple systems. The following list suggests how this might be accomplished with Oracle Forms Load Balancer Servers and master Oracle Reports Servers:
Oracle Forms Load Balancer Servers
An Oracle Forms Load Balancer Server is responsible for dividing the workload among the Oracle Forms Servers. In a multitiered configuration, both the Oracle Forms Load Balancer Server and the Web server represent single points of failure that would normally require user intervention to correct. You can eliminate both points of failure by configuring the Oracle Forms Load Balancer Server and the Web server for high availability with Oracle Fail Safe.
Availability can be improved further by adding the database that the Forms Servers access to a group with Oracle Fail Safe.
One of the servers is designated the master and is responsible for dividing the workload among the remaining servers. In a multitiered configuration, both the Oracle Reports Master Server and the Web server represent single points of failure that would normally require user intervention to correct. You can eliminate both points of failure by configuring the Oracle Reports Master Server and Web server for high availability with Oracle Fail Safe.
Availability can be improved further by adding the database that the Reports Servers access to a group with Oracle Fail Safe.
Figure 3-5 shows a three-tiered configuration in which an Oracle Reports Master Server has been made highly available with Oracle Fail Safe. On the database tier, Oracle database server and Oracle Services for MSCS are installed on each cluster node. On the application tier, Oracle Services for MSCS, an Oracle HTTP Server, and the Master Reports Server are installed on each cluster node. The client tier consists of the client applications and Web browsers.
Figure 3-5 Multitiered Oracle Reports Server Configuration
The multitiered configuration also allows for the following:
A flexible architecture that allows multiple clusters and platforms to work together. For example:
Oracle Forms and Oracle Reports Servers that you make highly available on a Windows platform with Oracle Fail Safe can access a back-end Oracle Real Application Clusters database running on a UNIX or Windows NT system.
Servers within the application tier can run on different platforms. Thus, an existing Oracle Reports Server running on a UNIX system only needs to add a single Microsoft cluster running Oracle Fail Safe to make the Oracle Reports Master Server (and thus the entire Reports tier) highly available.
Incremental deployment of high availability into your business solution. This is possible by first adding high availability to the less reliable middle-tier application servers without modifying legacy back-end database systems.
While Oracle Fail Safe provides high availability to single-instance Oracle databases, Oracle Data Guard provides disaster tolerance. For example, Oracle Fail Safe can ensure nearly continuous high availability for a given system, but does not protect against a disaster that incapacitates the site where that system resides. Similarly, while Oracle Data Guard provides excellent disaster recovery features, the time required to switch operations from the primary site to a physically separate site can range from several minutes to hours. By combining Oracle Fail Safe with Oracle Data Guard, your databases can be highly available and disaster tolerant.
A sample configuration with two databases deployed on separate Windows clusters (one in Boston and the other in Dallas) is shown in Figure 3-6. Oracle Data Guard is used to configure one database as the primary database and the other as a physical standby database. Each database is then configured for high availability with Oracle Fail Safe. The primary database in Boston is configured for high availability on a cluster. Its online redo log files are archived locally, and remotely (using Oracle Net) to a physical standby database, which is also configured for high availability, on a cluster in Dallas. (A logical standby database can be used, if desired.)
Many production environments employ multiple standby databases with a combination of synchronous and asynchronous log shipping (for example, a local standby database is kept current with the primary database through synchronous log shipping and several remote standby databases using asynchronous log shipping, which are maintained at different time delays from the primary database).
For detailed information on using Oracle Data Guard with Oracle Fail Safe, see the document entitled "Disaster-Tolerant High Availability: Oracle9i Data Guard with Oracle Fail Safe." This document is available on the Oracle Technology Network Web site. See the Oracle Fail Safe Release Notes for the URL of the Oracle Technology Network.
Figure 3-6 Database Configuration Using Oracle Data Guard for Disaster Tolerance
To operate in an Oracle Fail Safe environment, client applications do not require any special programming or changes. Client applications that work with an Oracle resource on a single node will continue to function correctly in an Oracle Fail Safe environment, without recoding, recompiling, or relinking. This is because clients can use the virtual server to access the application.
Chapters 7 through 14 contain a section specific to how you can integrate clients and applications. Chapter 7 describes how to make your clients and applications transparently fail over when a database fails over to another node in the cluster.