Oracle® Fail Safe Concepts and Administration Guide
Release 3.3.1 for Windows
Part No. A96684-01
Increasingly, businesses expect products and services to be available 24 hours a day, 365 days a year. While no solution can ensure 100% availability, Oracle Fail Safe minimizes the downtime of Oracle databases and other applications running on Microsoft clusters and configured with Microsoft Cluster Server (MSCS).
This chapter discusses the following topics:
|What Is Oracle Fail Safe?||Section 1.1|
|Benefits of Oracle Fail Safe||Section 1.2|
|A Typical Oracle Fail Safe Configuration||Section 1.3|
|Deploying Oracle Fail Safe Solutions||Section 1.4|
Oracle Fail Safe is an easy-to-use software option that works with Microsoft Cluster Server (MSCS) to provide highly available business solutions on Microsoft clusters. A cluster is a configuration of two or more Windows systems that makes them appear to network users as a single, highly available system. Each system in a cluster is referred to as a cluster node.
Oracle Fail Safe works with MSCS cluster software to provide high availability for applications and single-instance databases running on a cluster. When a cluster node fails, the cluster software moves its workload to the surviving node based on parameters that you configure using Oracle Fail Safe. This operation is called a failover.
With Oracle Fail Safe, you can reduce downtime for single-instance Oracle databases, Oracle Forms and Reports Servers, Oracle Applications, and almost any application that can be configured as a Microsoft Windows service.
Oracle Services for MSCS works with the MSCS software to configure fast, automatic failover during planned and unplanned outages for resources that you have configured for high availability. These resources can be the Oracle database server, Oracle Forms Server, Oracle Reports Server, Oracle MTS Service, Web server, or other Windows services (as well as the software and hardware upon which these items depend). Also, Oracle Services for MSCS can attempt to restart a failed software resource so that a failover from one cluster node to another may not be required.
Note:Oracle Services for MSCS was referred to as Oracle Fail Safe Server in previous releases of Oracle Fail Safe.
Oracle Fail Safe Manager provides an easy-to-use interface and wizards that help you to configure and manage cluster resources, and troubleshooting tools that help you to diagnose problems.
Together, these components enable rapid deployment of highly available database, application, and Internet business solutions.
Highly available databases and applications
Easy to use
Easy to integrate with applications
Oracle Fail Safe works with MSCS to configure both hardware and software resources for high availability. Once configured, the multiple nodes in the cluster appear to end users and clients as a single virtual server; end users and client applications connect to a single, fixed network address, called a virtual address, without requiring any knowledge of the underlying cluster. Then, if one node in the cluster becomes unavailable, MSCS moves the workload of the failed node (and client requests) to another node.
For example, the left side of Figure 1-1 shows a two-node cluster configuration where both nodes are available and actively processing transactions. On the surface, this configuration might seem no different from setting up two independent servers, except that the storage subsystem is configured so that the disks are connected physically to both nodes by a shared storage interconnect. Although both nodes are physically connected to the same disks, MSCS ensures that each disk can be owned and accessed by only one node at a time.
The right side of Figure 1-1 shows how, when hardware or software becomes unavailable on one node, its workload automatically moves (fails over) to the surviving node and is restarted, without administrator intervention. During the failover, ownership of the cluster disks is released from the failed server (Node A) and acquired by the surviving server (Node B). If a single-instance Oracle database was running on Node A, Oracle Fail Safe will restart the database instance on Node B. Clients then can access the database through Node B using the same virtual address that they used to access the database when it was hosted by Node A.
Figure 1-1 Failover with Oracle Fail Safe in a Microsoft Cluster
Because of the numerous hardware and software components involved, configuring software and all of its dependent components (for example, disks, IP addresses, network) to work in a cluster can be a complex process. In contrast, Oracle Fail Safe is designed to be easy to install, administer, and use and simplifies configuration of software in a cluster.
You can install Oracle Fail Safe either interactively using Oracle Universal Installer, or in silent mode. With the silent mode installation method, you install software by supplying input to Oracle Universal Installer with a response file. Also, you can perform rolling upgrades of both the operating system and application software. Rolling upgrades minimize downtime by allowing one cluster node to continue hosting the cluster workload while the other system is being upgraded. See the Oracle Fail Safe Installation Guide for more information.
Oracle Fail Safe Manager provides an easy-to-use interface to set up, configure, and manage applications and databases on the cluster. Oracle Fail Safe Manager provides wizards that automate the configuration process and ensure that the configuration is replicated consistently across cluster nodes.
Oracle Fail Safe Manager includes:
Wizards that automate and simplify resource configuration, and drag-and-drop capabilities that help you quickly perform routine system maintenance, such as moving resources across nodes to balance the workload
Online documentation, including a quick tour, a tutorial, help, and manuals available in HTML and PDF formats
Figure 1-2 shows an Oracle Fail Safe Manager window. The left pane displays a tree view showing multiple views (and the current state) of clusters and cluster resources. The right pane displays a property page that lists all groups on the cluster that has been selected from the tree view and the current state of those groups. Depending on the object chosen from the tree view, the display in the right pane changes. When you select a particular cluster, node, group, or resource, the property sheet for that cluster, node, group, or resource is displayed.
Figure 1-2 Oracle Fail Safe Manager
Figure 1-3 shows the Oracle Fail Safe menus and the items within each menu.
Figure 1-3 Oracle Fail Safe Manager Menus and Contents
If you want to configure an existing application to access databases or other applications configured with Oracle Fail Safe, few or no changes are required. Because applications always access cluster resources at the same virtual address, applications treat failover as a quick node reboot.
After a failover occurs, database clients or users must reconnect and replay any transactions that were left undone (such as database transactions that were rolled back during instance recovery). Applications developed with OCI (including ODBC clients that use the Oracle ODBC driver) can take advantage of automatic reconnection after failover. See Section 7.10 for more information.
Oracle Fail Safe solutions can be deployed on any Windows cluster certified by Microsoft for configuration with MSCS.
A typical cluster configuration includes the following hardware and software:
Microsoft cluster nodes, each with one or more local (private) disks where executable application files are installed.
NTFS formatted disks on the shared storage interconnect (SCSI or Fibre Channel). All data files, log files, and other files that need to fail over from one node to another are located on these cluster disks.
Note:See the documentation for your cluster hardware for information on using redundant hardware, such as RAID, to further ensure high availability.
Additional redundant components (UPS, network cards, disk controllers, and so on).
Oracle Services for MSCS
Oracle Fail Safe Manager (installed on one or more cluster nodes, one or more client workstations, or both)
One or more of the following resources that you want to make highly available, such as:
Oracle single-instance database servers
Oracle Forms Load Balancer Servers
Oracle Forms Servers
Oracle Reports Servers
Oracle HTTP Servers
Oracle Applications concurrent managers
Oracle MTS Services (for releases of the Oracle database server prior to 9.0.1)
Oracle applications or third-party applications that can be configured as Windows generic services
See the Oracle Fail Safe Release Notes for information about the supported releases of these components.
Figure 1-4 shows the hardware and software components in a two-node cluster configured with Oracle Fail Safe. Note that the executable application files are installed on a private disk on each cluster node and the application data and log files reside on a shared cluster disk.
Figure 1-4 Hardware and Software Components Configured with Oracle Fail Safe
Oracle Fail Safe works with MSCS to configure resources running on a cluster, to provide fast failover, and to minimize downtime during planned (system upgrades) and unplanned (hardware or software failure) outages.
Clusters provide high availability because they are designed to manage:
Unplanned group failover
Clusters manage unplanned group failovers (failure of hardware or software components) in a way that is transparent to users. When one node on the cluster becomes unavailable, another node temporarily serves both its workload and the workload from the failed node. When a resource fails and cannot be restarted on the current node, another node takes ownership of that resource (and any other resources upon which it depends) and attempts to restart it.
Clusters manage planned group failovers (those which you intentionally initiate, such as when you upgrade software on the cluster). You can fail over the resources to another node, perform a software or hardware upgrade, and then return the resources to the original node. (This is called failing back the resources.) Then, perform the same upgrade process on the other nodes in the cluster.
Oracle Fail Safe also allows you to efficiently use resources in the cluster environment by managing the following:
The cluster nodes can serve separate workloads. For example, one node can host an Oracle database, and the others can host applications.
You can balance resources across the cluster nodes. For example, a database can be moved from a node that is heavily loaded to one that has spare capacity.
Oracle Fail Safe has a variety of deployment options to satisfy a wide range of failover requirements. Chapter 3 explains how to configure an Oracle Fail Safe solution for your business needs, including active/passive solutions, active/active solutions, partitioned workload solutions, and multitiered solutions.