1 Introduction to Oracle Fail Safe

Increasingly, businesses expect products and services to be available 24 hours a day, 365 days a year. While no solution can ensure 100% availability, Oracle Fail Safe minimizes the downtime of Oracle Databases and other applications running on Microsoft clusters and configured with Microsoft Windows Failover Clusters.

This chapter discusses the following topics:

1.1 Overview of Oracle Fail Safe

Oracle Fail Safe is a user-friendly software that works with Microsoft Windows Failover Clusters to provide highly available business solutions on Microsoft clusters. A cluster is a configuration of two or more Microsoft Windows systems that makes them appear to network users as a single, highly available system. Each system in a cluster is referred to as a cluster node.

Oracle Fail Safe works with Microsoft Windows Failover Clusters software to provide high availability for applications and single-instance databases running on a cluster. When a cluster node fails, the cluster software moves its workload to the surviving node based on parameters that have been configured using the Microsoft Windows Failover Cluster Manager. This operation is called a failover.

With Oracle Fail Safe, you can reduce downtime for single-instance Oracle Databases and almost any application that can be configured as a Microsoft Windows service.

Oracle Fail Safe consists of Oracle Fail Safe Server and Oracle Fail Safe Manager:

  • Oracle Fail Safe Server works with the Microsoft Windows Failover Clusters to configure fast, automatic failover during planned and unplanned outages for resources configured for high availability. These resources can be the Oracle Database, or other Microsoft Windows services (also the software and hardware upon which these items depend). Also, Oracle Fail Safe can attempt to restart a failed software resource so that a failover from one cluster node to another may not be required.

    Note:

    Oracle Fail Safe Server was referred to as Oracle Services for MSCS in previous releases.
  • Oracle Fail Safe Manager provides a user-friendly interface and wizards that help configure and manage cluster resources, and troubleshooting tools that help diagnose problems.

Together, these components enable rapid deployment of highly available database, application, and Internet business solutions.

1.2 Benefits of Oracle Fail Safe

Oracle Fail Safe provides the key benefits discussed in the following sections:

1.2.1 Highly Available Resources and Applications

Oracle Fail Safe works with Microsoft Windows Failover Clusters to configure both hardware and software resources for high availability. Once configured, the multiple nodes in the cluster appear to end users and clients as a single virtual server; end users and client applications connect to a single, fixed network address, called a virtual address, without requiring any knowledge of the underlying cluster. If one node in the cluster becomes unavailable, then Microsoft Windows Failover Clusters moves the workload of the failed node (and client requests) to another node.

For example, the left side of Figure 1-1 shows a two-node cluster configuration where both nodes are available and actively processing transactions. On the surface, this configuration may seem no different from setting up two independent servers, except that the storage subsystem is configured so that the disks are connected physically to both nodes by a shared storage interconnect. Although both nodes are physically connected to the same disks, Microsoft Windows Failover Cluster ensures that each disk can be owned and accessed by only one node at a time.

The right side of Figure 1-1 shows how, when hardware or software becomes unavailable on one node, its workload automatically moves (fails over) to the surviving node and is restarted, without administrator intervention. During the failover, ownership of the cluster disks is released from the failed server (Node A) and acquired by the surviving server (Node B). If a single-instance Oracle Database was running on Node A, then Oracle Fail Safe restarts the database instance on Node B. Clients can then access the database through Node B using the same virtual address that they used to access the database when it was hosted by Node A.

Figure 1-1 Failover with Oracle Fail Safe in a Microsoft Cluster

Description of Figure 1-1 follows
Description of "Figure 1-1 Failover with Oracle Fail Safe in a Microsoft Cluster"

1.2.2 Ease of Use

Because of the numerous hardware and software components involved, configuring software and all of its dependent components (for example, disks, IP addresses, network) to work in a cluster can be a complex process. In contrast, Oracle Fail Safe is designed to be easy to install, administer, and use and simplifies configuration of software in a cluster.

Installation: Using Oracle Universal Installer, install Oracle Fail Safe either interactively or in silent mode. With the silent mode installation method, install software by supplying input to Oracle Universal Installer with a response file. Also, perform rolling upgrades of both the operating system and application software. Rolling upgrades minimize downtime by allowing one cluster node to continue hosting the cluster workload while the other system is being upgraded. See Oracle Fail Safe Installation Guide for Microsoft Windows for more information.

Administration and Use: Oracle Fail Safe Manager provides a user-friendly interface to set up, configure, and manage applications and databases on the cluster. Oracle Fail Safe Manager provides wizards that automate the configuration process and ensure that the configuration is replicated consistently across cluster nodes.

Oracle Fail Safe Manager includes:

  • A tree view of objects that displays multiple views of the same data to help you find information efficiently

  • Wizards that automate and simplify resource configuration, such as moving resources across nodes to balance the workload

  • An integrated family of verification tools that automatically diagnose and fix common configuration problems both before and after configuration

  • Online documentation, including a tutorial, help, and manuals available in HTML and PDF formats

  • A command-line interface (PowerShell) for managing the cluster through batch programs or scripts

Figure 1-2 shows an Oracle Fail Safe Manager window. The left pane displays a tree view showing the Microsoft Windows Failover Cluster Manager and the Oracle Fail Safe Manager. The Oracle Fail Safe Manager has a cluster, that includes the Oracle resources and a group (A group is sometimes referred to as a "service or application" or "clustered role"). The right pane displays the actions associated with Oracle Resources and Available Oracle Resources. The actions listed at the top of the Actions menu are relevant to the currently selected item in the tree view pane on the left. While the actions listed at the bottom of the Actions menu are related to the selected list item (if any) in the middle pane.

Figure 1-2 Oracle Fail Safe Manager

Description of Figure 1-2 follows
Description of "Figure 1-2 Oracle Fail Safe Manager"

Figure 1-3 shows the Oracle Fail Safe menus and the items within each menu.

Figure 1-3 Oracle Fail Safe Manager Menus and Contents

Description of Figure 1-3 follows
Description of "Figure 1-3 Oracle Fail Safe Manager Menus and Contents"

1.2.3 Product Accessibility

Oracle Fail Safe has two user interfaces: the PowerShell cmdlets Command-Line Interface and the Oracle Fail Safe Manager GUI. However, the Oracle Fail Safe Manager GUI is used more widely. The Oracle Fail Safe Manager GUI presents the following three panels:

  • A navigation tree in the left panel

  • The middle panel representing the selected tree view item

  • The right panel showing actions for the selected tree view at the top of the Actions menu list, and actions for the selected list item (if any) chosen from the middle panel, at the bottom of the Actions menu list.

Wizard pages are displayed when the user selects an action that requires multiple steps, such as adding a resource to a group.

Refer to the Microsoft Management Console (MMC) help topic titled "Accessibility for MMC 3.0" for more information regarding the accessibility features of MMC.

1.2.4 Ease of Integration with Applications

To configure an existing application to access databases or other applications configured with Oracle Fail Safe, few or no changes are required. Because applications always access cluster resources at the same virtual address, applications treat failover as a quick node restart.

After a failover occurs, database clients or users must reconnect and replay any transactions that were left undone (such as database transactions that were rolled back during instance recovery). Applications developed with OCI (including ODBC clients that use the Oracle ODBC driver) can take advantage of automatic reconnection after failover. See Section 8.9 for more information.

1.3 Overview of a Typical Oracle Fail Safe Configuration

Oracle Fail Safe solutions can be deployed on any Windows cluster certified by Microsoft for configuration with Microsoft Windows Failover Clusters.

Most clusters are configured similarly, differing only in choice of storage interconnect (Fibre Channel, or SAN) and in the way applications are deployed across the cluster nodes.

A typical cluster configuration includes the following hardware and software:

  • Hardware

    • Microsoft cluster nodes, each with one or more local (private) disks where executable application files are installed.

    • Private (heartbeat) interconnect between the nodes for intracluster communications.

    • Public interconnect (Internet, Intranet, or both) to the local area network (LAN) or wide area network (WAN).

    • NTFS formatted disks on the shared storage interconnect (Fibre Channel, or SAN). All data files, log files, and other files that must fail over from one node to another are located on these cluster disks.

      Note:

      See the documentation for your cluster hardware for information about using redundant hardware, such as RAID, to further ensure high availability.
    • Additional redundant components (UPS, network cards, disk controllers, and so on).

  • Software (installed on each node)

    • Microsoft Windows

    • Oracle Fail Safe

    • Oracle Fail Safe Manager (installed on one or more cluster nodes, one or more client workstations, or both)

    • One or more of the following resources that are highly available, such as:

      • Oracle single-instance databases

      • Oracle Management Agent

      • Oracle applications or third-party applications that can be configured as Windows generic services

See Oracle Fail Safe Release Notes for information about the supported releases of these components.

Figure 1-4 shows the hardware and software components in a two-node cluster configured with Oracle Fail Safe. Note that the executable application files are installed on a private disk on each cluster node and the application data and log files reside on a shared cluster disk.

Figure 1-4 Hardware and Software Components Configured with Oracle Fail Safe

Description of Figure 1-4 follows
Description of "Figure 1-4 Hardware and Software Components Configured with Oracle Fail Safe"

1.4 Deploying Oracle Fail Safe Solutions

Oracle Fail Safe works with Microsoft Windows Failover Clusters to configure resources running on a cluster, to provide fast failover, and to minimize downtime during planned (system upgrades) and unplanned (hardware or software failure) outages.

Clusters provide high availability by managing:

  • Unplanned group failover

    Clusters manage unplanned group failovers (failure of hardware or software components) in a way that is transparent to users. When one node on the cluster becomes unavailable, another node temporarily serves both its own workload and the workload from the failed node. When a resource fails and cannot be restarted on the current node, another node takes ownership of that resource (and any other resources upon which it depends) and attempts to restart it.

  • Planned failover

    Clusters manage planned group failovers (those which you intentionally start, such as when you upgrade software on the cluster). To fail over the resources to another node, perform a software or hardware upgrade, and then return the resources to the original node. (This is called failing back the resources.) Then, perform the same upgrade process on the other nodes in the cluster.

Oracle Fail Safe also ensures efficient use of resources in the cluster environment by managing the following:

  • Independent workloads

    The cluster nodes can serve separate workloads. For example, one node can host an Oracle Database, and the others can host applications.

  • Load balancing

    You can balance resources across the cluster nodes. For example, a database can be moved from a node that is heavily loaded to one that has spare capacity.

Oracle Fail Safe has a variety of deployment options to satisfy a wide range of failover requirements. Chapter 3 explains how to configure an Oracle Fail Safe solution for your business needs, including active/passive solutions and active/active solutions.