4 Management for High Availability

The unique advantage offered by Oracle Fail Safe is its ability to help you easily configure resources in a Windows cluster environment. This chapter discusses the following topics:

For step-by-step procedures to configure standalone resources into groups, and for information about managing those resources once they are in groups, refer to Chapters 4, 4, 4, and 4 in this manual and to Oracle Fail Safe Tutorial and online help.

4.1 What Does It Mean to Configure Failover?

Using Oracle Fail Safe Manager wizards, you can easily configure failover automatically and with minimal work by a network manager. Oracle Fail Safe Manager helps to configure resources into groups so that when one node in a cluster fails, another cluster node immediately takes over the resources in the failed node's groups.

The wizards minimize the risk of introducing configuration problems during implementation and also reduce the level of expertise required to configure resources for high availability. Most policies that you set with the wizards can be modified later with Oracle Fail Safe Manager.

The following list summarizes the basic tasks to perform to implement failover for resources. Except for the first task, you must perform all of these tasks using Oracle Fail Safe Manager.

  1. Ensure that the products that you want to configure with Oracle Fail Safe are properly installed. (This is described in the Oracle Fail Safe Installation Guide.)

  2. Start Oracle Fail Safe Manager.

  3. Verify the cluster.

  4. Create a group.

  5. Add one or more virtual addresses to the group.

  6. If you are adding a standalone Oracle database, then use the Verify Standalone Database tool to verify the database.

  7. Add resources to the group.

  8. Verify the group.

  9. Update any Oracle Net file (such as the tnsnames.ora file) on client systems.

Note:

Depending on the type of resource you are configuring, there may be additional steps or considerations.

Refer to the tutorial and online help in Oracle Fail Safe Manager for step-by-step guidance on using the Oracle Fail Safe Manager wizards.

4.2 How Does Oracle Fail Safe Use the Wizard Input?

Once the wizard collects all the required information, Oracle Fail Safe Manager interacts with Oracle Services for MSCS (which in turn interacts with MSCS) to facilitate a high-availability environment.

Based on the information that you provide with the wizards, Oracle Fail Safe derives any additional information it requires to configure the environment.

Most resources are configured by Oracle Fail Safe using a similar series of steps. Oracle Fail Safe performs the following specific steps to configure a highly available Oracle database:

  1. Configures access to the database using a virtual address:

    1. Configures Oracle Net to use the virtual address or addresses associated with the database on all nodes listed in the possible owner nodes list for the database. (On a two-node cluster, this is both cluster nodes. On clusters that consist of more than two nodes, you are asked to specify the possible owner nodes for a resource as a step in the Add Resource to Group Wizard.)

    2. Duplicates the network configuration information on all nodes in the possible owner nodes list.

  2. Configures the database to:

    1. Verify that all data files used by the database resource are on cluster disks and are not currently used by applications in other groups. If the cluster disks are in another group, but not used by applications in that group, then Oracle Fail Safe moves the disks into the same group with the database resource.

    2. Create the failback policy for the database resources based on choices you made in the wizard.

    3. Populate the group with these resources:

      • Each disk resource used by the cluster group

      • Oracle database

      • Oracle Net listener

  3. Performs the following steps on each of the possible owner nodes for the group to which the database has been added, one at a time:

    1. Creates an Oracle instance with the same name on the node.

    2. Verifies that the node can bring the database online and offline by failing it over to the node to ensure that the failover policy works.

  4. Shuts down Oracle database after testing failover on all nodes in the possible owner nodes list. If the preferred owner node list is empty, then the group remains on the last node to which it was failed over as part of the configuration process.

By performing these steps, Oracle Fail Safe ensures that the resource is correctly configured and capable of failing over and failing back to all possible owner nodes of the group to which it has been added.

Figure 4-1 shows a two-node active/active cluster configuration in which each node hosts a group with a database.

Figure 4-1 Virtual Servers and Addressing in an Oracle Fail Safe Environment

Description of Figure 4-1 follows
Description of "Figure 4-1 Virtual Servers and Addressing in an Oracle Fail Safe Environment"

The virtual servers (A and B) and their network addresses are known by all clients and cluster nodes. The listener.ora file on each cluster node and the tnsnames.ora file on each client workstation contain the network name and address information for each virtual server.

For failover to work properly, the host name (virtual address), database instance, SID entry, and protocol information in each tnsnames.ora and listener.ora file must match on each server node that is a possible owner of the resources in the group and the client system.

For example, during normal operations, Virtual Server A is active on Node A. Node B is the failover node for Virtual Server A. The cluster disks are connected to both nodes so that resources can run on either node in the cluster, but service for the resources in each group is provided by only one cluster node at a time.

If a system failure occurs on Node A, then Group 1 becomes active on Node B using the same virtual address and port number as it had on Node A. Node B takes over the workload from Node A transparently to clients, which continue to access Group 1 using Virtual Server A and Group 2 using Virtual Server B. Clients continue to access the resources in a group using the same virtual server name and address, without considering the physical node that is serving the group.

4.3 Managing Cluster Security

To accomplish administrative tasks associated with Oracle Fail Safe, you need the appropriate privileges to manage Oracle resources and applications and to perform operations through Oracle Fail Safe Manager.

Table 4-1 provides a quick reference for the privileges required for the services you use in an Oracle Fail Safe environment. For more information, refer to the sections listed in the last column.

Table 4-1 Permissions and Privileges

Service Required Privileges Reference

Oracle Services for MSCS

Domain user account that has Administrator privileges on all cluster nodes

Section 4.3.1

Oracle Fail Safe Manager

Domain user account that has Administrator privileges on all cluster nodes

Section 4.3.2

Oracle database

Database administrator account with SYSDBA privileges

Section 7.5

Generic services

By default, a generic service runs under the local system account. If you specify that the generic service must run under a user account, then it must have the "Log on as a service" privilege.

Section 8.4


4.3.1 Oracle Services for MSCS

Oracle Fail Safe accesses database resources from two different Windows services: the Cluster Service service and the OracleMSCSServices service. The Cluster Service service implements the database resource DLL functions, that is, the common resource functions that start and stop the database resource, and determine if the database resource is functioning properly by issuing simple database queries against the database ("Is Alive" polling). The OracleMSCSServices service issues database queries that are related to configuration of database resources, such as verifying that all database files are on shared cluster disks, the database starts and stops successfully, and so on.

Each of these services executes in the context of the Log On As user specified for the particular service. The OracleMSCSServices service executes under the account provided to the Oracle Services for MSCS Security Setup tool during the installation of Oracle Fail Safe. Prior to Windows Server 2008, the Cluster Service service executed under the cluster account specified when the cluster was configured. In Windows Server 2008 and later the Cluster Service service executes as user Local System.

All database connections must be properly authenticated, so Oracle Fail Safe must execute from a context that is authorized to connect to a database. If operating system authentication is being used to access a database (the database parameter REMOTE_LOGIN_PASSWORDFILE is set to NONE) then Oracle Database authenticates the access from the Windows service using the account name for that service. For the OracleMSCSServices service, that means that authentication is done using the Log On As account specified for the OracleMSCSServices service. For the Cluster Service service, on installations that are using a Windows Server version that is older than 2008, the cluster account is used. In Windows Server 2008 and later, the Oracle Fail Safe database resource DLL impersonates the OracleMSCSServices account when connecting to the database. Thus in Windows Server 2008 and later, even though the Cluster Service service is executing as Local System, database access authentication is done using the OracleMSCSServices account.

Prior to Windows Server 2008 it was possible for Oracle Fail Safe to access databases from two different user accounts: the one specified for the Cluster Service service and the OracleMSCSServices service. On systems using Windows Server 2008 and later, when using operating system authentication, Oracle Fail Safe only attempts to authenticate database access using the account specified for the OracleMSCSServices service. See Section 7.3.3.4, "Database Authentication" for more information regarding database authentication.

4.3.1.1 Account Updates Using the Oracle Fail Safe Security Setup Tool

Oracle Fail Safe provides a Security Setup tool that you can use to update the information for the account under which Oracle Services for MSCS runs. The Oracle Services for MSCS Security Setup tool is installed when you install Oracle Services for MSCS.

On a cluster node, you can access the Oracle Services for MSCS Security Setup tool from the Windows taskbar. To do so, from the Windows Start menu, select Programs (or All Programs), then Oracle_Home, and finally, Oracle Services for MSCS Security Setup.

Note:

Be sure that you use the Oracle Services for MSCS Security Setup tool to update the security information on all cluster nodes, and that you use the same account on all cluster nodes.

Figure 4-2 shows the setup for user account Administrator in the domain EXAMPLE\Admin.

Figure 4-2 Windows User Account Settings for the Oracle Services for MSCS

Description of Figure 4-2 follows
Description of "Figure 4-2 Windows User Account Settings for the Oracle Services for MSCS"

4.3.2 Oracle Fail Safe Manager

The account you use to log in to Oracle Fail Safe Manager must be a domain user account (not a local account) that has Administrator privileges on all cluster nodes.

4.4 Discovering Standalone Resources

Oracle Services for MSCS automatically discovers (locates) and displays standalone resources in the Oracle Fail Safe Manager tree view when you select the Standalone Resources folder from the tree view. Chapter 7, Chapter 9, and Chapter 10 contain information about how Oracle Fail Safe discovers each type of component that you can configure for high availability with Oracle Fail Safe.

4.5 Renaming Resources

Once a resource is added to a group, you must not change the resource name. If the resource name must be changed, then use Oracle Fail Safe Manager to remove the resource from the group and then, add it back to the group using the new name.

4.6 Using Oracle Fail Safe in a Multiple Oracle Homes Environment

Oracle Fail Safe supports the multiple Oracle homes feature. The following list describes the requirements for using Oracle Fail Safe in a multiple Oracle homes environment:

  • Install Oracle Services for MSCS in any one Oracle home on all cluster nodes. Only one version of Oracle Services for MSCS can be installed and running on a node.

  • Use the latest release of Oracle Fail Safe Manager to manage multiple clusters. See Oracle Fail Safe Release Notes for information about the compatibility of various versions on Oracle Fail Safe Manager and the Oracle Fail Safe server component.

    Note:

    You can install multiple versions of Oracle Fail Safe Manager on a system, but each version must be installed in a different Oracle home, and the latest release of Oracle Fail Safe Manager must be installed last.
  • Each resource to be configured for high availability must be installed in the same Oracle home on all cluster nodes that are possible owners. The Verify Cluster operation validates this symmetry. See Section 6.1.1 for information about the Verify Cluster operation.

  • All databases and listeners in a group must come from the same Oracle home.

    When you add a database to a group, an Oracle Net listener resource is added to the group also. Optionally, you can add an Oracle Management Agent resource to the group. See Section 9.2 for more information.

    The listener is created in the same Oracle home where the database resides.

4.7 Configurations Using Multiple Virtual Addresses

Before any resources, other than generic services, can be added to a group using Oracle Fail Safe Manager, one or more virtual addresses must be added to the group. Client applications connect to the resources in a group using one of the virtual addresses in the group.

You can add up to 32 virtual addresses to a group, prior to adding resources, by invoking the Add Resource to Group Wizard. In Oracle Fail Safe Manager, on the Resources menu, select Add to Group.

Note the following restrictions:

  • At least one virtual address must be added to a group before you can add another resource to the group. Only generic services can be added to a group that does not already contain a virtual address.

  • If the group contains one or more Oracle databases, then:

    • All virtual addresses that you plan to configure with one or more databases in a group must be added to the group before you can add any databases to the group.

    • All databases in a group must use the same set of virtual addresses that you specify for the first database that you add to the group. (The set of virtual addresses can contain as few as one address.)

    See Section 7.3.3.2 for more information about configuring multiple virtual addresses with Oracle databases.

When you add a virtual address to a group, the group is accessible by clients at the same network address, regardless of which cluster node is hosting the cluster.

Multiple virtual addresses in a group provide flexible configuration options. For example, users can access a database over the public network while you perform a database backup operation over the private network. Or you can allocate different virtual addresses on different network segments to control security, with administrators accessing the database on one segment, while users access the database on another segment.

When you add more than one virtual address to a group, Oracle Fail Safe Manager asks you to specify the address that clients can use to access the resources in that group. If you add more than one resource to a group (for example, a database and an Oracle Application Server), then you can dedicate one virtual address for users to access the database directly and another for users to access the Oracle Application Server. Alternatively, if there are many database users, then you can have some users access the database using one virtual address and the others use the other virtual address, to balance the network traffic.

See the online help in Oracle Fail Safe Manager for information about adding a virtual address to a group.

4.8 Adding a Node to an Existing Cluster

Instructions for installing the software to add a new node to an existing cluster are described in the Oracle Fail Safe Installation Guide. Once that task is completed, there is one final step. You must run the Verify Group command on each group on the cluster for which the new node is a possible owner.

Assume you add a new node to the cluster and install Oracle Fail Safe on that node along with the DLLs for the resources you intend to run on that node. The new node becomes a possible owner for these resources. If these resources have not yet been configured to run on the new node, when the group or groups containing them fail over to that node, then these resources cannot be restarted on that new node.

However, if you run the Verify Group command, then Oracle Fail Safe checks that the resources in the verified groups are configured to run on each node that is a possible owner for the group. If it finds a possible owner node where the resources in the group are not configured to run, then Oracle Fail Safe configures them for you.

Therefore, Oracle strongly recommends that you run the Verify Group operation for each group for which the new node is listed as a possible owner. Section 6.1.2 describes the Verify Group operation. You can also verify groups using the FSCMD command, as described in Chapter 5.