8 Configuring Generic Services for High Availability

A generic service is a Windows service that is supported by the generic service resource DLL provided with Microsoft Cluster Server (MSCS). Oracle Fail Safe support for configuring generic services for high availability lets you:

The following topics are discussed in this chapter:

8.1 Introduction

The difference between configuring a resource for high availability that is specifically supported by Oracle Fail Safe and configuring a generic service for high availability, is in the level of assistance that the Add Resource to Group Wizard provides. For resources that are specifically supported, the Add Resource to Group Wizard in Oracle Fail Safe Manager requests configuration information that is targeted at that specific resource. For generic resources, the Add Resource to Group Wizard cannot know what the configuration information will be, and thus the requested data is less well defined. Therefore, you must be more aware of the resources upon which your generic service depends, the Windows registry entries required, and so on.

8.1.1 Advantages of Using Oracle Fail Safe

This section lists some advantages of using Oracle Fail Safe instead of MSCS for configuring generic services:

  • Oracle Fail Safe can configure an existing service for high availability or it can create and configure a generic service for you as part of the Add Resource to Group operation; MSCS can configure only existing generic services for high availability.

  • The Oracle Fail Safe Add Resource to Group Wizard provides more questions to help you configure a generic resource for high availability. For example, it lets you specify the disks required by the generic resource, Windows registry entries that need to be replicated across the cluster, and so on. The MSCS wizard does not provide as many questions to guide you.

  • As part of the Add Resource to Group operation, Oracle Fail Safe tests your configuration (as it does for all types of resources it configures for high availability). Oracle Fail Safe tests the network, failover and failback, and ensures that the resource can be started on all cluster nodes that are possible owners of the resource.

  • Oracle Fail Safe sets the startup type for the resource on each cluster node to manual, as is required in a cluster environment. MSCS only sets the startup type on the node that currently owns the resource; you must remember to set the startup type to manual on the other cluster nodes that are possible owners.

Use the generic service resource type in the Add Resource to Group Wizard to configure a service for high availability if Oracle Fail Safe does not provide a customized wizard for that service. You can determine the services for which Oracle Fail Safe provides customized wizards by selecting the cluster in the Oracle Fail Safe Manager tree view and then clicking the Resources tab. Resources for which Oracle provides a customized wizard are listed on this property page.

8.1.2 Generic Resources That Must Not Be Configured for High Availability

When you consider configuring a generic service for high availability, keep in mind that after you do so, the service runs on only one cluster node at a time. Services you run on cluster nodes concurrently must not be configured for high availability. For example, consider the Windows Event Log. The Windows Event Log is a file to which all services on a given system can write informational messages, error messages, and so on. It is a means for the service to communicate conditions to the administrator.

If you make the Windows Event Log service highly available, then the service would run on only one cluster node at a time. Messages returned by services on the other cluster nodes would not have access to the Event Log on the cluster node running the Event Log. Therefore, it would be unwise to configure the Windows Event Log as a cluster resource.

8.2 Discovering Standalone Generic Services

Oracle Services for MSCS discovers generic resources by searching for them in the Windows service manager. During the discovery process, Oracle Fail Safe locates services in the Windows service manager on each node in the cluster and then displays the newly discovered services in the Oracle Fail Safe Manager Add Resource to Group Wizard.

To ensure that the properties of standalone and cluster resources are discovered and displayed correctly by Oracle Fail Safe Manager and Oracle Enterprise Manager, each resource must have a unique name within the cluster. It may be necessary to specify names that are different from default values or to change the default names of resources.

8.3 Adding Generic Services to a Group

To configure a generic service for high availability, you must add it to a group using the Oracle Fail Safe Manager Add Resource to Group Wizard. You can either add an existing generic service to a group, or you can specify Oracle Fail Safe to create the generic service. The following sections describe the configuration steps and the data that is needed to complete the Add Resource to Group Wizard for a generic service.

8.3.1 Configuration Steps

Table 8-1 provides a quick reference to the tasks needed to configure a generic service for high availability. For step-by-step instructions about any particular task, refer to the Oracle Fail Safe online help. From the Oracle Fail Safe Manager menu bar, select Help, then Search for Help on.

Table 8-1  Steps for Configuring a Generic Service

Step Procedure Comments

1

Ensure that the generic service executable file is installed on a private disk on each cluster node that will be a possible owner for the generic service.

This is not required, but is strongly recommended. Typically, several service instances use the same executable file. If the executable file is installed on a shared cluster disk, then all services that use that executable file must run on the cluster node that currently hosts that disk.

2

Copy files required by the generic service to a cluster disk.

If data files are required by the generic service, then they must be located on the cluster disks on the shared storage interconnect.

3

Start Oracle Fail Safe Manager.

From the Windows Start menu, select Oracle - Oracle_Home, then Oracle Fail Safe Manager.

4

Verify the cluster.

Select Troubleshooting, then Verify Cluster to run a procedure that validates the cluster hardware and software configurations.

5

Create a group.

Select Groups, then Create to run the Create Group Wizard. The wizard helps you to set up failover and failback policies and automatically opens the Add Resource to Group Wizard to let you add a virtual address to the group. Oracle Fail Safe does not require you to add a virtual address to a group before you add a generic service. However, the resources on which the generic service depends may require a virtual address. See Section 8.3.2.5.2 for details.

6

If needed, add one or more virtual addresses to the group.

Select Resources, then Add to Group to run the Add Resource to Group Wizard. The wizard helps you to create and configure the virtual server address for high availability.

7

Add resources upon which the generic service depends.

Select Resources, then Add to Group to open the Add Resource to Group Wizard.

8

Add the generic service to the group.

select Resources, then Add to Group to open the Add Resource to Group Wizard. The wizard helps you configure the generic service into a group. You can create a new generic service or specify an existing generic service.

9

Verify the group.

Select Troubleshooting, then Verify Group to check for and fix any problems with the group, virtual addresses, resources, or the failover configuration.


8.3.2 Configuration Data for Generic Services

To configure a generic service for high availability, you add it to a group. Oracle Fail Safe can create and add a new generic service to a group, or you can add an existing generic service to a group. In either case, when you use the Oracle Fail Safe Manager Add Resource to Group Wizard, you need the following data:

  • Possible owner nodes for the generic service, if the cluster consists of more than two nodes, or if one node is not available in a two-node cluster

  • Identity (node name, display name, service name, and image path) of the generic service

  • Account under which the generic service is run and its startup parameters

  • Disks, if any are used by the generic service

  • Other resources upon which the generic resource depends

  • Windows registry key values that the generic service uses

Unlike most other resources that you configure for high availability, you are not required to add a virtual address to a group before adding a generic service. You must determine, based on the use of the generic service, if a virtual address is needed. The following sections examine the issues you must consider to determine whether you must add a virtual address to the group and the configuration information needed to add a generic resource to a group.

8.3.2.1 Choose Nodes

If you are adding a generic service to a group and the cluster consists of more than two nodes, then you are asked to specify the nodes which must be possible owners for the generic service by specifying a list of selected nodes, as shown in Figure 8-1. To specify that a particular node must not be a possible owner for the generic service, select the node from the Selected Nodes list and click the left arrow.

Section 2.6.7 describes in detail the concept of the possible owner nodes list.

Figure 8-1 Choose Nodes Wizard Page When All Nodes Are Available

Description of Figure 8-1 follows
Description of "Figure 8-1 Choose Nodes Wizard Page When All Nodes Are Available"

If you are adding a generic service to a group and the cluster consists of two or more nodes, but one or more nodes are unavailable, then you are also asked to specify which nodes must be possible owners for the generic service. In this case, the wizard page displays the nodes that are unavailable and why, as shown in Figure 8-2.

Figure 8-2 Choose Nodes Wizard Page When Any Node Is Unavailable

Description of Figure 8-2 follows
Description of "Figure 8-2 Choose Nodes Wizard Page When Any Node Is Unavailable"

8.3.2.2 Generic Service Identity

When you configure a generic service for high availability, you must provide some basic information that Oracle Fail Safe can use to uniquely identify and locate the executable files for the generic service. In particular, the Add Resource to Group Wizard requests the following about the generic service identity:

  • Node name

    For an existing generic service, Oracle Fail Safe must know on which cluster node the generic service currently exists. If it exists on multiple nodes, then specify any one of them in the Add Resource to Group Wizard. If the service does not already exist, then select any node that is a possible owner for the generic service.

  • Display name

    The display name is used to describe the service in more detail than the service name. It can contain both spaces and up to 256 characters. The display name is shown in the Windows Services dialog box.

    The display name is also the name used by Oracle Fail Safe to refer to the service in the Oracle Fail Safe Manager tree view.

  • Service name

    The service name, sometimes referred to as a short name, labels the Windows registry subkey that contains the configuration information for the service. It must not contain spaces and is typically shorter than the display name.

  • Image name

    This is the path and file name for the generic service executable file. The executable file for a generic service must be installed on the same private disk and directory on all cluster nodes that are possible owners of the generic service. This ensures that if the generic service fails over, the executable files upon which it depends are available on the other cluster nodes.

    Oracle recommends that you do not install the generic service executable file on a shared cluster disk. Typically, several service instances use the same executable file. If the executable file is installed on a shared cluster disk, then all services that use that executable file must run on the cluster node that currently hosts that disk.

    When you install the executable file at the same location on each cluster node, each cluster node can host different service instances that access that same executable file. For example, you have two services, Service_A and Service_B, which use the same executable file. If the executable file is installed at the same location on each cluster disk, then Service_A can belong to Group_A, whose primary node is Node_1; and Service_B can belong to Group_B, whose primary node is Node_2. If you install the executable file on a shared cluster disk that belongs to Group_C, then the service can run only on the cluster node that is currently hosting Group_C.

Figure 8-3 shows the page in the Add Resource to Group Wizard on which you specify the generic service identity. If you enter an existing service in the Service Name box, then the Status box displays the status of the service. If you enter a new service in the Service Name box, then the Status box is empty. Oracle Fail Safe Manager presents the status for your information. You can add an existing generic resource to a group regardless of whether it is running or stopped.

Figure 8-3 Generic Service Identity Wizard Page

Description of Figure 8-3 follows
Description of "Figure 8-3 Generic Service Identity Wizard Page"

8.3.2.3 Generic Service Startup Parameters

The Add Resource to Group Wizard asks for the following details about how the generic service must be started:

  • Startup parameters

    You specify startup parameters that you want Oracle Fail Safe to pass to the Windows Service Control Manager. These parameters are the same as those you would specify if you were using the Windows Services dialog box, for example, -t. Oracle Fail Safe passes parameters unchanged to the Service Control Manager.

  • Log on as system account or user account

    You specify the account under which you want the service to run: the system account or a user account. Log on as System Account is selected by default. To log on as a user account, select This Account under "Log on as:" The account under which the service runs defines the security context for the generic service. When the service logs on as a system account (LocalSystem), it has access to all files on the local system, but no access to files across the network. When the service logs on as a user account, it can have access both to files on the local system and across the network, depending on which privileges it has. For example, Oracle Fail Safe itself runs under a user account (which you specify when you install Oracle Fail Safe) because it should be able to access files on all cluster nodes.

    The Add Resource to Group Wizard does not allow you to change the account under which an existing service runs. To change the account under which an existing generic service runs, use the Windows Services dialog box to change the account before attempting to add it to a group. (See Section 4.3.1 for information about changing the account under which Oracle Services for MSCS runs.)

    You will notice that Oracle Fail Safe does not request startup type information (Automatic, Manual, or Disabled) for the generic service. The startup type for all resources configured for high availability using Oracle Fail Safe is set to Manual. In a cluster environment, the service must only run on one node at a time. By setting the startup type to Manual, Oracle Fail Safe ensures that the resource runs on one node at a time and is started by MSCS only.

Figure 8-4 shows the page in the Add Resource to Group Wizard on which you specify the generic service startup parameters and account.

Figure 8-4 Generic Service Account Wizard Page

Description of Figure 8-4 follows
Description of "Figure 8-4 Generic Service Account Wizard Page"

8.3.2.4 Disks Used by a Generic Service

Oracle Fail Safe requires that data files needed by a highly available generic service be available on the cluster node currently running the service. This is accomplished in one of two ways:

  • You place the data files required by the service on a shared cluster disk that is included in the same group as the resource.

    In a failover, the disk fails over with the service so that the files are still available to the service.

  • You place the same file on the same private disk and directory on all cluster nodes that are possible owners of the generic service.

    In a failover, the service uses the same path to find the file on the private disk. Because the path to the file is the same on each cluster node, the resource can locate it, regardless of which cluster node is hosting the resource.

Typically, the service executable file is installed on a private disk on each cluster node, and the data files are placed on shared cluster disks. See Section 8.3.2.2 for more information about the placement of the executable file.

You may decide to place the data files on the same private disk and directory on each cluster node if you specifically want the contents of the files to be different depending upon the cluster node on which the generic service is running. For example, suppose Node_1 has twice the CPU and memory that Node_2 has. If your generic service uses a file to specify the maximum number of users that can access it concurrently, then you may want to set that number to 100 on Node_1 and 50 on Node_2.

However, Oracle recommends that data files be placed on shared cluster disks whenever possible. If you intend to follow this recommendation, then you must move any data file that the generic resource uses to a shared cluster disk prior to running the Add Resource to Group Wizard.

Figure 8-5 shows the page in the Add Resource to Group Wizard on which you specify the disk dependency.

Figure 8-5 Generic Service Disks Wizard Page

Description of Figure 8-5 follows
Description of "Figure 8-5 Generic Service Disks Wizard Page"

8.3.2.5 Generic Service Dependencies

Because you are configuring a resource about which neither Oracle Fail Safe Manager nor MSCS has detailed information, the process for configuring a generic service for high availability is less automated than for resource types about which Oracle Fail Safe does have detailed information (such as an Oracle database). For example, when you use Oracle Fail Safe to configure an Oracle database for high availability, Oracle Fail Safe Manager includes the resources upon which the database depends and determines the order in which those resources need to be brought online. When you configure a generic resource for high availability, you need to provide this dependency information.

8.3.2.5.1 Specifying Generic Service Dependencies

You provide dependency information for a generic resource by the order in which you add resources to a group. For example, suppose you want to make Service_A highly available, but in order for Service_A to come online, Service_B and Service_C must be online already. In other words, Service_A has a dependency on Service_B and Service_C. Furthermore, in order for Service_B to come online successfully, Service_C must be online already. This chain of dependencies can be illustrated by a tree, as shown in Figure 8-6.

Figure 8-6 Dependency Tree

Description of Figure 8-6 follows
Description of "Figure 8-6 Dependency Tree"

In this scenario, you add Service_C to the group first. Then you add Service_B to the group and specify Service_C as a dependency. Finally, you add Service_A to the group and specify Service_B as a dependency. In effect, you build the dependency tree one resource at a time. Each time you add a resource to a group, you can specify only one dependency level. Therefore, if the dependency tree is two levels deep (or more), then the order in which you add the resources to the group is important.

Figure 8-7 shows the page in the Add Resource to Group Wizard on which you specify resource dependencies.

Figure 8-7 Generic Service Dependencies Wizard Page

Description of Figure 8-7 follows
Description of "Figure 8-7 Generic Service Dependencies Wizard Page"

8.3.2.5.2 Generic Services and Virtual Address Dependencies

Oracle Fail Safe does not require you to add a virtual address to a group before you add a generic service to the group. A virtual address specifies the network address at which a resource can be found by clients or other services. If neither clients nor other services attach to the generic service, then you do not need to add a virtual address to the group before you add the generic service.

You may need to add a virtual address to the group, however, if the generic service is accessed by clients or other services.

8.3.2.6 Generic Service Registry Keys

If your generic service uses Windows registry entries to store information, then you can specify these entries in the Add Resource to Group Wizard. By specifying them in the wizard, you ensure that the Windows registry entries for your service are consistent across the cluster nodes that are possible owner nodes for the generic resource. This is important so that in a failover, your service can run correctly on any cluster node that is a possible owner of the generic resource.

For example, if you were to manually configure an Oracle Forms Server as a generic service, then you would specify the FORMS60_PATH registry variable.

The root for the registry keys you specify is assumed to be HKEY_LOCAL_MACHINE. This is discussed in more detail in the online help for the Add Resource to Group Wizard.

Figure 8-8 shows the page in the Add Resource to Group Wizard on which you specify Windows registry keys.

Figure 8-8 Generic Service Registry Wizard Page

Description of Figure 8-8 follows
Description of "Figure 8-8 Generic Service Registry Wizard Page"

8.4 Security Requirements for Generic Services

By default, a generic service runs under the local system account. If you specify that the generic service must run under a user account, then it must have the "Log on as a service" privilege. When you add a generic service to a group, Oracle Fail Safe checks to see if the account under which the generic service is running has this privilege; if it does not, then Oracle Fail Safe grants it to the specified user account.

In addition, Oracle Fail Safe checks that the user account and password you specified in the Add Resource to Group Wizard are valid. If not, then Oracle Fail Safe returns an error.

8.5 Troubleshooting Problems with Generic Services

You can run the Verify Group operation at any time. However, you must run it when any of the following occurs:

  • A group or resource in a group does not come online.

  • Failover or failback does not perform as you expect.

  • A new node is added to the cluster.

  • You accidently delete a generic service from a cluster node.

    When you run the Verify Group operation it automatically re-creates the same generic service on all cluster nodes that are possible owners of the service.

General information about the Oracle Fail Safe troubleshooting tools (Verify Cluster and Verify Group) is in Chapter 6.