7 Validating Actions

This chapter provides general information about the validation process in Oracle Fail Safe Manager. The following topics are discussed in this chapter:

Note that Oracle Fail Safe provides a centralized message facility. When you perform an action that results in an error, the system locates the message associated with the error and displays it. Find more information about these messages in the Oracle Fail Safe Error Messages for Microsoft Windows manual.

7.1 Validating Operations

Oracle Fail Safe provides a family of tools to help you validate cluster components and the cluster environment to validate the status of nodes, groups, and resources. If a discrepancy or a problem is found, then the validate operation takes the appropriate action to fix any potential or actual problems.

Use the validate commands at any time to validate your cluster, group, or standalone database. If problems are found during validation, then Oracle Fail Safe prompts you to fix them or returns an error message that further describes the problem.

If errors are returned when you run one of the validate commands, then fix the errors and then rerun the validate command. Repeat this process until the validate operation runs without errors.

7.1.1 Validating Cluster

The Validate cluster action validates the installation and network configuration of the cluster. You can perform a cluster verification at any time. Select the cluster you want to validate from the list, then select Validate from the Actions menu in the Cluster view.

The first time you connect to a cluster after installing or upgrading the Oracle Fail Safe software, you are prompted to run Validate. You can run the Validate action at any time, however, you must run it whenever the cluster configuration changes. The Validate action verifies that:

  • Each Oracle home name into which Oracle software is installed is the same on all cluster nodes

    If, for example, OFS is the Oracle home name for the Oracle Fail Safe software on one cluster node, then OFS must be the Oracle home name on all nodes in the cluster where Oracle Fail Safe is installed. Similarly, if OfsDb is the Oracle home name for the Oracle Database software on one cluster node, then it must be the Oracle home name on all nodes in the cluster where the Oracle Database software is installed.

  • The Oracle Fail Safe release is identical on all nodes

  • The resource providers (components) are configured identically on at least two of the nodes that are possible owners for each resource

Validate also registers Oracle resource DLLs with Microsoft Windows Failover Clusters. Moreover, if any of the cluster configuration changes, then Oracle recommends that you run the Microsoft Windows Failover Cluster Manager Validate Cluster wizard to verify that the cluster configuration is still valid.

Figure 7-1 shows the output from a typical Validate action.

Figure 7-1 Verifying Cluster Progress Window

Description of Figure 7-1 follows
Description of "Figure 7-1 Verifying Cluster Progress Window"

If you run the Validate operation and it does not complete successfully, then it may indicate one or more of the following problems:

  • A problem exists in the configuration of the hardware, network, or the Microsoft Windows Failover Clusters.

  • A problem exists in the symmetry of the Oracle homes and versions.

  • A problem exists with the Oracle Fail Safe installation (for example, with the symmetry of the resource providers).

If the operation completes successfully, but you face problems with Oracle Fail Safe, then the problem is based in the Oracle Fail Safe configuration.

7.1.2 Validating the Configuration of Oracle Resources

The Validate action does the following to ensure that a group performs correctly:

  • Checks all resources in a group and confirms that they have been configured correctly on all nodes that are possible owners for the group.

  • Updates the dependencies among resources in the group.

  • Repairs a group that is misconfigured after prompting.

You can run the Validate operation at any time. However, you must run it when any of the following occurs:

  • A group or resource in a group does not come online.

  • Failover or failback do not perform as you expected.

  • You add a node to the cluster.

Select a group, then select Validate from the Actions menu in the Cluster view.

Or, you can run the Validate action using the PowerShell cmdlet Test-OracleClusterGroup command (see Chapter 6). You can run the Test-OracleClusterGroup command in scripts as batch jobs.

You can watch the progress of the Validate action and view the status of the individual resources in the group as Oracle Fail Safe verifies the group.

Figure 7-2 shows the output from a Validate action.

Figure 7-2 Verifying Group Progress Window

Description of Figure 7-2 follows
Description of "Figure 7-2 Verifying Group Progress Window"

7.1.3 Validating Standalone Database

A standalone database can be validated at any time by selecting the Validate action. Select the database from the Available Oracle Resources list and then run the Validate action.

The Validate operation performs validation checks to ensure that the standalone database is configured correctly on the node where it resides and to remove any references to the database that may exist on other cluster nodes. (References to the database may exist on other cluster nodes if the database was once added to a group and then later removed.) This ensures that the database can be made highly available using Oracle Fail Safe.

Oracle recommends that you use the Validate command on a standalone database before you add it to a group. You can also use it whenever you have trouble accessing a standalone database. However, note that Oracle Fail Safe stops and restarts the database during the verify operation.

For example, you may perform a verification:

  • If a failure occurs when you try to add a database to a group.

  • If you used an administrator tool other than Oracle Fail Safe Manager to perform an operation on the database and the database now is inaccessible.

  • If you removed or deinstalled the Microsoft Windows Failover Clusters from the cluster nodes without first removing the Oracle Fail Safe software (for example, during a software upgrade). This is described in more detail in the Oracle Fail Safe Installation Guide for Microsoft Windows.

Figure 7-3 shows the output from a typical Validate operation in a Clusterwide Operation window.

Figure 7-3 Verifying Standalone Database Progress Window

Description of Figure 7-3 follows
Description of "Figure 7-3 Verifying Standalone Database Progress Window"

To verify a standalone database, perform the following steps:

  • Select Oracle Resources from the tree-view on the left panel of the window.

  • Select a resource from the Available Oracle Resources list.

  • Select Validate action from the Actions menu list in the right panel of the window.

  • The Verifying standalone database progress window opens. This window shows the different tests run for the standalone database and in case of any errors, a message is displayed. These errors must be resolved before attempting to add the database to a cluster group. The Oracle Fail Safe Server may be able to resolve some issues, but it will ask for your confirmation before making any changes.

Oracle Fail Safe uses this information to:

  • Fix clusterwide problems with Oracle Net

  • Check that the standalone database is on a cluster disk

  • Ensure that Oracle Fail Safe can attach to the database

If a standalone database is open and you select the Validate action, then the action does not restart the database.

If a standalone database is not open or if the database is stopped, then Oracle Fail Safe asks your permission to stop and restart the database instance. Subsequently, Oracle Fail Safe opens the database for access.

If any problems are found during verification, then the Validate action prompts you before it attempts to fix them. For example, imagine that you try to add a database to a group, but the operation fails because of an Oracle Net problem. Run the Validate action to fix the network problem and subsequently add the database to a group.

7.2 Dumping Cluster

The Dump cluster action allows you to direct Oracle Fail Safe to display cluster data (such as number of cluster nodes, resource types, network information, Oracle Homes, restart action, and so on) in a window. You can then save this data to a file. You can enter this command periodically (and save the output) to maintain a record of changes made to the cluster over time, or you might enter it at the request of customer support so as to provide a snapshot of the cluster environment.

Data displayed when you select the Dump cluster action includes:

  • Information related to the operating system (including the location of the quorum disk)

  • Public and private network information

  • Resources registered with the cluster

  • Group failover and failback policies

You can optionally save the Dump Cluster data to a file by clicking Save As.

To run the Dump cluster action, select the cluster you want to dump from the list, then select Dump from the Actions menu in the Cluster view.

Figure 7-4 shows the portion of the Dump cluster command output that provides information about cluster-2 cluster and some of its resources.

Figure 7-4 Dumping Cluster Information Progress Window

Description of Figure 7-4 follows
Description of "Figure 7-4 Dumping Cluster Information Progress Window"

7.3 Finding Additional Troubleshooting Information

This chapter describes how to verify the different groups, clusters, and resources of Oracle Fail Safe Manager. Additional information is available as follows:

  • Information about troubleshooting a specific component can be found in Chapters 7 through 9, each of which describes how to configure a particular component for high availability.

  • Because Oracle Fail Safe is layered upon Microsoft Windows Failover Clusters software, you may need to refer to the Microsoft Windows Failover Clusters documentation to troubleshoot problems with the cluster service, interconnect, and hardware configuration.

  • If you are unable to start Oracle Fail Safe, then start the Windows Event Viewer and look at the application log. Oracle Fail Safe usually logs an event identifying the problem.