Oracle® Fail Safe Concepts and Administration Guide
Release 3.3.3 for Windows
Part No. B12070-01
This chapter provides general information about the troubleshooting tools provided with Oracle Fail Safe Manager. The following table shows the information provided in this chapter:
Finding Additional Troubleshooting Information
Note that Oracle Fail Safe provides a centralized message facility. When you perform an action that results in an error, the system locates the message associated with the error and displays it. You can find more information about these messages in the Oracle Fail Safe Error Messages manual.
Oracle Fail Safe provides a family of tools to help you verify cluster components and the cluster environment to validate the status of nodes, groups, and resources. If a discrepancy or a problem is found, the verify operation takes the appropriate action to fix any potential or actual problems.
Figure 6-1 shows the verify commands in the Troubleshooting menu.
Figure 6-1 Troubleshooting Menu and Verify Commands
Table 6-1 describes the verify commands and provides references for more information.
Table 6-1 Verify Commands for Troubleshooting
|Verify Cluster||Validates the Oracle Fail Safe installation, the Oracle product installation (including Oracle homes and product version numbers), cluster network configuration, and cluster resource DLL registration.||
|Verify Group||Validates that the group resources and their dependencies are configured correctly.||
|Verify Standalone Database||Validates the standalone database instance and removes any old configuration information that might remain on another node.||
You can use the verify commands at any time to validate your cluster, group, or standalone database. If problems are found during verification, Oracle Fail Safe prompts you to fix them or returns an error message that further describes the problem.
If errors are returned when you run one of the verify commands, fix the errors and then rerun the verify command. Repeat this process until the verify operation runs without errors.
The Verify Cluster operation validates the installation and network configuration of the cluster. You can perform a cluster verification at any time. From the Oracle Fail Safe Manager menu bar, choose Troubleshooting, then Verify Cluster.
The first time you connect to a cluster after installing or upgrading the Oracle Fail Safe software, you are prompted to run Verify Cluster. You can run Verify Cluster at any time, however, and you should run it whenever the cluster configuration changes. The Verify Cluster operation verifies that:
If, for example, OFS is the Oracle home name for the Oracle Fail Safe software on one cluster node, then OFS must be the Oracle home name on all nodes in the cluster where Oracle Fail Safe is installed. Similarly, if OfsDb is the Oracle home name for the Oracle database software on one cluster node, then it must be the Oracle home name on all nodes in the cluster where the Oracle database software is installed.
The Oracle Services for MSCS release is identical on all nodes
The resource providers (components) are configured identically on at least two of the nodes that are possible owners for each resource
If there is a problem with inconsistent mapping, the Verify Cluster command returns errors indicating that the order of network adapters may be incorrect. See Appendix A for details.
Verify Cluster also registers Oracle resource DLLs with Microsoft Cluster Server (MSCS).
Figure 6-2 shows the output from a typical Verify Cluster operation.
Figure 6-2 Clusterwide Operation Window for Verify Cluster
If you run the Verify Cluster operation and it does not complete successfully, it might indicate one or more of the following problems:
A problem exists in the configuration of the hardware, network, or the MSCS software.
A problem exists in the symmetry of the Oracle homes and versions.
A problem exists with the Oracle Fail Safe installation (for example, with the symmetry of the resource providers).
If the operation completes successfully, but you are having problems with Oracle Fail Safe, the problem is based in the Oracle Fail Safe configuration.
Checks all resources in a group and confirms that they have been configured correctly on all nodes that are possible owners for the group.
Updates the dependencies among resources in the group.
After prompting you, repairs a group that is misconfigured.
You can run the Verify Group operation at any time. However, you should run it when any of the following occurs:
A group or resource in a group does not come online.
You add a node to the cluster.
Or, you can run a Verify Group operation using the FSCMD command VERIFYGROUP (see Chapter 5). The FSCMD command also provides a VERIFYALLGROUPS command that lets you verify all groups configured by Oracle Fail Safe on a given cluster. You can run the VERIFYGROUP and VERIFYALLGROUPS commands in scripts as batch jobs.
You can watch the progress of the Verify Group operation and view the status of the individual resources in the group as Oracle Fail Safe verifies the group.
Figure 6-3 shows the output from a Verify Group operation.
Figure 6-3 Clusterwide Operation Window for Verify Group
You can validate a standalone database at any time using the Verify Standalone Database operation. To issue the Verify Standalone Database command, select the database from the Oracle Fail Safe Manager tree view, and then from the Oracle Fail Safe Manager menu bar, choose Troubleshooting, then Verify Standalone Database.
The Verify Standalone Database operation performs validation checks to ensure that the standalone database is configured correctly on the node where it resides and to remove any references to the database that might exist on other cluster nodes. (References to the database might exist on other cluster nodes if the database was once added to a group and then later removed.) This ensures that the database can be made highly available using Oracle Fail Safe.
Oracle recommends that you use the Verify Standalone Database command on a standalone database before you add it to a group. You can also use it whenever you have trouble accessing a standalone database. However, note that Oracle Fail Safe stops and restarts the database during the verify operation.
For example, you might perform a verification:
In response to a failure when you try to add a database to a group.
If you used an administrator tool other than Oracle Fail Safe Manager to perform an operation on the database and the database now is inaccessible.
If you removed or deinstalled the MSCS software from the cluster nodes without first removing the Oracle Fail Safe software (for example, during a software upgrade). This is described in more detail in the Oracle Fail Safe Installation Guide.
Figure 6-4 shows the Verify Standalone Database dialog box in which you enter valid database information and account information for a standalone database.
Figure 6-4 Verify Standalone Database Dialog Box
To use the Verify Standalone Database dialog box, you specify:
The service name of the standalone database, in the Service Name field
The instance name of the standalone database, in the Instance Name field
The database name of the standalone database, in the Database Name field
The parameter file disk, path name, and file name for the initialization parameter file for the standalone database, in the Parameter File field
The account that Oracle Fail Safe should use to attach to the database, in the Account area.
Oracle Fail Safe uses this information to:
Fix clusterwide problems with Oracle Net
Check that the standalone database is on a cluster disk
Make sure that Oracle Fail Safe can attach to the database
If a standalone database is open and you run a Verify Standalone Database operation, the operation does not restart the database.
If a standalone database is not open or if the database is stopped, Oracle Fail Safe will ask your permission to stop and restart the database instance. Subsequently, Oracle Fail Safe will open the database for access.
Figure 6-5 shows the output from a typical Verify Standalone Database operation in a Clusterwide Operation window.
If any problems are found during verification, the Verify Standalone Database operation prompts you before it attempts to fix them. For example, imagine that you try to add a database to a group, but the operation fails because of an Oracle Net problem. You can run the Verify Standalone Database command to fix the network problem and subsequently add the database to a group.
Oracle Fail Safe provides the Dump Cluster command to display Oracle Fail Safe Manager cluster data in a window. You might issue this command periodically (and save the output) to maintain a record of changes made to the cluster over time, or you might issue it at the request of customer support so as to provide a snapshot of the cluster environment.
Data presented when you issue the Dump Cluster command includes:
You can optionally save the Dump Cluster data to a file by clicking Save As.
To issue the Dump Cluster command, select the cluster from the Oracle Fail Safe Manager tree view, and then from the Oracle Fail Safe Manager menu bar, choose Troubleshooting, then Dump Cluster.
Figure 6-6 shows the portion of the Dump Cluster output that provides information about the FS-170 cluster and some of its resources.
Figure 6-6 Dump Cluster Clusterwide Operation
This chapter described how to use the Oracle Fail Safe Manager family of troubleshooting tools. Additional information is available as follows:
Information about troubleshooting a specific component can be found in Chapters 7 through 9, each of which describes how to configure a particular component for high availability.
Information about troubleshooting network configuration problems is described in Appendix A.
Because Oracle Fail Safe is layered upon Microsoft Cluster Server software, you might need to refer to the MSCS documentation to troubleshoot problems with the cluster service, interconnect, and hardware configuration.