Oracle® Real Application Clusters Guard Concepts and Administration Guide
Release 3.3.1 for Windows
Part No. A96687-01
This chapter provides general information on the troubleshooting tools provided with Oracle Real Application Clusters Guard Manager. The following table shows the information provided in this chapter:
|Verify Operations||Section 6.1|
|Dump Cluster||Section 6.2|
|Handling Errors and Troubleshooting Problems with Configured Databases||Section 6.3|
|Finding Additional Troubleshooting Information||Section 6.4|
Oracle Real Application Clusters Guard provides a centralized message facility. When you perform an action that results in an error, the system locates the message associated with the error and displays it. You can find more information about these messages in the Oracle Services for MSCS Error Messages manual.
Oracle Real Application Clusters Guard provides a family of tools to help you to validate the status of nodes, groups, resources, and the cluster environment. If a discrepancy or a problem is found, the verify operation takes the appropriate action to fix any potential or actual problems.
Figure 6-1 shows the commands in the Troubleshooting menu.
Figure 6-1 Troubleshooting Menu and Verify Commands
Table 6-1 describes the verify commands and provides references for more information.
Table 6-1 Verify Commands for Troubleshooting
|Verify Cluster||Validates the Oracle Real Application Clusters Guard installation, the Oracle Real Application Clusters product installation (including Oracle homes and product version numbers), cluster network configuration, and cluster resource DLL registration||Section 6.1.1|
|Verify Group||Validates that the group resources and their dependencies are configured correctly||Section 6.1.2|
|Verify Real Application Clusters Database||Validates the Oracle Real Application Clusters database instances||Section 6.1.3|
You can use the verify commands at any time to validate your cluster, group, or configured Oracle Real Application Clusters database. If problems are found during verification, Oracle Real Application Clusters Guard prompts you to fix them or returns an error message that further describes the problem. Oracle Real Application Clusters Guard does not make any changes without first prompting you.
-> Verify Cluster
The first time you connect to a cluster after installing or upgrading the Oracle Real Application Clusters Guard software, you are prompted to run Verify Cluster. You can run Verify Cluster at any time, however, and you should run it whenever the cluster configuration changes. The Verify Cluster operation verifies that:
If, for example, RealAppCluGuard is the Oracle home name for the Oracle Real Application Clusters Guard software on one cluster node, then RealAppCluGuard must be the Oracle home name on all nodes in the cluster where Oracle Real Application Clusters Guard is installed. Similarly, if OfsDb is the Oracle home name for the Oracle Real Application Clusters software on one cluster node, then it must be the Oracle home name on all nodes in the cluster where the Oracle Real Application Clusters software is installed.
The Oracle Services for MSCS release is identical on all nodes
The resource providers (components) are configured identically on at least two of the nodes that are possible owners for each resource
If there is a problem with inconsistent mapping, the Verify Cluster command returns errors indicating that the order of network adapters may be incorrect. See Appendix A for details.
The Verify Cluster operation also registers Oracle resource DLLs with Microsoft Cluster Server (MSCS).
Figure 6-2 shows the output from a typical Verify Cluster operation.
Figure 6-2 Clusterwide Operation Window for Verify Cluster
If you run the Verify Cluster operation and it does not complete successfully, it might indicate one or more of the following problems:
A problem exists in the configuration of the hardware, network, or the MSCS software.
A problem exists in the symmetry of the Oracle homes and versions.
A problems exists with the Oracle Real Application Clusters Guard installation (for example, with the symmetry of the resource providers).
If the operation completes successfully, but you are having problems with Oracle Real Application Clusters Guard, the problem is based in the Oracle Real Application Clusters configuration.
Checks all resources in a group and confirms that they have been configured correctly on all nodes that are possible owners for the group.
Updates the dependencies among resources in the group.
After prompting you, repairs a group that is misconfigured.
You can run the Verify Group operation at any time. However, you should run it when any of the following occurs:
A group or resource in a group does not come online when it should.
-> Verify Group
Or, you can run a Verify Group operation using the ORACGCMD command VERIFYGROUP (see Chapter 5). The ORACGCMD command also provides a VERIFYALLGROUPS command that allows you to verify all groups configured by Oracle Real Application Clusters Guard on a given cluster. You can run the VERIFYGROUP and VERIFYALLGROUPS commands in scripts as batch jobs.
You can watch the progress of the Verify Group operation and view the status of the individual resources in the group as Oracle Real Application Clusters Guard verifies the group.
Figure 6-3 shows the output from a Verify Group operation.
Figure 6-3 Clusterwide Operation Window for Verify Group
You can validate a configured Oracle Real Application Clusters database at any time using the Verify Real Application Clusters Database operation. To issue the Verify Real Application Clusters Database command, select the database from the Oracle Real Application Clusters Guard Manager tree view, and then choose:
-> Verify Real Application Clusters Database
The Verify Real Application Clusters Database operation verifies that all database instances are configured and then calls the verify group operation to verify each of the instance groups. The verify database operation returns warning messages if it finds that not all database instances are configured or if the cluster metadata indicates there are more instances configured for the database than are actually associated with the database. (Use the Configure Additional Instances command or Unconfigure command to remedy these situations. See Section 3.4 or Section 3.5 for information on these commands.)
Figure 6-4 shows the output from a typical Verify Real Application Clusters Database operation in a Clusterwide Operation window.
If any problems are found during verification, the Verify Real Application Clusters Database operation prompts you before it attempts to fix them.
Oracle Real Application Clusters Guard provides the Dump Cluster command to display Oracle Real Application Clusters Guard cluster data in a window. You might issue this command periodically (and save the output) to maintain a record of changes made to the cluster over time, or you might issue it at the request of customer support to provide a snapshot of the cluster environment.
Data presented when you issue the Dump Cluster command includes:
You can optionally save the Dump Cluster data to a file by clicking the Save As button.
To issue the Dump Cluster command, from the Troubleshooting menu, choose the Dump Cluster command.
Figure 6-5 shows a portion of the Dump Cluster output.
Figure 6-5 Dump Cluster Clusterwide Operation
The following sections describe how to troubleshoot specific problems that you may encounter with configured Oracle Real Application Clusters databases. For general information about troubleshooting Oracle Real Application Clusters databases, see the Oracle Real Application Clusters documentation.
In most cases, the first step in troubleshooting a problem is to issue the Verify Cluster, Verify Group, or Verify Real Application Clusters Database command.
If there is a problem placing a group online, try the following:
Oracle Net logs an entry to the listener log file every time an error is encountered or an instance is accessed through the listener. Check for errors in the log file that might help you to identify the problem. The log files reside in the <Oracle_Home>\Network\Log directory.
The Oracle Real Application Clusters Guard database instance resource DLL accesses each database in a group at the Is Alive interval. It uses the database connection information to access each database instance. If the database access information has changed, then Oracle Real Application Clusters Guard will fail to access the database instances. Hence, MSCS will not consider the database resource to be alive.
Check the Oracle Net configuration data.
Set the Pending Timeout value to specify the length of time you want the cluster software to allow for the database instance to be brought online (or taken offline) before considering the operation to have failed. Set the value high enough to prevent a cluster system from mistaking slow response time for unavailability, yet low enough to minimize the restart response time when a failure does occur.
You can set the Pending Timeout value by modifying the database properties, as follows:
In the Oracle Real Application Clusters Guard Manager tree view, select the database name.
Click the Policies tab.
In the Pending Timeout box, modify the Pending Timeout value.
If database password files are not being used, ensure that the REMOTE_LOGIN_PASSWORDFILE initialization parameter is set to NONE in the database initialization parameter file.
You can determine whether or not database password files are being used by selecting the database from the tree view, then clicking the Authentication tab. If a password file is being used, the Account information contains entries; if it is not being used, then the Account information is empty and dimmed.
If the password for the account through which Oracle Real Application Clusters Guard accesses a database instance changes and you do not update the information through Oracle Real Application Clusters Guard Manager, the attempts at polling the database will fail. See Section 3.9 for information on how to update database password changes for Oracle Real Application Clusters Guard.
Sometimes, processing-intensive operations (such as an Import operation) can cause Is Alive polling to fail and may result in an undesired group failover. In such cases, you can disable Is Alive polling for the instance by issuing the ORACGCMD DISABLEISALIVE command. However, be aware that when you disable Is Alive polling, Oracle Real Application Clusters Guard suspends monitoring the instance until Is Alive polling is reenabled. You reenable Is Alive polling with the ORACGCMD ENABLEISALIVE command.
Oracle Corporation recommends that you issue these ORACGCMD commands from within a script so that you can ensure that Is Alive polling is reenabled when the processing-intensive operation completes.
For information on the ORACGCMD commands, see Chapter 5.
An entry is logged to the listener.log file every time a connection is made to a database. Because Oracle Real Application Clusters Guard connects to the database each time it performs Is Alive polling, the listener.log file can grow quite large, very quickly. When it becomes very large (2 Mb or more), it can create a drain on resources, which can lead to the ORA-12500 error being returned.
To avoid this error, you can do any one of the following:
Disable the use of the Listener Control Utility for Is Alive polling.
In Oracle Real Application Clusters Guard Manager, select the listener from the tree view, then click the Parameters tab in the properties sheet. Clear the box for the "Use the Listener Control Utility for Is Alive polling" option.
Periodically delete the listener.log file. However, be aware that to delete the listener.log file, you must first stop the listener.
Use the LOGGING_<listener-name> = OFF parameter entry in the listener.ora file to stop logging. Make sure that the listener-name you specify as part of the parameter is the Oracle Real Application Clusters Guard listener.
Each listener has its own output file that is named using the listener name and the .out extension. (In the example, the listener name is fslnode.) If you experience difficulties when creating a new listener, you can use the output file to help you diagnose the problem.
Whenever Oracle Real Application Clusters Guard makes changes in the listener.ora or tnsnames.ora files, the original version of the file is archived. If you need to reference an Oracle Net net service name definition or a listener definition as it was before Oracle Real Application Clusters Guard changed the definition, you can look at the archived versions of the configuration files.
Oracle Real Application Clusters Guard keeps up to two archived versions of configuration files. The file name of the archived version has a format of <filename>_000.ora and <filename>_001.ora. Note that <filename>_000.ora is the most recent file.
Whenever Oracle Real Application Clusters Guard encounters an error during an operation after Oracle Net configuration files have been changed, the updated version of the file is saved as <filename>_rlb.ora. Then, the original version of the file is restored.
The rollback version of the file may be useful for problem diagnosis.
If users and client applications are unable to access a database that is configured into an MSCS cluster, perform the following steps to fix the problem:
This chapter described how to use the Oracle Real Application Clusters Guard Manager family of troubleshooting tools and how to troubleshoot problems with configured Oracle Real Application Clusters databases. Additional information is available as follows:
Information on troubleshooting network configuration problems is described in Appendix A.
Because Oracle Real Application Clusters Guard is layered upon Microsoft Cluster Server software, you might need to refer to the MSCS documentation to troubleshoot problems with the cluster service, interconnect, and hardware configuration.
If you are unable to start Oracle Real Application Clusters Guard, invoke the Microsoft Windows Event Viewer and look at the application log. Oracle Real Application Clusters Guard Server usually logs an event identifying the problem.