Skip Headers
Oracle® Clusterware Administration and Deployment Guide
11g Release 1 (11.1)

B28255-07
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

F Troubleshooting Oracle Clusterware

This appendix introduces monitoring the Oracle Clusterware environment and explains how you can enable dynamic debugging to troubleshoot Oracle Clusterware processing, and enable debugging and tracing for specific components and specific Oracle Clusterware resources to focus your troubleshooting efforts.

This appendix contains the following topics:

Monitoring Oracle Clusterware

You can use Oracle Enterprise Manager to monitor the Oracle Clusterware environment. When you log in to Oracle Enterprise Manager using a client browser, the Cluster Database Home page appears where you can monitor the status of both Oracle Clusterware environments. Monitoring can include such things as:

The Cluster Database Home page is similar to a single-instance Database Home page. However, on the Cluster Database Home page, Oracle Enterprise Manager displays the system state and availability. This includes a summary about alert messages and job activity, as well as links to all the database and Automatic Storage Management (ASM) instances. For example, you can track problems with services on the cluster including when a service is not running on all of the preferred instances or when a service response time threshold is not being met.

You can use the Oracle Enterprise Manager Interconnects page to monitor the Oracle Clusterware environment. The Interconnects page shows the public and private interfaces on the cluster, the overall throughput on the private interconnect, individual throughput on each of the network interfaces, error rates (if any) and the load contributed by database instances on the interconnect, including:

All of this information also is available as collections that have a historic view. This is useful in conjunction with cluster cache coherency, such as when diagnosing problems related to cluster wait events. You can access the Interconnects page by clicking the Interconnect tab on the Cluster Database home page.

Also, the Oracle Enterprise Manager Cluster Database Performance page provides a quick glimpse of the performance statistics for a database. Statistics are rolled up across all the instances in the cluster database in charts. Using the links next to the charts, you can get more specific information and perform any of the following tasks:

The charts on the Cluster Database Performance page include the following:

In addition, the Top Activity drilldown menu on the Cluster Database Performance page enables you to see the activity by wait events, services, and instances. Plus, you can see the details about SQL/sessions by going to a prior point in time by moving the slider on the chart.

Dynamic Debugging

You can use crsctl commands as the root user to enable dynamic debugging for Oracle Clusterware, the Event Manager (EVM), and the clusterware subcomponents. You can dynamically change debugging levels using crsctl commands. Debugging information remains in the Oracle Cluster Registry (OCR) for use during the next startup. You can also enable debugging for resources.

The crsctl syntax to enable debugging for Oracle Clusterware is:

crsctl debug log crs "CRSRTI:1,CRSCOMM:2" 

The crsctl syntax to enable debugging for EVM is:

crsctl debug log evm "EVMCOMM:1" 

The crsctl syntax to enable debugging for resources is:

crsctl debug log res "resname:1"

Component Level Debugging

You can use crsctl commands as the root user to enable dynamic debugging for the Oracle Clusterware Cluster Ready Services (CRS), Oracle Cluster Registry (OCR), Cluster Synchronization Services (CSS), and the Event Manager (EVM).

This section contains the following topics:

Enabling Debugging for CRS, OCR, CSS, and EVM Modules

You can enable debugging for the CRS, OCR, CSS, and EVM modules and their components by setting environment variables or by issuing crsctl debug commands using the following syntax:

crsctl debug log module_name component:debugging_level

You must issue the crsctl debug command as the root user, and supply the following information:

  • module_name—The name of the module: CRS, EVM, or CSS.

  • component—The name of a component for the CRS, OCR, EVM, or CSS module. See Table F-1 for a list of all of the components.

  • debugging_level—A number from 1 to 5 to indicate the level of detail you want the debug command to return, where 1 is the least amount of debugging output and 5 provides the most detailed debugging output.

    You can dynamically change the debugging level in the crsctl command, or you can configure an init file for changing the debugging level as described in "Creating an Initialization File to Contain the Debugging Level".

The following commands show examples of how to enable debugging for the various modules:

  • To enable debugging for Oracle Clusterware:

    crsctl debug log crs "CRSRTI:1,CRSCOMM:2"
    
  • To enable debugging for OCR:

    crsctl debug log crs "CRSRTI:1,CRSCOMM:2,OCRSRV:4"
    
  • To enable debugging for EVM:

    crsctl debug log evm "EVMCOMM:1"
    
  • To enable debugging for resources

    crsctl debug log res "resname:1"
    

To list the components that can be used for debugging, issue the crsctl lsmodules command using the following syntax and supply crs, evm, or css for the module_name parameter:

crsctl lsmodules module_name

Note:

You do not have to be the root user to run the crsctl command with the lsmodules option.

Table F-1 shows the components for the CRS, OCR, EVM, and CSS modules, respectively. Note that some of the component names are common between the CRS, EVM, and CSS daemons and may be enabled on that specific daemon. For example COMMNS is the NS layer and because each daemon uses the NS layer, you can enable this specific module component on any of the daemons to get specific debugging information.

Table F-1 Components for the CRS, OCR, EVM, and CSS Modules

CRS ModulesFoot 1  OCR ModulesFoot 2  EVM ModulesFoot 3  CSS ModulesFoot 4 

CRSUI
CRSCOMM
CRSRTI
CRSMAIN
CRSPLACE
CRSAPP
CRSRES
CRSCOMM
CRSOCR
CRSTIMER
CRSEVT
CRSD
CLUCLS
CSSCLNT
COMMCRS
COMMNS

OCRAPI
OCRCLI
OCRSRV
OCRMAS
OCRMSG
OCRCAC
OCRRAW
OCRUTL
OCROSD

OCR Tools Modules


OCRCONF
OCRDUMP
OCRCHECK

EVMD
EVMDMAIN
EVMCOMM
EVMEVT
EVMAPP
EVMAGENT
CRSOCR
CLUCLS
CSSCLNT
COMMCRS
COMMNS

CSSD
COMMCRS
COMMNS

Footnote 1 List the CRS component modules using the crsctl lsmodules crs command.

Footnote 2 You cannot list the OCR modules using the crsctl lsmodules command.

Footnote 3 List the EVM component modules using the crsctl lsmodules evm command.

Footnote 4 List the CSS component modules using the crsctl lsmodules css command.

Creating an Initialization File to Contain the Debugging Level

This section describes how to specify the debugging level in an initialization file. This debugging information is stored for use during the next startup.

For each process that you want to debug, you can create an initialization file that contains the debugging level.

The initialization file name includes the name of the process that you are debugging (process_name.ini). The file is located in the |Oracle_home/log/hostname/admin/| directory.

For example, ORACLE_HOME/log/hostA/admin/clscfg.ini is the name for the CLSCFG debugging initialization file on hostA.

See Also:

"Enabling Debugging for CRS, OCR, CSS, and EVM Modules" for information about dynamically changing debugging levels by specifying the level number (from 1 to 5) on the crsctl command

Oracle Clusterware Shutdown and Startup

You can start or stop Oracle Clusterware by issuing crsctl start and stop commands.

Example 1   Stopping Oracle Clusterware

To stop Oracle Clusterware and its related resources on a specific node, issue the following command:

crsctl stop crs 
Example 2   Starting Oracle Clusterware

To start Oracle Clusterware and its related resources on a specific node, issue the following command:

crsctl start crs

Note:

You must run these crsctl commands as the root user.

Enabling and Disabling Oracle Clusterware Daemons

When the Oracle Clusterware daemons are enabled, they start automatically at the time the node is started. To prevent the daemons from starting, you can disable them using crsctl commands. You can use crsctl commands as follows to enable and disable the startup of the Oracle Clusterware daemons.

Run the following command to enable startup for all of the Oracle Clusterware daemons:

crsctl enable crs

Run the following command to disable the startup of all of the Oracle Clusterware daemons:

crsctl disable crs

Notes:

  • You must run these crsctl commands as the root user.

  • Neiter of these commands is supported on Windows systems

Determining the Active Versions and Software Versions

You can determine the active version or the software version running on the local node cluster by issuing crsctl activeversion and softwarewareversion commands.

These versions are used while upgrading a cluster.

Example 1   Determining the Active Version

To determine the active version on the local node, issue the following command:

crsctl query crs activeversion
Example 2   Determining the Software Version

To determine the software version on the local node, issue the following command:

crsctl query crs softwareversion

Diagnostics Collection Script

Every time an Oracle Clusterware error occurs, you should use run the diagcollection.pl script to collect diagnostic information from Oracle Clusterware in trace files. The diagnostics provide additional information so Oracle Support can resolve problems. Run this script from the following location:

CRS_home/bin/diagcollection.pl

Note:

You must run this script as the root user.

Oracle Clusterware Alerts

Oracle Clusterware posts alert messages when important events occur. The following is an example of an alert from the CRSD process:

2007-09-03 10:05:35.463
[cssd(3073)]CRS-1605:CSSD voting file is online: /dev/sdm2. Details in 
/scratch/crs/log/stnsp012/cssd/ocssd.log.
2007-09-03 10:05:35.484
[cssd(3073)]CRS-1605:CSSD voting file is online: /dev/sdl3. Details in 
/scratch/crs/log/stnsp012/cssd/ocssd.log.
[cssd(3073)]CRS-1601:CSSD Reconfiguration complete. Active nodes are 
stnsp011 stnsp012 stnsp013 stnsp014 .
2007-09-03 10:05:36.949
[evmd(2218)]CRS-1401:EVMD started on node stnsp012.
2007-09-03 10:05:36.999
[crsd(2232)]CRS-1012:The OCR service started on node stnsp012.
2007-09-03 10:05:38.770
[crsd(2232)]CRS-1201:CRSD started on node stnsp012.

The location of this alert log on Linux, UNIX, and Windows systems is in the following directory path, where CRS_home is the name of the location of Oracle Clusterware: CRS_home/log/hostname/alerthostname.log.

The following example shows an EVMD alert:

[NORMAL] CLSD-1401: EVMD started on node %s 
[ERROR] CLSD-1402: EVMD aborted on node %s. Error [%s]. Details in %s.

Resource Debugging

You can use crsctl command to enable resource debugging using the following syntax:

crsctl debug log res "ora.node1.vip:1"

This has the effect of setting the environment variable USER_ORA_DEBUG, to 1, before running the start, stop, or check action scripts for the ora.node1.vip resource.

Note:

You must run this crsctl command as the root user.

Checking the Health of the Clusterware

Use the crsctl check command to determine the health of your clusterware as in the following example:

crsctl check crs 

Run the following command to determine the health of individual daemons where daemon is crsd, cssd or evmd:

crsctl check daemon

Note:

You do not have to be the root user to perform health checks.

Clusterware Log Files and the Unified Log Directory Structure

Oracle Database uses a unified log directory structure to consolidate the Oracle Clusterware component log files. This consolidated structure simplifies diagnostic information collection and assists during data retrieval and problem analysis.

Oracle Clusterware retains one current log file and five older log files that are 50 MB in size (300 MB of storage) for the cssd process, and one current log file and 10 older log files that are 10 MB in size (110 MB of storage) for the crsd process. In addition, Oracle Clusterware overwrites the oldest retained log file for any log file group when the current log file gets stored. Alert files are stored in the directory structures shown in Table F-2.

Table F-2 Locations of Oracle Clusterware Component Log Files

Component Log File LocationFoot 1 

Cluster Ready Services Daemon (crsd) Log Files

CRS home/log/hostname/crsd

Oracle Cluster Registry (OCR) records l

For the OCR tools (OCRDUMP, OCRCHECK, OCRCONFIG) record log information in the following location:Foot 2 

CRS_Home/log/hostname/client

The OCR server records log information in the following location:Foot 3 

CRS_home/log/hostname/crsd

Oracle Process Monitor Daemon (OPROCD)

The following path is specific to LinuxFoot 4 :

/etc/oracle/hostname.oprocd.log

Cluster Synchronization Services (CSS)

CRS_home/log/hostname/cssd

Event Manager (EVM) information generated by evmd

CRS_home/log/hostname/evmd

Oracle RAC RACG

The Oracle RAC high availability trace files are located in the following two locations:

CRS_home/log/hostname/racg 

and 

$ORACLE_HOME/log/hostname/racg

Core files are in subdirectories of the log directory. Each RACG executable has a subdirectory assigned exclusively for that executable. The name of the RACG executable subdirectory is the same as the name of the executable.


Footnote 1 The directory structure is the same for Linux, UNIX, and Windows systems.

Footnote 2  To change the amount of logging, edit the path in the CRS_home/srvm/admin/ocrlog.ini file.

Footnote 3 To change the amount of logging, edit the path in the CRS_home/log/hostname/crsd/crsd.ini file.

Footnote 4 This path is dependent upon the installed Linux or UNIX platform.

Troubleshooting the Oracle Cluster Registry

This following topics in this section explain how to troubleshoot the OCR:

Using the OCRDUMP Utility to View Oracle Cluster Registry Content

This section explains how to use the OCRDUMP utility to view OCR content for troubleshooting. The OCRDUMP utility enables you to view the OCR contents by writing OCR content to a file or stdout in a readable format.

You can use a number of options for OCRDUMP. For example, you can limit the output to a key and its descendents. You can also write the contents to an XML file that you can view using a browser. OCRDUMP writes the OCR keys as ASCII strings and values in a datatype format. OCRDUMP retrieves header information based on a best effort basis.

OCRDUMP also creates a log file in CRS_home/log/hostname/client. To change the amount of logging, edit the file CRS_Home/srvm/admin/ocrlog.ini.

To change the logging component, edit the entry containing the comploglvl= entry. For example, to change the logging of the ORCAPI component to 3 and to change the logging of the OCRRAW component to 5, make the following entry in the ocrlog.ini file:

comploglvl="OCRAPI:3;OCRRAW:5" 

Note:

Make sure that you have file creation privileges in the CRS_home directory before using the OCRDUMP utility.

OCRDUMP Utility Syntax and Options

This section describes the OCRDUMP utility command syntax and usage. Run the ocrdump command with the following syntax where filename is the name of a target file to which you want Oracle Database to write the OCR output and where keyname is the name of a key from which you want Oracle Database to write OCR subtree content:

ocrdump [file_name|-stdout] [-backupfile backup_file_name] [-keyname keyname] [-xml] [-noheader] 

Table F-3 describes the OCRDUMP utility options and option descriptions.

Table F-3 OCRDUMP Options and Option Descriptions

Options Description

file_name

The name of a file to which you want OCRDUMP to write output.

By default, output from the OCRDUMP utility is written to the predefined output file named OCRDUMPFILE. The file_name option redirects OCRDUMP output to the file that you specify.

-stdout

Use this option to redirect the OCRDUMP output to the text terminal that initiated the program.

If you do not redirect the output, output from the OCRDUMP utility is written to the predefined output file named OCRDUMPFILE by default.

-keyname

The name of an OCR key whose subtree is to be dumped.

-xml

Writes the output in XML format.

-noheader

Does not print the time at which you ran the command and when the OCR configuration occurred.

-backupfile

Option to identify a backup file.

backup_file_name

The name of the backup file with the content you want to view. You can query the backups using the ocrconfig -showbackup command.


OCRDUMP Utility Examples

The following ocrdump utility examples extract various types of OCR information and write it to various targets:

ocrdump

Writes the OCR content to a file called OCRDUMPFILE in the current directory.

ocrdump MYFILE

Writes the OCR content to a file called MYFILE in the current directory.

ocrdump -stdout -keyname SYSTEM

Writes the OCR content from the subtree of the key SYSTEM to stdout.

ocrdump -stdout -xml 

Writes the OCR content to stdout in XML format.

Sample OCRDUMP Utility Output

The following OCRDUMP examples show the KEYNAME, VALUE TYPE, VALUE, permission set (user, group, world) and access rights for two sample runs of the ocrdump command. The following shows the output for the SYSTEM.language key that has a text value of AMERICAN_AMERICA.WE8ASCII37.

[SYSTEM.language]
ORATEXT : AMERICAN_AMERICA.WE8ASCII37
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
 OTHER_PERMISSION : PROCR_READ, USER_NAME : user, GROUP_NAME : group
}
 

The following shows the output for the SYSTEM.version key that has integer value of 3:

[SYSTEM.version]
UB4 (10) : 3
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
 OTHER_PERMISSION : PROCR_READ, USER_NAME : user, GROUP_NAME : group
} 

Using the OCRCHECK Utility

The OCRCHECK utility displays the version of the OCR's block format, total space available and used space, OCRID, and the OCR locations that you have configured. OCRCHECK performs a block-by-block checksum operation for all of the blocks in all of the OCRs that you have configured. It also returns an individual status for each file as well as a result for the overall OCR integrity check.

The following example shows a sample of the OCRCHECK utility output:

Status of Oracle Cluster Registry is as follows :
        Version                  :          2
        Total space (kbytes)     :     262144
        Used space (kbytes)      :      16256
        Available space (kbytes) :     245888
        ID                       : 1918913332
        Device/File Name         : /dev/raw/raw1
                                   Device/File integrity check succeeded
        Device/File Name         : /dev/raw/raw2
                                   Device/File integrity check succeeded
 
        Cluster registry integrity check succeeded 

OCRCHECK creates a log file in the directory CRS_home/log/hostname/client. To change amount of logging, edit the file CRS_home/srvm/admin/ocrlog.ini.

Oracle Cluster Registry Troubleshooting

Table F-4 describes common OCR problems with corresponding resolution suggestions.

Table F-4 Common OCR Problems and Solutions

Problem Solution

Not currently using OCR mirroring and would like to enable it.

Run the ocrconfig command with the -replace option as described.

An OCR failed and you need to replace it. Error messages in Oracle Enterprise Manager or OCR log file.

Run the ocrconfig command with the -replace option as described.

An OCR has a misconfiguration.

Run the ocrconfig command with the -repair option as described.

You are experiencing a severe performance effect from OCR processing or you want to remove an OCR for other reasons.

Run the ocrconfig command with the -replace option as described .

An OCR has failed and before you can fix it, the node need to be rebooted with only one OCR.

Run the ocrconfig -repair command to remove the bad ocr file. Oracle Clusterware will not start if it cannot find all OCRs defined.


Enabling Additional Tracing for Oracle Clusterware High Availability

Oracle Support may ask you to enable tracing to capture additional information. Because the procedures described in this section may affect performance, only perform these activities with the assistance of Oracle Support. This section includes the following topics:

Generating Additional Trace Information for a Running Resource

To generate additional trace information for a running resource, Oracle recommends that you use CRSCTL commands. For example, issue the following command to turn on debugging for resources:

$ crsctl debug log res "resource_name:level"

For example, to set the value of the USR_ORA_DEBUG initialization parameter to 1 for the VIP resource, issue the following command:

$ crsctl debug log res ora.cwclu011.vip:1

Verifying Event Manager Daemon Communications

The event manager daemons (evmd) running on separate nodes communicate through specific ports. To determine whether the evmd for a node can send and receive messages, perform the test described in this section while running session 1 in the background.On node 1, session 1 enter:

$ evmwatch –A –t "@timestamp @@"

On node 2, session 2 enter:

$ evmpost -u "hello" [-h nodename]

Session 1 should show output similar to the following:

$ 21-Jul-2007 08:04:26 hello

Ensure that each node can both send and receive messages by executing this test in several permutations.