Oracle8i Parallel Server Getting Started
Release 8.1.5 for Windows NT
A68813-01

Library

Product

Contents

Index

PrevNext

C
Troubleshooting

Specific topics covered in this appendix are:

Cluster Configuration Tips

A large fraction of cluster problems that have been reported to Oracle Corporation are due to incorrect cluster configuration, particular of the Cluster Manager (CM) and interconnect components.

The information in this section is based on Oracle Corporation's reference implementation of the cluster Operating System Dependent (OSD) modules. Consequently, some of this information may not be applicable to your particular cluster environment.

This section covers the following configuration and troubleshooting topics:

Cluster Software

Make sure all nodes have the exact same cluster OSD software installed, as well as the same registry configuration. Software can be verified by ensuring nodes have the same time stamps and file sizes.

CM Configuration

This section describes the following:

Installing and De-Installing CM As a Service

CM can be started as a background process for the OSD Startup module or as a service.

To make CM a service, install it with the following command:

C:\> CM.EXE /i:"CmrvrPath_value"

/i must be lowercase.

where CmSrvrPath_value is the CmSrvrPath registry value specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM.

To remove CM as service:

Enter:

CM.EXE /r

/r must be lowercase.

Specifying a Host Name for Cluster Interconnect

Typically, each node in a cluster will have at least two cards, one for the corporate network and one for the cluster interconnect. A computer, however, can only have one host name associated with it. To get around this problem, a host name for the computer can be assigned just for the cluster interconnect.

To specify a host name for the cluster interconnect:

  1. For each node, ping the host name. For example,
  2. C:\> PING OPS1-NT.US.ORACLE.COM
    
    A message similar to the one below appears:
    
    Reply from 144.25.188.247: bytes=32 time<10ms TIL=126
    

    The IP address returned is for the corporate network, not the cluster interconnect.

  3. For each node, determine which ethernet card will be used for the cluster interconnect by entering:
  4. C:\> IPCONFIG /ALL
    

    The output looks similar to the sample shown below:

    Windows NT IP Configuration 
     
                  Host Name . . . . . . . . . : ops1-nt.us.oracle.com 
     
    Ethernet adapter El90x1: 
     
                  Description . . . . . . . . : 3Com 3C90x Ethernet Adapter 
                  IP Address. . . . . . . . . : 144.25.188.247 
     
    Ethernet adapter CpqNF31: 
     
                  Description . . . . . . . . : Compaq NetFlex-3 Driver 
                  IP Address. . . . . . . . . : 144.25.190.247
    

    In this case, the first interface is used for the corporate network, while the second interface is (144.25.190.247) is the one intended for the cluster interconnect.

  5. Specify new host name for each node's interconnect IP address in the HOST file (SYSTEMROOT\SYSTEM32\DRIVERS\ETC\HOSTS). For example:
  6. 144.25.190.247 ops1-ipc 
    144.25.190.248 ops2-ipc 
    144.25.190.249 ops3-ipc 
    144.25.190.250 ops4-ipc
  • For each node, ensure the DefinedNodes registry value is specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM. DefinedNodes specifies the member nodes in the cluster.
  • DefinedNodes: REG_MULTI_SZ: ops1-ipc ops4-ipc ops5-ipc ops2-ipc
    

    Note:

    DefinedNodes must be of value class REG_MULTI_SZ, and each host name entry must be entered on a separate line in the Multi-String Editor dialog box. 


  • For each node, ensure the CmHostName registry value is specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM. CmHostName specifies the node's interconnect host name.
  • CmHostName: REG_SZ: ops1-ipc

    Implementing Proper CM Setup

    In order for CM to start, check the following on each node:

    1. Ensure the CmSrvrPath registry value specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM points to where the CM executable (CM.EXE) is. CmSrvrPath specifies the location of CM.EXE. This registry value only needs to be set if CM is started by the Startup module.
    2. Ensure NM.DLL and NM.EXE are in the same directory as where CM.EXE is.

    3.  

       
       
       
       
       

    Cluster Configuration Verification with CM.LOG

    To verify your cluster configuration:

    1. Start CM.
    2. Check the bottom of CM.LOG file to ensure that each time a node is brought up, CM reconfigures with the correct number of nodes. For example, if two nodes are up, the following should be in the log file:

    3.  

       
       
       
       
       

    Figure C-1 CM.LOG Sample

    Thread(00e9): 08/12/98 19:42:31 cm
    19:42:31 | MESSAGE | 00e9 | LoadDll(): nm.dll loaded ok
    19:42:32 | MESSAGE | 00e9 | LoadDll(): VendorId(Oracle Standalone NM OSD 
    Reference DLL) Version(2.0)
    19:42:32 | MESSAGE | 00e9 | InitNMContext(): Local Node(1)
    19:42:34 | MESSAGE | 00e9 | NMEVENT_SUSPEND [00][00][00][00]
    19:42:35 | MESSAGE | 00e9 | NMEVENT_RECONFIG [00][00][00][03]
    19:42:35 | MESSAGE | 00e9 | CMReconfig(): Reconfig(1) ActiveNodes(2) 
    Master(0) complete!

    The ErrorLog registry value specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM specifies the location of the CM error log file, CM.LOG, if CM is started by the Startup module. If CM is started as a service, the error log is automatically placed in SYSTEMROOT\SYSTEM32\CM.LOG. In this case, it is not necessary to set a value for ErrorLog.

    CM Troubleshooting

    This section describes the following:

    Troubleshooting with NM.LOG

    During normal operation, CM on each node checks in with one another to ensure the health of each member. These check-ins occur at interval of N in milliseconds, as specified by the PollInterval registry value in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM. A node is allowed to miss M check-ins before it is cast out of the cluster, as specified by the MissCount registry value in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM.

    Failed check-ins are recorded to the Node Monitor error log file, NM.LOG. If CM is started by the Startup module, NM.LOG is generated in the current working directory. If CM is started as a service, NM.LOG is generated in SYSTEMROOT\SYSTEM32\CM.LOG.

    These check-in packets are typically UDP packets, and may be lost:

    If one of your database instances is dropping out of the cluster under heavy activity, you may see messages in NM.LOG file in similar to:

    Figure C-2 NM.LOG Sample

    05:01:25 | MESSAGE | PollingThread(): node(1) missed(3) checkin(s) 
    05:01:27 | MESSAGE | PollingThread(): node(1) missed(5) checkin(s) 
    05:01:28 | MESSAGE | PollingThread(): node(1) failure detected

    This occurs if the check-in messages were lost because of the heavy activity. Make sure there is a dedicated interconnect for Oracle Parallel Server that is separate from the rest of the network. Slightly increasing the MissCount registry value may also help.


    Note:

    MissCount * PollInterval should never be greater than 20 seconds. 


    Cluster Tracing

    CM tracing can be helpful to Oracle Worldwide Support in debugging your cluster configuration problems in cases where the database is not starting, a particular node is hanging, or there is a node crash.

    CM tracing is stored in the error log file, CM.LOG.

    To enable detailed CM tracing:

    1. Stop CM:
    2. If CM is started by the Startup module, set the CMSrvrpath registry value in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM with the following:
    3. CmSrvrpath: REG_SZ: c:\orant\osdbin\cm.exe /c /v /d 
      

      where:

      /c 

      command line mode rather than Control Panel service mode 

      /v 

      verbose (more than /d) 

      /d 

      debug information 

      /c, /v and /d must be lowercase.

      If CM is started as a service, you must remove CM as a service, then re-install CM with parameters tracing turned on.

      1. Remove CM as a service. Enter:
      2. CM.EXE /r
        
      3. To re-install CM as a service with tracing set to on. Enter:
      4. C:\> CM.EXE /i:"CmrvrPath_value /v /d"
        

      where CmSrvrPath_value is the CmSrvrPath registry value specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM. /v and /d must be lowercase.

       
       

    CM Secondary Backup

    If you are using the secondary disk backup feature of the CM, try to use a partition on a disk that is not heavily used. The backup disk file is written to by every node member during each check-in. If the backup disk is heavily used, it may cause the CM to miss check-ins and falsely drop node members.


    Note:

    If you are using the secondary disk backup feature, do not lower PollInterval beyond 500 milliseconds because every node writes to the disk backup partition every PollInterval. 


    Oracle Database Configuration Assistant

    This section covers the following topics:

    Database Creation Failures

    If the Oracle Database Configuration Assistant fails during the creation of a database, certain entries may have been installed in the registry. When a database fails:

    1. Delete the DB_NAME sub-key in the Performance and Management (P&M) key in the registry on all nodes prior to running the application again:

    2.  

       
       
       

      HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\PM
      \DB_NAME

      where DB_NAME is OP for the first installation of an Oracle8i Parallel Server database. If you do not delete this key, the Oracle Database Configuration will assume the OP database name is in use and use OA for the database name. This will result in an installation of OA rather than OP.

    3. Delete the key for OracleServiceSID key in the registry on the node from which the Oracle Database Configuration Assistant was run:

    4.  

       
       
       

      HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\SERVICES\OracleServiceSID

    Cluster Software Detection Problems

    When the Oracle Database Configuration Assistant starts, it verifies cluster software is installed and configured. You may see the error message below if CM is installed and:

    To resolve this error message:

    1. Check to see if Cluster Manager was installed and configured properly. See your OSD vendor documentation for further information.
    2. Check if you have administrative privileges on nodes by entering:
    3. NET USE \\host_name\C$

      where host_name is the host name defined in the DefinedNodes registry value for Cluster Manager.

      A successful connection results in "The command completed successfully."

      Oracle Corporation recommends using the same user name and password on each node in a cluster or use a domain user name. If you use a domain user name, log on under a domain with username and password which has administrative privileges on each node.

    Starting Services

    If you are having difficulty starting services or the database, check the CM.LOG file.

    DYNAMIC RESOURCES ALLOCATED or DYNAMIC LOCKS ALLOCATED

    The following messages appear if LM_RESS and LM_LOCKS values are not sufficient, and additional IDLM locks or resources must be allocated dynamically from the SGA:

    If these messages appear often, it may lead to SGA exhaustion. To resolve this, increase LM_RESS and LM_LOCKS parameters appropriately based on your database needs to avoid exhausting the SGA.

    Additional Information:

    See Chapter 15, "Allocating PCM Instance Locks Oracle8i Parallel Server," of the Oracle8 Parallel Server Concepts & Administration guide. 

    Understanding the Trace Files

    This section discusses the following trace file subjects:

    Background Thread Trace Files

    Oracle Parallel Server background threads use trace files to record occurrences and exceptions of database operations, as well as errors. These detailed trace logs are helpful to Oracle support to debug problems in your cluster configuration. Background thread trace files are created regardless of whether the BACKGROUND_DUMP_DEST parameter is set in the INIT_COM.ORA initialization parameter file. If BACKGROUND_DUMP_DEST is set, the trace files are stored in the directory specified. If the parameter is not set, the trace files are stored in the ORACLE_BASE\ADMIN\PARALLEL_SERVER\DB_NAME\SID\BDUMP directory.

    Oracle8 database creates a different trace file for each background thread. The name of the trace file contains the name of the background thread, followed by the extension .TRC, such as:

    Oracle Parallel Server trace information is reported in the following trace files:
    Trace File Description

    SIDBSP0.TRC 

    Trace file for BSP process. This trace files shows errors associated with BSP. 

    SIDLCKN.TRC 

    Trace file for the LCKn process. This trace file shows lock request for other background processes. 

    SIDLMDN.TRC 

    Trace file for the LMDn process. This trace file shows lock requests. 

    SIDLMON.TRC 

    Trace file for the LMON process. This trace file show status of cluster, including the "Reconfiguration complete" message. 

    SIDP00N.TRC 

    Trace file for the parallel query slaves. 

    User Thread Trace Files

    Trace files are also created for user threads if the USER_DUMP_DEST parameter is set in the initialization parameter file. The trace files for the user threads have the form ORAXXXXX.TRC, where XXXXX is a 5-digit number indicating the Windows NT thread ID.

    Alert File

    The alert file, SIDALRT.LOG, contains important information about error messages and exceptions that occur during database operations. Each instance has one alert file; information is appended to the file each time you start the instance. All threads can write to the alert file.

    SIDALRT.LOG is found in the directory specified by the BACKGROUND_DUMP_DEST parameter in the INIT_COM.ORA initialization parameter file. If the BACKGROUND_DUMP_DEST parameter is not set, the SIDALRT.LOG file is generated in ORACLE_BASE\ADMIN\PARALLEL_SERVER\DB_NAME\SID\BDUMP.

    Error Call Trace Stack

    Oracle Worldwide Support may ask you to create an error call trace stack for a particular trace file. An error call trace stack provides program trace of specific background or user threads in the database.

    To create an error call trace:

    1. Obtain the Oracle process ID for the background processes:
    2. C:\> SVRMGRL
      SVRMGR> CONNECT INTERNAL/PASSWORD
      SELECT PID "Oracle Process Id", 
             NAME 
          FROM V$PROCESS, V$BGPROCESS 
          WHERE V$PROCESS.ADDR = V$BGPROCESS.PADDR;

      Output displayed looks like this:

      Oracle Pro NAME 
      ---------- ----- 
               2 PMON 
               3 LMON 
               4 LMD0 
               5 DBW0 
               6 LGWR 
               7 CKPT 
               8 SMON 
               9 RECO 
              10 SNP0 
              11 SNP1 
              13 LCK0 
      
    3. Dump the trace stack to the trace file. For example, to dump out the trace stack of LMON, enter:
      1. Set the Oracle process ID to LMON, which is 3 in this example:
      2. SVRMGR> ORADEBUG SETORAPID 3
      3. Dump the error stack to SIDLMON.TRC:
      4. SVRMGR> ORADEBUG DUMP ERRORSTACK 3

    Using PhysicalDrive for Raw Partitions

    When creating symbolic links for the logical partitions with SETLINKS utility, do not use prefix \\.\PhysicalDrive. If you use \\.\PhysicalDrive as a symbolic link, you may corrupt your database files. Use the symbolic links provided in the ORALINKx.TBL file(s), as described in Chapter 2, "Setting Up Raw Partitions".

    SHUTDOWN ABORT

    SHUTDOWN ABORT is not recommended. Oracle Corporation recommends shutting down the OracleServiceSID service so that resources, such as memory usage or files, will be cleaned up by the Windows NT operating system correctly.

    To shut down OracleServiceSID:

    Contacting Oracle Worldwide Customer Support

    If after reading this appendix, you still cannot resolve your problems, call Oracle Worldwide Customer Support to report the error. Please have the following information at hand:

    Severe Errors

    If an ORA-600 error occurred, it will be printed to SIDALRT.LOG file. If an ORA-600 error or any other severe errors appear in the SIDALRT.LOG file, then provide all files in ORACLE_BASE\ADMIN\PARALLEL_SERVER\DB_NAME\SID\BDUMP.


    Prev Next
    Oracle
    Copyright © 1999 Oracle Corporation.
    All Rights Reserved.
    Library Product Contents Index