Oracle Parallel Server Getting Started
Release 8.0.6 for Windows NT

A69942-01

Library

Product

Contents

Index

Prev Next

B
Troubleshooting

Specific topics covered in this appendix are:

Cluster Configuration Tips

A large fraction of cluster problems that have been reported to Oracle Corporation are due to incorrect cluster configuration, particular of the Cluster Manager (CM) and interconnect components.

The information in this section is based on Oracle Corporation's reference implementation of the cluster Operating System Dependent (OSD) modules. Consequently, some of this information may not be applicable to your particular cluster environment.


Additional Information:

Consult with your hardware vendor for more details about installing and configuring your particular cluster configuration 



Note:

The registry instructions in this section assume REGEDT32, not REGEDIT. 


Cluster Software

Make sure all nodes have the exact same cluster OSD software installed, as well as the same registry configuration. Software can be verified by ensuring nodes have the same time stamps and file sizes.

CM Configuration

Typically, each node in a cluster will have at least two cards, one for the corporate network and one for the cluster interconnect. A computer, however, can only have one host name associated with it. To get around this problem, a host name for the computer can be assigned just for the cluster interconnect.

To specify a host name for the cluster interconnect:
  1. For each node, ping the host name. For example,

    C:\> PING OPS1-NT.US.ORACLE.COM
    
    A message similar to the one below appears:
    Reply from 144.25.188.247: bytes=32 time<10ms TIL=126
    
    

    The IP address returned is for the corporate network, not the cluster interconnect.

  2. For each node, determine which ethernet card will be used for the cluster interconnect by entering:

    C:\> IPCONFIG /ALL
    
    

    The output looks similar to the sample shown below:

    Windows NT IP Configuration 
     
                  Host Name . . . . . . . . . : ops1-nt.us.oracle.com 
     
    Ethernet adapter El90x1: 
     
                  Description . . . . . . . . : 3Com 3C90x Ethernet Adapter 
                  IP Address. . . . . . . . . : 144.25.188.247 
     
    Ethernet adapter CpqNF31: 
     
                  Description . . . . . . . . : Compaq NetFlex-3 Driver 
                  IP Address. . . . . . . . . : 144.25.190.247
    
    

    In this case, the first interface is used for the corporate network, while the second interface is (144.25.190.247) is the one intended for the cluster interconnect.

  3. Specify an new host names for each node's interconnect IP address in the HOST file (SYSTEMROOT\SYSTEM32\DRIVERS\ETC\HOSTS). For example:

    144.25.190.247 ops1-ipc 
    144.25.190.248 ops2-ipc 
    144.25.190.249 ops3-ipc 
    144.25.190.250 ops4-ipc 
    

  • For each node, ensure the DefinedNodes value is specified in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. DefinedNodes specifies the member nodes in the cluster.

    DefinedNodes: REG_MULTI_SZ: ops1-ipc ops4-ipc ops5-ipc ops2-ipc
    


    Note:

    DefinedNodes must be of value class REG_MULTI_SZ, and each host name entry must be entered on a separate line in the Multi-String Editor dialog box.  


  • For each node, ensure the CmHostName value is specified in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. CmHostName specifies the node's interconnect host name.

    CmHostName: REG_SZ: ops1-ipc 
    

    Cluster Configuration Verification

    To verify your cluster configuration:
    1. Start PGMS each node:

      • From the MS-DOS command line, enter:

        C:\> NET START ORACLEPGMSSERVICESID
        
        
      • From the Control Panel's Services window, select OraclePGMSService, and click Start.

    2. Check the bottom of PGMS.LOG file stored in SYSTEMROOT\SYSTEM32\PGMS.LOG to ensure that each time a node is brought up, PGMS reconfigures with the correct number of nodes. For example, if two nodes are up, the following should be in the log file:

      15:06:46 | MESSAGE | 006f | HandleReconfig(): Reconfig OK - nodes(2) 
      rcfgGen(5) master(0) 
      
       
      
    3. If you are unable to bring up PGMS, check your cluster configuration to make sure that it is correct.

    CM Troubleshooting

    During normal operation, CM on each node checks in with one another to ensure the health of each member. These check-ins occur at interval of N in milliseconds, as specified by the PollInterval registry value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. A node is allowed to miss M check-ins before it is cast out of the cluster, as specified by the MissCount value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM.

    Failed check-ins are recorded to the CM error log file (CM.LOG). These check-in packets are typically UDP packets, and may be lost:

    If one of your database instances is dropping out of the cluster under heavy activity, you may see messages in CM.LOG file similar to:

    05:01:25 | MESSAGE | PollingThread(): node(1) missed(3) checkin(s) 
    05:01:27 | MESSAGE | PollingThread(): node(1) missed(5) checkin(s) 
    05:01:28 | MESSAGE | PollingThread(): node(1) failure detected 
    
    

    This occurs if the check-in messages were lost because of the heavy activity. Make sure there is a dedicated interconnect for Oracle Parallel Server that is separate from the rest of the network. Slightly increasing the MissCount value may also help.


    Note::

    MissCount * PollInterval should never be greater than 20 seconds. 


    CM Secondary Backup

    If you are using the secondary disk backup feature of the CM, try to use a partition on a disk that is not heavily used. The backup disk file is written to by every node member during each check-in. If the backup disk is heavily used, it may cause the CM to miss check-ins and falsely drop node members.


    Note:

    If you are using the secondary disk backup feature, do not lower PollInterval beyond 500 milliseconds because every node writes to the disk backup partition every PollInterval. 


    CM Error Log File Specification

    The CM error log file (CM.LOG) is specified by the ErrorLog value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM:

    ErrorLog: REG_SZ: c:\orant\rdbms80\trace\cm.log
    
    
    
    

    Oracle Corporation recommends specifying an error log location of ORANT\RDBMS80\TRACE\CM.LOG.

    Performance and Manager Configuration Tips

    You must configure the Performance and Management (PM) module so that PGMS can determine the cluster configuration. Each Oracle Parallel Server database corresponds to a PGMS group or domain. For example, the INITSID.ORA and INIT_COM.ORA files could have the following parameters defined:

    INITOPS1.ORA:

    instance_number=1 
     
    

    INITOPS2.ORA:

    instance_number=2 
    
    

    INITOPS3.ORA:

    instance_number=4 
    
    

    INITOPS4.ORA:

    instance_number=4 
     
    

    INIT_COM.ORA:

    db_name=ops 
    
    

    The HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>PM key would then contain:


    where:



    Note:

    Each row entry must be entered on a separate line in the Multi-String Editor dialog box. Instance numbers must be sequential, such as 0, 1, 2. Do not skip instance numbers, such as 0, 1, 3. Also, the key name (OPS) must match the value of DB_NAME in INIT_COM.ORA 


    ORA-29702

    If the instance numbers in the PM key do not match those specified in the INITSID.ORA file, you will receive the following error in ORACLE_HOME\RDBMS80\TRACE\SIDLMON.TRC upon instance startup:

    ORA-29702: error occurred in Group Membership Service operation 
    

    Starting Services

    If you are having difficulty starting services or the database, check the PGMS.LOG file stored in SYSTEMROOT\SYSTEM32\PGMS.LOG.

    If you used the CRTSRV script in "Step 4: Create Services", OraclePGMSService automatically starts up and shuts down when the OracleServiceSID service is started.

    If you did not use the CRTSRV script, you can still have OraclePGMSService start up automatically with a OracleServiceSID service by entering the following at the command for each node:

    C:\> OPSREG80 ADD SID 
    
    

    You can also discontinue the OraclePGMSService service automatic start up with OracleServiceSID service with the following at the command line for each node:

    C:\> OPSREG80 DEL SID 
    

    DYNAMIC RESOURCES ALLOCATED or DYNAMIC LOCKS ALLOCATED

    The following messages appear if LM_RESS and LM_LOCKS values are not sufficient, and additional IDLM locks or resources must be allocated dynamically from the SGA:

    If these messages appear often, it may lead to SGA exhaustion. To resolve this, increase LM_RESS and LM_LOCKS parameters appropriately based on your database needs to avoid exhausting the SGA.


    Additional Information:

    See Chapter 15, "Allocating PCM Instance Locks Oracle Parallel Server," of the Oracle8 Parallel Server Concepts and Administration guide. 


    Understanding the Trace Files

    This section discusses the following trace file subjects:

    Background Thread Trace Files

    Oracle Parallel Server background threads use trace files to record occurrences and exceptions of database operations, as well as errors. These detailed trace logs are helpful to Oracle support to debug problems in your cluster configuration. Background thread trace files are created regardless of whether the BACKGROUND_DUMP_DEST parameter is set in the INIT_COM.ORA initialization parameter file. If BACKGROUND_DUMP_DEST is set, the trace files are stored in the directory specified. If the parameter is not set, the trace files are stored in the ORACLE_HOME\RDBMS80\TRACE directory.

    Oracle8 database creates a different trace file for each background thread. The name of the trace file contains the name of the background thread, followed by the extension .TRC, such as:

    Oracle Parallel Server trace information is reported in the following trace files:

    Trace File  Description 

    SIDLCKn.TRC  

    Trace file for the LCKn process. This trace file shows lock request for other background processes. 

    SIDLMDn.TRC  

    Trace file for the LMDn process. This trace file shows lock requests. 

    SIDLMON.TRC 

    Trace file for the LMON process. This trace file show status of cluster, including the "Reconfiguration complete" message. 

    SIDP00n.TRC 

    Trace file for the parallel query slaves. 

    User Thread Trace Files

    Trace files are also created for user threads if the USER_DUMP_DEST parameter is set in the initialization parameter file. The trace files for the user threads have the form ORAXXXXX.TRC, where XXXXX is a 5-digit number indicating the Windows NT thread ID.

    Alert File

    The alert file, SIDALRT.LOG, contains important information about error messages and exceptions that occur during database operations. Each instance has one alert file; information is appended to the file each time you start the instance. All threads can write to the alert file.

    SIDALRT.LOG is found in the directory specified by the BACKGROUND_DUMP_DEST parameter in the INIT_COM.ORA initialization parameter file. If the BACKGROUND_DUMP_DEST parameter is not set, the SIDALRT.LOG file is generated in ORACLE_HOME\RDBMS80\TRACE.

    Error Call Trace Stack

    Oracle Worldwide Support may ask you to create an error call trace stack for a particular trace file. An error call trace stack provides program trace of specific background or user threads in the database.

    To create an error call trace:
    1. Obtain the Oracle process ID for the background processes:

      C:\> SVRMGR30
      SVRMGR30> CONNECT INTERNAL/PASSWORD
      SELECT PID "Oracle Process Id", 
             NAME 
          FROM V$PROCESS, V$BGPROCESS 
          WHERE V$PROCESS.ADDR = V$BGPROCESS.PADDR; 
      

        Output displayed looks like this:

        Oracle Pro NAME 
        ---------- ----- 
                 2 PMON 
                 3 LMON 
                 4 LMD0 
                 5 DBW0 
                 6 LGWR 
                 7 CKPT 
                 8 SMON 
                 9 RECO 
                10 SNP0 
                11 SNP1 
                13 LCK0 
        
        
    2. Dump the trace stack to the trace file. For example, to dump out the trace stack of LMON, enter:

      1. Set the Oracle process ID to LMON, which is 3 in this example:

        SVRMGR30> ORADEBUG SETORAPID 3 
        
        
        
      2. Dump the error stack to SIDLMON.TRC:

        SVRMGR30> ORADEBUG DUMP ERRORSTACK 3 
        

      Cluster Tracing

      CM and PGMS tracing can be helpful to Oracle Worldwide Support in debugging your cluster configuration problems in cases where the database is not starting, a particular node is hanging, or there is a node crash.

      PGMS Tracing

      PGMS tracing is stored in the PGMS log file, SYSTEMROOT\SYSTEM32\PGMS.LOG.


      Note:

      Do not enable detailed tracing during normal database operation. 


      To enable detailed PGMS tracing:
      1. De-install the OraclePGMSService:

        PGMS /R
        
        
      2. Re-install OraclePGMSService with debug flags turned on:

        PGMS /I:"C:ORANT\BIN\PGMS.EXE /D /V /S"
        
        

        where:

        /D 

        debug tracing 

        /V 

        verbose tracing 

        /S 

        spy on PGMS network packets 

      To disable tracing:

      1. De-install the OraclePGMSService:

        PGMS /R
        
        
      2. Re-install OraclePGMSService with debug flags turned off:

        PGMS /I:C:"ORANT\BIN\PGMS.EXE"
        

      CM Tracing

      CM tracing is stored in the error log file, CM.LOG. The location of CM.LOG is defined by the ErrorLog value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM.

      To enable detailed CM tracing:
      1. Stop the CMSRVR.EXE by rebooting the node.

      2. Specify the CMSrvrpath value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. ErrorLog specifies the CM log file.

        CMSrvrpath: REG_SZ: c:\orant\osdbin\cmsrvr.exe /v /c /s 
        
        

        where:

        /v 

        verbose 

        /c 

        trace client request 

        /s 

        spy on CM network traffic 

      Using PhysicalDrive for Raw Partitions

      When creating symbolic links for the logical partitions with SETLINKS utility, do not use prefix \\.\PhysicalDrive. If you use \\.\PhysicalDrive as a symbolic link, you may corrupt the database files. Use the symbolic links provided in the ORALINKx.TBL files, described in Chapter 5, "Configuring Oracle Parallel Server".

      SHUTDOWN ABORT

      SHUTDOWN ABORT is not recommended. Oracle Corporation recommends shutting down the OracleServiceSID service so that resources, such as memory usage or files, will be cleaned up by the Windows NT operating system correctly.

      To shut down OracleServiceSID:
      • From the MS-DOS command line, enter:

        C:\> NET STOP OracleServiceSID
        
        
      • From the Control Panel's Services window, select the OracleServiceSID service, then choose Stop.

      Contacting Oracle Worldwide Customer Support

      If after reading this appendix, you still cannot resolve your problems, call Oracle Worldwide Customer Support to report the error. Please have the following information at hand:

      • cluster hardware, for example, a two-node cluster of Dell PowerEdge 6100 servers

      • Windows NT version (for example, Windows NT (Workstation, Server, Enterprise) 4.0 with Service pack 3)

      • all five digits in release number of Oracle RDBMS (for example, 8.0.5.1.0)

      • all five digits in release number of Oracle Parallel Server Option

      • version number of PGMS, which can be obtained from SYSTEMROOT\SYSTEM32\PGMS.LOG.

      • contents of HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD key

      • cluster OSD upgrades from vendor

      • particular operation that failed, for example, database startup or query

      • steps to reproduce the problem.

      Severe Errors

      If an ORA-600 error occurred, it will be printed to SIDALRT.LOG file. If an ORA-600 error or any other severe errors appear in the SIDALRT.LOG file, then provide all files in ORACLE_HOME\RDBMS80\TRACE and PGMS.LOG located in SYSTEMROOT\SYSTEM32.


  • Prev Next
    Oracle
    Copyright © 1999 Oracle Corporation.

    All Rights Reserved.

    Library

    Product

    Contents

    Index