Oracle8i
Parallel Server Getting Started
Release 8.1.5 for Windows NT A68813-01 |
|
Specific topics covered in this appendix are:
A large fraction of cluster problems that have been reported
to Oracle Corporation are due to incorrect cluster configuration, particular
of the Cluster Manager (CM) and interconnect components.
The information in this section is based on Oracle Corporation's
reference implementation of the cluster Operating System Dependent (OSD)
modules. Consequently, some of this information may not be applicable to
your particular cluster environment.
This section covers the following configuration and troubleshooting topics:
Additional
Information:
Consult with your hardware vendor for more details about installing and configuring your particular cluster configuration |
Note: The registry instructions in this section assume REGEDT32, not REGEDIT. |
Make sure all nodes have the exact same cluster OSD software
installed, as well as the same registry configuration. Software can be
verified by ensuring nodes have the same time stamps and file sizes.
This section describes the following:
CM can be started as a background process for the OSD Startup
module or as a service.
To make CM a service, install it with the following command:
C:\> CM.EXE /i:"CmrvrPath_value"
where CmSrvrPath_value is the CmSrvrPath registry
value specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM.
Typically, each node in a cluster will have at least two
cards, one for the corporate network and one for the cluster interconnect.
A computer, however, can only have one host name associated with it. To
get around this problem, a host name for the computer can be assigned just
for the cluster interconnect.
To specify a host name for the cluster interconnect:
C:\> PING OPS1-NT.US.ORACLE.COM A message similar to the one below appears: Reply from 144.25.188.247: bytes=32 time<10ms TIL=126
The IP address returned is for the corporate network, not the cluster interconnect.
C:\> IPCONFIG /ALL
The output looks similar to the sample shown below:
Windows NT IP Configuration Host Name . . . . . . . . . : ops1-nt.us.oracle.com Ethernet adapter El90x1: Description . . . . . . . . : 3Com 3C90x Ethernet Adapter IP Address. . . . . . . . . : 144.25.188.247 Ethernet adapter CpqNF31: Description . . . . . . . . : Compaq NetFlex-3 Driver IP Address. . . . . . . . . : 144.25.190.247
In this case, the first interface is used for the corporate network, while the second interface is (144.25.190.247) is the one intended for the cluster interconnect.
144.25.190.247 ops1-ipc 144.25.190.248 ops2-ipc 144.25.190.249 ops3-ipc 144.25.190.250 ops4-ipc
The HOSTS file should have one entry for each node's interconnect, and should be copied to all nodes of the cluster so that they can see each other. To verify that they can see each other, try pinging each host from each node. For example
C:\> PING OPS3-IPC
DefinedNodes: REG_MULTI_SZ: ops1-ipc ops4-ipc ops5-ipc ops2-ipc
Note: DefinedNodes must be of value class REG_MULTI_SZ, and each host name entry must be entered on a separate line in the Multi-String Editor dialog box. |
CmHostName: REG_SZ: ops1-ipc
In order for CM to start, check the following on each node:
To verify your cluster configuration:
Thread(00e9): 08/12/98 19:42:31 cm 19:42:31 | MESSAGE | 00e9 | LoadDll(): nm.dll loaded ok 19:42:32 | MESSAGE | 00e9 | LoadDll(): VendorId(Oracle Standalone NM OSD Reference DLL) Version(2.0) 19:42:32 | MESSAGE | 00e9 | InitNMContext(): Local Node(1) 19:42:34 | MESSAGE | 00e9 | NMEVENT_SUSPEND [00][00][00][00] 19:42:35 | MESSAGE | 00e9 | NMEVENT_RECONFIG [00][00][00][03] 19:42:35 | MESSAGE | 00e9 | CMReconfig(): Reconfig(1) ActiveNodes(2) Master(0) complete!
The ErrorLog registry value specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM
specifies the location of the CM error log file, CM.LOG, if CM is started
by the Startup module. If CM is started as a service, the error log is
automatically placed in SYSTEMROOT\SYSTEM32\CM.LOG. In this case,
it is not necessary to set a value for ErrorLog.
This section describes the following:
During normal operation, CM on each node checks in with one
another to ensure the health of each member. These check-ins occur at interval
of N in milliseconds, as specified by the PollInterval registry
value in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM. A node is allowed to
miss M check-ins before it is cast out of the cluster, as specified
by the MissCount registry value in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM.
Failed check-ins are recorded to the Node Monitor error log
file, NM.LOG. If CM is started by the Startup module, NM.LOG is generated
in the current working directory. If CM is started as a service, NM.LOG
is generated in SYSTEMROOT\SYSTEM32\CM.LOG.
These check-in packets are typically UDP packets, and may be lost:
If one of your database instances is dropping out of the
cluster under heavy activity, you may see messages in NM.LOG file in similar
to:
05:01:25 | MESSAGE | PollingThread(): node(1) missed(3) checkin(s) 05:01:27 | MESSAGE | PollingThread(): node(1) missed(5) checkin(s) 05:01:28 | MESSAGE | PollingThread(): node(1) failure detected
This occurs if the check-in messages were lost because of
the heavy activity. Make sure there is a dedicated interconnect for Oracle
Parallel Server that is separate from the rest of the network. Slightly
increasing the MissCount registry value may also help.
Note: MissCount * PollInterval should never be greater than 20 seconds. |
CM tracing can be helpful to Oracle Worldwide Support in
debugging your cluster configuration problems in cases where the database
is not starting, a particular node is hanging, or there is a node crash.
CM tracing is stored in the error log file, CM.LOG.
To enable detailed CM tracing:
CmSrvrpath: REG_SZ: c:\orant\osdbin\cm.exe /c /v /d
where:
/c |
command line mode rather than Control Panel service mode |
/v |
verbose (more than /d) |
/d |
debug information |
/c, /v and /d must be lowercase.
If CM is started as a service, you must remove CM as a service, then re-install CM with parameters tracing turned on.
CM.EXE /r
C:\> CM.EXE /i:"CmrvrPath_value /v /d"
where CmSrvrPath_value is the CmSrvrPath registry
value specified in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\CM. /v and /d
must be lowercase.
If you are using the secondary disk backup feature of the
CM, try to use a partition on a disk that is not heavily used. The backup
disk file is written to by every node member during each check-in. If the
backup disk is heavily used, it may cause the CM to miss check-ins and
falsely drop node members.
Note: If you are using the secondary disk backup feature, do not lower PollInterval beyond 500 milliseconds because every node writes to the disk backup partition every PollInterval. |
This section covers the following topics:
If the Oracle Database Configuration Assistant fails during the creation of a database, certain entries may have been installed in the registry. When a database fails:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\OSD\PM
\DB_NAME
where DB_NAME is OP for the first installation of an Oracle8i Parallel Server database. If you do not delete this key, the Oracle Database Configuration will assume the OP database name is in use and use OA for the database name. This will result in an installation of OA rather than OP.
HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\SERVICES\OracleServiceSID
When the Oracle Database Configuration Assistant starts, it verifies cluster software is installed and configured. You may see the error message below if CM is installed and:
To resolve this error message:
NET USE \\host_name\C$
where host_name is the host name defined in the DefinedNodes
registry value for Cluster Manager.
A successful connection results in "The command completed
successfully."
Oracle Corporation recommends using the same user name and password on each node in a cluster or use a domain user name. If you use a domain user name, log on under a domain with username and password which has administrative privileges on each node.
If you are having difficulty starting services or the database,
check the CM.LOG file.
The following messages appear if LM_RESS and LM_LOCKS values are not sufficient, and additional IDLM locks or resources must be allocated dynamically from the SGA:
If these messages appear often, it may lead to SGA exhaustion.
To resolve this, increase LM_RESS and LM_LOCKS parameters appropriately
based on your database needs to avoid exhausting the SGA.
Additional
Information:
See Chapter 15, "Allocating PCM Instance Locks Oracle8i Parallel Server," of the Oracle8 Parallel Server Concepts & Administration guide. |
This section discusses the following trace file subjects:
Oracle Parallel Server background threads use trace files
to record occurrences and exceptions of database operations, as well as
errors. These detailed trace logs are helpful to Oracle support to debug
problems in your cluster configuration. Background thread trace files are
created regardless of whether the BACKGROUND_DUMP_DEST parameter is set
in the INIT_COM.ORA initialization parameter file. If BACKGROUND_DUMP_DEST
is set, the trace files are stored in the directory specified. If the parameter
is not set, the trace files are stored in the ORACLE_BASE\ADMIN\PARALLEL_SERVER\DB_NAME\SID\BDUMP
directory.
Oracle8 database creates a different trace file for each background thread. The name of the trace file contains the name of the background thread, followed by the extension .TRC, such as:
Oracle Parallel Server trace information is reported in the following trace files:
Trace files are also created for user threads if the USER_DUMP_DEST
parameter is set in the initialization parameter file. The trace files
for the user threads have the form ORAXXXXX.TRC, where XXXXX
is a 5-digit number indicating the Windows NT thread ID.
The alert file, SIDALRT.LOG, contains important information
about error messages and exceptions that occur during database operations.
Each instance has one alert file; information is appended to the file each
time you start the instance. All threads can write to the alert file.
SIDALRT.LOG is found in the directory specified by
the BACKGROUND_DUMP_DEST parameter in the INIT_COM.ORA initialization parameter
file. If the BACKGROUND_DUMP_DEST parameter is not set, the SIDALRT.LOG
file is generated in ORACLE_BASE\ADMIN\PARALLEL_SERVER\DB_NAME\SID\BDUMP.
Oracle Worldwide Support may ask you to create an error call
trace stack for a particular trace file. An error call trace stack provides
program trace of specific background or user threads in the database.
To create an error call trace:
C:\> SVRMGRL SVRMGR> CONNECT INTERNAL/PASSWORD SELECT PID "Oracle Process Id", NAME FROM V$PROCESS, V$BGPROCESS WHERE V$PROCESS.ADDR = V$BGPROCESS.PADDR;
Output displayed looks like this:
Oracle Pro NAME ---------- ----- 2 PMON 3 LMON 4 LMD0 5 DBW0 6 LGWR 7 CKPT 8 SMON 9 RECO 10 SNP0 11 SNP1 13 LCK0
When creating symbolic links for the logical partitions with
SETLINKS utility, do not use prefix \\.\PhysicalDrive. If you use \\.\PhysicalDrive
as a symbolic link, you may corrupt your database files. Use the symbolic
links provided in the ORALINKx.TBL file(s), as described in Chapter
2, "Setting Up Raw Partitions".
SHUTDOWN ABORT is not recommended. Oracle Corporation recommends
shutting down the OracleServiceSID service so that resources, such
as memory usage or files, will be cleaned up by the Windows NT operating
system correctly.
To shut down OracleServiceSID:
C:\> NET STOP OracleServiceSID
If after reading this appendix, you still cannot resolve your problems, call Oracle Worldwide Customer Support to report the error. Please have the following information at hand:
If an ORA-600 error occurred, it will be printed to SIDALRT.LOG
file. If an ORA-600 error or any other severe errors appear in the SIDALRT.LOG
file, then provide all files in ORACLE_BASE\ADMIN\PARALLEL_SERVER\DB_NAME\SID\BDUMP.
|
![]() Copyright © 1999 Oracle Corporation. All Rights Reserved. |
|