|Oracle® Grid Infrastructure Installation Guide
11g Release 2 (11.2) for Linux
Part Number E10812-02
This appendix provides troubleshooting information for installing Oracle grid infrastructure.
See Also:The Oracle Database 11g Oracle RAC documentation set in the Documentation directory:
This appendix contains the following topics:
The following is a list of examples of types of errors that can occur during installation. It contains the following issues:
/etc/oratabpointing to a non-existent Oracle home. The OUI log file should show the following error: "java.io.IOException: /home/oracle/OraHome/bin/kfod: not found"
/etc/oratabpointing to a non-existing Oracle home.
sucommand to change from a user that is authorized to open an X window to a user account that is not authorized to open an X window on the display, such as a lower-privileged user opening windows on the
rootuser's console display.
echo $DISPLAYto ensure that the variable is set to the correct visual or to the correct host. If the display variable is set correctly then either ensure that you are logged in as the user authorized to open an X window, or run the command
xhost +to allow any user to open an X window.
If you are logged in locally on the server console as
root, and used the su - command to change to the grid infrastructure installation owner, then log out of the server, and log back in as the grid installation owner.
xhostis not properly configured, or where you are running as a user account that is different from the account you used with the
startxcommand to start the X server.
$ xhost fullyqualifiedRemoteHostname
$ xhost somehost.example.com
Then, enter the following commands, where
workstationname is the host name or IP address of your workstation.
Bourne, Bash, or Korn shell:
$ DISPLAY=workstationname:0.0 $ export DISPLAY
To determine whether X Window applications display correctly on the local system, enter the following command:
The X clock should appear on your monitor. If this fails to work, then use of the
xhost command may be restricted.
If you are using a VNC client to access the server, then ensure that you are accessing the visual that is assigned to the user that you are trying to use for the installation. For example, if you used the
su command to become the installation owner on another user visual, and the
xhost command use is restricted, then you cannot use the
xhost command to change the display. If you use the visual assigned to the installation owner, then the correct display will be available, and entering the
xclock command will display the X clock.
When the X clock appears, then close the X clock and start the installer again.
You can confirm this by checking
ocrconfig.log files located in the path
/client and finding the following:
/u02/app/crs/clusterregistry, ret -1, errno 75, os err string Value too large for defined data type 2007-10-30 11:23:52.101: [ OCROSD]utopen:6'': OCR location
Note:You should not have
netdevin the mount instructions, or
netdevoption is only required for OCFS file systems, and
vers=2forces the kernel to mount NFS using the older version 2 protocol.
After correcting the NFS mount information, remount the NFS mount point, and run the
root.sh script again. For example, with the mountpoint
#umount /u02 #mount -a -t nfs #cd $ORA_CRS_HOME #sh root.sh
root. This change causes permission errors for other installations. In addition, the Oracle Clusterware software stack may not come up under an Oracle base path.
/dev/shmsize for PGA and SGA.
If you are installing on a Linux system, note that Memory Size (SGA and PGA), which sets the initialization parameter
MEMORY_MAX_TARGET, cannot be greater than the shared memory file system (
/dev/shm) on your operating system.
/dev/shmmountpoint size. For example:
# mount -t tmpfs shmfs -o size=4g /dev/shm
Also, to make this change persistent across system restarts, add an entry in
/etc/fstab similar to the following:
shmfs /dev/shm tmpfs size=4g 0
rootupgrade.sh. To confirm, look for the error "utopen:12:Not enough space in the backing store" in the log file
pidstands for the process id.
listener.ora, Oracle log files, or any action scripts are located on an NAS device or NFS mount, and the name service cache daemon
nscdhas not been activated.
/sbin/service nscd start
If you run Cluster Verification Utility using the
-verbose argument, and a Cluster Verification Utility command responds with
UNKNOWN for a particular node, then this is because Cluster Verification Utility cannot determine if a check passed or failed. The following is a list of possible causes for an "Unknown" response:
The node is down
Common operating system command binaries required by Cluster Verification Utility are missing in the
/bin directory in the Oracle grid infrastructure home or Oracle home directory
The user account starting Cluster Verification Utility does not have privileges to run common operating system commands on the node
The node is missing an operating system patch, or a required package
The node has exceeded the maximum number of processes or maximum number of open files, or there is a problem with IPC segments, such as shared memory or semaphores
If the Cluster Verification Utility report indicates that your system fails to meet the requirements for Oracle grid infrastructure installation, then use the topics in this section to correct the problem or problems indicated in the report, and run Cluster Verification Utility again.
For each node listed as a failure node, review the installation owner user configuration to ensure that the user configuration is properly completed, and that SSH configuration is properly completed. The user that runs the Oracle Clusterware installation must have permissions to create SSH connections.
Oracle recommends that you use the SSH configuration option in OUI to configure SSH. You can use Cluster Verification Utility before installation if you configure SSH manually, or after installation, when SSH has been configured for installation.
For example, to check user equivalency for the user account
oracle, use the command
su - oracle and check user equivalence manually by running the
ssh command on the local node with the
date command argument using the following syntax:
$ ssh nodename date
The output from this command should be the timestamp of the remote node identified by the value that you use for
nodename. If you are prompted for a password, then you need to configure SSH. If
ssh is in the default location, the
/usr/bin directory, then use
ssh to configure user equivalence. You can also use
rsh to confirm user equivalence.
If you see a message similar to the following when entering the date command with SSH, then this is the probable cause of the user equivalence error:
The authenticity of host 'node1 (126.96.36.199)' can't be established. RSA key fingerprint is 7z:ez:e7:f6:f4:f2:4f:8f:9z:79:85:62:20:90:92:z9. Are you sure you want to continue connecting (yes/no)?
Enter yes, and then run Cluster Verification Utility to determine if the user equivalency error is resolved.
ssh is in a location other than the default,
/usr/bin, then Cluster Verification Utility reports a user equivalence check failure. To avoid this error, navigate to the directory
/cv/admin, open the file
cvu_config with a text editor, and add or update the key
ORACLE_SRVM_REMOTESHELL to indicate the
ssh path location on your system. For example:
# Locations for ssh and scp commands ORACLE_SRVM_REMOTESHELL=/usr/local/bin/ssh ORACLE_SRVM_REMOTECOPY=/usr/local/bin/scp
Note the following rules for modifying the
Key entries have the syntax name=value
Each key entry and the value assigned to the key defines one property only
Lines beginning with the number sign (#) are comment lines, and are ignored
Lines that do not follow the syntax name=value are ignored
When you have changed the path configuration, run Cluster Verification Utility again. If
ssh is in another location than the default, you also need to start OUI with additional arguments to specify a different location for the remote shell and remote copy commands. Enter
runInstaller -help to obtain information about how to use these arguments.
Note:When you or OUI run
rshcommands, including any login or other shell scripts they start, you may see errors about invalid arguments or standard input if the scripts generate any output. You should correct the cause of these errors.
To stop the errors, remove all commands from the
oracle user's login scripts that generate output when you run
If you see messages about X11 forwarding, then complete the task "Setting Display and X11 Forwarding Configuration" to resolve this issue.
If you see errors similar to the following:
stty: standard input: Invalid argument stty: standard input: Invalid argument
These errors are produced if hidden files on the system (for example,
stty commands. If you see these errors, then refer to Chapter 2, "Preventing Installation Errors Caused by stty Commands" to correct the cause of these errors.
addressto check each node address. When you find an address that cannot be reached, check your list of public and private addresses to make sure that you have them correctly configured. If you use third-party vendor clusterware, then refer to the vendor documentation for assistance. Ensure that the public and private network interfaces have the same interface names on each node of your cluster.
idcommand on each node to confirm that the installation owner user (for example,
oracle) is created with the correct group membership. Ensure that you have created the required groups, and create or modify the user account on affected nodes to establish required group membership.
See Also:"Creating Groups, Users and Paths for Oracle Grid Infrastructure" in Chapter 2 for instructions about how to create required groups, and how to configure the installation owner user
The Oracle Clusterware alert log is the first place to look for serious errors. In the event of an error, it can contain path information to diagnostic logs that can provide specific information about the cause of errors.
After installation, Oracle Clusterware posts alert messages when important events occur. For example, you might see alert messages from the Cluster Ready Services (CRS) daemon process when it starts, if it aborts, if the failover process fails, or if automatic restart of a CRS resource failed.
Enterprise Manager monitors the Clusterware log file and posts an alert on the Cluster Home page if an error is detected. For example, if a voting disk is not available, a
CRS-1604 error is raised, and a critical alert is posted on the Cluster Home page. You can customize the error detection and alert settings on the Metric and Policy Settings page.
The location of the Oracle Clusterware log file is
CRS_home is the directory in which Oracle Clusterware was installed and
hostname is the host name of the local node.
You have missing operating system packages on your system if you receive error messages such as the following during Oracle grid infrastructure, Oracle RAC, or Oracle Database installation:
libstdc++.so.5: cannot open shared object file: No such file or directory libXp.so.6: cannot open shared object file: No such file or directory
Errors such as these should not occur, as missing packages should have been identified during installation. They may indicate that you are using an operating system distribution that has not been certified, or that you are using an older version of the Cluster Verification Utility.
If you have a Linux support network configured, such as the Red Hat network or Oracle Unbreakable Linux support, then use the
up2date command to determine the name of the package. For example:
# up2date --whatprovides libstdc++.so.5 compat-libstdc++-188.8.131.52-47.3
Also, download the most recent version of Cluster Verification Utility to make sure that you have the current required packages list. You can obtain the most recent version at the following URL:
If the installer does not display the Node Selection page, then use the following command syntax to check the integrity of the Cluster Manager:
cluvfy comp clumgr -n node_list -verbose
In the preceding syntax example, the variable
node_list is the list of nodes in your cluster, separated by commas.
Note:If you encounter unexplained installation errors during or after a period when cron jobs are run, then your cron job may have deleted temporary files before the installation is finished. Oracle recommends that you complete installation before daily cron jobs are run, or disable daily cron jobs that perform cleanup until after the installation is completed.
If you use multiple network interface cards (NICs) for the interconnect, then the NICs should be bonded at the operating system level. Otherwise, the failure of a single NIC will affect the availability of the cluster node.
If you install Oracle grid infrastructure and Oracle RAC, then they must use the same NIC or bonded NIC cards for the interconnect.
If you use bonded NIC cards, then they must be on the same subnet.
If you encounter errors, then carry out the following system checks:
Verify with your network providers that they are using correct cables (length, type) and software on their switches. In some cases, to avoid bugs that cause disconnects under loads, or to support additional features such as Jumbo Frames, you may need a firmware upgrade on interconnect switches, or you may need newer NIC driver or firmware at the operating system level. Running without such fixes can cause later instabilities to Oracle RAC databases, even though the initial installation seems to work.
Review VLAN configurations, duplex settings, and auto-negotiation in accordance with vendor and Oracle recommendations.