This chapter describes how to complete the postinstallation tasks after you have installed the Oracle Clusterware software.
This chapter contains the following topics:
You must perform the following tasks after completing your installation:
After your Oracle Clusterware installation is complete and after you are sure that your system is functioning properly, make a backup of the contents of the voting disk. Use the dd
utility. For example:
# dd if=/dev/sda1 of=/dev/myvdisk1.bak
Also, make a backup copy of the voting disk contents after you complete any node additions or node deletions, and after running any deinstallation procedures.
Review the following sections to configure input/output fencing:
Input/output fencing (IO Fencing) is a required mechanism to ensure that a node that is evicted from the cluster is prevented from writing to the cluster shared storage. The mechanisms that provide fencing are the hangcheck-timer kernel module, and the Oracle Clusterware Process Monitor Daemon (oprocd
). The oprocd
process is installed during Oracle Clusterware installation. Both mechanisms are required.
The hangcheck-timer and oprocd
are independent mechanisms. The oprocd
process provides additional hang check capability, and it can catch hang conditions that the hangcheck-timer misses.
Load the hangcheck-timer kernel module as root, using insmod
or modprobe
. The examples in this section show modprobe
commands.
Check the following hangcheck timer settings:
hangcheck_tick
parameter: This parameter defines how often, in seconds, the hangcheck-timer
checks the node for hangs. The default value is 60 seconds. Oracle recommends that you change the value of hangcheck_tick
to 1
.
hangcheck_margin
parameter: This parameter defines how long the timer waits, in seconds, for a response from the kernel. The default value is 180 seconds. Oracle recommends that you change the value of hangcheck_margin
to 10
.
The hangcheck_reboot
parameter determines if the hangcheck-timer restarts the node if the kernel fails to respond within the sum of the hangcheck_tick
and hangcheck_margin
parameter values. If the value of hangcheck_reboot is equal to or greater than 1, then the hangcheck-timer
module restarts the system when a hang is detected. If the hangcheck_reboot parameter is set to zero, then the hangcheck-timer module does not restart the node when a hang is detected.
Note:
On Linux 2.6 kernels, by defaulthangcheck_reboot
is set to 0. The value for hangcheck_reboot
must always be set to 1
to restart the system if a hang is detected.These settings assume that the CSS misscount value is set to 30 or 60 seconds, which are the default for release 11g and release 10g respectively. The value for CSS misscount should always be greater than the sum of hangcheck_tick
and hangcheck_margin
.
For optimal cluster performance, test applications with the hangcheck parameter values that Oracle recommends. If you find that the cluster produces false node evictions with these values, then increase the hangcheck_margin
parameter value, with the help of Oracle Support.
Use the following procedure to configure the hangcheck timer:
Log in as root.
Check to see if settings for the hangcheck timer are listed in the module configuration file. On Red Hat and on Oracle Linux, that file is /etc/modprobe.conf
. On SUSE, it is /etc/modprobe.conf.local
. For example:
# more /etc/modprobe.conf |grep hang
You should see something similar to the following:
options hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1
If the hangcheck configuration does not exist, or if it exists but the values are set to different values than those recommended, then enter a command similar to the following to load them into the configuration file. The following example is for Red Hat and Oracle Linux:
# echo "options hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 \ hangcheck_reboot=1" >>/etc/modprobe.conf
If necessary, enter the following command to remove the existing hangcheck-timer values:
# /sbin/modprobe -r hangcheck-timer
Enter the following command to load the new hangcheck-timer values:
# /sbin/modprobe -v hangcheck-timer
To confirm that the hangcheck module is loaded, enter the following command:
# /sbin/lsmod | grep hang
The output should be similar to the following:
hangcheck_timer 3289 0
To ensure that the module is loaded every time the system restarts, verify that the local system startup file contains the command /sbin/modprobe -v hangcheck-timer,
or add it if necessary:
Red Hat:
Add the command to the /etc/rc.d/rc.local
file.
SUSE:
Add the command to the /etc/init.d/boot.local
file.
Repeat this process on each node that you intend to make a member of the cluster.
The Oracle Clusterware Process Monitor Daemon (oprocd
) process is part of the Oracle Clusterware software installation. It is started automatically by Oracle Clusterware to detect system hangs. When it detects a system hang, it restarts the hung node.
Oracle has found wide variations in scheduling latencies observed across operating systems and versions of operating systems. Because of these scheduling latencies, the default values for oprocd
can be overly sensitive, particularly under heavy system load, resulting in unnecessary oprocd
-initiated restarts (false restarts).
Oracle recommends that you address scheduling latencies with your operating system vendor to reduce or eliminate them as much as possible, as they can cause other problems.
To overcome these scheduling latencies, Oracle recommends that you set the Oracle Clusterware parameter diagwait
to the value 13
. This setting increases the time for failed nodes to flush final trace files, which helps to debug the cause of a node failure. You must shut down the cluster to change the diagwait
setting.
If you require more aggressive failover times to meet more stringent service level requirements, then you should open a service request with Oracle Support to receive advice about how to tune for lower failover settings.
Note:
Changing thediagwait
parameter requires a clusterwide shutdown. Oracle recommends that you change the diagwait
setting either immediately after the initial installation, or during a scheduled outage.To change the diagwait
setting:
Log in as root, and run the following command on all nodes, where CRS_home
is the home directory of the Oracle Clusterware installation:
# CRS_home/bin/crsctl stop crs
Enter the following command, where CRS_home
is the Oracle Clusterware home:
# CRS_home/bin/oprocd stop
Repeat this command on all nodes.
From one node of the cluster, change the value of the diagwait
parameter to 13 seconds by issuing the following command as root
:
# CRS_home/bin/crsctl set css diagwait 13 -force
Restart the Oracle Clusterware by running the following command on all nodes:
# CRS_home/bin/crsctl start crs
Run the following command to ensure that Oracle Clusterware is functioning properly:
# CRS_home/bin/crsctl check crs
Refer to the OracleMetaLink Web site for required patch updates for your installation. To download required patch updates:
Use a Web browser to view the OracleMetaLink Web site:
Log in to OracleMetaLink.
Note:
If you are not an OracleMetaLink registered user, then click Register for MetaLink and register.On the main OracleMetaLink page, click Patches & Updates.
On the Patches & Update page, click Advanced Search.
On the Advanced Search page, click the search icon next to the Product or Product Family field.
In the Search and Select: Product Family field, select Database and Tools in the Search list field, enter RDBMS Server in the text field, and click Go.
RDBMS Server appears in the Product or Product Family field. The current release appears in the Release field.
Select your platform from the list in the Platform field, and at the bottom of the selection list, click Go.
Any available patch updates appear under the Results heading.
Click the number of the patch that you want to download.
On the Patch Set page, click View README and read the page that appears. The README page contains information about the patch set and how to apply the patches to your installation.
Return to the Patch Set page, click Download, and save the file on your system.
Use the unzip utility provided with Oracle Database 10g to uncompress the Oracle patch updates that you downloaded from OracleMetaLink. The unzip utility is located in the $ORACLE_HOME/bin
directory.
Refer to Appendix B for information about how to stop database processes in preparation for installing patches.
Oracle recommends that you complete the following tasks after installing Oracle Clusterware.
Oracle recommends that you back up the root.sh
script after you complete an installation. If you install other products in the same Oracle home directory, then the Oracle Universal Installer (OUI) updates the contents of the existing root.sh
script during the installation. If you require information contained in the original root.sh
script, then you can recover it from the root.sh
file copy.