7 Oracle Clusterware Postinstallation Procedures

This chapter describes how to complete the postinstallation tasks after you have installed the Oracle Clusterware software.

This chapter contains the following topics:

7.1 Required Postinstallation Tasks

You must perform the following tasks after completing your installation:

7.1.1 Back Up the Voting Disk After Installation

After your Oracle Clusterware installation is complete and after you are sure that your system is functioning properly, make a backup of the contents of the voting disk. Use the dd utility. For example:

# dd if=/dev/sda1 of=/dev/myvdisk1.bak

Also, make a backup copy of the voting disk contents after you complete any node additions or node deletions, and after running any deinstallation procedures.

7.1.2 Configure Input/Output Fencing

Review the following sections to configure input/output fencing:

7.1.2.1 About Input/Output Fencing

Input/output fencing (IO Fencing) is a required mechanism to ensure that a node that is evicted from the cluster is prevented from writing to the cluster shared storage. The mechanisms that provide fencing are the hangcheck-timer kernel module, and the Oracle Clusterware Process Monitor Daemon (oprocd). The oprocd process is installed during Oracle Clusterware installation. Both mechanisms are required.

The hangcheck-timer and oprocd are independent mechanisms. The oprocd process provides additional hang check capability, and it can catch hang conditions that the hangcheck-timer misses.

7.1.2.2 Configuring the Hangcheck-timer Module

Load the hangcheck-timer kernel module as root, using insmod or modprobe. The examples in this section show modprobe commands.

Check the following hangcheck timer settings:

  • hangcheck_tick parameter: This parameter defines how often, in seconds, the hangcheck-timer checks the node for hangs. The default value is 60 seconds. Oracle recommends that you change the value of hangcheck_tick to 1.

  • hangcheck_margin parameter: This parameter defines how long the timer waits, in seconds, for a response from the kernel. The default value is 180 seconds. Oracle recommends that you change the value of hangcheck_margin to 10.

  • The hangcheck_reboot parameter determines if the hangcheck-timer restarts the node if the kernel fails to respond within the sum of the hangcheck_tick and hangcheck_margin parameter values. If the value of hangcheck_reboot is equal to or greater than 1, then the hangcheck-timer module restarts the system when a hang is detected. If the hangcheck_reboot parameter is set to zero, then the hangcheck-timer module does not restart the node when a hang is detected.

    Note:

    On Linux 2.6 kernels, by default hangcheck_reboot is set to 0. The value for hangcheck_reboot must always be set to 1 to restart the system if a hang is detected.

These settings assume that the CSS misscount value is set to 30 or 60 seconds, which are the default for release 11g and release 10g respectively. The value for CSS misscount should always be greater than the sum of hangcheck_tick and hangcheck_margin.

For optimal cluster performance, test applications with the hangcheck parameter values that Oracle recommends. If you find that the cluster produces false node evictions with these values, then increase the hangcheck_margin parameter value, with the help of Oracle Support.

Use the following procedure to configure the hangcheck timer:

  1. Log in as root.

  2. Check to see if settings for the hangcheck timer are listed in the module configuration file. On Red Hat and on Oracle Linux, that file is /etc/modprobe.conf. On SUSE, it is /etc/modprobe.conf.local. For example:

    # more /etc/modprobe.conf |grep hang
    

    You should see something similar to the following:

    options hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1
    
  3. If the hangcheck configuration does not exist, or if it exists but the values are set to different values than those recommended, then enter a command similar to the following to load them into the configuration file. The following example is for Red Hat and Oracle Linux:

    # echo "options hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 \
    hangcheck_reboot=1" >>/etc/modprobe.conf
    
  4. If necessary, enter the following command to remove the existing hangcheck-timer values:

    # /sbin/modprobe -r hangcheck-timer
    
  5. Enter the following command to load the new hangcheck-timer values:

    # /sbin/modprobe -v hangcheck-timer
    
  6. To confirm that the hangcheck module is loaded, enter the following command:

    # /sbin/lsmod | grep hang
    

    The output should be similar to the following:

     hangcheck_timer         3289  0
    
  7. To ensure that the module is loaded every time the system restarts, verify that the local system startup file contains the command /sbin/modprobe -v hangcheck-timer, or add it if necessary:

    • Red Hat:

      Add the command to the /etc/rc.d/rc.local file.

    • SUSE:

      Add the command to the /etc/init.d/boot.local file.

  8. Repeat this process on each node that you intend to make a member of the cluster.

7.1.2.3 Configuring Oracle Clusterware Process Monitor Daemon

The Oracle Clusterware Process Monitor Daemon (oprocd) process is part of the Oracle Clusterware software installation. It is started automatically by Oracle Clusterware to detect system hangs. When it detects a system hang, it restarts the hung node.

Oracle has found wide variations in scheduling latencies observed across operating systems and versions of operating systems. Because of these scheduling latencies, the default values for oprocd can be overly sensitive, particularly under heavy system load, resulting in unnecessary oprocd-initiated restarts (false restarts).

Oracle recommends that you address scheduling latencies with your operating system vendor to reduce or eliminate them as much as possible, as they can cause other problems.

To overcome these scheduling latencies, Oracle recommends that you set the Oracle Clusterware parameter diagwait to the value 13. This setting increases the time for failed nodes to flush final trace files, which helps to debug the cause of a node failure. You must shut down the cluster to change the diagwait setting.

If you require more aggressive failover times to meet more stringent service level requirements, then you should open a service request with Oracle Support to receive advice about how to tune for lower failover settings.

Note:

Changing the diagwait parameter requires a clusterwide shutdown. Oracle recommends that you change the diagwait setting either immediately after the initial installation, or during a scheduled outage.

To change the diagwait setting:

  1. Log in as root, and run the following command on all nodes, where CRS_home is the home directory of the Oracle Clusterware installation:

    # CRS_home/bin/crsctl stop crs
    
  2. Enter the following command, where CRS_home is the Oracle Clusterware home:

    # CRS_home/bin/oprocd stop
    

    Repeat this command on all nodes.

  3. From one node of the cluster, change the value of the diagwait parameter to 13 seconds by issuing the following command as root:

    # CRS_home/bin/crsctl set css diagwait 13 -force
    
  4. Restart the Oracle Clusterware by running the following command on all nodes:

    # CRS_home/bin/crsctl start crs
    
  5. Run the following command to ensure that Oracle Clusterware is functioning properly:

    # CRS_home/bin/crsctl check crs
    

7.1.3 Download and Install Patch Updates

Refer to the OracleMetaLink Web site for required patch updates for your installation. To download required patch updates:

  1. Use a Web browser to view the OracleMetaLink Web site:

    https://metalink.oracle.com

  2. Log in to OracleMetaLink.

    Note:

    If you are not an OracleMetaLink registered user, then click Register for MetaLink and register.
  3. On the main OracleMetaLink page, click Patches & Updates.

  4. On the Patches & Update page, click Advanced Search.

  5. On the Advanced Search page, click the search icon next to the Product or Product Family field.

  6. In the Search and Select: Product Family field, select Database and Tools in the Search list field, enter RDBMS Server in the text field, and click Go.

    RDBMS Server appears in the Product or Product Family field. The current release appears in the Release field.

  7. Select your platform from the list in the Platform field, and at the bottom of the selection list, click Go.

  8. Any available patch updates appear under the Results heading.

  9. Click the number of the patch that you want to download.

  10. On the Patch Set page, click View README and read the page that appears. The README page contains information about the patch set and how to apply the patches to your installation.

  11. Return to the Patch Set page, click Download, and save the file on your system.

  12. Use the unzip utility provided with Oracle Database 10g to uncompress the Oracle patch updates that you downloaded from OracleMetaLink. The unzip utility is located in the $ORACLE_HOME/bin directory.

  13. Refer to Appendix B for information about how to stop database processes in preparation for installing patches.

7.2 Recommended Postinstallation Tasks

Oracle recommends that you complete the following tasks after installing Oracle Clusterware.

7.2.1 Back Up the root.sh Script

Oracle recommends that you back up the root.sh script after you complete an installation. If you install other products in the same Oracle home directory, then the Oracle Universal Installer (OUI) updates the contents of the existing root.sh script during the installation. If you require information contained in the original root.sh script, then you can recover it from the root.sh file copy.

7.2.2 Run CVU Postinstallation Check

After installing Oracle Clusterware, check the status of your Oracle Clusterware installation with the command cluvfy stage -post crsinst, using the following syntax:

cluvfy stage -post crsinst -n node_list [-verbose]