Sun Cluster Data Service for Sun Grid Engine Guide for Solaris OS

Installing and Configuring Sun Cluster HA for Sun Grid Engine

This chapter explains how to install and configure Sun Cluster HA for Sun Grid Engine.


Note –

Sun Grid Engine was formerly known as Sun ONE Grid Engine. In this book, references to “Sun Grid Engine” also apply to Sun ONE Grid Engine unless this book explicitly states otherwise.


This chapter contains the following procedures.

Sun Cluster HA for Sun Grid Engine Overview

Sun Grid Engine is a distributed resource management program, which runs jobs in parallel on multiple machines. To minimize the loss of work that a failure of a machine might cause, nodes in the management tier must be protected against failure. However, protection of individual execution nodes in the grid against failure is not required. Failure of an individual execution node in a grid causes only a minor loss of work.

To eliminate single points of failure in the management tier of a Sun Grid Engine system, Sun Cluster HA for Sun Grid Engine provides fault monitoring and automatic fault recovery for the following Sun Grid Engine daemons:

You must configure Sun Cluster HA for Sun Grid Engine as a failover service.

For conceptual information about failover data services and scalable data services, see Sun Cluster Concepts Guide for Solaris OS.

Because the management tier relies on the Sun Grid Engine file system, the NFS server that exports this file system must also be protected against failure. To eliminate single points of failure in the NFS server, use the Sun Cluster HA for NFS data service. For more information about this data service, see Sun Cluster Data Service for Network File System (NFS) Guide for Solaris OS.

Each component of Sun Grid Engine has a data service that protects the component when the component is configured in Sun Cluster. See the following table.

Table 1–1 Protection of Sun Grid Engine Components by Sun Cluster Data Services

Sun Grid Engine Component 

Data Service 

Sun Grid Engine daemons: 

  • Communication daemon (sge_commd)

  • Queue master daemon (sge_qmaster)

  • Scheduling daemon (sge_schedd)

Sun Cluster HA for Sun Grid Engine 

The resource type is SUNW.gds.

NFS server 

Sun Cluster HA for NFS 

The resource type is SUNW.nfs.

Overview of Installing and Configuring Sun Cluster HA for Sun Grid Engine

The following table summarizes the tasks for installing and configuring Sun Cluster HA for Sun Grid Engine and provides cross-references to detailed instructions for performing these tasks. Perform the tasks in the order that they are listed in the table.

Table 1–2 Tasks for Installing and Configuring Sun Cluster HA for Sun Grid Engine

Task 

Instructions 

Plan the installation 

Sun Cluster HA for Sun Grid Engine Overview

Planning the Sun Cluster HA for Sun Grid Engine Installation and Configuration

Prepare the nodes and disks 

Preparing the Nodes and Disks

Install and configure Sun Grid Engine 

Installing and Configuring Sun Grid Engine

Verify Sun Cluster HA for Sun Grid Engine installation and configuration 

Verifying the Installation and Configuration of Sun Grid Engine

Install Sun Cluster HA for Sun Grid Engine Packages 

Installing the Sun Cluster HA for Sun Grid Engine Packages

Configure the HAStoragePlus resource type to work with Sun Cluster HA for Sun Grid Engine

Configuring the HAStoragePlus Resource Type to Work With Sun Cluster HA for Sun Grid Engine

Configure Sun Cluster HA for NFS for use with Sun Cluster HA for Sun Grid Engine 

Configuring Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine

Register and Configure Sun Cluster HA for Sun Grid Engine 

Registering and Configuring Sun Cluster HA for Sun Grid Engine

Verify Sun Cluster HA for Sun Grid Engine installation and configuration 

Verifying the Sun Cluster HA for Sun Grid Engine Installation and Configuration

Tune Sun Cluster HA for Sun Grid Engine fault monitors 

Tuning the Sun Cluster HA for Sun Grid Engine Fault Monitors

Debug Sun Cluster HA for Sun Grid Engine 

Debugging Sun Cluster HA for Sun Grid Engine

Planning the Sun Cluster HA for Sun Grid Engine Installation and Configuration

This section contains the information that you need to plan your Sun Cluster HA for Sun Grid Engine installation and configuration.


Note –

Before you begin, consult your Sun Grid Engine documentation for configuration restrictions and requirements that are not imposed by Sun Cluster software.


Configuration Restrictions

The configuration restrictions in the subsections that follow apply only to Sun Cluster HA for Sun Grid Engine.


Caution – Caution –

Your data service configuration might not be supported if you do not observe these restrictions.


Sun Grid Engine Shadow Daemon

Do not use the Sun Grid Engine shadow daemon. The Sun Grid Engine shadow daemon provides an optional mechanism for recovery from failures. This mechanism interferes with the automatic fault recovery that Sun Cluster provides.

Start at Boot Option

Do not choose the start at boot option when installing Sun Grid Engine. To ensure that Sun Cluster HA for Sun Grid Engine can provide fault monitoring and automatic fault recovery, Sun Grid Engine must be started only by Sun Cluster.

Configuration Requirements

The configuration requirements in this section apply only to Sun Cluster HA for Sun Grid Engine.


Caution – Caution –

If your data service configuration does not conform to these requirements, the data service configuration might not be supported.


Sun Grid Engine Software Version Requirements

Use Sun Grid Engine version 5.3.

Operating System for the Sun Grid Engine Management Tier

The Sun Grid Engine management tier must run on Sun Cluster nodes. Because Sun Cluster runs only on the Solaris Operating System, the Sun Grid Engine management tier must also run on the Solaris Operating System. However, Sun Grid Engine supports other operating systems. Therefore, this requirement applies only to the management tier, not to individual execution nodes in the grid.

Memory Requirements

Ensure that enough free memory is available on the cluster nodes where you plan to run the Sun Grid Engine master.

The amount of free memory that is required on each cluster node depends on the number of jobs that are running on the grid. For example:

Disk Space Requirements

Ensure that you have enough disk space in the Sun Grid Engine file system and on the local disk of each node.

The disk space requirements for each type of file or directory in the Sun Grid Engine file system are listed in the following table.

File Type or Directory Type 

Required Disk Space 

Binary files 

15 Mbytes for each architecture 

Spool directories 

30–200 Mbytes 

Installation tar file 

40 Mbytes 

On the local disk of each node, 10–20 Mbytes of disk space are required. If you are installing the Sun Grid Engine software on the local disk of a node, 15 Mbytes of disk space are additionally required for the binary files.

Sun Cluster HA for Sun Grid Engine Configuration Requirements

Configure Sun Cluster HA for Sun Grid Engine as a failover data service. You cannot configure Sun Cluster HA for Sun Grid Engine as a scalable data service. For more information, see:

NFS Configuration for the Sun Grid Engine File System

The Sun Grid Engine file system must reside on a multihost disk. This disk must be available to the other nodes in the cluster that will be used for the Sun Grid Engine administrative services,

You must use NFS to export the Sun Grid Engine file system to the noncluster nodes. The NFS server that exports this file system must also be protected against failure. To protect the NFS server against failure, use the Sun Cluster HA for NFS data service. For more information about this data service, see Sun Cluster Data Service for Network File System (NFS) Guide for Solaris OS.

Sun Cluster HA for NFS Configuration Requirements

Configure the resources for the Sun Grid Engine management tier in the same resource group as the resource for NFS. For more information, see Configuring Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine.

Dependencies Between Sun Grid Engine Components

The dependencies between Sun Grid Engine components are shown in the following table.

Table 1–3 Dependencies Between Sun Grid Engine Components

Sun Grid Engine Component 

Dependency 

Sun Grid Engine communication daemon (sge_commd)

SUNW.HAStoragePlus resource

Sun Grid Engine queue master daemon (sge_qmaster)

Sun Grid Engine communication daemon (sge_commd) resource

Sun Grid Engine scheduling daemon (sge_schedd)

Sun Grid Engine queue master daemon (sge_qmaster) resource

These dependencies are set when you register and configure Sun Cluster HA for Sun Grid Engine. For more information, see Registering and Configuring Sun Cluster HA for Sun Grid Engine.

Configuration Considerations

The configuration considerations in the subsections that follow affect the installation and configuration of Sun Cluster HA for Sun Grid Engine.

Location of the Sun Grid Engine Binary Files

You can install Sun Grid Engine on one of the following locations:

For the advantages and disadvantages of placing the Sun Grid Engine binary files on a highly available local file system and the cluster file system, see “Configuration Guidelines for Sun Cluster Data Services” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.


Tip –

To enable the type of file system to be identified from the mount point, use a prefix that indicates the type of file system as follows:


File Systems for Spool Directories and Binary Files

The optimum distribution of spool directories and binary files among file systems depends on the grid configuration. See the following table.

Grid Configuration 

File System Configuration 

The execution tier contains fewer than 200 hosts. 

Use a single shared NFS file system under the root of the Sun Grid Engine file system for the spool directories and binary files. 

The execution tier contains about 200 hosts, or the applications are disk intensive. 

Use a separate area on an NFS file system for the spool directories. 

The execution tier contains more than 200 hosts, or NFS performance is likely to be a problem. 

See the Sun Grid Engine documentation for alternate grid configurations. 

Configuration Planning Questions

Use the questions in this section to plan the installation and configuration of Sun Cluster HA for Sun Grid Engine. Write the answers to these questions in the space that is provided on the data service worksheets in “Configuration Worksheets” in Sun Cluster 3.1 Data Services Planning and Administration Guide.

Preparing the Nodes and Disks

Preparing the nodes and disks modifies the configuration of the operating system to enable Sun Cluster HA for Sun Grid Engine to eliminate single points of failure in a Sun Grid Engine system.

Before you begin, ensure that the requirements in the following sections are met:

How to Prepare the Nodes and Disks

  1. Become superuser of the cluster node where you are installing Sun Grid Engine.

  2. Create an administrative user account for Sun Grid Engine.

    Either select an existing user account other than root for the grid administration, or create an account specifically for grid administration.


    Tip –

    For consistency with the Sun Grid Engine documentation, name the account sgeadmin.


  3. Create a directory for the root of Sun Grid Engine file system.


    # mkdir sge-root-dir
    
  4. Change the owner of the root of the Sun Grid Engine file system to the administrative user whose account you created in Step 2.


    # chown sge-admin  sge-root-dir 
    
  5. Set the mode of the root of Sun Grid Engine file system to drwxr-xr-x .


    # chmod 755  sge-root-dir
    
  6. Specify the port number and protocol for the sge_commd service.

    Choose an unused port number below 1024. The sge_commd service is to be provided through Transmission Control Protocol (TCP).

    To specify the port number and protocol, add the following line to the /etc/services file.

    sge_commd	port-no/tcp
  7. For each type of host in the grid, create a plain text file that contains the names of all hosts of that type in the grid.

    The install_qmaster script uses these files when you install Sun Grid Engine. Create a separate file for each type of host in the grid:

    • Execution hosts

    • Administrative hosts

    • Submit hosts


Example 1–1 Preparing the Nodes and Disks for the Installation of Sun Grid Engine

This example shows how to prepare the nodes and disks for a Sun Grid Engine installation that is to be configured as follows:

The sequence of operations for preparing the nodes and disks for the installation of Sun Grid Engine is as follows:

  1. To create the /global/gridmaster directory for the root of Sun Grid Engine file system, the following command is run:


    # mkdir /global/gridmaster
    
  2. To change the owner of the /global/gridmaster directory to the sgeadmin user, the following command is run:


    # chown sgeadmin  /global/gridmaster
    
  3. To set the mode of the /global/gridmaster directory to drwxr-xr-x, the following command is run:


    # chmod 755  /global/gridmaster
    
  4. To specify that the sge_commd service is to be provided through port 536 and TCP, the following line is added to the /etc/services file:

    sge_commd	536/tcp

Installing and Configuring Sun Grid Engine

The procedure that follows explains only the special requirements for installing Sun Grid Engine for use with Sun Cluster HA for Sun Grid Engine. For complete information about installing and configuring Sun Grid Engine, see your Sun Grid Engine documentation.

To enable Sun Grid Engine to run in a cluster, you must modify Sun Grid Engine to use a logical host name.

How to Install and Configure Sun Grid Engine

Before you begin, ensure that you have the host names of all hosts in the grid. Create a separate list of host names for each type of host in the grid:

  1. Become superuser of the cluster node where you are installing Sun Grid Engine.

  2. Install the SDRMcomm and SDRMsp64 packages with pkgadd.

    When you install each package, you are asked for the directory for the root of the Sun Grid Engine file system.


    Where should Sun Grid Engine 5.3 be installed [default /gridware/sge] 
  3. Specify the directory for the root of Sun Grid Engine file system that you created in Preparing the Nodes and Disks.

  4. When you are prompted, specify the following information:

    • The name of the Sun Grid Engine administrative user whose account you created in Preparing the Nodes and Disks. The default is sgeadmin.

    • The name of the user group of the Sun Grid Engine administrative user. The default is adm.

  5. Set the SGE_ROOT environment variable to the directory for the root of Sun Grid Engine file system that you created in Preparing the Nodes and Disks.


    # SGE_ROOT=sge-root-dir 
    # export SGE_ROOT
    
  6. Go to the directory for the root of Sun Grid Engine file system.


    # cd sge-root-dir
    
  7. Start the script that installs the Sun Grid Engine master host.


    # ./install_qmaster
    
  8. Follow the prompts on screen to provide or confirm the following information:

    • The value of the SGE_ROOT environment variable

    • The TCP port number

    • The name of the Sun Grid Engine administrative user

    • The method that you used to install the SDRMcomm and SDRMsp64 packages

    • Details of your domain name service (DNS) domains

  9. When you are prompted, specify the range of group IDs for Sun Grid Engine to use.

    To ensure that you allocate enough group IDs, specify a range of approximately 100 group IDs, for example, 20000-20100.

    You are asked if you want to install the script that starts Sun Grid Engine at boot time.


    We can install the startup script that
    Grid Engine is started at machine boot (y/n) [y] >>   
  10. When you are asked if you want to install the script that starts Sun Grid Engine at boot time, reply no.

    To ensure that Sun Cluster HA for Sun Grid Engine can provide fault monitoring and automatic fault recovery, Sun Grid Engine must be started only by Sun Cluster.

  11. When you are prompted, specify the list of execution hosts.

How to Enable Sun Grid Engine to Run in a Cluster

  1. Become superuser of a node in the cluster that will host Sun Grid Engine.

  2. Create a failover resource group to contain the Sun Cluster HA for Sun Grid Engine resources.

    Use the resource group that you identified when you answered the questions in Configuration Planning Questions.


    # scrgadm -a -g sge-rg \
     -y Pathprefix=sge-root-dir
    
    -g sge-rg

    Specifies that the resource group that you are creating is named sge-rg.

    -y Pathprefix= sge-root-dir

    Specifies a directory on a cluster file system that Sun Cluster HA for NFS uses to maintain administrative and status information. This directory must be the directory that you created for the root of the Sun Grid Engine file system in Preparing the Nodes and Disks.

  3. Add a resource for the Sun Grid Engine logical host name to the failover resource group that you created in Step 2.


    # scrgadm -a -L -j sge-lh-rs \
    -g sge-rg \
    -l hostlist
    
    -j sge-lh-rs

    Specifies that the resource that you are creating is named sge-lh-rs

    -g sge-rg

    Specifies that the logical host name resource is to be added to the failover resource group that you created in Step 2

    -l hostlist

    Specifies a comma-separated list of host names that are to be made available by this logical host name resource

Verifying the Installation and Configuration of Sun Grid Engine

Before you install the Sun Cluster HA for Sun Grid Engine packages, verify that the Sun Grid Engine software is correctly installed and configured to run in a cluster. This verification does not verify that the Sun Grid Engine application is highly available because the Sun Cluster HA for Sun Grid Engine data service is not yet installed.


Note –

If any step in this procedure fails, see your Sun Grid Engine documentation for more information about how to verify the Sun Grid Engine installation.


How to Verify the Installation and Configuration of Sun Grid Engine

You verify the installation and configuration of Sun Grid Engine by submitting a dummy job and checking that the required processes are running.

  1. Log in to the master host as the administrative user whose account you created in Preparing the Nodes and Disks.

  2. Set the SGE_ROOT environment variable to the directory for the root of Sun Grid Engine file system that you created in Preparing the Nodes and Disks.


    $ SGE_ROOT=sge-root-dir 
    $ export SGE_ROOT
    
  3. Start the script that modifies your environment to enable Sun Grid Engine to run.


    $ . $SGE_ROOT/default/common/settings.sh
    
  4. Submit a dummy job to Sun Grid Engine.


    $ qsub $SGE_ROOT/examples/jobs/sleeper.sh
    your job 1 (*Sleeper*) has been submitted 
  5. On the master host, confirm that these processes are running:

    • sge_commd

    • sge_qmaster

    • sge_schedd


    #  ps -ef | grep sge_ 
    root  429  1  0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_commd
    root  429  1  0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_qmaster
    root  429  1  0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_schedd
  6. View the global configuration of the grid.

    • If you are using the command line, type the following command:


      $ qconf -sconf
      
    • If you are using the QMON graphical user interface (GUI), select Cluster Configuration.

  7. On at minimum one execution host, confirm that these processes are running:

    • sge_commd

    • sge_execd


    #  ps -ef | grep sge_ 
    root  439  1 0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_commd
    root  451  1 0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_execd

Installing the Sun Cluster HA for Sun Grid Engine Packages

If you did not install the Sun Cluster HA for Sun Grid Engine packages during your initial Sun Cluster installation, perform this procedure to install the packages. Perform this procedure on each cluster node where you are installing the Sun Cluster HA for Sun Grid Engine packages. To complete this procedure, you need the Sun Java Enterprise System Accessory CD Volume 3.

If you are installing more than one data service simultaneously, perform the procedure in “Installing the Software” in Sun Cluster Software Installation Guide for Solaris OS.

Install the Sun Cluster HA for Sun Grid Engine packages by using one of the following installation tools:


Note –

The Web Start program is not available in releases earlier than Sun Cluster 3.1 Data Services 10/03.


How to Install the Sun Cluster HA for Sun Grid Engine Packages by Using the Web Start Program

You can run the Web Start program with a command-line interface (CLI) or with a graphical user interface (GUI). The content and sequence of instructions in the CLI and the GUI are similar. For more information about the Web Start program, see the installer(1M) man page.

  1. On the cluster node where you are installing the Sun Cluster HA for Sun Grid Engine packages, become superuser.

  2. (Optional) If you intend to run the Web Start program with a GUI, ensure that your DISPLAY environment variable is set.

  3. Load the Sun Java Enterprise System Accessory CD Volume 3 into the CD-ROM drive.

    If the Volume Management daemon vold(1M) is running and configured to manage CD-ROM devices, it automatically mounts the CD-ROM on the /cdrom/cdrom0 directory.

  4. Change to the Sun Cluster HA for Sun Grid Engine component directory of the CD-ROM.

    The Web Start program for the Sun Cluster HA for Sun Grid Engine data service resides in this directory.


    # cd /cdrom/cdrom0/\
    components/SunCluster_HA_SUN_GRID_ENG_3.1
    
  5. Start the Web Start program.


    # ./installer
    
  6. When you are prompted, select the type of installation.

    • To install only the C locale, select Typical.

    • To install other locales, select Custom.

  7. Follow instructions on the screen to install the Sun Cluster HA for Sun Grid Engine packages on the node.

    After the installation is finished, the Web Start program provides an installation summary. This summary enables you to view logs that the Web Start program created during the installation. These logs are located in the /var/sadm/install/logs directory.

  8. Exit the Web Start program.

  9. Unload the Sun Java Enterprise System Accessory CD Volume 3 from the CD-ROM drive.

    1. To ensure that the CD-ROM is not being used, change to a directory that does not reside on the CD-ROM.

    2. Eject the CD-ROM.


      # eject cdrom
      

How to Install the Sun Cluster HA for Sun Grid Engine Packages by Using the scinstall Utility

  1. Load the Sun Java Enterprise System Accessory CD Volume 3 into the CD-ROM drive.

  2. Run the scinstall utility with no options.

    This step starts the scinstall utility in interactive mode.

  3. Select the menu option, Add Support for New Data Service to This Cluster Node.

    The scinstall utility prompts you for additional information.

  4. Provide the path to the Sun Java Enterprise System Accessory CD Volume 3.

    The utility refers to the CD as the “data services cd.”

  5. Specify the data service to install.

    The scinstall utility lists the data service that you selected and asks you to confirm your choice.

  6. Exit the scinstall utility.

  7. Unload the CD from the drive.

Configuring the HAStoragePlus Resource Type to Work With Sun Cluster HA for Sun Grid Engine

For maximum availability of the Sun Grid Engine application, resources that Sun Cluster HA for Sun Grid Engine requires must be available before the Sun Grid Engine management tier is started. An example of such a resource is the Sun Grid Engine file system. To ensure that these resources are available, configure the HAStoragePlus resource type to work with Sun Cluster HA for Sun Grid Engine.

For information about the relationship between resource groups and disk device groups, see “Relationship Between Resource Groups and Disk Device Groups” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Configuring the HAStoragePlus resource type to work with Sun Cluster HA for Sun Grid Engine involves the following operations:

How to Register and Configure an HAStoragePlus Resource

  1. Become superuser on a node in the cluster that will host Sun Grid Engine.

  2. Register the SUNW.HAStoragePlus resource type.


    # scrgadm -a -t SUNW.HAStoragePlus
    
  3. Add an HAStoragePlus resource for the Sun Grid Engine file system to the resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster.


    # scrgadm -a -j sge-hasp-rs \
    -g sge-rg \
    -t SUNW.HAStoragePlus  \
    -x FilesystemMountPoints=sge-root
    
    -j sge-hasp-rs

    Specifies that the resource that you are creating is named sge-hasp-rs

    -g sge-rg

    Specifies that the resource is to be added to the resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster

    -x FilesystemMountPoints=sge-root

    Specifies that the mount point for this file system is the root of the Sun Grid Engine file system

Configuring Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine

You must use NFS to export the Sun Grid Engine file system to the noncluster nodes. The NFS server that exports this file system must also be protected against failure. To protect the NFS server against failure, use the Sun Cluster HA for NFS data service.

The procedure that follows explains only the special requirements for using Sun Cluster HA for NFS with Sun Cluster HA for Sun Grid Engine. For complete information about installing and configuring Sun Cluster HA for NFS, see Sun Cluster Data Service for Network File System (NFS) Guide for Solaris OS.

How to Configure Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine


Note –

Commands in this procedure assume that you have set the $SGE_ROOT environment variable to specify the root of the Sun Grid Engine file system.


  1. Register the SUNW.nfs resource type.


    # scrgadm -a -t SUNW.nfs
    
  2. From any cluster node, create a directory for NFS configuration files.

    Create the directory under root of the Sun Grid Engine file system. Name the directory SUNW.nfs.


    # mkdir -p $SGE_ROOT/SUNW.nfs
    
  3. In the directory that you created in Step 2, create a file that contains the share command for the root of the Sun Grid Engine file system.

    Name the file the dfstab.sge-nfs-rs, where sge-nfs-rs is the name of the NFS resource that you will create in Step 4.


    # echo "share -F nfs -o rw sge-root" \
     > $SGE_ROOT/SUNW.nfs/dfstab.sge-nfs-rs
    
  4. Add a SUNW.nfs resource to the failover resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster.


    # scrgadm -a -j sge-nfs-rs \
    -g sge-rg \
    -t SUNW.nfs \
    -y Resource_dependencies=sge-hasp-rs
    

Example 1–2 Creating a dfstab File for the Root of the Sun Grid Engine File System

This example shows the command for creating a dfstab file for the root of the Sun Grid Engine file system.


# echo "share -F nfs -o rw /global/gridmaster" \
 > /global/gridmaster/SUNW.nfs/dfstab.sge-nfs-rs

Registering and Configuring Sun Cluster HA for Sun Grid Engine

Before you perform this procedure, ensure that the Sun Cluster HA for Sun Grid Engine data service packages are installed.

Use the configuration and registration files in the /opt/SUNWscsge/util directory to register the Sun Cluster HA for Sun Grid Engine resources. The files define the dependencies that are required between Sun Grid Engine components. For information about these dependencies, see Dependencies Between Sun Grid Engine Components. For a listing of these files, see Appendix A, Files for Configuring and Removing Sun Cluster HA for Sun Grid Engine Resources.

Registering and configuring Sun Cluster HA for Sun Grid Engine involves the tasks that are explained in the following sections:

  1. Specifying Configuration Parameters for Sun Cluster HA for Sun Grid Engine Resources

  2. How to Create and Enable Sun Cluster HA for Sun Grid Engine Resources

Specifying Configuration Parameters for Sun Cluster HA for Sun Grid Engine Resources

Sun Cluster HA for Sun Grid Engine provides scripts that automate the process of configuring and removing Sun Cluster HA for Sun Grid Engine resources. These scripts obtain configuration parameters from the sge_config file in the /opt/SUNWscsge/util/ directory. To specify configuration parameters for Sun Cluster HA for Sun Grid Engine resources, edit the sge_config file.

Each configuration parameter in the sge_config file is defined as a keyword-value pair. The sge_config file already contains the required keywords and equals signs. For more information, see Listing of sge_config. When you edit the sge_config file, add the required value to each keyword. Use the values that you identified in Configuration Planning Questions.

The keyword-value pairs in the sge_config file are as follows:

COMMDRS=sge-commd-rs
QMASTERRS=sge-qmaster-rs
SCHEDDRS=sge-schedd-rs
RG=sge-rg
LH=sge-lh-rs
SGE_ROOT=sge-root-dir
SGE_CELL=cell-name
PORT=portno
USE_INTERNAL_DEP=FALSE|TRUE

The meaning and permitted values of the keywords in the sge_config file are as follows:

COMMDRSS=sge-commd-rs

Specifies the name that you are assigning to the resource for the Sun Grid Engine communications daemon sge_commd.

QMASTERRS=sge-qmaster-rs

Specifies the name that you are assigning to the resource for the Sun Grid Engine queue master daemon sge_qmaster.

SCHEDDRS=sge-schedd-rs

Specifies the name that you are assigning to the resource for the Sun Grid Engine scheduling daemon sge_schedd.

RG=sge-rg

Specifies the name of the resource group that contains the Sun Cluster HA for Sun Grid Engine resources. This name must be the name that you assigned when you created the resource group as explained in How to Enable Sun Grid Engine to Run in a Cluster.

LH=sge-lh-rs

Specifies the name of the logical host name resource for Sun Grid Engine. This name must be the name that you assigned when you created the resource in How to Enable Sun Grid Engine to Run in a Cluster.

SGE_ROOT=sge-root-dir

Specifies the root directory of the Sun Grid Engine file system. This directory must be the directory that you created for root of the Sun Grid Engine file system in Preparing the Nodes and Disks.

SGE_CELL=cell-name

Specifies the cell that Sun Grid Engine references.

PORT=portno

The port number is ignored. You can specify any integer for PORT. In the sge_config file, PORT is preset to 1234.

USE_INTERNAL_DEP=FALSE|TRUE

Specifies whether dependencies are to be set between resource groups. The possible values for this keyword are as follows:

FALSE

Specifies that dependencies are not to be set between resource groups.

TRUE

Specifies that dependencies are to be set between resource groups.

In the sge_config file, USE_INTERNAL_DEP is preset to FALSE.


Example 1–3 Sample sge_config File

This example shows an sge_config file in which configuration parameters are set as follows:

COMMDRS=sge_commd-rs
QMASTERRS=sge_qmaster-rs
SCHEDDRS=sge_schedd-rs
RG=sge-rg
LH=sge-lh-rs 
SGE_ROOT=/global/gridmaster
SGE_CELL=default
PORT=1234
USE_INTERNAL_DEP=FALSE

How to Create and Enable Sun Cluster HA for Sun Grid Engine Resources

Before you begin, ensure that you have edited the sge_config file to specify configuration parameters for Sun Cluster HA for Sun Grid Engine resources. For more information, see Specifying Configuration Parameters for Sun Cluster HA for Sun Grid Engine Resources.

  1. Register the SUNW.gds resource type.


    # scrgadm -a -t SUNW.gds
    
  2. Go to the directory that contains the script for creating the Sun Grid Engine resources.


    # cd /opt/SUNWscsge/util/
    
  3. Run the script that creates the Sun Grid Engine resources.


    # ./sge_register
    
  4. Bring online the failover resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster.

    This resource group contains the following resources:

    • Logical host name resource

    • HAStoragePlus resource

    • NFS resource

    • Sun Grid Engine application resources


    # scswitch -Z -g sge-rg
    
    -g sge-rg

    Specifies the resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster is to be brought online

Setting Sun Cluster HA for Sun Grid Engine Extension Properties

Extension properties for Sun Cluster HA for Sun Grid Engine resources are set when you run the script that creates these resources. You need to set these properties only if you require values other than the values that are set by the script. For information about Sun Cluster HA for Sun Grid Engine extension properties, see the SUNW.gds(5) man page. You can update some extension properties dynamically. You can update other properties, however, only when you create or disable a resource. The Tunable entry indicates when you can update a property.

To update an extension property of a resource, run the scrgadm(1M) command with the following option to modify the resource:


-x property=value 
-x property

Identifies the extension property that you are setting

value

Specifies the value to which you are setting the extension property

You can also use the procedures in “Administering Data Service Resources” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS to configure resources after the resources are created.

Verifying the Sun Cluster HA for Sun Grid Engine Installation and Configuration

After you install, register, and configure Sun Cluster HA for Sun Grid Engine, verify the Sun Cluster HA for Sun Grid Engine installation and configuration. Verifying the Sun Cluster HA for Sun Grid Engine installation and configuration determines if the Sun Cluster HA for Sun Grid Engine data service makes the Sun Grid Engine application highly available.

How to Verify the Sun Cluster HA for Sun Grid Engine Installation and Configuration

  1. Become superuser a node that will host Sun Grid Engine.

  2. Verify that all Sun Grid Engine resources are online.


    # scstat
    
  3. If a Sun Grid Engine resource is not online, enable the resource.


    # scswitch -e -j sge-rs
    
  4. Switch the Sun Grid Engine resource group to another cluster node.


    # scswitch -z -g sge-rg -h node
    

Tuning the Sun Cluster HA for Sun Grid Engine Fault Monitors

The Sun Cluster HA for Sun Grid Engine fault monitors verify that the following daemons are running correctly:

Each Sun Cluster HA for Sun Grid Engine fault monitor is contained in the resource that represents Sun Grid Engine component. You create these resources when you register and configure Sun Cluster HA for Sun Grid Engine. For more information, see Registering and Configuring Sun Cluster HA for Sun Grid Engine.

System properties and extension properties of these resources control the behavior of the fault monitor. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Sun Cluster installations. Therefore, you should tune the Sun Cluster HA for Sun Grid Engine fault monitor only if you need to modify this preset behavior.

Tuning the Sun Cluster HA for Sun Grid Engine fault monitors involves the following tasks:

For more information, see “Tuning Fault Monitors for Sun Cluster Data Services” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Debugging Sun Cluster HA for Sun Grid Engine

The config file in the /opt/SUNWscsge/etc directory enables you to activate debugging for Sun Grid Engine resources. This file enables you to activate debugging for all Sun Grid Engine resources or for a specific Sun Grid Engine resource on a particular node. If you require debugging for Sun Cluster HA for Sun Grid Engine to be enabled throughout the cluster, repeat this procedure on all nodes.

How to Activate Debugging for Sun Cluster HA for Sun Grid Engine

  1. Determine whether debugging for Sun Cluster HA for Sun Grid Engine is active.

    If debugging is inactive, daemon.notice is set in the file /etc/syslog.conf.


    # grep daemon /etc/syslog.conf
    *.err;kern.debug;daemon.notice;mail.crit        /var/adm/messages
    *.alert;kern.err;daemon.err                     operator
    #
  2. If debugging is inactive, edit the /etc/syslog.conf file to change daemon.notice to daemon.debug.

  3. Confirm that debugging for Sun Cluster HA for Sun Grid Engine is active.

    If debugging is active, daemon.debug is set in the file /etc/syslog.conf.


    # grep daemon /etc/syslog.conf
    *.err;kern.debug;daemon.debug;mail.crit        /var/adm/messages
    *.alert;kern.err;daemon.err                    operator
    #
  4. Restart the syslogd daemon.


    # pkill -1 syslogd
    
  5. Edit the /opt/SUNWscsge/etc/config file to change DEBUG= to DEBUG=ALL or DEBUG=sge-rs.


    # cat /opt/SUNWscsge/etc/config
    #
    # Copyright 2003 Sun Microsystems, Inc.  All rights reserved.
    # Use is subject to license terms.
    #
    # Usage:
    #       DEBUG=<RESOURCE_NAME> or ALL
    #
    DEBUG=ALL
    #

    Note –

    To deactivate debugging, reverse the preceding steps.