Sun Cluster Data Service for Sun Grid Engine Guide for Solaris OS

Installing and Configuring Sun Cluster HA for Sun Grid Engine

This chapter explains how to install and configure Sun Cluster HA for Sun Grid Engine.


Note –

Sun Grid Engine was formerly known as Sun ONE Grid Engine. In this book, references to Sun Grid Engine also apply to Sun ONE Grid Engine unless this book explicitly states otherwise.


This chapter contains the following sections.

Sun Cluster HA for Sun Grid Engine Overview

Sun Grid Engine is a distributed resource management program, which runs jobs in parallel on multiple machines. To minimize the loss of work that a failure of a machine might cause, nodes in the management tier must be protected against failure. However, protection of individual execution nodes in the grid against failure is not required. Failure of an individual execution node in a grid causes only a minor loss of work.

To eliminate single points of failure in the management tier of a Sun Grid Engine system, Sun Cluster HA for Sun Grid Engine provides fault monitoring and automatic fault recovery for the following Sun Grid Engine daemons:

You must configure Sun Cluster HA for Sun Grid Engine as a failover service.

For conceptual information about failover data services and scalable data services, see Sun Cluster Concepts Guide for Solaris OS.

Because the management tier relies on the Sun Grid Engine file system, the NFS server that exports this file system must also be protected against failure. To eliminate single points of failure in the NFS server, use the Sun Cluster HA for NFS data service. For more information about this data service, see Sun Cluster Data Service for NFS Guide for Solaris OS.

Each component of Sun Grid Engine has a data service that protects the component when the component is configured in Sun Cluster. See the following table.

Table 1 Protection of Sun Grid Engine Components by Sun Cluster Data Services

Sun Grid EngineComponent 

Data Service 

Sun Grid Engine daemons: 

  • Queue master daemon (sge_qmaster)

  • Scheduling daemon (sge_schedd)

Sun Cluster HA for Sun Grid Engine 

The resource type is SUNW.gds.

NFS server 

Sun Cluster HA for NFS 

The resource type is SUNW.nfs.

Overview of Installing and Configuring Sun Cluster HA for Sun Grid Engine

The following table summarizes the tasks for installing and configuring Sun Cluster HA for Sun Grid Engine and provides cross-references to detailed instructions for performing these tasks. Perform the tasks in the order that they are listed in the table.

Table 2 Tasks for Installing and Configuring Sun Cluster HA for Sun Grid Engine

Task 

Instructions 

Plan the installation 

Sun Cluster HA for Sun Grid Engine Overview

Planning the Sun Cluster HA for Sun Grid Engine Installation and Configuration

Prepare the nodes and disks 

Preparing the Nodes and Disks

Install and configure Sun Grid Engine 

Installing and Configuring Sun Grid Engine

Verify Sun Cluster HA for Sun Grid Engine installation and configuration 

Verifying the Installation and Configuration of Sun Grid Engine

Install Sun Cluster HA for Sun Grid Engine Packages 

Installing the Sun Cluster HA for Sun Grid Engine Packages

Configure the HAStoragePlus resource type to work with Sun Cluster HA for Sun Grid Engine

Configuring the HAStoragePlus Resource Type to Work With Sun Cluster HA for Sun Grid Engine

Configure Sun Cluster HA for NFS for use with Sun Cluster HA for Sun Grid Engine 

Configuring Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine

Register and Configure Sun Cluster HA for Sun Grid Engine 

Registering and Configuring Sun Cluster HA for Sun Grid Engine

Verify Sun Cluster HA for Sun Grid Engine installation and configuration 

Verifying the Sun Cluster HA for Sun Grid Engine Installation and Configuration

Tune Sun Cluster HA for Sun Grid Engine fault monitors 

Tuning the Sun Cluster HA for Sun Grid Engine Fault Monitors

Debug Sun Cluster HA for Sun Grid Engine 

Debugging Sun Cluster HA for Sun Grid Engine

Planning the Sun Cluster HA for Sun Grid Engine Installation and Configuration

This section contains the information that you need to plan your Sun Cluster HA for Sun Grid Engine installation and configuration.


Note –

Before you begin, consult your Sun Grid Engine documentation for configuration restrictions and requirements that are not imposed by Sun Cluster software.


Configuration Restrictions

The configuration restrictions in the subsections that follow apply only to Sun Cluster HA for Sun Grid Engine.


Caution – Caution –

Your data service configuration might not be supported if you do not observe these restrictions.


Sun Grid Engine Shadow Daemon

Do not use the Sun Grid Engine shadow daemon. The Sun Grid Engine shadow daemon provides an optional mechanism for recovery from failures. This mechanism interferes with the automatic fault recovery that Sun Cluster provides.

Sun Grid Engine Berkley DB spooling server

Do not choose the option to use a Berkley DB spooling server. Either choose the Classic spooling method or the local Berkley DB spooling method. Currently it is not possible to configure the Berkley DB spooling server in a highly available way within the Sun Cluster framework.

Start at Boot Option

Do not choose the start at boot option when installing Sun Grid Engine. To ensure that Sun Cluster HA for Sun Grid Engine can provide fault monitoring and automatic fault recovery, Sun Grid Engine must be started only by Sun Cluster.

Configuration Requirements

The configuration requirements in this section apply only to Sun Cluster HA for Sun Grid Engine.


Caution – Caution –

If your data service configuration does not conform to these requirements, the data service configuration might not be supported.


Sun Grid Engine Software Version Requirements

Use Sun Grid Engine version 6.0. Make sure to apply the most recent available Patches to the Sun Grid Engine software.

Operating System for the Sun Grid Engine Management Tier

The Sun Grid Engine management tier must run on Sun Cluster nodes. Because Sun Cluster runs only on the Solaris Operating System, the Sun Grid Engine management tier must also run on the Solaris Operating System. However, Sun Grid Engine supports other operating systems. Therefore, this requirement applies only to the management tier, not to individual execution nodes in the grid.

Memory Requirements

Ensure that enough free memory is available on the cluster nodes where you plan to run the Sun Grid Engine master.

The amount of free memory that is required on each cluster node depends on the number of jobs that are running on the grid. For example:

Disk Space Requirements

Ensure that you have enough disk space in the Sun Grid Engine file system and on the local disk of each node.

The disk space requirements for each type of file or directory in the Sun Grid Engine file system are listed in the following table.

File Type or Directory Type 

Required Disk Space 

Binary files 

15 Mbytes for each architecture 

Spool directories 

30–200 Mbytes 

Installation tar file 

40 Mbytes 

On the local disk of each node, 10–20 Mbytes of disk space are required. If you are installing the Sun Grid Engine software on the local disk of a node, 15 Mbytes of disk space are additionally required for the binary files.

Sun Cluster HA for Sun Grid Engine Configuration Requirements

Configure Sun Cluster HA for Sun Grid Engine as a failover data service. You cannot configure Sun Cluster HA for Sun Grid Engine as a scalable data service. For more information, see:


Note –

If you are using the Solaris 10 OS, install and configure this data service to run only in the global zone. At publication of this document, this data service is not supported in non-global zones. For updated information about supported configurations of this data service, contact your Sun service representative.


NFS Configuration for the Sun Grid Engine File System

The Sun Grid Engine file system must reside on a multihost disk. This disk must be available to the other nodes in the cluster that will be used for the Sun Grid Engine administrative services,

You must use NFS to export the Sun Grid Engine file system to the noncluster nodes. The NFS server that exports this file system must also be protected against failure. To protect the NFS server against failure, use the Sun Cluster HA for NFS data service. For more information about this data service, see Sun Cluster Data Service for NFS Guide for Solaris OS.

Sun Cluster HA for NFS Configuration Requirements

Configure the resources for the Sun Grid Engine management tier in the same resource group as the resource for NFS. For more information, see Configuring Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine.

Dependencies Between Sun Grid Engine Components

The dependencies between Sun Grid Engine components are shown in the following table.

Table 3 Dependencies Between Sun Grid Engine Components

Sun Grid Engine Component 

Dependency 

Sun Grid Engine queue master daemon (sge_qmaster)

SUNW.HAStoragePlus resource

Sun Grid Enginescheduling daemon (sge_schedd)

Sun Grid Engine queue master daemon (sge_qmaster) resource

These dependencies are set when you register and configure Sun Cluster HA for Sun Grid Engine. For more information, see Registering and Configuring Sun Cluster HA for Sun Grid Engine.

Configuration Considerations

The configuration considerations in the subsections that follow affect the installation and configuration of Sun Cluster HA for Sun Grid Engine.

Location of the Sun Grid Engine Binary Files

You can install Sun Grid Engine on one of the following locations:

For the advantages and disadvantages of placing the Sun Grid Engine binary files on a highly available local file system and the cluster file system, see Configuration Guidelines for Sun Cluster Data Services in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.


Tip –

To enable the type of file system to be identified from the mount point, use a prefix that indicates the type of file system as follows:


File Systems for Spool Directories and Binary Files

The optimum distribution of spool directories and binary files among file systems depends on the grid configuration. See the following table.

Grid Configuration 

File System Configuration 

The execution tier contains fewer than 200 hosts. 

Use a single shared NFS file system under the root of the Sun Grid Engine file system for the spool directories and binary files. 

The execution tier contains about 200 hosts, or the applications are disk intensive. 

Use a separate area on an NFS file system for the spool directories. 

The execution tier contains more than 200 hosts, or NFS performance is likely to be a problem. 

See the Sun Grid Engine documentation for alternate grid configurations. 

Configuration Planning Questions

Use the questions in this section to plan the installation and configuration of Sun Cluster HA for Sun Grid Engine. Write the answers to these questions in the space that is provided on the data service worksheets in Configuration Worksheets in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Preparing the Nodes and Disks

Preparing the nodes and disks modifies the configuration of the operating system to enable Sun Cluster HA for Sun Grid Engine to eliminate single points of failure in a Sun Grid Engine system.

Before you begin, ensure that the requirements in the following sections are met:

ProcedureHow to Prepare the Nodes and Disks

  1. Become superuser on all the cluster nodes where you are installing Sun Grid Engine.

  2. Create an administrative user account for Sun Grid Engine on all those cluster nodes.

    Either select an existing user account other than root for the grid administration, or create an account specifically for grid administration.


    Tip –

    For consistency with the Sun Grid Engine documentation, name the account sgeadmin.


  3. Create a directory for the root of Sun Grid Engine file system.


    # mkdir sge-root-dir
    

    Note –

    The sge-root-dir must reside in the cluster filesystem. Refer to Configuring the HAStoragePlus Resource Type to Work With Sun Cluster HA for Sun Grid Engine for more details.


  4. Change the owner of the root of the Sun Grid Engine file system to the administrative user whose account you created in Step 2.


    # chown sge-admin sge-root-dir
    
  5. Set the mode of the root of Sun Grid Engine file system to drwxr-xr-x .


    # chmod 755 sge-root-dir
    
  6. Specify the port number and protocol for the sge_qmaster and sge_execd services.

    Choose an unused port number below 1024. The sge_qmaster and sge_execd services are to be provided through Transmission Control Protocol (TCP).

    To specify the port number and protocol, add the following line to the /etc/services file.

    sge_qmaster	port-no/tcp
    sge_execd  	port-no/tcp
  7. For each type of host in the grid, create a plain text file that contains the names of all hosts of that type in the grid.

    The install_qmaster script uses these files when you install Sun Grid Engine. Create a separate file for each type of host in the grid:

    • Execution hosts

    • Administrative hosts

    • Submit hosts


Example 1 Preparing the Nodes and Disks for the Installation of Sun Grid Engine

This example shows how to prepare the nodes and disks for a Sun Grid Engine installation that is to be configured as follows:

The sequence of operations for preparing the nodes and disks for the installation of Sun Grid Engine is as follows:

  1. To create the /global/gridmaster directory for the root of Sun Grid Engine file system, the following command is run:


    # mkdir /global/gridmaster
    
  2. To change the owner of the /global/gridmaster directory to the sgeadmin user, the following command is run:


    # chown sgeadmin /global/gridmaster
    
  3. To set the mode of the /global/gridmaster directory to drwxr-xr-x, the following command is run:


    # chmod 755 /global/gridmaster
    
  4. To specify that the sge_qmaster service is to be provided through port 536 and TCP, and that the sge_execd service is to be provided through port 537 and TCP, the following line is added to the /etc/services file:

    sge_qmaster	536/tcp
    sge_execd  	537/tcp

Installing and Configuring Sun Grid Engine

The procedure that follows explains only the special requirements for installing Sun Grid Engine for use with Sun Cluster HA for Sun Grid Engine. For complete information about installing and configuring Sun Grid Engine, see your Sun Grid Engine documentation.

To enable Sun Grid Engine to run in a cluster, you must modify Sun Grid Engine to use a logical host name.

ProcedureHow to Install and Configure Sun Grid Engine

Before you begin, ensure that you have the host names of all hosts in the grid. Create a separate list of host names for each type of host in the grid:

  1. Become superuser of the cluster node where you are installing Sun Grid Engine.

  2. Install the Sun Grid Engine distribution files. You have to choose between the tar.gz format and the pkgadd format.

    Follow the instructions outlined in How to Load the Distribution Files On a Workstation in N1 Grid Engine 6 Installation Guide in the N1 Grid Engine 6 Installation Guide.


    Note –

    If you choose the pkgadd format, you need to make sure to install Patches for the Sun Grid Engine software on exactly the same node the Sun Grid Engine packages are registered on.


  3. Set the SGE_ROOT environment variable to the directory for the root of Sun Grid Engine file system that you created in Preparing the Nodes and Disks.


    # SGE_ROOT=sge-root-dir 
    # export SGE_ROOT
    
  4. Go to the directory for the root of Sun Grid Engine file system.


    # cd sge-root-dir
    
  5. Start the script that installs the Sun Grid Engine master host.


    # ./install_qmaster
    
  6. Follow the prompts on screen to provide or confirm the following information:

    • The name of the Sun Grid Engine administrative user

    • The value of the SGE_ROOT environment variable

    • The TCP port number

    • The name of the Sun Grid Engine cell to be configured

    • The path to the spool directory

    • The setup for the correct file permissions

    • Details of your domain name service (DNS) domains

  7. When you are asked whether you want to use classic spooling or Berkley DB, do not choose to use a Berkely DB spooling Server.

    Either choose the classic spooling method, or choose Berkley DB with local spooling.

  8. When you are prompted, specify the range of group IDs for Sun Grid Engine to use.

    To ensure that you allocate enough group IDs, specify a range of approximately 100 group IDs, for example, 20000-20100.

  9. Follow the prompts on screen to provide or confirm the following information:

    • The path to the spooling directory for the execution daemon

    • The email address of the user who should receive problem reports

    • Confirm the configuration parameters

  10. When you are asked if you want to install the script that starts Sun Grid Engine at boot time, reply no.

    You are asked if you want to install the script that starts Sun Grid Engine at boot time.


    We can install the startup script that will
    start qmaster/scheduler at machine boot (y/n) [y] >> n
    

    To ensure that Sun Cluster HA for Sun Grid Engine can provide fault monitoring and automatic fault recovery, Sun Grid Engine must be started only by Sun Cluster.

  11. Follow the prompts on screen to provide or confirm the following information:

    • Specify the list of execution, admin and submit hosts

    • Do not use a shadow host

    • Select a scheduler profile

ProcedureHow to Enable Sun Grid Engine to Run in a Cluster

  1. Become superuser of a node in the cluster that will host Sun Grid Engine.

  2. Create a failover resource group to contain the Sun Cluster HA for Sun Grid Engine resources.

    Use the resource group that you identified when you answered the questions in Configuration Planning Questions.


    # clresourcegroup create -p Pathprefix=sge-root-dir sge-rg
    
    -p Pathprefix= sge-root-dir

    Specifies a directory on a cluster file system that Sun Cluster HA for NFS uses to maintain administrative and status information. This directory must be the directory that you created for the root of the Sun Grid Engine file system in Preparing the Nodes and Disks.

    sge-rg

    Specifies that the resource group that you are creating is named sge-rg.

  3. Add a resource for the Sun Grid Engine logical host name to the failover resource group that you created in Step 2.


    # clreslogicalhostname create  \
    -g sge-rg \
    -h hostlist \
    sge-lh-rs
    
    -g sge-rg

    Specifies that the logical host name resource is to be added to the failover resource group that you created in Step 2

    -h hostlist

    Specifies a comma-separated list of host names that are to be made available by this logical host name resource

    sge-lh-rs

    Specifies that the resource that you are creating is named sge-lh-rs

Verifying the Installation and Configuration of Sun Grid Engine

Before you install the Sun Cluster HA for Sun Grid Engine packages, verify that the Sun Grid Engine software is correctly installed and configured to run in a cluster. This verification does not verify that the Sun Grid Engine application is highly available because the Sun Cluster HA for Sun Grid Engine data service is not yet installed.


Note –

If any step in this procedure fails, see your Sun Grid Engine documentation for more information about how to verify the Sun Grid Engine installation.


ProcedureHow to Verify the Installation and Configuration of Sun Grid Engine

You verify the installation and configuration of Sun Grid Engine by submitting a dummy job and checking that the required processes are running.

  1. Log in to the master host as the administrative user whose account you created in Preparing the Nodes and Disks.

  2. Set the SGE_ROOT environment variable to the directory for the root of Sun Grid Engine file system that you created in Preparing the Nodes and Disks.


    $ SGE_ROOT=sge-root-dir 
    $ export SGE_ROOT
    
  3. Start the script that modifies your environment to enable Sun Grid Engine to run.


    $ . $SGE_ROOT/default/common/settings.sh
    
  4. Submit a dummy job to Sun Grid Engine.


    $ qsub $SGE_ROOT/examples/jobs/sleeper.sh
    your job 1 (*Sleeper*) has been submitted 
  5. On the master host, confirm that these processes are running:

    • sge_qmaster

    • sge_schedd


    #  ps -ef | grep sge_ 
    root  429  1  0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_qmaster
    root  429  1  0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_schedd
  6. View the global configuration of the grid.

    • If you are using the command line, type the following command:


      $ qconf -sconf
      
    • If you are using the QMON graphical user interface (GUI), select Cluster Configuration.

  7. On at minimum one execution host, confirm that these processes are running:

    • sge_execd


    #  ps -ef | grep sge_ 
    root  451  1 0  Jul 27 3:37 /global/gridmaster/bin/solaris64/sge_execd

Installing the Sun Cluster HA for Sun Grid Engine Packages

If you did not install the Sun Cluster HA for Sun Grid Engine packages during your initial Sun Cluster installation, perform this procedure to install the packages. To install the packages, use the Sun JavaTM Enterprise System Installation Wizard.

ProcedureHow to Install the Sun Cluster HA for Sun Grid Engine Packages

Perform this procedure on each cluster node where you are installing the Sun Cluster HA for Sun Grid Engine packages.

You can run the Sun Java Enterprise System Installation Wizard with a command-line interface (CLI) or with a graphical user interface (GUI). The content and sequence of instructions in the CLI and the GUI are similar.


Note –

Install the packages for this data service in the global zone.


Before You Begin

Ensure that you have the Sun Java Availability Suite DVD-ROM.

If you intend to run the Sun Java Enterprise System Installation Wizard with a GUI, ensure that your DISPLAY environment variable is set.

  1. On the cluster node where you are installing the data service packages, become superuser.

  2. Load the Sun Java Availability Suite DVD-ROM into the DVD-ROM drive.

    If the Volume Management daemon vold(1M) is running and configured to manage DVD-ROM devices, the daemon automatically mounts the DVD-ROM on the /cdrom directory.

  3. Change to the Sun Java Enterprise System Installation Wizard directory of the DVD-ROM.

    • If you are installing the data service packages on the SPARC® platform, type the following command:


      # cd /cdrom/cdrom0/Solaris_sparc
      
    • If you are installing the data service packages on the x86 platform, type the following command:


      # cd /cdrom/cdrom0/Solaris_x86
      
  4. Start the Sun Java Enterprise System Installation Wizard.


    # ./installer
    
  5. When you are prompted, accept the license agreement.

    If any Sun Java Enterprise System components are installed, you are prompted to select whether to upgrade the components or install new software.

  6. From the list of Sun Cluster agents under Availability Services, select the data service for Sun Grid Engine.

  7. If you require support for languages other than English, select the option to install multilingual packages.

    English language support is always installed.

  8. When prompted whether to configure the data service now or later, choose Configure Later.

    Choose Configure Later to perform the configuration after the installation.

  9. Follow the instructions on the screen to install the data service packages on the node.

    The Sun Java Enterprise System Installation Wizard displays the status of the installation. When the installation is complete, the wizard displays an installation summary and the installation logs.

  10. (GUI only) If you do not want to register the product and receive product updates, deselect the Product Registration option.

    The Product Registration option is not available with the CLI. If you are running the Sun Java Enterprise System Installation Wizard with the CLI, omit this step

  11. Exit the Sun Java Enterprise System Installation Wizard.

  12. Unload the Sun Java Availability Suite DVD-ROM from the DVD-ROM drive.

    1. To ensure that the DVD-ROM is not being used, change to a directory that does not reside on the DVD-ROM.

    2. Eject the DVD-ROM.


      # eject cdrom
      
Next Steps

Refer to the Sun Cluster Data Service for NFS Guide for Solaris OS on how to also install the Sun Cluster HA for NFS packages.

Configuring the HAStoragePlus Resource Type to Work With Sun Cluster HA for Sun Grid Engine

For maximum availability of the Sun Grid Engine application, resources that Sun Cluster HA for Sun Grid Engine requires must be available before the Sun Grid Engine management tier is started. An example of such a resource is the Sun Grid Engine file system. To ensure that these resources are available, configure the HAStoragePlus resource type to work with Sun Cluster HA for Sun Grid Engine.

For information about the relationship between resource groups and disk device groups, see Relationship Between Resource Groups and Device Groups in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Configuring the HAStoragePlus resource type to work with Sun Cluster HA for Sun Grid Engine involves the following operations:

ProcedureHow to Register and Configure an HAStoragePlus Resource

  1. Become superuser on a node in the cluster that will host Sun Grid Engine.

  2. Register the SUNW.HAStoragePlus resource type.


    # clresourcetype register SUNW.HAStoragePlus
    
  3. Add an HAStoragePlus resource for the Sun Grid Engine file system to the resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster.


    # clresource create \
    -g sge-rg \
    -t SUNW.HAStoragePlus \
    -p FilesystemMountPoints=sge-root \
    sge-hasp-rs
    
    -g sge-rg

    Specifies that the resource is to be added to the resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster

    -p FilesystemMountPoints=sge-root

    Specifies that the mount point for this file system is the root of the Sun Grid Engine file system

    sge-hasp-rs

    Specifies that the resource that you are creating is named sge-hasp-rs

Configuring Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine

You must use NFS to export the Sun Grid Engine file system to the noncluster nodes. The NFS server that exports this file system must also be protected against failure. To protect the NFS server against failure, use the Sun Cluster HA for NFS data service.

The procedure that follows explains only the special requirements for using Sun Cluster HA for NFS with Sun Cluster HA for Sun Grid Engine. For complete information about installing and configuring Sun Cluster HA for NFS, see Sun Cluster Data Service for NFS Guide for Solaris OS.

ProcedureHow to Configure Sun Cluster HA for NFS for Use With Sun Cluster HA for Sun Grid Engine


Note –

Commands in this procedure assume that you have set the $SGE_ROOT environment variable to specify the root of the Sun Grid Engine file system.


  1. Register the SUNW.nfs resource type.


    # clresourcetype register SUNW.nfs
    
  2. From any cluster node, create a directory for NFS configuration files.

    Create the directory under root of the Sun Grid Engine file system. Name the directory SUNW.nfs.


    # mkdir -p $SGE_ROOT/SUNW.nfs
    
  3. In the directory that you created in Step 2, create a file that contains the share command for the root of the Sun Grid Engine file system.

    Name the file the dfstab.sge-nfs-rs, where sge-nfs-rs is the name of the NFS resource that you will create in Step 4.


    # echo "share -F nfs -o rw sge-root" \
     > $SGE_ROOT/SUNW.nfs/dfstab.sge-nfs-rs
    
  4. Add a SUNW.nfs resource to the failover resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster.


    # clresource create \
    -g sge-rg \
    -t SUNW.nfs \
    -p Resource_dependencies=sge-hasp-rs \
    sge-nfs-rs
    

Example 2 Creating a dfstab File for the Root of the Sun Grid Engine File System

This example shows the command for creating a dfstab file for the root of the Sun Grid Engine file system.


# echo "share -F nfs -o rw /global/gridmaster" \
 > /global/gridmaster/SUNW.nfs/dfstab.sge-nfs-rs

Registering and Configuring Sun Cluster HA for Sun Grid Engine

Before you perform this procedure, ensure that the Sun Cluster HA for Sun Grid Engine data service packages are installed.

Use the configuration and registration files in the /opt/SUNWscsge/util directory to register the Sun Cluster HA for Sun Grid Engine resources. The files define the dependencies that are required between Sun Grid Engine components. For information about these dependencies, see Dependencies Between Sun Grid Engine Components. For a listing of these files, see Appendix A, Files for Configuring and Removing Sun Cluster HA for Sun Grid Engine Resources.

Registering and configuring Sun Cluster HA for Sun Grid Engine involves the tasks that are explained in the following sections:

  1. Specifying Configuration Parameters for Sun Cluster HA for Sun Grid Engine Resources

  2. How to Create and Enable Sun Cluster HA for Sun Grid Engine Resources

Specifying Configuration Parameters for Sun Cluster HA for Sun Grid Engine Resources

Sun Cluster HA for Sun Grid Engineprovides scripts that automate the process of configuring and removing Sun Cluster HA for Sun Grid Engine resources. These scripts obtain configuration parameters from the sge_config file in the /opt/SUNWscsge/util/ directory. To specify configuration parameters for Sun Cluster HA for Sun Grid Engine resources, edit the sge_config file.

Each configuration parameter in the sge_config file is defined as a keyword-value pair. The sge_config file already contains the required keywords and equals signs. For more information, see Listing of sge_config. When you edit the sge_config file, add the required value to each keyword. Use the values that you identified in Configuration Planning Questions.

The keyword-value pairs in the sge_config file are as follows:

QMASTERRS=sge-qmaster-rs
SCHEDDRS=sge-schedd-rs
MASTERRG=sge-rg
MASTERLH=sge-lh-rs
MASTERPORT=portno
MASTERHASP=sge-hasp-rs
SGE_ROOT=sge-root-dir
SGE_CELL=cell-name
SGE_VER=6.0

The meaning and permitted values of the keywords in the sge_config file are as follows:

QMASTERRS=sge-qmaster-rs

Specifies the name that you are assigning to the resource for the Sun Grid Engine queue master daemon sge_qmaster. This must be defined.

SCHEDDRS=sge-schedd-rs

Specifies the name that you are assigning to the resource for the Sun Grid Engine scheduling daemon sge_schedd. This must be defined.

MASTERRG=sge-rg

Specifies the name of the resource group that contains the Sun Cluster HA for Sun Grid Engine resources. This name must be the name that you assigned when you created the resource group as explained in How to Enable Sun Grid Engine to Run in a Cluster. This must be defined.

MASTERLH=sge-lh-rs

Specifies the name of the logical host name resource for Sun Grid Engine. This name must be the name that you assigned when you created the resource in How to Enable Sun Grid Engine to Run in a Cluster. This must be defined.

MASTERPORT=portno

Specifies the port number that is configured for sge_qmaster, the default is set to 536. It must be an integer and must be defined.

MASTERHASP=sge-hasp-rs

Specifies the name of the SUNW.HAStoragePlus resource for Sun Grid Engine. This name must be the name that you assigned when you created the resource in Configuring the HAStoragePlus Resource Type to Work With Sun Cluster HA for Sun Grid Engine. If this resource is used it must be defined.

SGE_ROOT=sge-root-dir

Specifies the root directory of the Sun Grid Engine file system. This directory must be the directory that you created for root of the Sun Grid Engine file system in Preparing the Nodes and Disks. This must be defined.

SGE_CELL=cell-name

Specifies the cell that Sun Grid Engine references. This must be defined.

SGE_VER=6.0

Specifies the version of the installed Sun Grid Engine configuration. This keyword needs to be defined and can currently only have the value of "6.0".


Example 3 Sample sge_config File

This example shows an sge_config file in which configuration parameters are set as follows:

QMASTERRS=sge_qmaster-rs
SCHEDDRS=sge_schedd-rs
MASTERRG=sge-rg
MASTERLH=sge-lh-rs
MASTERPORT=536
MASTERHASP=sge-hasp-rs
SGE_ROOT=/global/gridmaster
SGE_CELL=default
SGE_VER=6.0

ProcedureHow to Create and Enable Sun Cluster HA for Sun Grid Engine Resources

Before you begin, ensure that you have edited the sge_config file or a copy of it to specify configuration parameters for Sun Cluster HA for Sun Grid Engine resources. For more information, see Specifying Configuration Parameters for Sun Cluster HA for Sun Grid Engine Resources.

  1. Register the SUNW.gds resource type.


    # clresourcetype register SUNW.gds
    
  2. Go to the directory that contains the script for creating the Sun Grid Engine resources.


    # cd /opt/SUNWscsge/util/
    
  3. Run the script that creates the Sun Grid Engine resources.


    # ./sge_register -f /mypath/sge_config
    
  4. Bring online the failover resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster.

    This resource group contains the following resources:

    • Logical host name resource

    • HAStoragePlus resource

    • NFS resource

    • Sun Grid Engine application resources


    # clresourcegroup online -M sge-rg
    
    sge-rg

    Specifies the resource group that you created in How to Enable Sun Grid Engine to Run in a Cluster is to be brought online


    Caution – Caution –

    Make sure that the Sun Grid Engine daemons (sge_qmaster and sge_schedd) are not running before bringing the failover resource group online. They may be running because the install_qmaster installation script started them or they are still running after performing the verification described in How to Verify the Sun Cluster HA for Sun Grid Engine Installation and Configuration.


Setting Sun Cluster HA for Sun Grid Engine Extension Properties

Extension properties for Sun Cluster HA for Sun Grid Engine resources are set when you run the script that creates these resources. You need to set these properties only if you require values other than the values that are set by the script. For information about Sun Cluster HA for Sun Grid Engine extension properties, see the SUNW.gds(5) man page. You can update some extension properties dynamically. You can update other properties, however, only when you create or disable a resource. The Tunable entry indicates when you can update a property.

To update an extension property of a resource, run the clresource(1CL) command with the following option to modify the resource:


-p property=value 
-p property

Identifies the extension property that you are setting

value

Specifies the value to which you are setting the extension property

You can also use the procedures in Chapter 2, Administering Data Service Resources, in Sun Cluster Data Services Planning and Administration Guide for Solaris OS to configure resources after the resources are created.

Verifying the Sun Cluster HA for Sun Grid Engine Installation and Configuration

After you install, register, and configure Sun Cluster HA for Sun Grid Engine, verify the Sun Cluster HA for Sun Grid Engine installation and configuration. Verifying the Sun Cluster HA for Sun Grid Engine installation and configuration determines if the Sun Cluster HA for Sun Grid Engine data service makes the Sun Grid Engine application highly available.

ProcedureHow to Verify the Sun Cluster HA for Sun Grid Engine Installation and Configuration

  1. Become superuser a node that will host Sun Grid Engine.

  2. Verify that all Sun Grid Engine resources are online.


    # cluster status -t rg,rs
    
  3. If a Sun Grid Engine resource is not online, enable the resource.


    # clresource enable sge-rs
    
  4. Switch the Sun Grid Engine resource group to another cluster node.


    # clresourcegroup switch -n node sge-rg
    

Tuning the Sun Cluster HA for Sun Grid Engine Fault Monitors

The Sun Cluster HA for Sun Grid Engine fault monitors verify that the following daemons are running correctly:

Each Sun Cluster HA for Sun Grid Engine fault monitor is contained in the resource that represents Sun Grid Engine component. You create these resources when you register and configure Sun Cluster HA for Sun Grid Engine. For more information, see Registering and Configuring Sun Cluster HA for Sun Grid Engine.

System properties and extension properties of these resources control the behavior of the fault monitor. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Sun Cluster installations. Therefore, you should tune the Sun Cluster HA for Sun Grid Engine fault monitor only if you need to modify this preset behavior.

Tuning the Sun Cluster HA for Sun Grid Engine fault monitors involves the following tasks:

For more information, see Tuning Fault Monitors for Sun Cluster Data Services in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Debugging Sun Cluster HA for Sun Grid Engine

The config file in the /opt/SUNWscsge/etc directory enables you to activate debugging for Sun Grid Engine resources. This file enables you to activate debugging for all Sun Grid Engine resources or for a specific Sun Grid Engine resource on a particular node. If you require debugging for Sun Cluster HA for Sun Grid Engine to be enabled throughout the cluster, repeat this procedure on all nodes.

ProcedureHow to Activate Debugging for Sun Cluster HA for Sun Grid Engine

  1. Determine whether debugging for Sun Cluster HA for Sun Grid Engine is active.

    If debugging is inactive, daemon.notice is set in the file /etc/syslog.conf.


    # grep daemon /etc/syslog.conf
    *.err;kern.debug;daemon.notice;mail.crit        /var/adm/messages
    *.alert;kern.err;daemon.err                     operator
    #
  2. If debugging is inactive, edit the /etc/syslog.conf file to change daemon.notice to daemon.debug.

  3. Confirm that debugging for Sun Cluster HA for Sun Grid Engine is active.

    If debugging is active, daemon.debug is set in the file /etc/syslog.conf.


    # grep daemon /etc/syslog.conf
    *.err;kern.debug;daemon.debug;mail.crit        /var/adm/messages
    *.alert;kern.err;daemon.err                    operator
    #
  4. Restart the syslogd daemon.

    • If your operating system is Solaris 9, perform:


      # pkill -1 syslogd
      
    • If your operating system is Solaris 10, perform:


      # svcadm restart system-log
      
  5. Edit the /opt/SUNWscsge/etc/config file to change DEBUG= to DEBUG=ALL or DEBUG=sge-rs.


    # cat /opt/SUNWscsge/etc/config
    #
    # Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
    # Use is subject to license terms.
    #
    # ident "@(#)config     1.1     06/02/18 SMI"
    #
    # Usage:
    #       DEBUG=<RESOURCE_NAME> or ALL
    #
    DEBUG=ALL
    #

    Note –

    To deactivate debugging, reverse the preceding steps.