Sun N1 Grid Engine 6.1 Installation Guide

Chapter 3 Automating the Installation Process

This chapter describes how you can automate the software installation process for grid engine software.

Automatic Installation Overview

You can use the sge-root/inst_sge utility to install and uninstall N1 Grid Engine master hosts, execution hosts, shadow host, and Berkeley DB spooling server hosts. You can also use this utility to backup automatically the N1 Grid Engine configuration and accounting data. You can use the inst_sge utility in interactive mode to supplant any of the commands that were described in Chapter 2, Installing the N1 Grid Engine Software Interactively.


Note –

The use of a Berkeley DB Spooling Server host does not provide high availability. In addition, the Berkeley DB Spooling Server has no authentication mechanism, and should only be used on a closed network with fully trusted users.


To simplify automatic installation and backup processes, use the configuration templates that are located in the sge-root/util/install_modules directory.

The automatic installation requires no user interaction. No messages are displayed on the terminal during the installation. When the installation finishes, a message indicates where the installation log file resides. The name of the installation log file is of the form install_hostname_timestamp.log. Normally, you can find information about errors during installation in this file. In case of serious errors, the installation script might not be able to move the log file into the spool directory. In this situation, the log file is placed in the /tmp directory.

Special Considerations

The first step in performing an automatic installation is to set up a configuration file. You can find configuration file templates in the sge-rootT/util/install_modules directory. Consider the following as you plan your automatic installation:


Note –

If you start the automatic installation on the master host, the entire cluster can be installed with one command. The automatic installation script accesses the remote hosts through rsh or ssh and starts the installation remotely. This process requires a well-configured configuration file, which each host must be able to read. That file should be installed on each host or shared through NFS.


Using the inst_sge Utility and a Configuration Template

To automate system installation, use the inst_sge utility in combination with a configuration file .


Note –

You cannot use the auto installation procedure to install remotely a Windows execution host. You must run the auto installation procedure directly on the Windows execution host.


ProcedureHow to Automate the Master Host Installation

Before You Begin

You need to complete the planning process as outlined in Plan the Installation.

In addition, you need to be able to connect to each of the remote hosts using the rsh or ssh commands, without supplying a password. If this type of access is not allowed on your network, you cannot use this method of installation.

  1. Create a copy of the configuration template, sge-root/util/install_modules/inst_template.conf.


    # cd sge-root/util/install_modules
    # cp inst_template.conf my_configuration.conf
    
  2. Edit your configuration template, using the values from the worksheet you completed in Plan the Installation.

    The configuration file template includes liberal comments to help you decide where appropriate information belongs. See Configuration File Templates.

  3. Log in as root on the system that you want to be the N1 Grid Engine master host.

  4. Create the sge-root directory.

    The sge-root directory is the root directory of the N1 Grid Engine software hierarchy.

  5. Go to the sge-root directory and start the installation.


    # cd sge-root
    # ./inst_sge -m -auto full-path-to-configuration-file
    

    The -m option starts the master host installation and installs the master daemon on the local machine. In addition, the -auto option sets up any remote hosts, as specified in the configuration file.


    Note –

    You cannot install remotely a master host. You must always install a master host locally.


    To prevent data loss or destroying already installed clusters, the automatic installation terminates if the configured $SGE_CELL directory or the configured Berkeley DB spooling directory already exists. If the installation terminates, the script displays the reason for the termination on the screen.

    A log file of the master installation is created in the sge-root/default/spool/qmaster directory. The file name is created using the format install_hostname_date_time.log.


    Tip –

    You can also combine options if you want to perform multiple installations with one command. For example, the following command installs the master daemon on the local machine and installs all execution hosts that are configured in the configuration file:


    ./inst_sge -m -x -auto full-path-to-configuration-file
    

    Wait for notification that the installation has completed.

    When the automatic installation exits successfully, it displays a message similar to the following:


    Install log can be found in: /opt/n1ge61/spool/install_myhost_30mar2007_090152.log

    The installation log file includes any script or error messages that were generated during installation. If the qmaster_spooling_dir directory exists, the log files will be in that directory. If the directory does not exist, the log files will be in the /tmp directory.

Troubleshooting

If you do not want your execution hosts to spool locally, be sure to set EXECD_SPOOL_DIR_LOCAL="", with no space between the double quotes ("").

Automating Other Installations Through a Configuration File

In addition to installing the master host, you can perform a variety of other automatic installations using a similar process. The actual form of the inst_sge command differs slightly, and different sections of the configuration file apply. This section provides some examples.

Configuration File Templates

Configuration file templates are located in the sge-root/util/install_modules directory.


Example 3–1 Example Configuration File

#-------------------------------------------------
# SGE default configuration file
#-------------------------------------------------

# Use always fully qualified pathnames, please

# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
SGE_ROOT="/opt/n1ge61"

# SGE_QMASTER_PORT is used by qmaster for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_QMASTER_PORT="6444"

# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_EXECD_PORT="6445"

# CELL_NAME, will be a dir in SGE_ROOT, contains the common dir
# Please enter only the name of the cell. No path, please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"

# ADMIN_USER, if you want to use a different admin user than the owner,
# of SGE_ROOT, you have to enter the user name, here
# Leaving this blank, the owner of the SGE_ROOT dir will be used as 
admin user
ADMIN_USER=""

# The dir, where qmaster spools this parts, which are not spooled by DB
#(mandatory for qmaster installation)
QMASTER_SPOOL_DIR="/opt/n1ge61/default/spool/qmaster"

# The dir, where the execd spools (active jobs)
# This entry is needed, even if your are going to use
# berkeley db spooling. Only cluster configuration and jobs will
# be spooled in the database. The execution daemon still needs a spool
# directory  
#(mandatory for qmaster installation)
EXECD_SPOOL_DIR="/opt/n1ge61/default/spool"

# For monitoring and accounting of jobs, every job will get
# unique GID. So you have to enter a free GID Range, which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation) 
GID_RANGE="20000-20100"

# If SGE is compiled with -spool-dynamic, you have to enter here, which
# spooling method should be used. (classic or berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"

# Name of the Server, where the Spooling DB is running on
# if spooling methode is berkeleydb, it must be "none", when
# using no spooling server and it must containe the servername
# if a server should be used. In case of "classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"

# The dir, where the DB spools
# If berkeley db spooling is used, it must contain the path to
# the spooling db. Please enter the full path. (eg. /tmp/data/spooldb)
# Remember, this directory must be local on the qmaster host or on the
# Berkeley DB Server host. No NSF mount, please
DB_SPOOLING_DIR="/opt/n1ge61/default/spooldb"

# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
ADMIN_HOST_LIST="host1"

# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
SUBMIT_HOST_LIST="host1"

# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
# (mandatory for execution host installation)
EXEC_HOST_LIST="host1"

# The dir, where the execd spools (local configuration)
# If you want configure your execution daemons to spool in
# a local directory, you have to enter this directory here.
# If you do not want to configure a local execution host spool directory
# please leave this empty
EXECD_SPOOL_DIR_LOCAL=""

# If true, the domainnames will be ignored, during the hostname resolving
# if false, the fully qualified domain name will be used for name resolving
HOSTNAME_RESOLVING="true"

# Shell, which should be used for remote installation (rsh/ssh)
# This is only supported, if your hosts and rshd/sshd is configured,
# not to ask for a password, or promting any message.
SHELL_NAME="rsh"

# Enter your default domain, if you are using /etc/hosts or NIS configuration
DEFAULT_DOMAIN="none"

# If a job stops, fails, finnish, you can send a mail to this adress
ADMIN_MAIL="my.name@sun.com"

# If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added,
# to start automatically during boottime
ADD_TO_RC="true"

#If this is "true" the file permissions of executables will be set to 755
#and of ordenary file to 644.  
SET_FILE_PERMS="true"

# This option is not implemented, yet.
# When a exechost should be uninstalled, the running jobs will be rescheduled
RESCHEDULE_JOBS="wait"

# Enter a one of the three distributed scheduler tuning configuration sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"

# The name of the shadow host. This host must have read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter the servername
# (mandatory for shadowhost installation)
SHADOW_HOST="hostname"

# Remove this execution hosts in automatic mode
# (mandatory for unistallation of executions hosts)
EXEC_HOST_LIST_RM="host2 host3 host4"

# This is a Windows specific part of the auto isntallation template
# If you going to install windows executions hosts, you have to enable the
# windows support. To do this, please set the WINDOWS_SUPPORT variable
# to "true". ("false" is disabled)
# (mandatory for qmaster installation, by default WINDOWS_SUPPORT is
# disabled)
WINDOWS_SUPPORT="false"

# Enabling the WINDOWS_SUPPORT, recommends the following parameter.
# The WIN_ADMIN_NAME will be added to the list of SGE managers.
# Without adding the WIN_ADMIN_NAME the execution host installation
# won't install correctly.
# WIN_ADMIN_NAME is set to "Administrator" which is default on most
# Windows systems. In some cases the WIN_ADMIN_NAME can be prefixed with
# the windows domain name (eg. DOMAIN+Administrator)
# (mandatory for qmaster installation)
WIN_ADMIN_NAME="Administrator"

# This parameter set the number of parallel installation processes.
# The prevent a system overload, or exeeding the number of open file
# descriptors the user can limit the number of parallel install processes.
# eg. set PAR_EXECD_INST_COUNT="20", maximum 20 parallel execd are installed.
PAR_EXECD_INST_COUNT="20"

Automatic Installation With Increased Security (CSP)

The automatic installation also supports the Certificate Security Protocol (CSP) mode described in Chapter 4, Installing the Increased Security Features. To use the CSP security mode, you must fill out the CSP parameters of the template files. The parameters are as follows:


# This section is used for csp installation mode.
# CSP_RECREATE recreates the certs on each installation, if true.
# In case of false, the certs will be created, if not existing.
# Existing certs won't be overwritten. (mandatory for csp install)
CSP_RECREATE="true"

# The created certs won't be copied, if this option is set to false
# If true, the script tries to copy the generated certs. This
# requires passwordless ssh/rsh access for user root to the
# execution hosts
CSP_COPY_CERTS="false"

# csp information, your country code (only 2 characters)
# (mandatory for csp install)
CSP_COUNTRY_CODE="DE"

# your state (mandatory for csp install)
CSP_STATE="Germany"

# your location, eg. the building (mandatory for csp install)
CSP_LOCATION="Building"

# your organisation (mandatory for csp install)
CSP_ORGA="Organisation"

# your organisation unit (mandatory for csp install)
CSP_ORGA_UNIT="Organisation_unit"

# your email (mandatory for csp install)
CSP_MAIL_ADDRESS="name@yourdomain.com"

To start the installation, type the following command:


inst_sge -m -csp -auto template-file-name

Note –

Certificates are created during the installation process. These certificates have to be copied to each host of the installed cluster. The installation process can do this for you; however, you need to perform the following steps to allow the installation process appropriate permissions to copy the certificates:

  1. Use rsh/rcp or ssh/scp on each host.

  2. Provide the root user with access to each host over ssh or rsh, without entering a password.


Automatic Uninstallation

You can also uninstall hosts automatically.


Note –

Uninstall all compute hosts before you uninstall the master host. If you uninstall the master host first, you have to uninstall all execution hosts manually.


To ensure that you have a clean environment, always source the $SGE_ROOT/$SGE_CELL/common/settings.csh file before proceeding.

Uninstalling Execution Hosts

During the execution host uninstallation, all configuration information for the targeted hosts is deleted. The uninstallation attempts to stop the exec hosts in a graceful manner. First, the queue instances associated with the target host of the uninstallation will be disabled, so that new jobs will not be started. Then, in sequence, the following actions are done on each running jobs: checkpoint the job; reschedule the job; do forced rescheduling of the job. At this point, the queue instance will be empty, and the execution daemon will be shut down, then the configuration, global spool directory or local spool directory will be removed.

The configuration file template has a section for identifying hosts that can be uninstalled automatically. Look for this section:


# Remove this execution hosts in automatic mode 
EXEC_HOST_LIST_RM="host1 host2 host3 host4"

Every host in the EXEC_HOST_LIST_RM list will be automatically removed from the cluster.

To start the automatic uninstallation of execution hosts, type the following command:


% ./inst_sge -ux -auto full-path-to-configuration-file

Uninstalling the Master Host

The master host uninstallation removes all of the N1 Grid Engine configuration files. After the uninstallation procedure completes, only the binary files remain. If you think that you will need the configuration information after the uninstallation, perform a backup of the master host. The master host uninstallation supports both interactive and automatic mode.

To start the automatic uninstallation of the master host, type the following command:


% ./inst_sge -um -auto full-path-to-configuration-file

This command performs the same procedure as in interactive mode, except the user is not prompted for confirmation of any steps and all terminal output is suppressed. Once the uninstall process is started, it cannot be stopped.

Uninstalling the Shadow Host

To start the automatic uninstallation of the shadow host, type the following command:


% ./inst_sge -usm -auto full-path-to-configuration-file

Uninstalling the Shadow Host

To start the automatic uninstallation of the shadow host, type the following command:


% ./inst_sge -usm -auto full-path-to-configuration-file

Automatic Backup

The automatic backup procedure backs up configuration and accounting data in much the same way as the interactive backup procedure. You can run the automatic backup procedure as a cron job if you want to schedule unattended or periodic backups. The automatic backup requires a configuration file, for which a template is located in the sge_root/ util/install_modules/backup_template.conf file.

Comments within the configuration file template indicate what values to use for your environment.

Starting an Automatic Backup

After you set up the configuration file, type the following command to start the automatic backup:


% ./inst_sge -bup -auto full-path-to-configuration-file

To prevent overwriting existing backup files, a date/time combination is added to the end of the backup directory name that is specified in the configuration file.


Example 3–2 Backup Configuration File


#--------------------------------------------------- 
# Autobackup Configuration File Template 
#--------------------------------------------------- 
# Please, enter your SGE_ROOT here (mandatory) 
SGE_ROOT="/opt/gridengine" 
# Please, enter your SGE_CELL here (mandatory) 
SGE_CELL="default" 
# Please, enter your Backup Directory here 
# After backup you will find your backup files here (mandatory) 
# The autobackup will add a time /date combination to this dirname 
# to prevent an overwriting! 
BACKUP_DIR="/opt/backups/ge_backup" 
# Please, enter true to get a tar/gz package 
# and false to copy the files only (mandatory) 
TAR="true" 
# Please, enter the backup file name here. (mandatory) 
BACKUP_FILE="backup.tar"

Troubleshooting Automatic Installation and Uninstallation

The following errors might be encountered during auto-installation.

Problem: If the sge_cell directory exists, the installation terminates to avoid overwriting a previous installation.

Solution: Remove or rename the directory.

Problem: If the Berkeley database spooling directory exists, the installation terminates to avoid overwriting a previous installation.

Solution: This directory must be removed or renamed in order to proceed. Make sure that the ADMINUSER has permissions to write into the location where the Berkeley database spooling directory is located. The ADMINUSER will be the owner of the Berkeley database spooling directory.

Problem: The execution host installation appears to succeed, but the execution daemon is not started, or no load values are shown.

Solution: Verify that user root is allowed to rsh or ssh to the other host, without entering a password.

If your network does not allow user root to have permissions to connect to other hosts through rsh or ssh without asking for a password, the automatic installation will not work remotely. In this case, log in to the host and use the following command to start the automatic installation locally on each host:


% ./inst_sge -x -noremote -auto /tmp/install_config_file.conf