Extensions to Regular Shell Scripts (Sun N1 Grid Engine 6.1 User's Guide)

Sun N1 Grid Engine 6.1 User's Guide

Extensions to Regular Shell Scripts

Some extensions to regular shell scripts influence the behavior of scripts that run under grid engine system control. The following sections describe these extensions.

How a Command Interpreter Is Selected

At submit time, you can specify the command interpreter to use to process the job script file as shown in Figure 3–5. However, if nothing is specified, the configuration variable shell_start_mode determines how the command interpreter is selected:

If shell_start_mode is set to unix_behavior, the first line of the script file specifies the command interpreter. The first line of the script file must begin with #!. If the first line does not begin with #!, the Bourne Shell sh is used by default.
For all other settings of shell_start_mode, the default command interpreter is determined by the shell parameter for the queue where the job starts. See Displaying Queues and Queue Properties and the queue_conf(5) man page.

Output Redirection

Since batch jobs do not have a terminal connection, their standard output and their standard error output must be redirected into files. The grid engine system enables the user to define the location of the files to which the output is redirected. Defaults are used if no output files are specified.

The standard location for the files is in the current working directory where the jobs run. The default standard output file name is job-name.ojob-id, the default standard error output is redirected to job-name>.ejob-id. The job-name can be built from the script file name, or defined by the user. See, for example, the -N option in the submit(1) man page. job-id is a unique identifier that is assigned to the job by the grid engine system.

For array job tasks , the task identifier is added to these filenames, separated by a dot. The resulting standard redirection paths are job-name.ojob-id.task-id> and job-name.ejob-id.task-id. For more information, see Submitting Array Jobs.

In case the standard locations are not suitable, the user can specify output directions with QMON, as shown in Figure 3–6. Or the user can use the -e and -o options to the qsub command to specify output directions. Standard output and standard error output can be merged into one file. The redirections can be specified on a per execution host basis, in which case, the location of the output redirection file depends on the host on which the job is executed. To build custom but unique redirection file paths, use dummy environment variables together with the qsub -e and -o options. A list of these variables follows.

HOME – Home directory on execution machine
USER – User ID of job owner
JOB_ID – Current job ID
JOB_NAME – Current job name; see the -N option
HOSTNAME – Name of the execution host
TASK_ID – Array job task index number

When the job runs, these variables are expanded into the actual values, and the redirection path is built with these values.

See the qsub(1) man page for further details.

Active Comments

Lines with a leading # sign are treated as comments in shell scripts. However, the grid engine system recognizes special comment lines and uses these lines in a special way. The special comment script line is treated as part of the command line argument list of the qsub command. The qsub options that are supplied within these special comment lines are also interpreted by the QMON Submit Job dialog box. The corresponding parameters are preset when a script file is selected.

By default, the special comment lines are identified by the #$ prefix string. You can redefine the prefix string with the qsub -C command.

This use of special comments is called script embedding of submit arguments. The following example shows a script file that uses script-embedded command-line options.

Example 3–2 Using Script-Embedded Command Line Options

#!/bin/csh

#Force csh if not Grid Engine default 
#shell

#$ -S /bin/csh

# This is a sample script file for compiling and
# running a sample FORTRAN program under N1 Grid Engine 6
# We want Grid Engine to send mail
# when the job begins
# and when it ends.

#$ -M EmailAddress
#$ -m b e

# We want to name the file for the standard output
# and standard error.

#$ -o flow.out -j y

# Change to the directory where the files are located.

cd TEST

# Now we need to compile the program "flow.f" and
# name the executable "flow".

f77 flow.f -o flow

# Once it is compiled, we can run the program.

flow

Environment Variables

When a job runs, several variables are preset into the job's environment.

ARC – The architecture name of the node on which the job is running. The name is compiled into the sge_execd binary.
SGE_ROOT – The root directory of the grid engine system as set for sge_execd before startup, or the default /usr/SGE directory.
SGE_BINARY_PATH – The directory in which the grid engine system binaries are installed.
SGE_CELL – The cell in which the job runs.
SGE_JOB_SPOOL_DIR – The directory used by sge_shepherd to store job-related data while the job runs.
SGE_O_HOME – The path to the home directory of the job owner on the host from which the job was submitted.
SGE_O_HOST – The host from which the job was submitted.
SGE_O_LOGNAME – The login name of the job owner on the host from which the job was submitted.
SGE_O_MAIL – The content of the MAIL environment variable in the context of the job submission command.
SGE_O_PATH – The content of the PATH environment variable in the context of the job submission command.
SGE_O_SHELL – The content of the SHELL environment variable in the context of the job submission command.
SGE_O_TZ – The content of the TZ environment variable in the context of the job submission command.
SGE_O_WORKDIR – The working directory of the job submission command.
SGE_CKPT_ENV – The checkpointing environment under which a checkpointing job runs. The checkpointing environment is selected with the qsub -ckpt command.
SGE_CKPT_DIR – The path ckpt_dir of the checkpoint interface. Set only for checkpointing jobs. For more information, see the checkpoint(5) man page.
SGE_STDERR_PATH – The path name of the file to which the standard error stream of the job is diverted. This file is commonly used for enhancing the output with error messages from prolog, epilog, parallel environment start and stop scripts, or checkpointing scripts.
SGE_STDOUT_PATH – The path name of the file to which the standard output stream of the job is diverted. This file is commonly used for enhancing the output with messages from prolog, epilog, parallel environment start and stop scripts, or checkpointing scripts.
SGE_TASK_ID – The task identifier in the array job represented by this task.
ENVIRONMENT – Always set to BATCH. This variable indicates that the script is run in batch mode.
HOME – The user's home directory path as taken from the passwd file.
HOSTNAME – The host name of the node on which the job is running.
JOB_ID – A unique identifier assigned by the sge_qmaster daemon when the job was submitted. The job ID is a decimal integer from 1 through 9,999,999.
JOB_NAME – The job name, which is built from the file name provided with the qsub command, a period, and the digits of the job ID. You can override this default with qsub -N.
LOGNAME – The user's login name as taken from the passwd file.
NHOSTS – The number of hosts in use by a parallel job.
NQUEUES – The number of queues that are allocated for the job. This number is always 1 for serial jobs.
NSLOTS – The number of queue slots in use by a parallel job.
PATH – A default shell search path of: /usr/local/bin:/usr/ucb:/bin:/usr/bin.
PE – The parallel environment under which the job runs. This variable is for parallel jobs only.
PE_HOSTFILE – The path of a file that contains the definition of the virtual parallel machine that is assigned to a parallel job by the grid engine system. This variable is used for parallel jobs only. See the description of the $pe_hostfile parameter in sge_pe for details on the format of this file.
QUEUE – The name of the queue in which the job is running.
REQUEST – The request name of the job. The name is either the job script file name or is explicitly assigned to the job by the qsub -N command.
RESTARTED – Indicates whether a checkpointing job was restarted. If set to value 1, the job was interrupted at least once. The job is therefore restarted.
SHELL – The user's login shell as taken from the passwd file.

Note –
SHELL is not necessarily the shell that is used for the job.
TMPDIR – The absolute path to the job's temporary working directory.
TMP – The same as TMPDIR. This variable is provided for compatibility with NQS.
TZ – The time zone variable imported from sge_execd, if set.
USER – The user's login name as taken from the passwd file.