The following sections describe how to submit more complex jobs through the grid engine system.
Shell scripts, also called batch jobs, are a sequence of command-line instructions that are assembled in a file. Script files are made executable by the chmod command. If scripts are invoked, a command interpreter is started. Each instruction is interpreted as if the instruction were typed manually by the user who is running the script. csh, tcsh, sh, or ksh are typical command interpreters. You can invoke arbitrary commands, applications, and other shell scripts from within a shell script.
The command interpreter can be invoked as login shell. To do so, the name of the command interpreter must be contained in the login_shells list of the grid engine system configuration that is in effect for the particular host and queue that is running the job.
The grid engine system configuration might be different for the various hosts and queues configured in your cluster. You can display the effective configurations with the -sconf and -sq options of the qconf command. For detailed information, see the qconf(1) man page.
If the command interpreter is invoked as login shell, the environment of your job is the same as if you logged in and ran the script. In using csh, for example, .login and .cshrc are executed in addition to the system default startup resource files, such as /etc/login, whereas only .cshrc is executed if csh is not invoked as login-shell. For a description of the difference between being invoked and not being invoked as login-shell, see the man page of your command interpreter.
Example 3–1 is a simple shell script. The script first compiles the application flow from its Fortran77 source and then runs the application.
#!/bin/csh # This is a sample script file for compiling and # running a sample FORTRAN program under N1 Grid Engine 6 cd TEST # Now we need to compile the program "flow.f" and # name the executable "flow". f77 flow.f -o flow
Your local system user's guide provides detailed information about building and customizing shell scripts. You might also want to look at the sh, ksh, csh, or tcsh man page. In the following sections, the emphasis is on special things to consider in order to prepare batch scripts for the grid engine system.
In general, you can submit to the grid engine system all shell scripts that you can run from your command prompt by hand. Such shell scripts must not require a terminal connection, and the scripts must not need interactive user intervention. The exceptions are the standard error and standard output devices, which are automatically redirected. Therefore, Example 3–1 is ready to be submitted to the grid engine system and the script will perform the desired action.
Some extensions to regular shell scripts influence the behavior of scripts that run under grid engine system control. The following sections describe these extensions.
At submit time, you can specify the command interpreter to use to process the job script file. See, for example, Figure 3–5. However, if nothing is specified, the configuration variable shell_start_mode determines how the command interpreter is selected:
If shell_start_mode is set to unix_behavior, the first line of the script file specifies the command interpreter. The first line of the script file must begin with #!. If the first line does not begin with #!, the Bourne Shell sh is used by default.
For all other settings of shell_start_mode, the default command interpreter is determined by the shell parameter for the queue where the job starts. See Displaying Queues and Queue Properties and the queue_conf(5) man page.
Since batch jobs do not have a terminal connection, their standard output and their standard error output must be redirected into files. The grid engine system enables the user to define the location of the files to which the output is redirected. Defaults are used if no output files are specified.
The standard location for the files is in the current working directory where the jobs run. The default standard output file name is job-name.ojob-id, the default standard error output is redirected to job-name>.ejob-id. The job-name is built from the script file name, or the job-name can be defined by the user. See for example the -N option in the submit(1) man page. job-id is a unique identifier that is assigned to the job by the grid engine system.
In case of array job tasks , the task identifier is added to these filenames, separated by a dot. Hence the resulting standard redirection paths are job-name.ojob-id.task-id> and job-name.ejob-id.task-id. For more information, see Submitting Array Jobs.
In case the standard locations are not suitable, the user can specify output directions with QMON, as shown in Figure 3–6. Or the user can use the -e and -o options to the qsub command to specify output directions. Standard output and standard error output can be merged into one file. The redirections can be specified on a per execution host basis. That is, depending on the host on which the job is executed, the location of the output redirection files is different. To build custom but unique redirection file paths, dummy environment variables are available that can be used together with the qsub -e and -o options. A list of these variables follows.
When the job runs, these variables are expanded into the actual values, and the redirection path is built with these values.
See the qsub(1) man page for further details.
Lines with a leading # sign are treated as comments in shell scripts. However, the grid engine system recognizes special comment lines and uses these lines in a special way. The rest of such a script line is treated as part of the command line argument list of the qsub command. The qsub options that are supplied within these special comment lines are also interpreted by the QMON Submit Job dialog box. The corresponding parameters are preset when a script file is selected.
#!/bin/csh #Force csh if not Grid Engine default #shell #$ -S /bin/csh # This is a sample script file for compiling and # running a sample FORTRAN program under N1 Grid Engine 6 # We want Grid Engine to send mail # when the job begins # and when it ends. #$ -M EmailAddress #$ -m b,e # We want to name the file for the standard output # and standard error. #$ -o flow.out -j y # Change to the directory where the files are located. cd TEST # Now we need to compile the program "flow.f" and # name the executable "flow". f77 flow.f -o flow # Once it is compiled, we can run the program. flow
When a job runs, a number of variables are preset into the job's environment. The following is a list of these variables:
SGE_STDERR_PATH – The path name of the file to which the standard error stream of the job is diverted. This file is commonly used for enhancing the output with error messages from prolog, epilog, parallel environment start and stop scripts, or checkpointing scripts.
SGE_STDOUT_PATH – The path name of the file to which the standard output stream of the job is diverted. This file is commonly used for enhancing the output with messages from prolog, epilog, parallel environment start and stop scripts, or checkpointing scripts.
PE_HOSTFILE – The path of a file that contains the definition of the virtual parallel machine that is assigned to a parallel job by the grid engine system. This variable is used for parallel jobs only. See the description of the $pe_hostfile parameter in sge_pe for details on the format of this file.
SHELL is not necessarily the shell that is used for the job.