Sun N1 Grid Engine 6.1 User's Guide

Transparent Remote Execution

The grid engine system provides a set of closely related facilities that support the transparent remote execution of certain computational tasks. The core tool for this functionality is the qrsh command, which is described in Remote Execution With qrsh. Two high-level facilities, qtcsh and qmake, build on top of qrsh. These two commands enable the grid engine system to transparently distribute implicit computational tasks, thereby enhancing the standard UNIX facilities make and csh. qtcsh is described in Transparent Job Distribution With qtcsh. qmake is described in Parallel Makefile Processing With qmake.

Remote Execution With `qrsh`

qrsh is built around the standard rsh facility. See the information that is provided in sge-root/3rd_party for details on the involvement of rsh. qrsh can be used for various purposes, including the following:

To provide remote execution of interactive applications that use the grid engine system comparable to the standard UNIX facility rsh. rsh is also called remsh on HP-UX systems.
To offer interactive login session capabilities that use the grid engine system, similar to the standard UNIX facility rlogin. qlogin is still required as a grid engine system's representation of the UNIX telnet facility.
To allow for the submission of batch jobs that support terminal I/O (standard output, standard error, and standard input) and terminal control.
To provide a way to submit a standalone program that is not embedded in a shell script.

Note –
You can also submit scripts with qrsh by using the -b n option. For more information, see the qrsh man page.
To provide a submission client that remains active while a batch job is pending or running and that goes away only if the job finishes or is cancelled.
To allow for the grid engine system-controlled remote running of job tasks within the framework of the dispersed resources allocated by parallel jobs. See Tight Integration of Parallel Environments and Grid Engine Software in Sun N1 Grid Engine 6.1 Administration Guide.

By virtue of these capabilities, qrsh is the major enabling infrastructure for the implementation of the qtcsh and the qmake facilities. qrsh is also used for the tight integration of the grid engine system with parallel environments such as MPI or PVM.

Invoking Transparent Remote Execution With `qrsh`

Type the qrsh command, adding options and arguments according to the following syntax:

% qrsh	[options] program|shell-script [arguments] \
	[> stdout] [>&2 stderr] [< stdin]

qrsh understands almost all options of qsub. qrsh provides the following options:

-now yes|no – -now yes specifies that the job is scheduled immediately. The job is rejected if no appropriate resources are available. -now yes is the default. -now no specifies that the job is queued like a batch job if the job cannot be started at submission time.
-inherit – qrsh does not go through the scheduling process to start a job-task. Instead, qrsh assumes that the job is embedded in a parallel job that already has allocated suitable resources on the designated remote execution host. This form of qrsh is commonly used in qmake and in a tight parallel environment integration. The default is not to inherit external job resources.
-binary yes|no – When specified with the n option, enables you to use qrsh to submit script jobs.
-noshell – With this option, you do not start the command line that is given to qrsh in a user's login shell. Instead, you execute the command without the wrapping shell. Use this option to speed up execution, as some overhead, such as the shell startup and the sourcing of shell resource files, is avoided.
-nostdin – Suppresses the input stream STDIN. With this option set, qrsh passes the -n option to the rsh command. Suppression of the input stream is especially useful if multiple tasks are executed in parallel using qrsh, for example, in a make process. Which process gets the input is undefined.
-verbose – This option presents output on the scheduling process. -verbose is mainly intended for debugging purposes and is therefore switched off by default.

Transparent Job Distribution With `qtcsh`

qtcsh is a fully compatible replacement for the widely known and used UNIX C shell derivative tcsh. qtcsh is built around tcsh. See the information that is provided in sge-root/3rd_party for details on the involvement of tcsh. qtcsh provides a command shell with the extension of transparently distributing execution of designated applications to suitable and lightly loaded hosts that use the grid engine system. The .qtask configuration files define the applications to execute remotely and the requirements that apply to the selection of an execution host.

These applications are transparent to the user and are submitted to the grid engine system through the qrsh facility. qrsh provides standard output, error output, and standard input handling as well as terminal control connection to the remotely executing application. Three noticeable differences between running such an application remotely and running the application on the same host as the shell are:

The remote host might be more powerful, lower-loaded, and have required hardware and software resources installed. Therefore, such a remote host would be much better suited than the local host, which might not allow running the application at all.
A small delay is incurred by the remote startup of the jobs and by their handling through the grid engine system.
Administrators can restrict the use of resources through interactive jobs (qrsh) and thus through qtcsh. If not enough suitable resources are available for an application to be started through qrsh, or if all suitable systems are overloaded, the implicit qrsh submission fails. A corresponding error message is returned, such as Not enough resources ... try later.

In addition to the standard use, qtcsh is a suitable platform for third-party code and tool integration. The single-application execution form of qtcsh is qtcsh -c app-name. The use of this form of qtcsh inside integration environments presents a persistent interface that almost never needs to be changed. All the required application, tool, integration, site, and even user-specific configurations are contained in appropriately defined .qtask files. A further advantage is that this interface can be used in shell scripts of any type, in C programs, and even in Java applications.

`qtcsh` Usage

The invocation of qtcsh is exactly the same as for tcsh. qtcsh extends tcsh in providing support for the .qtask file and by offering a set of specialized shell built-in modes.

The .qtask file is defined as follows. Each line in the file has the following format:

% [!]app-name qrsh-options

The optional leading exclamation mark (!) defines the precedence between conflicting definitions in a global cluster .qtask file and the personal .qtask file of the qtcsh user. If the exclamation mark is missing in the global cluster file, a conflicting definition in the user file overrides the definition in the global cluster file. If the exclamation mark is in the global cluster file, the corresponding definition cannot be overridden.

app-name specifies the name of the application that, when typed on a command line in a qtcsh, is submitted to the grid engine system for remote execution.

qrsh-options specifies the options to the qrsh facility to use. These options define resource requirements for the application.

The application name must appear in the command line exactly as the application is defined in the .qtask file. If the application name is prefixed with a path name, a local binary is addressed. No remote execution is intended.

csh aliases are expanded before a comparison with the application names is performed. The applications intended for remote execution can also appear anywhere in a qtcsh command line, in particular before or after standard I/O redirections.

Hence, the following examples are valid and meaningful syntax:

# .qtask file
netscape -v DISPLAY=myhost:0
grep -l h=filesurfer

Given this .qtask file, the following qtcsh command lines:

netscape
~/mybin/netscape
cat very_big_file | grep pattern | sort | uniq

implicitly result in:

qrsh -v DISPLAY=myhost:0 netscape
~/mybin/netscape
cat very_big_file | qrsh -l h=filesurfer grep pattern | sort | uniq

qtcsh can operate in different modes, influenced by switches that can be set on or off:

Local or remote execution of commands. Remote is the default.
Immediate or batch remote execution. Immediate is the default.
Verbose or nonverbose output. Nonverbose is the default.

The setting of these modes can be changed using option arguments of qtcsh at start time or with the shell built-in command qrshmode at runtime. See the qtcsh(1) man page for more information.

Parallel Makefile Processing With `qmake`

qmake is a replacement for the standard UNIX make facility. qmake extends make by enabling the distribution of independent make steps across a cluster of suitable machines. qmake is built around the popular GNU-make facility gmake. See the information that is provided in sge-root/3rd_party for details on the involvement of gmake.

To ensure that a distributed make process can run to completion, qmake first allocates the required resources in a way analogous to a parallel job. qmake then manages this set of resources without further interaction with the scheduling. qmake distributes make steps as resources become available, using the qrsh facility with the -inherit option.

qrsh provides standard output, error output, and standard input handling as well as terminal control connection to the remotely executing make step. Therefore, only three noticeable differences exist between executing a make procedure locally and using qmake:

Provided that individual make steps have a certain duration and that enough independent make steps exist to process, the parallelization of the make process will speed up significantly.
In the make steps to be started up remotely, an implied small overhead exists that is caused by qrsh and the remote execution.
To take advantage of the make step distribution of qmake, the user must specify as a minimum the degree of parallelization. That is, the user must specify the number of concurrently executable make steps. In addition, the user can specify the resource characteristics required by the make steps, such as available software licenses, machine architecture, memory, or CPU-time requirements.

The most common use of make is the compilation of complex software packages. Compilation might not be the major application for qmake, however. Program files are often quite small as a matter of good programming practice. Therefore, compilation of a single program file, which is a single make step, often takes only a few seconds. Furthermore, compilation usually implies significant file access. Nested include files can cause this problem. File access might not be accelerated if done for multiple make steps in parallel because the file server can become a bottleneck. Such a bottleneck effectively serializes all the file access. Therefore, the compilation process sometimes cannot be accelerated in a satisfactory manner.

Other potential applications of qmake are more appropriate. An example is the steering of the interdependencies and the workflow of complex analysis tasks through makefiles. Each make step in such environments is typically a simulation or data analysis operation with nonnegligible resource and computation time requirements. A considerable acceleration can be achieved in such cases.

`qmake` Usage

The command-line syntax of qmake looks similar to the syntax of qrsh:

% qmake [-pe pe-name pe-range][options] \
 -- [gnu-make-options][target]

Note –

The -inherit option is also supported by qmake, as described later in this section.

Pay special attention to the use of the -pe option and its relation to the gmake -j option. You can use both options to express the amount of parallelism to be achieved. The difference is that gmake provides no possibility with -j to specify something like a parallel environment to use. Therefore, qmake assumes that a default environment for parallel makes is configured that is called make. Furthermore, gmake ´s -j allows for no specification of a range, but only for a single number. qmake interprets the number that is given with -j as a range of 1-n. By contrast, -pe permits the detailed specification of all these parameters. Consequently the following command line examples are identical:

% qmake -- -j 10
% qmake -pe make 1-10 --

The following command lines cannot be expressed using the -j option:

% qmake -pe make 5-10,16 --
% qmake -pe mpi 1-99999 --

Apart from the syntax, qmake supports two modes of invocation: interactively from the command line without the -inherit option, or within a batch job with the -inherit option. These two modes start different sequences of actions:

Interactive – When qmake is invoked on the command line, the make process is implicitly submitted to the grid engine system with qrsh. The process takes the resource requirements that are specified in the qmake command line into account. The grid engine system then selects a master machine for the execution of the parallel job that is associated with the parallel make job. The grid engine system starts the make procedure there. The procedure must start there because the make process can be architecture-dependent. The required architecture is specified in the qmake command line. The qmake process on the master machine then delegates execution of individual make steps to the other hosts that are allocated for the job. The steps are passed to qmake through the parallel environment hosts file.
Batch – In this case, qmake appears inside a batch script with the -inherit option. If the -inherit option is not present, a new job is spawned, as described in the first case earlier. This results in qmake making use of the resources already allocated to the job into which qmake is embedded. qmake uses qrsh -inherit directly to start make steps. When calling qmake in batch mode, the specification of resource requirements, the -pe option and the -j option are ignored.

Note –
Single CPU jobs also must request a parallel environment:
qmake -pe make 1 --
If no parallel execution is required, call qmake with gmake command-line syntax without grid engine system options and without --. This qmake command behaves like gmake.

See the qmake(1) man page for further details.

Transparent Remote Execution

Remote Execution With qrsh

Invoking Transparent Remote Execution With qrsh

Transparent Job Distribution With qtcsh

qtcsh Usage