Sun N1 Grid Engine 6.1 Administration Guide

Configuring Parallel Environments With QMON

On the QMON Main Control window, click the Parallel Environment Configuration button. The Parallel Environment Configuration dialog box appears.

Dialog box titled Parallel Environment Configuration.
Shows PE List and Configuration list. Shows Add, Modify, Delete, Done,
and Help buttons.

Currently configured parallel environments are displayed under PE List.

To display the contents of a parallel environment, select it. The selected parallel environment configuration is displayed under Configuration.

To delete a parallel environment, select it, and then click Delete.

To add a new parallel environment, click Add. To modify a parallel environment, select it, and then click Modify.

When you click Add or Modify, the Add/Modify PE dialog box appears.

Dialog box titled Add/Modify PE. The following
context describes the fields that are shown. Shows Ok and Cancel buttons.

If you are adding a new parallel environment, type its name in the Name field. If you are modifying a parallel environment, its name is displayed in the Name field.

In the Slots box, enter the total number of job slots that can be occupied by all parallel environment jobs running concurrently.

User Lists displays the user access lists that are allowed to access the parallel environment. Xuser Lists displays the user access lists that are not allowed to access the parallel environment. See Configuring User Access Lists for more information about user access lists.

Click the icons at the right of each list to modify the content of the lists. The Select Access Lists dialog box appears.

Dialog box titled Select Access Lists. Shows
Available Access Lists and Chosen Access Lists. Shows Ok, Cancel,
and Help buttons.

The Start Proc Args and Stop Proc Args fields are optional. Use these fields to enter the precise invocation sequence of the parallel environment startup and stop procedures. See the sections Parallel Environment Startup Procedure and Termination of the Parallel Environment, respectively. If no such procedures are required for a certain parallel environment, you can leave the fields empty.

The first argument is usually the name of the start or stop procedure itself. The remaining parameters are command-line arguments to the procedures.

A variety of special identifiers, which begin with a $ prefix, are available to pass internal runtime information to the procedures. The sge_pe(5) man page contains a list of all available parameters.

The Allocation Rule field defines the number of parallel processes to allocate on each machine that is used by a parallel environment. A positive integer fixes the number of processes for each suitable host. Use the special denominator $pe_slots to cause the full range of processes of a job to be allocated on a single host (SMP). Use the denominators $fill_up and $round_robin to cause unbalanced distributions of processes at each host. For more details about these allocation rules, see the sge_pe(5) man page.

The Urgency Slots field specifies the method the grid engine system uses to assess the number of slots that pending jobs with a slot range get. The assumed slot allocation is meaningful when determining the resource-request-based priority contribution for numeric resources. You can specify an integer value for the number of slots. Specify min to use the slot range minimum. Specify max to use the slot range maximum. Specify avg to use the average of all numbers occurring within the job's parallel environment range request.

The Control Slaves check box specifies whether the grid engine system generates parallel tasks or whether the corresponding parallel environment creates its own process. The grid engine system uses sge_execd and sge_shepherd to generate parallel tasks. Full control over slave tasks by the grid engine system is preferable, because the system provides the correct accounting and resource control. However, this functionality is available only for parallel environment interfaces especially customized for the grid engine system. See Tight Integration of Parallel Environments and Grid Engine Software for more details.

The Job Is First Task check box is meaningful only if Control Slaves is selected. If you select Job Is First Task, the job script or one of its child processes acts as one of the parallel tasks of the parallel application. For PVM, you usually want the job script to be part of the parallel application, for example. If you clear the Job Is First Task check box, the job script initiates the parallel application but does not participate. For MPI, you usually do not want the job script to be part of the parallel application, for example, when you use mpirun.

Click OK to save your changes and close the dialog box. Click Cancel to close the dialog box without saving changes.

Displaying Configured Parallel Environment Interfaces With QMON

On the QMON Main Control window, click the Parallel Environment Configuration button. The Parallel Environment Configuration dialog box appears. See Configuring Parallel Environments With QMON for more information.

The following example defines a parallel job to be submitted. The job requests that the parallel environment interface mpi (message passing interface) be used with from 4 to 16 processes. 16 is preferable.

Dialog box titled Submit Job. Shows that the
parallel environment named mpi is defined for the job.

To select a parallel environment from a list of available parallel environments, click the button at the right of the Parallel Environment field. A selection dialog box appears.

Dialog box titled Select an Item. Shows Available
Parallel Environment list and a selection field. Shows OK, Cancel,
and Help buttons.

You can add a range for the number of parallel tasks initiated by the job after the parallel environment name in the Parallel Environment field.

The qsub command corresponding to the parallel job specification described previously is as follows:


% qsub -N Flow -p -111 -P devel -a 200012240000.00 -cwd \
 -S /bin/tcsh -o flow.out -j y -pe mpi 4-16 \
 -v SHARED_MEM=TRUE,MODEL_SIZE=LARGE \
 -ac JOB_STEP=preprocessing,PORT=1234 \
 -A FLOW -w w -r y -m s,e -q big_q\
 -M me@myhost.com,me@other.address \
 flow.sh big.data

This example shows how to use the qsub -pe command to formulate an equivalent request. The qsub(1) man page provides more details about the -pe option.

Select a suitable parallel environment interface for a parallel job, keeping the following considerations in mind:

Ask the grid engine system administration for the available parallel environment interfaces best suited for your types of parallel jobs.

You can specify resource requirements along with your parallel environment request. The specifying of resource requirements further reduces the set of eligible queues for the parallel environment interface to those queues that fit the requirement. See Defining Resource Requirements in Sun N1 Grid Engine 6.1 User’s Guide.

For example, assume that you run the following command:


% qsub -pe mpi 1,2,4,8 -l nastran,arch=osf nastran.par

The queues that are suitable for this job are queues that are associated with the parallel environment interface mpi by the parallel environment configuration. Suitable queues also satisfy the resource requirement specification specified by the qsub -l command.


Note –

The parallel environment interface facility is highly configurable. In particular, the administrator can configure the parallel environment startup and stop procedures to support site-specific needs. See the sge_pe(5) man page for details. Use the qsub -v and qsub -V commands to pass information from the user who submits the job to the startup and stop procedures. These two options export environment variables. If you are unsure, ask the administrator whether you are required to export certain environment variables.