Sun HPC ClusterTools 3.0 Administrator's Guide: With LSF

Chapter 3 Notes on LSF Batch Queues and Sun HPC Jobs

This chapter discusses various LSF Batch queue issues that are of particular interest to Sun HPC system administrators. It also discusses a new, optional configuration variable that can be used to verify project-based accounting.


Note -

The following discussion deals with LSF terms and concepts with which you are expected to be familiar. If you have not done so already, please read the LSF Batch Administrator's Guide, version 3.2.3, paying special attention to sections dealing with queues, job starters, and the queue configuration file, lsb.queues.


Creating Sun HPC-Specific Queues

Because HPC jobs distribute multiple processes across multiple nodes, their batch queue requirements are different from those of serial jobs. For this reason, you should specifically configure one or more batch queues for running Sun HPC jobs. This involves editing the queue configuration file, lsb.queues. Sections "Specify PAM as Job Starter" and "Enable Interactive Batch Mode" discuss the queue parameters of primary interest, JOB_STARTER and INTERACTIVE.

Specify PAM as Job Starter

The JOB_STARTER parameter allows an LSF batch queue to pass job launching control over to a special job-starting procedure rather than launching the job itself. For Sun HPC applications, this job launching role is given to the Parallel Application Manager (PAM), which is a utility for starting and managing MPI jobs.

PAM should be specified as the job starter on all queues that will be used by Sun HPC jobs. To do this, simply edit the JOB_STARTER line in lsb.queues to read as follows:

JOB_STARTER=pam

When a Sun HPC job is submitted to a PAM-configured queue, the queue will start PAM running. PAM, in turn, will launch the Sun HPC job on the cluster.

Enable Interactive Batch Mode

LSF supports the concept of interactive batch job execution. When a job is submitted in interactive batch mode, it receives the same batch scheduling and host selection services as noninteractive batch jobs, but the terminal from which the job was submitted remains attached to the job as if it were launched interactively.


Note -

Interactive batch mode is the only interactive mode Sun HPC ClusterTools 3.0 software supports.


By default, both batch mode and interactive batch mode are available. To select interactive batch mode, include the -I option on the bsub command line. Without this option, bsub invokes conventional batch mode.

The INTERACTIVE parameter in the lsb.queues file allows you restrict a queue to accept only interactive batch jobs or exclude all interactive batch jobs. Use it to restrict Sun HPC-dedicated queues to interactive batch jobs. Otherwise, noninteractive jobs could be added to the queue, which could make the queue less efficient for handling the interactive batch jobs. To impose this restriction, add the following line to the appropriate queue descriptor in the lsb.queues file:

INTERACTIVE=ONLY

All jobs submitted to a queue configured in this way must include the -I option on the bsub command line.


Note -

Separate queues can be configured for batch-mode-only jobs as well.


Because interactive batch jobs need fast response times, there are other steps you should take to minimize job launch latencies normally associated with batch queue behavior. These are described in the next section.

Configuring for Fast Interactive Batch Response Time

There are several steps you can take to optimize the response time of an interactive batch queue. These steps are discussed in Sections "Set PRIORITY in lsb.queues" through "Add Optimization Parameters to lsb.params".

Set PRIORITY in lsb.queues

The PRIORITY parameter defines a batch queue's priority relative to other batch queues. To ensure faster dispatching, assign a higher PRIORITY value to interactive batch queues than you give to noninteractive queues. A higher number equals a higher priority. For example, the following setting

PRIORITY=12
means that jobs on that queue will usually be serviced sooner than
jobs on queues with a setting of PRIORITY=11 or lower.

Set NICE in lsb.queues

Set the queue's NICE parameter to 10. This will ensure that it receives the same CPU priority as other interactive queues.

Set NEW_JOB_SCHED_DELAY in lsb.queues

Set the NEW_JOB_SCHED_DELAY parameter to 0. This will allow a new job scheduling session to be started as soon as a job is submitted to this queue.

Add Optimization Parameters to lsb.params

During installation of the Sun HPC ClusterTools 3.0 packages, you are asked if you want to modify the lsb.params file to optimize interactive batch response time. If you answered yes, the SUNWrte package makes the following changes to the lsb.params file:

MBD_SLEEP_TIME=1
MAX_SBD_FAIL=30
JOB_ACCEPT_INTERVAL=0

The first parameter, MBD_SLEEP_TIME, specifies the number of seconds LSF Batch will wait between attempts to dispatch jobs. The default is 60 seconds. SUNWrte changes the interval to 1 second.

The MAX_SBD_FAIL parameter specifies how many times LSF Batch will try to reach an unresponsive slave batch daemon before giving up. MBD_SLEEP_TIME controls the frequency of these attempts. If MAX_SBD_FAIL is not specified, its default value is three times the MBD_SLEEP_TIME value. SUNWrte sets MAX_SBD_FAIL to 30.

The JOB_ACCEPT_INTERVAL parameter specifies how many MBD_SLEEP_TIME periods LSF Batch will wait after successfully dispatching a job to a host before it dispatches another job to the same host. SUNWrte sets this parameter to 0, allowing the host to accept multiple jobs in each job dispatching period (MBD_SLEEP_TIME).

If you answered no during the installation, but now wish to enable these optimizations, simply edit these parameters in the lsb.params file as shown above.

Verifying Project-Based Accounting

One feature of LSF is project-based accounting--that is, individual jobs can be associated with particular projects and charges allocated accordingly. Projects can be specified in the following ways:

A new, optional configuration variable, CHECK_PROJECT_UGRPMEMBERSHIP, has been added to ensure the integrity of project-based accounting. To enable this feature, add the following line to the lsb.params file:

CHECK_PROJECT_UGRPMEMBERSHIP=y

When this entry is present in lsb.params, the software verifies that the person submitting the job is a member of the user group associated with proj_name. User groups are defined in the configuration file lsb.users. If the person submitting the job is not a member of the user group associated with that project, the job will be rejected.