Sun N1 Grid Engine 6.1 User's Guide

Monitoring and Controlling Queues

As described in Displaying Queues and Queue Properties, the owners of queues have permission to suspend and resume queues, and to disable and enable queues. Owners might want to suspend or disable queues if certain machines are needed for important work, and those machines are strongly affected by jobs running in the background.

You can control queues in two ways:

Monitoring and Controlling Queues With QMON

In the QMON Main Control window, click the Queue Control button. The Cluster Queues dialog box appears.

Dialog box titled Cluster Queues. Shows the Cluster
Queues tab with a list of defined cluster queues. Shows buttons you
can use to manipulate queues.

Monitoring and Controlling Cluster Queues

The Cluster Queue tab provides a quick overview of all cluster queues that are defined for the cluster. The Cluster Queue tab also provides the means to suspend and resume cluster queues, to disable and enable cluster queues, as well as to configure them.

Information displayed in the Cluster Queue dialog box is updated periodically. Click Refresh to force an update. Click a cluster queue name to select the queue.

Click Delete, Suspend, Resume, Disable, or Enable to execute the corresponding operation on cluster queues that you select. The suspend/resume and disable/enable operations require notification of the corresponding sge_execd. If notification is not possible, you can force an sge_qmaster internal status change by clicking Force. For example, notification might not be possible because a host is down.

The suspend/resume and disable/enable operations require cluster queue owner permission, grid engine manager permission, or operator permission. See Managers, Operators, and Owners for details.

Suspended cluster queues are closed for further jobs. The jobs already running in suspended queues are also suspended, as described in Monitoring and Controlling Jobs With QMON. The cluster queue and its jobs are unsuspended as soon as the queue is resumed.


Note –

If a job in a suspended cluster queue was suspended explicitly, the job is not resumed when the queue is resumed. The job must be resumed explicitly.


Disabled cluster queues are closed. However, the jobs that are running in those queues are allowed to continue. The disabling of a cluster queue is commonly used to clear a queue. After the cluster queue is enabled, it is eligible to run jobs again. No action on currently running jobs is performed.

Error states are displayed using a red font in the queue list. Click Clear Error to remove an error state from a queue.

Click Reschedule to reschedule all jobs currently running in the selected cluster queues.

To configure cluster queues and queue instances, click Add or Modify on the Cluster Queue dialog box. See Configuring Queues With QMON in Sun N1 Grid Engine 6.1 Administration Guide for details.

Click Done to close the dialog box.

Cluster Queue Status

Each row in the cluster queue table represents one cluster queue. For each cluster queue, the table lists the following information:

See the qstat(1) man page for complete information about cluster queues and their states.

Monitoring and Controlling Queue Instances

The Queue Instances tab provides a quick overview of all queue instances that are associated with the selected cluster queue. The Queue Instance tab also provides the means to suspend, resume, disable, and enable queue instances.

Dialog box titled Cluster Queues. Shows the Queue
Instances tab with a list of defined queue instances.

Click a cluster queue name to select the queue instance.

Click Suspend, Resume, Disable, or Enable to execute the corresponding operation on queue instances that you select. The suspend/resume and disable/enable operations require notification of the corresponding sge_execd. If notification is not possible, for example, because the host is not reachable, you can force an sge_qmaster internal status change by clicking Force.

The suspend/resume and disable/enable operations require queue owner permission, manager permission, or operator permission. See Managers, Operators, and Owners.

Suspended queue instances are closed for further jobs. The jobs already running in suspended queue instances are also suspended, as described in Monitoring and Controlling Jobs With QMON. The queue instance and its jobs are unsuspended as soon as the queue instance is resumed.


Note –

If a job in a suspended queue instance was suspended explicitly, the job is not resumed when the queue instance is resumed. The job must be resumed explicitly.


Disabled queue instances are closed. However, the jobs executing in those queue instances are allowed to continue. The disabling of a queue instance is commonly used to clear a queue instance. After the queue instance is enabled, it is eligible to run jobs again. No action on currently running jobs is performed.

Queue Instance Status

Each row in the queue instances table represents one queue instance. For each queue instance, the table lists the following information:

See Cluster Queue Status for a list of queue states. See the qstat(1) man page for complete information about queue instances and their states.

Displaying Queue Instance Attributes

To retrieve a queue instance's current attribute information, load information, and resource consumption information, select the queue instance, and then click Load. This information also implicitly includes information about the machine that is hosting the queue instance. The window shown in the following figure appears:

Dialog box titled Attributes for queue <name>.
Shows lists titled Attributes, Slot-Limits/Fixed Attributes, Load(scaled)/Consumable.

The Attribute column lists all attributes attached to the queue instance, including those attributes that are inherited from the host or the global cluster.

The Slot-Limits/Fixed Attributes column shows values for those attributes that are defined as per queue instance slot limits or as fixed resource attributes.

The Load(scaled)/Consumable column shows information about the reported and scaled load parameters. The column also shows information about the available resource capacities based on the consumable resources facility. See Load Parameters in Sun N1 Grid Engine 6.1 Administration Guide and Consumable Resources in Sun N1 Grid Engine 6.1 Administration Guide.

Load reports and consumable capacities can override each other if a load attribute is configured as a consumable resource. The minimum value of both, which is used in the job-dispatching algorithm, is displayed.


Note –

The displayed load and consumable values currently do not take into account load adjustment corrections, as described in Execution Hosts.


Filtering Cluster Queues and Queue Instances

The Customize button enables you to filter the cluster queues and queue instances you want to display.

The following figure shows a filtered selection of only those queue instances whose current configuration is ambiguous.

Dialog box titled Queue Customize. Shows Misc
Filters tab with User, Pattern, and State filtering options. Shows
Save, Cancel, and Ok buttons.

Click Save in the Queue Customize dialog box to store your settings in the file .qmon_preferences in your home directory for standard reactivation on later invocations of QMON.

Controlling Queues With qmod

You can use the qmod command to suspend and resume queues. You can also use qmod to disable and enable queues.

The following commands illustrate how to use qmod:


% qmod -s q-name
% qmod -us -f q-name1, q-name2
% qmod -d q-name
% qmod -e q-name1, q-name2, q-name3

qmod –s suspends a queue. qmod –us –f resumes (unsuspends) two queues. qmod –d disables a queue. qmod –e enables three queues.

The -f option forces registration of the status change in sge_qmaster when the corresponding sge_execd is not reachable, for example, due to network problems.

Suspending and resuming queues as well as disabling and enabling queues requires queue owner permission, manager permission, or operator permission. See Managers, Operators, and Owners.


Note –

You can use qmod commands with crontab or at jobs.