In the QMON Main Control window, click the Queue Control button. The Cluster Queues dialog box appears.
The Cluster Queue tab provides a quick overview of all cluster queues that are defined for the cluster. The Cluster Queue tab also provides the means to suspend and resume cluster queues, to disable and enable cluster queues, as well as to configure them.
Information displayed in the Cluster Queue dialog box is updated periodically. Click Refresh to force an update. Click a cluster queue name to select the queue.
Click Delete, Suspend, Resume, Disable, or Enable to execute the corresponding operation on cluster queues that you select. The suspend/resume and disable/enable operations require notification of the corresponding sge_execd. If notification is not possible, you can force an sge_qmaster internal status change by clicking Force. For example, notification might not be possible because a host is down.
The suspend/resume and disable/enable operations require cluster queue owner permission, grid engine manager permission, or operator permission. See Managers, Operators, and Owners for details.
Suspended cluster queues are closed for further jobs. The jobs already running in suspended queues are also suspended, as described in Monitoring and Controlling Jobs With QMON. The cluster queue and its jobs are unsuspended as soon as the queue is resumed.
If a job in a suspended cluster queue was suspended explicitly, the job is not resumed when the queue is resumed. The job must be resumed explicitly.
Disabled cluster queues are closed. However, the jobs that are running in those queues are allowed to continue. The disabling of a cluster queue is commonly used to clear a queue. After the cluster queue is enabled, it is eligible to run jobs again. No action on currently running jobs is performed.
Error states are displayed using a red font in the queue list. Click Clear Error to remove an error state from a queue.
Click Reschedule to reschedule all jobs currently running in the selected cluster queues.
To configure cluster queues and queue instances, click Add or Modify on the Cluster Queue dialog box. See Configuring Queues With QMON in Sun N1 Grid Engine 6.1 Administration Guide for details.
Click Done to close the dialog box.
Each row in the cluster queue table represents one cluster queue. For each cluster queue, the table lists the following information:
Cluster Queue – Name of the cluster queue.
Load – Average of the normalized load average of all cluster queue hosts. Only hosts with a load value are considered.
Used – Number of currently used job slots.
Avail – Number of currently available job slots.
Total – Total number of job slots.
aoACD – Number of queue instances that are in at least one of the following states:
a – Load threshold alarm
o – Orphaned
A – Suspend threshold alarm
C – Suspended by calendar
D – Disabled by calendar
cdsuE – Number of queue instances that are in at least one of the following states:
c – Configuration ambiguous
d – Disabled
s – Suspended
u – Unknown
E – Error
s – Number of queue instances that are in the suspended state.
A – Number of queue instances where one or more suspend thresholds are currently exceeded. No more jobs
S – Number of queue instances that are suspended through subordination to another queue.
C – Number of queue instances that are automatically suspended by the grid engine system calendar.
u – Number of queue instances that are in an unknown state.
a – Number of queue instances where one or more load thresholds are currently exceeded.
d – Number of queue instances that are in the disabled state.
D – Number of queue instances that are automatically disabled by the grid engine system calendar.
c – Number of queue instances whose configuration is ambiguous.
o – Number of queue instances that are in the orphaned state.
E – Number of queue instances that are in the error state.
See the qstat(1) man page for complete information about cluster queues and their states.
The Queue Instances tab provides a quick overview of all queue instances that are associated with the selected cluster queue. The Queue Instance tab also provides the means to suspend, resume, disable, and enable queue instances.
Click a cluster queue name to select the queue instance.
Click Suspend, Resume, Disable, or Enable to execute the corresponding operation on queue instances that you select. The suspend/resume and disable/enable operations require notification of the corresponding sge_execd. If notification is not possible, for example, because the host is not reachable, you can force an sge_qmaster internal status change by clicking Force.
The suspend/resume and disable/enable operations require queue owner permission, manager permission, or operator permission. See Managers, Operators, and Owners.
Suspended queue instances are closed for further jobs. The jobs already running in suspended queue instances are also suspended, as described in Monitoring and Controlling Jobs With QMON. The queue instance and its jobs are unsuspended as soon as the queue instance is resumed.
If a job in a suspended queue instance was suspended explicitly, the job is not resumed when the queue instance is resumed. The job must be resumed explicitly.
Disabled queue instances are closed. However, the jobs executing in those queue instances are allowed to continue. The disabling of a queue instance is commonly used to clear a queue instance. After the queue instance is enabled, it is eligible to run jobs again. No action on currently running jobs is performed.
Each row in the queue instances table represents one queue instance. For each queue instance, the table lists the following information:
Queue – Name of the queue instance
qtype – Type of queue instance, which can be B (batch), I (interactive), or P (parallel)
used/total – Number of used job slots and the total number of job slots
load_avg – Load average of the queue instance host
arch – Architecture of the queue instance host
states – States of the queue instance
See Cluster Queue Status for a list of queue states. See the qstat(1) man page for complete information about queue instances and their states.
To retrieve a queue instance's current attribute information, load information, and resource consumption information, select the queue instance, and then click Load. This information also implicitly includes information about the machine that is hosting the queue instance. The window shown in the following figure appears:
The Attribute column lists all attributes attached to the queue instance, including those attributes that are inherited from the host or the global cluster.
The Slot-Limits/Fixed Attributes column shows values for those attributes that are defined as per queue instance slot limits or as fixed resource attributes.
The Load(scaled)/Consumable column shows information about the reported and scaled load parameters. The column also shows information about the available resource capacities based on the consumable resources facility. See Load Parameters in Sun N1 Grid Engine 6.1 Administration Guide and Consumable Resources in Sun N1 Grid Engine 6.1 Administration Guide.
Load reports and consumable capacities can override each other if a load attribute is configured as a consumable resource. The minimum value of both, which is used in the job-dispatching algorithm, is displayed.
The displayed load and consumable values currently do not take into account load adjustment corrections, as described in Execution Hosts.
The Customize button enables you to filter the cluster queues and queue instances you want to display.
The following figure shows a filtered selection of only those queue instances whose current configuration is ambiguous.
Click Save in the Queue Customize dialog box to store your settings in the file .qmon_preferences in your home directory for standard reactivation on later invocations of QMON.