Sometimes a pending job is obviously capable of being run, but the job does not get dispatched. To diagnose the reason, the grid engine system offers a pair of utilities and options, qstat -j job-id and qalter-w v job-id.
qstat -j job-id
When enabled, qstat -j job-id provides a list of reasons why a certain job was not dispatched in the last scheduling run. This monitoring can be enabled or disabled. You might want to disable monitoring because it can cause undesired communication overhead between the schedd daemon and qmaster. See schedd_job_info in the sched_conf(5) man page. The following example shows output for a job with the ID 242059:
% qstat -j 242059 scheduling info: queue "fangorn.q" dropped because it is temporarily not available queue "lolek.q" dropped because it is temporarily not available queue "balrog.q" dropped because it is temporarily not available queue "saruman.q" dropped because it is full cannot run in queue "bilbur.q" because it is not contained in its hard queuelist (-q) cannot run in queue "dwain.q" because it is not contained in its hard queue list (-q) has no permission for host "ori" |
This information is generated directly by the schedd daemon. The generating of this information takes the current usage of the cluster into account. Sometimes this information does not provide what you are looking for. For example, if all queue slots are already occupied by jobs of other users, no detailed message is generated for the job you are interested in.
qalter -w v job-id
This command lists the reasons why a job is not dispatchable in principle. For this purpose, a dry scheduling run is performed. All consumable resources, as well as all slots, are considered to be fully available for this job. Similarly, all load values are ignored because these values vary.