Checkpointing jobs are interruptible at any time because their restart capability ensures that very little work that is already done must be repeated. This ability is used to build migration and dynamic load balancing mechanism in the grid engine system. If requested, checkpointing jobs are stopped on demand. The jobs are migrated to other machines in the grid engine system, thus averaging the load in the cluster dynamically. Checkpointing jobs are stopped and migrated for the following reasons:
The executing queue or the job is suspended explicitly by a qmod or a QMON command.
The job or the queue where the job runs is suspended automatically because a suspend threshold for the queue is exceeded. The checkpoint occasion specification for the job includes the suspension case. For more information, see Configuring Load and Suspend Thresholds in Sun N1 Grid Engine 6.1 Administration Guide and Submitting, Monitoring, or Deleting a Checkpointing Job From the Command Line.
A migrating job moves back to sge_qmaster. The job is subsequently dispatched to another suitable queue if such a queue is available. In such a case, the qstat output shows R as the status.