Sun N1 Grid Engine 6.1 User's Guide

Migrating Checkpointing Jobs

Checkpointing jobs are interruptible at any time because their restart capability ensures that very little work that is already done must be repeated. This ability is used to build migration and dynamic load balancing mechanism in the grid engine system. If requested, checkpointing jobs are stopped on demand. The jobs are migrated to other machines in the grid engine system, thus averaging the load in the cluster dynamically. Checkpointing jobs are stopped and migrated for the following reasons:

A migrating job moves back to sge_qmaster. The job is subsequently dispatched to another suitable queue if such a queue is available. In such a case, the qstat output shows R as the status.