Sun N1 Grid Engine 6.1 Administration Guide

Migrating qmaster to Another Host

Because the spooling database cannot be located on an NFS-mounted file system, the following procedure requires that the Berkeley DB RPC server be used for spooling.

If you configure spooling to a local file system, you must transfer the spooling database to a local file system on the new sge_qmaster host.

ProcedureHow to Migrate qmaster to Another Host Using a Script

  1. Check that the new master host has read/write access.

    The new master host must have read/write access to the qmaster spool directory and common directory as does the current master. If the administrative user is user root (check the global cluster configuration for the setting of admin_user), you should verify that user root can create files in these directory under his user name.

  2. Run the migrate script on the new master host.

    On the new master host, run the following script as user root:

    # /etc/init.d/sgemaster -migrate

    This command stops sge_qmaster and sge_schedd on the old master host and starts them on the new master host. The master host name listed in the file $SGE_ROOT/$SGE_CELL/common/act_qmaster is automatically changed to the new master host. If qmaster is not running, warning messages will appear and a delay of about one minute will occur until qmaster is started on the new host.

  3. Modify the shadow_masters file if necessary.

    Check if the $SGE_ROOT/$CELL/common/shadow_masters file exists. If the file exists, you can add the new qmaster host to this file and remove the old master host, depending on your requirements. Then stop and restart the sge_shadowd daemons by issuing the following commands on the respective machines:

    /etc/init.d/sgemaster -shadowd stop
    /etc/init.d/sgemaster -shadowd start

    Note –

    The location of the system-wide sgemaster startup script may differ on your operating system. You can always use $SGE_ROOT/default/common/sgemaster.

Important Notes about Migration

The migration procedure migrates to the host on which the sgemaster -migrate command is issued. If the file primary_qmaster exists, any subsequent calls of sgemaster on the machine contained in the primary_qmaster file will cause a migration back to that machine. To avoid such a situation, change or delete the $SGE_ROOT/$SGE_CELL/common/primary_qmaster file.

Note –

Existence of the primary_qmaster file does not imply that the qmaster is actually running.

Although jobs may continue to run during the migration procedure, the grid should be inactive. While the migration is taking place, any running SGE commands, such as qsub or qstat, will return an error.

If the current qmaster is down, there will be a delay in shutting down the scheduler until it times out waiting for contact with the qmaster.

The shadow_masters file has no direct effect on the migration procedure. This file only exists if one or more shadow masters have been configured. For more information on how to set up shadow masters, see Configuring Shadow Master Hosts.

ProcedureHow to Migrate qmaster to Another Host Manually

  1. On the current master host, stop the master daemon and the scheduler daemon by typing the following command:

    qconf -ks -km
  2. Edit the sge-root/cell/common/act_qmaster file according to the following guidelines:

    1. Confirm the new master host's name.

      To get the new master host name, type the following command on the new master host:

    2. In the act_qmaster file, replace the current host name with the new master host's name returned by the gethostname utility.

  3. On the new master host, start sge_qmaster and sge_schedd: