Sun N1 Grid Engine 6.1 Installation Guide

Chapter 6 Verifying the Installation

The verification phase includes the following tasks:

Verifying the Installation

To ensure that the grid engine system daemons are running, look for the sge_qmaster and sge_schedd daemons on the master host, and then the sge_execd daemon on the execution hosts. Once you have verified that the daemons are running, you can try to use commands, and prepare to submit jobs.


Note –

If no cell name was specified during installation, the value of cell is default.


ProcedureHow to Verify That the Daemons Are Running on the Master Host

  1. Log in to the master host.

    Look in the file sge-root/cell/common/act_qmaster to see if you really are on the master host.

  2. Verify that the daemons are running.

    • On BSD-based UNIX systems, type the following command.


      % ps -ax | grep sge
      
    • On systems running a UNIX System 5-based operating system (such as the Solaris Operating System), type the following command.


      % ps -ef | grep sge
      
  3. Verify that the daemons are running by looking through the output for sge strings that are similar to the following examples.

    Specifically, you should see that the sge_qmaster daemon and the sge_schedd daemon are running.

    • On a BSD-based UNIX system, you should see output such as the following example.


      14676 p1 S <  4:47 /gridware/sge/bin/solaris/sge_qmaster
      
      14678 p1 S <  9:22 /gridware/sge/bin/solaris/sge_schedd
    • On a UNIX System 5-based system, you should see output such as the following example.


      root 439 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_qmaster
      
      root 446 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_schedd
  4. If you do not see the appropriate strings, restart the daemons.

    To start the master host daemons, sge_qmaster and sge_schedd:


    # sge-root/cell/common/sgemaster  start
    
  5. Continue the verification process.

    After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands.

ProcedureHow to Verify That the Daemons Are Running on the Execution Hosts

  1. Log in to the execution hosts on which you ran the execution host installation procedure.

  2. Verify that the daemons are running.

    • On BSD-based UNIX systems, type the following command.


      % ps -ax | grep sge
      
    • On systems running a UNIX System 5--based operating system (such as the Solaris Operating System), type the following command.


      % ps -ef | grep sge
      
  3. Verify the daemons are running by looking for the sge_execd string in the output.

    Specifically, you should see that the sge_execd daemon is running.

    • On a BSD-based UNIX system, you should see output such as the following example.


      14688 p1 S <    4:27  /gridware/sge/bin/solaris/sge_execd
    • On a UNIX System 5-based system, such as the Solaris Operating System, you should see output such as the following example.


      root 171 1 0 Jun 22 ? 7:11 /gridware/sge/bin/solaris/sge_execd
  4. If you do not see similar output, restart the daemon.


    # sge-root/cell/common/sgeexecd  start
    
  5. Continue the verification process.

    After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands.

ProcedureHow to Run Simple Commands

If both the necessary daemons are running on the master and execution hosts, the grid engine software should be operational. Check by issuing a trial command.

  1. Log in to either the master host or another administrative host.

    In your standard search path, make sure to include sge-root/bin.

  2. From the command line, type the following command.


    % qconf -sconf
    

    This qconf command displays the current global cluster configuration (see Basic Cluster Configuration in Sun N1 Grid Engine 6.1 Administration Guide).

    If this command fails, your SGE_ROOT environment variable is not set correctly.

    1. Check whether the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set in the script files, sge-root/cell/common/settings.csh or sge-root/cell/common/settings.sh.


      Note –

      If no cell name was specified during installation, the value of cell is default.


      • If so, make sure that the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set to the correct value before you try the command again.

      • If not, verify whether your NIS services map contains entries for sge_qmaster and sge_execd.

        If the SGE_EXECD_PORT and SGE_QMASTER_PORT variables are not used in these files, then the services database, for example, /etc/services or the NIS services map, on the machine from which you run the command must provide entries for both sge_qmaster and sge_execd. If these entries do not exist, add an entry to the machine's services database, giving it the same value as is configured on the master host.

    2. Retry the qconf command.

  3. Try to submit test jobs.

ProcedureHow to Submit Test Jobs

Before you start submitting batch scripts to the grid engine system, check to see whether your site's standard shell resource files (.cshrc, .profile, or .kshrc) as well as your personal resource files contain commands such as stty. Batch jobs do not have a terminal connection by default, and therefore calls to stty result in an error.

  1. Log in to the master host.

  2. Type the following command.


    % rsh exec-host-name date
    

    exec-host-name refers to one of the already installed execution hosts. You should try this test on all execution hosts if your login or home directories differ from host to host. The rsh command should give you output similar to the date command run locally on the master host. If any additional lines contain error messages, you must fix the cause of the errors before you can run a batch job successfully.

    For all command interpreters you can check on an actual terminal connection before you run a command such as stty.

    The following is an example of a Bourne shell script to test the terminal connection.


    tty -s 
    if [ $? = 0 ]; then
       stty erase ^H
    fi
    

    The following example shows C Shell syntax.


    tty -s
    if ( $status = 0 ) then
       stty erase ^H
    endif
    
  3. Submit one of the sample scripts contained in the sge-root/examples/jobs directory.


    % qsub sge-root/examples/jobs/simple.sh
    
  4. Use the qstat command to monitor the job's behavior.

    For more information about submitting and monitoring batch jobs, see Submitting Batch Jobs in Sun N1 Grid Engine 6.1 User’s Guide.

  5. After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id.

    job-id is a consecutive unique integer number assigned to each job.

    In case of problems, see Chapter 9, Fine Tuning, Error Messages, and Troubleshooting, in Sun N1 Grid Engine 6.1 Administration Guide.