Sun N1 Grid Engine 6.1 Installation Guide

ProcedureHow to Install the Master Host

The master installation procedure creates the appropriate directory hierarchy required by sge_qmaster and sge_schedd. The procedure starts up the grid engine system daemons, sge_qmaster and sge_schedd, on the master host. The master host is also registered as a host with administrative and submit permission. The installation procedure creates a default configuration for the system on which it is run. The installation script queries the system for the type of operating system. The script then makes meaningful settings based on this information.

If, at any time during the installation, you think something went wrong, you can quit the installation procedure and restart it.

Before You Begin

If you have decided to use an administrative user, as described in User Names, you should create that user now. This procedure assumes that you have already extracted the grid engine software, as described in Loading the Distribution Files on a Workstation.


Note –

Windows hosts cannot act as master hosts.


  1. Log in to the master host as root.

  2. If the $SGE_ROOT environment variable is not set, set it by typing:


    # SGE_ROOT=sge-root; export SGE_ROOT
    

    To confirm that you have set the $SGE_ROOT environment variable, type:


    # echo $SGE_ROOT
    
  3. Change to the installation directory.

    • If the directory where the installation files reside is visible from the master host, change directories (cd) to the installation directory sge-root, and then proceed to Step 4.

    • If the directory is not visible and cannot be made visible, do the following:

      1. Create a local installation directory, sge-root, on the master host.

      2. Copy the installation files to the local installation directory sge-root across the network, for example, by using ftp or rcp.

      3. Change directories (cd) to the local sge-root directory.

  4. Type the install_qmaster command, adding the -csp flag if you are installing using the Certificate Security Protocol method described in Chapter 4, Installing the Increased Security Features.

    This command starts the master host installation procedure. You are asked several questions, and you might be required to run some administrative actions.


    % ./install_qmaster
    Welcome to the Grid Engine installation
    ---------------------------------------
    
    Grid Engine qmaster host installation
    -------------------------------------
    
    Before you continue with the installation please read these hints:
    
       - Your terminal window should have a size of at least
         80x24 characters
    
       - The INTR character is often bound to the key Ctrl-C.
         The term >Ctrl-C< is used during the installation if you
         have the possibility to abort the installation
    
    The qmaster installation procedure will take approximately 5-10 minutes.
    
    Hit <RETURN> to continue >> 
  5. Choose an administrative account owner.


    Choosing Grid Engine admin user account
    ---------------------------------------
    
    You may install Grid Engine that all files are created with the user id of an
    unprivileged user.
    
    This will make it possible to install and run Grid Engine in directories
    where user >root< has no permissions to create and write files and directories.
    
       - Grid Engine still has to be started by user >root<
    
       - this directory should be owned by the Grid Engine administrator
    
    Do you want to install Grid Engine
    under an user id other than >root< (y/n) [y] >> y
    

    Choosing a Grid Engine admin user name
    --------------------------------------
    
    Please enter a valid user name >> sgeadmin
    
    Installing Grid Engine as admin user >sgeadmin<
    
    Hit <RETURN> to continue >>
  6. Verify the sge-root directory setting.

    In the following example, the value of sge-root is /opt/n1ge6.


    Checking $SGE_ROOT directory
    ----------------------------
    
    The Grid Engine root directory is:
    
       $SGE_ROOT = /opt/n1ge6
    
    If this directory is not correct (e.g. it may contain an automounter
    prefix) enter the correct path to this directory or hit <RETURN>
    to use default [/opt/n1ge6] >> 
    
    Your $SGE_ROOT directory: /opt/n1ge6
    
    Hit <RETURN> to continue >> 
  7. Set up the TCP/IP services for the grid engine software.

    1. You will be notified if the TCP/IP services have not been configured.


      Grid Engine TCP/IP service >sge_qmaster<
      ----------------------------------------
      
      There is no service >sge_qmaster< available in your >/etc/services< file
      or in your NIS/NIS+ database.
      
      You may add this service now to your services database or choose a port number.
      It is recommended to add the service now. If you are using NIS/NIS+ you should
      add the service at your NIS/NIS+ server and not to the local >/etc/services<
      file.
      
      Please add an entry in the form
      
         sge_qmaster <port_number>/tcp
      
      to your services database and make sure to use an unused port number.
      
      Please add the service now or press <RETURN> to go to entering a port number >> 
    2. Start a new terminal session or window to add the information /etc/services file or your NIS maps.

    3. Add the correct ports to the /etc/services file or your NIS services map, as described in Network Services.

      The following example shows how you might edit your /etc/services file.


      ...
      sge_qmaster     6444/tcp
      sge_execd       6445/tcp
      

      Note –

      In this example, the entries for both sge_qmaster and sge_execd are added to /etc/services. Subsequent steps in this example assume that both entries have been made.


      Save your changes.

    4. Return to the window where the installation script is running.


      Please add the service now or press <RETURN> to go to entering a port number >> 

      Press the Return key. You will see the following output:


      sge_qmaster 6444
      
      Service >sge_qmaster< is now available.
      
      Hit <RETURN> to continue >> 

      Grid Engine TCP/IP service >sge_execd<
      --------------------------------------
      
      Using the service
      
         sge_execd
      
      for communication with Grid Engine.
      
      Hit <RETURN> to continue >> 
  8. Type the name of your cell.

    The use of grid engine system cells is described in Cells.


    Grid Engine cells
    -----------------
    
    Grid Engine supports multiple cells.
    
    If you are not planning to run multiple Grid Engine clusters or if you don't
    know yet what is a Grid Engine cell it is safe to keep the default cell name
    
       default
    
    If you want to install multiple cells you can enter a cell name now.
    
    The environment variable
    
       $SGE_CELL=<your_cell_name>
    
    will be set for all further Grid Engine commands.
    
    Enter cell name [default] >> 
    • If you have decided to use cells, type the cell name now.

    • If you have decided not to use cells, press the Return key to continue.


      Using cell >default<. 
      Hit <RETURN> to continue >> 

    Press the Return key to continue.

  9. Specify a spool directory.

    For guidelines on disk space requirements for the spool directory, see Disk Space Requirements. For information on where spool directory is installed, see Spool Directories Under the Root Directory.


    Grid Engine qmaster spool directory
    -----------------------------------
    
    The qmaster spool directory is the place where the qmaster daemon stores
    the configuration and the state of the queuing system.
    
    The admin user >sgeadmin< must have read/write access
    to the qmaster spool directory.
    
    If you will install shadow master hosts or if you want to be able to start
    the qmaster daemon on other hosts (see the corresponding section in the
    Grid Engine Installation and Administration Manual for details) the account
    on the shadow master hosts also needs read/write access to this directory.
    
    The following directory
    
    [/opt/n1ge6/default/spool/qmaster]
    
    will be used as qmaster spool directory by default!
    
    Do you want to select another qmaster spool directory (y/n) [n] >> 
    • If you want to accept the default spool directory, press the Return key to continue.

    • If you do not want to accept the default spool directory, then answer y.

      In the following example the /my/spool directory is specified as the master host spool directory.


      Do you want to select another qmaster spool directory (y/n) [n] >> y
      
      Please enter a qmaster spool directory now! >>/my/spool
      
  10. The next question concerns Windows-based execution hosts.

    If you do not plan to use Windows support, answer No. If you want Windows support, answer Yes.

    If you answer yes, you will be asked some Windows-specific questions further on in the installation process. These questions will be marked as WINDOWS ONLY.


    Windows Execution Host Support
    ------------------------------
                                                                                    
    Are you going to install Windows Execution Hosts? (y/n) [n]
  11. Verify or set the correct file permissions.

    If you used pkgadd or you know that the file permissions are correct, you should answer Yes. Answering No will direct the script to set the permissions for you as shown in the next step.


    Verifying and setting file permissions
    --------------------------------------
    
    Did you install this version with >pkgadd< or did you already
    verify and set the file permissions of your distribution (y/n) [y] >> y
    
  12. Set the correct file permissions.

    • WINDOWS ONLY – If you specified that you wanted Windows Execution Host support in the previous question, you should let the script set the file permissions for you. Answer No to the following question.


      Verifying and setting file permissions
      --------------------------------------
      
      Did you install this version with >pkgadd< or did you already
      verify and set the file permissions of your distribution (y/n) [y] >> 
      
      In some cases, eg: the binaries are stored on a NTFS or on any other 
      filesystem, which provides additional file permissions, the UNIX file 
      permissions can be wrong. In this case we would advise to verify and 
      to set the file permissions (enter: n) (y/n) [n] >>n
      
    • Verify and set file permissions.


      Verifying and setting file permissions
      --------------------------------------
      
      We may now verify and set the file permissions of your Grid Engine
      distribution.
      
      This may be useful since due to unpacking and copying of your distribution
      your files may be unaccessible to other users.
      
      We will set the permissions of directories and binaries to
      
         755 - that means executable are accessible for the world
      
      and for ordinary files to
      
         644 - that means readable for the world
      
      Do you want to verify and set your file permissions (y/n) [y] >> y
      

      Verifying and setting file permissions and owner in >3rd_party<
      Verifying and setting file permissions and owner in >bin<
      Verifying and setting file permissions and owner in >ckpt<
      Verifying and setting file permissions and owner in >examples<
      Verifying and setting file permissions and owner in >install_execd<
      Verifying and setting file permissions and owner in >install_qmaster<
      Verifying and setting file permissions and owner in >mpi<
      Verifying and setting file permissions and owner in >pvm<
      Verifying and setting file permissions and owner in >qmon<
      Verifying and setting file permissions and owner in >util<
      Verifying and setting file permissions and owner in >utilbin<
      Verifying and setting file permissions and owner in >catman<
      Verifying and setting file permissions and owner in >doc<
      Verifying and setting file permissions and owner in >man<
      Verifying and setting file permissions and owner in >inst_sge<
      Verifying and setting file permissions and owner in >bin<
      Verifying and setting file permissions and owner in >lib<
      Verifying and setting file permissions and owner in >utilbin<
      
      Your file permissions were set
      
      Hit <RETURN> to continue >> 
  13. Specify whether all of your grid engine system hosts are located in a single DNS domain.


    Select default Grid Engine hostname resolving method
    ----------------------------------------------------
    
    Are all hosts of your cluster in one DNS domain? If this is
    the case the hostnames
    
       >hostA< and >hostA.foo.com<
    
    would be treated as equal, because the DNS domain name >foo.com<
    is ignored when comparing hostnames.
    
    Are all hosts of your cluster in a single DNS domain (y/n) [y] >>   
    • If all of your grid engine system hosts are located in a single DNS domain, then answer y.


      Are all hosts of your cluster in a single DNS domain (y/n) [y] >> y 
      
      Ignoring domainname when comparing hostnames.
      
      Hit <RETURN> to continue >> 
    • If all of your grid engine system hosts are not located in a single DNS domain, then answer n.


      Are all hosts of your cluster in a single DNS domain (y/n) [y] >> n 
      
      The domainname is not ignored when comparing hostnames.
      
      Hit <RETURN> to continue >> 

      Default domain for hostnames
      ----------------------------
      
      Sometimes the primary hostname of machines returns the short hostname
      without a domain suffix like >foo.com<.
      
      This can cause problems with getting load values of your execution hosts.
      If you are using DNS or you are using domains in your >/etc/hosts< file or
      your NIS configuration it is usually safe to define a default domain
      because it is only used if your execution hosts return the short hostname
      as their primary name.
      
      If your execution hosts reside in more than one domain, the default domain
      parameter must be set on all execution hosts individually.
      
      Do you want to configure a default domain (y/n) [y] >> 

      Press the Return key to continue.

      1. If you want to specify a default domain, then answer y.

        In the following example, sun.com is specified as the default domain.


        Do you want to configure a default domain (y/n) [y] >> y
        
        
        Please enter your default domain >> sun.com
        
        Using >sun.com< as default domain. Hit <RETURN> to continue >>
      2. If you do not want to specify a default domain, then answer n.

        In the following example, sun.com is specified as the default domain.


        Do you want to configure a default domain (y/n) [y] >> n
        
  14. Press the Return key to continue.


    Making directories
    ------------------
    
    creating directory: default/common
    creating directory: /opt/n1ge6/default/spool/qmaster
    creating directory: /opt/n1ge6/default/spool/qmaster/job_scripts
    Hit <RETURN> to continue >> 
  15. Specify whether you want to use classic spooling or Berkeley DB.

    For more information on how to determine the type of spooling mechanism you want, please see Choosing Between Classic Spooling and Database Spooling.


    Setup spooling
    --------------
    Your SGE binaries are compiled to link the spooling libraries
    during runtime (dynamically). So you can choose between Berkeley DB 
    spooling and Classic spooling method.
    Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> 
    • Tto specify Berkeley DB spooling, press the Return key to continue.


      Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> 

      The Berkeley DB spooling method provides two configurations!
      
      1) Local spooling:
      The Berkeley DB spools into a local directory on this host (qmaster host)
      This setup is faster, but you can't setup a shadow master host
      
      2) Berkeley DB Spooling Server:
      If you want to setup a shadow master host, you need to use
      Berkeley DB Spooling Server!
      In this case you have to choose a host with a configured RPC service.
      The qmaster host connects via RPC to the Berkeley DB. This setup is more
      failsafe, but results in a clear potential security hole. RPC communication
      (as used by Berkeley DB) can be easily compromised. Please only use this
      alternative if your site is secure or if you are not concerned about
      security. Check the installation guide for further advice on how to achieve
      failsafety without compromising security.
      
      Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> 
      • To use a Berkeley DB spooling server, enter y.


        Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> y
        
        Berkeley DB Setup
        
        -----------------
        Please, log in to your Berkeley DB spooling host and execute "inst_sge -db"
        Please do not continue, before the Berkeley DB installation with
        "inst_sge -db" is completed, continue with <RETURN>
        

        Note –

        Do not press the Return key until you have completed the Berkeley DB installation on the spooling server.


        1. Start a new terminal session or window.

        2. Log in to the spooling server.

        3. Install the software, as described in How to Install the Berkeley DB Spooling Server.

        4. After you have installed the software on the spooling server, return to the master installation window, and press the Return key to continue.

        5. Type the name of the spooling server.

          In the following example, vector is the host name of the spooling server.


          Berkeley Database spooling parameters
          -------------------------------------
          
          Please enter the name of your Berkeley DB Spooling Server! >> vector
          
        6. Type the name of the spooling directory.

          In the following example, /opt/n1ge6/default/spooldb is the spooling directory.


          Please enter the Database Directory now!
          
          Default: [/opt/n1ge6/default/spooldb] >> 
          Dumping bootstrapping information
          Initializing spooling database
          
          Hit <RETURN> to continue >> 
      • If you do not want to use a Berkeley DB spooling server, type n.


        Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> n
        
        
        Hit <RETURN> to continue >> 

        Berkeley Database spooling parameters
        -------------------------------------
        
        Please enter the Database Directory now, even if you want to spool locally
        it is necessary to enter this Database Directory. 
        
        Default: [/opt/n1ge6/default/spool/spooldb] >> 

        Specify an alternate directory, or press the Return key to continue.


        creating directory: /opt/n1ge6/default/spool/spooldb
        Dumping bootstrapping information
        Initializing spooling database
        
        Hit <RETURN> to continue >> 
    • To specify classic spooling, type classic.


      Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic
      

      Dumping bootstrapping information
      Initializing spooling database
      
      Hit <RETURN> to continue >> 
  16. Type a group ID range

    For more information, see Group IDs.


    Grid Engine group id range
    --------------------------
    
    When jobs are started under the control of Grid Engine an additional group id
    is set on platforms which do not support jobs. This is done to provide maximum
    control for Grid Engine jobs.
    
    This additional UNIX group id range must be unused group id's in your system.
    Each job will be assigned a unique id during the time it is running.
    Therefore you need to provide a range of id's which will be assigned
    dynamically for jobs.
    
    The range must be big enough to provide enough numbers for the maximum number
    of Grid Engine jobs running at a single moment on a single host. E.g. a range
    like >20000-20100< means, that Grid Engine will use the group ids from
    20000-20100 and provides a range for 100 Grid Engine jobs at the same time
    on a single host.
    
    You can change at any time the group id range in your cluster configuration.
    
    Please enter a range >> 20000-20100
    
    Using >20000-20100< as gid range. Hit <RETURN> to continue >> 
  17. Verify the spooling directory for the execution daemon.

    For information on spooling, see Spool Directories Under the Root Directory.


    Grid Engine cluster configuration
    ---------------------------------
    
    Please give the basic configuration parameters of your Grid Engine
    installation:
    
       <execd_spool_dir>
    
    The pathname of the spool directory of the execution hosts. User >sgeadmin<
    must have the right to create this directory and to write into it.
    
    Default: [/opt/n1ge6/default/spool] >>  
  18. Type the email address of the user who should receive problem reports.

    In this example, the user who will receive problem reports is me@my.domain.


    Grid Engine cluster configuration (continued)
    ---------------------------------------------
    
    <administator_mail>
    
    The email address of the administrator to whom problem reports are sent.
    
    It's is recommended to configure this parameter. You may use >none<
    if you do not wish to receive administrator mail.
    
    Please enter an email address in the form >user@foo.com<.
    
    Default: [none] >> me@my.domain
    
  19. Verify the configuration parameters.


    The following parameters for the cluster configuration were configured:
    
       execd_spool_dir        /opt/n1ge6/default/spool
       administrator_mail     me@my.domain
    
    Do you want to change the configuration parameters (y/n) [n] >> n
    
    Creating local configuration
    ----------------------------
    Creating >act_qmaster< file
    Adding default complex attributes
    Reading in complex attributes.
    Adding default parallel environments (PE)
    Reading in parallel environments:
            PE "make".
    Adding SGE default usersets
    Reading in usersets:
            Userset "deadlineusers".
            Userset "defaultdepartment".
    Adding >sge_aliases< path aliases file
    Adding >qtask< qtcsh sample default request file
    Adding >sge_request< default submit options file
    Creating >sgemaster< script
    Creating >sgeexecd< script
    Creating settings files for >.profile/.cshrc<
    
    Hit <RETURN> to continue >> 
  20. WINDOWS-ONLY – If you specified that you want Windows support, you are asked to create Certificate Security Protocol (CSP) certificates.

    Read How to Install a CSP-Secured System for information about CSP certificates before you continue.

  21. Specify whether you want the daemons to start when the system is booted.


    qmaster/scheduler startup script
    --------------------------------
    
    We can install the startup script that will
    start qmaster/scheduler at machine boot (y/n) [y] >> y
    
    Installing startup script /etc/rc2.d/S95sgemaster
    
    Hit <RETURN> to continue >> 
    ...
  22. WINDOWS-ONLY – Add the Windows Administrator name to the SGE manager list.


    Windows Administrator Name
    --------------------------
                                                                                    
    For a later execution host installation it is recommended to add the
    Windows Administrator name to the SGE manager list
                                                                                    
    
    Please, enter the Windows Administrator name [Default: Administrator] >>
  23. Identify the hosts that you will later install as execution hosts.


    Adding Grid Engine hosts
    ------------------------
    
    Please now add the list of hosts, where you will later install your execution
    daemons. These hosts will be also added as valid submit hosts.
    
    Please enter a blank separated list of your execution hosts. You may
    press <RETURN> if the line is getting too long. Once you are finished
    simply press <RETURN> without entering a name.
    
    You also may prepare a file with the hostnames of the machines where you plan
    to install Grid Engine. This may be convenient if you are installing Grid
    Engine on many hosts.
    
    Do you want to use a file which contains the list of hosts (y/n) [n] >> n
    
    Adding admin and submit hosts
    -----------------------------
    
    Please enter a blank seperated list of hosts.
    
    Stop by entering <RETURN>. You may repeat this step until you are
    entering an empty list. You will see messages from Grid Engine
    when the hosts are added.
    
    Host(s): host1 host2 host3 host4
    
    host1 added to administrative host list
    host1 added to submit host list
    host2 added to administrative host list
    host2 added to submit host list
    host3 added to administrative host list
    host3 added to submit host list
    host4 added to administrative host list
    host4 added to submit host list
    Hit <RETURN> to continue >> 
    
    Creating the default <all.q> queue and <allhosts> hostgroup
    -----------------------------------------------------------
    
    root@vector added "@allhosts" to host group list
    root@vector added "all.q" to cluster queue list
    
    Hit <RETURN> to continue >> 
  24. Select a scheduler profile.

    For information on how to determine which profile you should use, see Scheduler Profiles.


    Scheduler Tuning
    ----------------
    
    The details on the different options are described in the manual. 
    
    Configurations
    --------------
    1) Normal
              Fixed interval scheduling, report scheduling information,
              actual + assumed load
    
    2) High
              Fixed interval scheduling, report limited scheduling information,
              actual load
    
    3) Max
              Scheduling on demand, report no scheduling information,
              actual load
    
    Enter the number of your preferred configuration and hit <RETURN>! 
    Default configuration is [1] >> 

    Once you answer this question, the installation process is complete. Several screens of information will be displayed before the script exits. The commands that are noted in those screens are also documented in this chapter.

  25. WINDOWS-ONLY – If you are using CSP mode, copy the certificate files to each execution host.

    You can use a script to perform this function.


    Tip –

    To use this functionality without being asked for a password, the root user should use rsh or ssh to access the execution hosts.



    Should the script try to copy the cert files, for you, to each
    execution host? (y/n) [y] >>
  26. Create the environment variables for use with the grid engine software.


    Note –

    If no cell name was specified during installation, the value of cell is default.


    • If you are using a C shell, type the following command:


      % source sge-root/cell/common/settings.csh
      
    • If you are using a Bourne shell or Korn shell, type the following command:


      $ . sge-root/cell/common/settings.sh
      
See Also

For details about how you can verify that the execution host has been set up correctly, see How to Verify That the Daemons Are Running on the Master Host.