Sun N1 Grid Engine 6.1 Installation Guide

Appendix C Other N1 Grid Engine Installation Issues

Checking Linux Motif Libraries

On newer Linux systems, the libXm.so.2 Motif libraries are not always installed resulting in the inability to run the precompiled Linux qmon binary.

To correct this problem, use the following steps.

ProcedureInstalling Linux Motif Libraries

  1. Check if the libraries are already present by using this command.


    % ls -l /usr/X11R6/lib/libXm*

    If the /usr/X11R6/lib/libXm.so.2 points to a libXm.so.2.x version you are done. Note that a symbolic link to /usr/X11R6/lib/libXm.so.3 does not work.

    If the libraries are not present then continue following these steps.

  2. Download the corresponding openmotif libraries from http://www.ist.co.uk/DOWNLOADS/motif_download.html or from the SUSE 9.1 distribution (there is an additional rpm file called openmotif21-* available).

  3. Install the missing libraries as root. For Suse 9.1, you install the openmotif21-* package like any other package. For packages downloaded from For packages downloaded from http://www.ist.co.uk, install the libraries as in the following example.


    # rpm -i --prefix /tmp/test --force \
          openmotif-2.1.31-2_IST-JDS2003.i386.rpm
       # cd /tmp/test/OpenMotif-2.1.31/lib
       # cp libXm.so.2.1 /usr/X11R6/lib
       # cd /usr/X11R6/lib
       # ln -s libXm.so.2.1 libXm.so.2
  4. Test qmon.


    % ldd `which qmon`

Installing N1 Grid Engine on a System with IPMP

How to install the Grid Engine software on hosts with the Solaris Operating Environment IP Multipathing (IPMP) technology

What is IP Multipathing?

IP Multipathing is a technology that allows grouping of TCP/IP interfaces for fail over and load balancing purposes. If an interface within an IP Multipathing group fails, the interface is disabled and its IP address is relocated to another interface in the group. Outbound IP traffic is distributed across the interfaces of a group. For further details on IP Multipathing, refer to the Solaris Operating Environment documentation, which can be found at: http://docs.sun.com/db/doc/806-4075/6jd69oabu?a=view. The IPMP features overview can be found at: http://wwws.sun.com/software/solaris/ds/ds-netmultipath/index.html.

Issues between IPMP and Grid Engine

The only major issue is the error messages which occur when starting the Grid Engine daemons on a machine where the main interface is part of an IPMP group. This situation occurs when the IPMP load balancing distributes the connections across the interfaces in the group; therefore, the IP packets show up at the receiving end as coming from a different host rather than the one associated with the main interface. For example, on a machine with three interfaces named qfe0, qfe1, and qfe3, where the IP addresses for these interfaces are 10.1.1.1, 10.1.1.2 and 10.1.13 respectively, IPMP would need an extra address for each interface for testing. However that requirement is ignored in this example. Each of these addresses has a hostname associated with it. The hosts table looks like the following example:


10.1.1.1 sge
    10.1.1.2 sge-qfe1
    10.1.1.3 sge-qfe2

The machine's hostname is sge. When a connection is established from sge to another machine, it might go through sge, sge-qfe1 , or sge-qfe2. Upon installation, Grid Engine will only recognize sge. When Grid Engine receives a connection request from sge-qfe2, it closes the connection because the request is not from one of the authorized (or known) nodes.

You solve this problem by using the host_aliases files (see the sge_h_aliases man page for details). You can use this file to "tell" Grid Engine that sge, sge1, and sge-qfe2 are all from the same machine. The host_aliases file in this case would look like this:


sge sge-qfe1 sge-qfe2

Note that if you make any changes to the $SGE_ROOT/$SGE_CELL/common/host_aliases file, you must stop and restart all running Grid Engine daemons (sge_qmaster, sge_scheduler, and sge_execd). To do this, login as root to all your Grid Engine hosts and enter these commands :


/etc/init.d/sgemaster stop
/etc/init.d/sgeexecd stop
    /etc/init.d/sgemaster start
/etc/init.d/sgeexecd start

Installing the Grid Engine Master Node With IPMP

There are two ways you can fix this problem: one way is to ignore the error messages during installation. This method is operating system independent (except for MS Windows) The other way is to temporarily disable IPMP on the interface associated with the machine's hostname. This method only works on systems with Solaris 8 or greater Operating Environments.

    The ignoring the error messages procedure is as follows in these steps:

  1. Run the inst_sge -m command while ignoring the error messages during the start up of the daemons.

  2. Shutdown the daemons with the /etc/init.d/sgemaster stop and /etc/init.d/sgemaster stop commands. Due to the networking errors, some daemons fail to shutdown and must be killed with the kill -9 command. To see which daemons failed to shutdown use this command: ps -e | grep sge_.

  3. Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common directory.

  4. Restart the daemons with the /etc/init.d/sgemaster start and /etc/init.d/sgeexecd start commands.

The temporary disable method is as follows in these steps:

  1. Identify the interface associated with the machine's hostname.

  2. Verify that the interface has IPMP enabled with the ifconfig <<interface>> | grep groupname. command.

  3. Take note of the group name.

  4. Disable IPMP with this command: ifconfig <<interface>> group "" .

  5. Install the Grid Engine master node.

  6. Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common directory.

  7. Restart the daemons with the with the /etc/init.d/sgemaster and /etc/init.d/sgeexecd commands.

  8. Re-enable IPMP using the following command: ifconfig <<interface>> group <<IPMP group>>.

Installing a Grid Engine on an Execution Host With IPMP

Once the host_aliases file is installed and the Grid Engine daemons are restarted, you can simply start the execution host installation without further problems.

Enabling Administrative and Submit Hosts with IPMP

You have two choices when enabling these hosts with IPMP: