How to install the Grid Engine software on hosts with the Solaris Operating Environment IP Multipathing (IPMP) technology
IP Multipathing is a technology that allows grouping of TCP/IP interfaces for fail over and load balancing purposes. If an interface within an IP Multipathing group fails, the interface is disabled and its IP address is relocated to another interface in the group. Outbound IP traffic is distributed across the interfaces of a group. For further details on IP Multipathing, refer to the Solaris Operating Environment documentation, which can be found at: http://docs.sun.com/db/doc/806-4075/6jd69oabu?a=view. The IPMP features overview can be found at: http://wwws.sun.com/software/solaris/ds/ds-netmultipath/index.html.
The only major issue is the error messages which occur when starting the Grid Engine daemons on a machine where the main interface is part of an IPMP group. This situation occurs when the IPMP load balancing distributes the connections across the interfaces in the group; therefore, the IP packets show up at the receiving end as coming from a different host rather than the one associated with the main interface. For example, on a machine with three interfaces named qfe0, qfe1, and qfe3, where the IP addresses for these interfaces are 10.1.1.1, 10.1.1.2 and 10.1.13 respectively, IPMP would need an extra address for each interface for testing. However that requirement is ignored in this example. Each of these addresses has a hostname associated with it. The hosts table looks like the following example:
10.1.1.1 sge 10.1.1.2 sge-qfe1 10.1.1.3 sge-qfe2 |
The machine's hostname is sge. When a connection is established from sge to another machine, it might go through sge, sge-qfe1 , or sge-qfe2. Upon installation, Grid Engine will only recognize sge. When Grid Engine receives a connection request from sge-qfe2, it closes the connection because the request is not from one of the authorized (or known) nodes.
You solve this problem by using the host_aliases files (see the sge_h_aliases man page for details). You can use this file to "tell" Grid Engine that sge, sge1, and sge-qfe2 are all from the same machine. The host_aliases file in this case would look like this:
sge sge-qfe1 sge-qfe2 |
Note that if you make any changes to the $SGE_ROOT/$SGE_CELL/common/host_aliases file, you must stop and restart all running Grid Engine daemons (sge_qmaster, sge_scheduler, and sge_execd). To do this, login as root to all your Grid Engine hosts and enter these commands :
/etc/init.d/sgemaster stop /etc/init.d/sgeexecd stop /etc/init.d/sgemaster start /etc/init.d/sgeexecd start |
There are two ways you can fix this problem: one way is to ignore the error messages during installation. This method is operating system independent (except for MS Windows) The other way is to temporarily disable IPMP on the interface associated with the machine's hostname. This method only works on systems with Solaris 8 or greater Operating Environments.
The ignoring the error messages procedure is as follows in these steps:
Run the inst_sge -m command while ignoring the error messages during the start up of the daemons.
Shutdown the daemons with the /etc/init.d/sgemaster stop and /etc/init.d/sgemaster stop commands. Due to the networking errors, some daemons fail to shutdown and must be killed with the kill -9 command. To see which daemons failed to shutdown use this command: ps -e | grep sge_.
Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common directory.
Restart the daemons with the /etc/init.d/sgemaster start and /etc/init.d/sgeexecd start commands.
The temporary disable method is as follows in these steps:
Identify the interface associated with the machine's hostname.
Verify that the interface has IPMP enabled with the ifconfig <<interface>> | grep groupname. command.
Take note of the group name.
Disable IPMP with this command: ifconfig <<interface>> group "" .
Install the Grid Engine master node.
Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common directory.
Restart the daemons with the with the /etc/init.d/sgemaster and /etc/init.d/sgeexecd commands.
Re-enable IPMP using the following command: ifconfig <<interface>> group <<IPMP group>>.
Once the host_aliases file is installed and the Grid Engine daemons are restarted, you can simply start the execution host installation without further problems.
You have two choices when enabling these hosts with IPMP:
Follow the same procedure used for the execution host (updating the host_aliases file before installation.)
Add all the hostnames associated with the administrative or submit host with one of the following commands:
qconf -ah <<hostname>> <<alias 1>> <<alias 2>> ... (for the administrative host) qconf -as <<hostname>> <<alias 1>> <<alias 2>> ... (for the submit host) |