This chapter describes how you can see information about the hosts constituting a grid. Information is available about all the hosts performance as well detailed information about any particular host.
You use the Hosts page both as a means of checking on how efficiently the hosts's resources are being used and as a way to access more details about the host itself.
The fields in the Host table have the following meaning:
Hostname – The name you assign to this host. Clicking the Hostname displays the very detailed Host Details page.
Arch – The host's processor architecture like win32-x86 or sol-sparc64. For a complete list of supported architectures, see the Host Details page.
Average Load/CPU – Shows how efficiently the Host's CPU is being used. This parameter can be any positive decimal number but is usually between zero and 2 or 3. Ideally, this number should be close to 1. A smaller number could mean the host is under-utilized, and a larger number could mean that the host is overutilized. The ideal value depends on the workload that is being run. Only the local administrator can really know the implications of the workload.
Used Mem – The percentage of total memory currently being used to execute jobs. If this value is too close to the total memory, then the host is possibly in trouble. However, if the workloads are tuned to fit in the server, then it could be perfectly fine that the used memory is just under the total memory. In fact, this is tunable: you can set the value at which the difference between these two parameters triggers an alarm. So, in one case, a difference of less than 100 MB triggers a warning, while in another case the value could be set at 25 MB.
Total Mem – The total amount of memory on this host.
Free Swap – The amount of free swap space left on this host measured in MBs. In a well-architected grid, the free swap space should never drop very far below its initial value. It is possible that temporary drops in this value can be tolerated, again, depending on how the grid is architected. If this value goes close to zero, the host is in danger of failing completely.
The Host Details page contains detailed information about the host system that is helping to execute a job and is hosting a queue.
The Host Details page contains the following information:
Hostname – The name assigned to this host.
Arch – An architecture string compiled into the cod_execd describing the operating system architecture for which the execd is targeted. Possible values are:
sol-sparc for Sun Solaris (Sparc) 7 and higher, 32–bit kernel
sol-sparc64 for Sun Solaris (Sparc) 7 and higher, 64–bit kerne
sol-x86 for Sun Solaris (x86) 8 and higher
x24-amd64 for Linux 2.4.x (AMD64) glibc 2.2+ based
lx24-x86 for Linux 2.4.x (x86) glibc 2.2+ based
win-x86 for MS-Windows NT
An sge_execd daemon for a particular architecture may run on multiple OS versions but the architecture string does not reveal this level of detail.
Num Proc – The number of processors provided by the execution host. The host is in this case defined by a single Internet address, for example, rack-mounted multihost systems are counted as a cluster rather than a single multiheaded machine.
Load Avg – The same as Load Medium.
Load Short – The short time average OS run queue length. This value is the first of the values triple reported by the uptime command. Many implementations provide a one minute average.
Load Medium – The medium time average OS run queue length. This value is the second of the values triple reported by the uptime command. Many implementations provide a 5 minute average with this value.
Load Long — The long time average OS run queue length. This value is the third of the values triple reported by the uptime command. Many implementations provide a 10 or 15 minutes.
NP Load Avg – The same as Load Medium.
NP Load Short – The same as Load Short but divided by the number of processors. This value allows you to compare the load of single and multiheaded hosts.
NP Load Medium – The same as Load Medium but divided by the number of processors. This value allows you to compare the load of single and multiheaded hosts.
NP Load Long – The same as Load Long but divided by the number of processors. This value allows you to compare the load of single and multiheaded hosts.
Memory Free – The amount of free memory.
Memory Used – The amount of used memory.
Memory Total – The total amount of memory (free plus used).
Swap Free – The amount of free swap memory.
Swap Used – The amount of used swap space.
Swap Total – The total amount of swap space (free plus used).
Virtual Free – The sum of Mem Free and Swap Free.
Virtual Used – The sum of Mem Used and SwapUsed.
Virtual Total – The sum of Mem Total and Swap Total.
CPU – The percentage of CPU of cpu busy time when the data was gathered.
Date/Time – The timestamp for when the data was gathered.
For more information about configuring execution host parameters in the Configuring Execution Hosts With QMON chapter of the N1 Grid Engine Administrators Guide on docs.sun.com.