You can see potential host problems from the Host Alerts page. This page is available from the Alerts table on the Overview page.
The following host alert parameters can all be alarmed so that if they pass a specified threshhold, an alert will be generated and appear on the Overview Alerts table.
Load Per CPU – Shows how efficiently the Host's CPU is being used. This parameter can be any positive decimal number but is usually between zero and 2 or 3. Ideally, this number should be close to 1. A smaller number could mean the host is under utilized, and a larger number could mean the host is overutilized. The ideal value depends on the workload that is being run. Only the local administrator can really know the implications of the workload.
Used Mem. – The percentage of total memory currently being used to execute jobs. If the used memory is too close to the total memory, then the host could be in trouble. However, if the workloads are tuned to fit in the server, then it could be perfectly fine that the used memory is just under the total memory. In fact, this is tunable. You can set the value at which the difference between these two parameters triggers an alarm. So, in one case, a difference of less than 100 MB triggers a warning, while in another case it could be at 25 MB.
Total Mem. – The total amount of memory on this host.
Swap Used – The amount of free swap space left on this host measured in MBs. In a well-architected grid, the free swap space should never drop very far below its initial value. It is possible that temporary drops in this value can be tolerated depending on how the grid is architected. If this value goes close to zero, then the host is in danger of failing completely.
Date/Time – The timestamp for when the alert was generated.