Sun N1 System Manager 1.3 Grid Engine Provisioning and Monitoring Guide

Chapter 8 Troubleshooting N1 Grid Engine

This chapter tells you how to use the various alerts and the N1 Grid Engine daemon logs to troubleshoot a grid.

Using N1 Grid Engine Daemon Logs

You use the N1 Grid Engine Daemon Logs page to see a historical view of all the messages logged by the various N1 Grid Engine daemons. To see the log file for a particular host, click its host name. To see the log files for the system hosting the queue, click on a name in the QMASTER column.

Figure 8–1 Daemon Logs List Page

This page shows you the list of available daemon logs.

The log file for a particular host contains fields for a Flag, a Time Stamp, and a Message. The flag tells you what kind of message was logged. Flags exist for the following message types:

Use the loglevel parameter in the cluster configuration to specify on a global basis or a local basis what message types you want to log.

Troubleshooting Queues

You can use the information on the Queue Alerts page to troubleshoot any queue problems. You access this page from the Alerts table on the Overview page. Queue alerts are generated when the Queue Resource Limit parameters defined using the queue_conf command are exceeded.

Figure 8–2 Queue Alerts List Page

This page shows you the lists of queue alerts.

The three types of queue alerts are:

The Queue states are:

Troubleshooting Hosts

You can see potential host problems from the Host Alerts page. This page is available from the Alerts table on the Overview page.

Figure 8–3 Hosts Alerts List Page

This page shows you the list of host alerts.

The following host alert parameters can all be alarmed so that if they pass a specified threshhold, an alert will be generated and appear on the Overview Alerts table.

Troubleshooting Jobs

You can view potential job problems from the Job Alerts page. This page is available from the Alerts table on the Overview page. The Pending Time and Deadline job alert parameters can be alarmed so that if the values pass a specified threshold, an alert will be generated and appear on the Overview Alerts table.

Figure 8–4 Job Alerts List Page

This page shows you the list of job alerts.

The Job Alerts page shows the following information:

See the qstat man page for more information about alarms and thresh holds.