Chapter 4 Monitoring N1 Grid Engine

This chapter tells you how to get a snapshot of a grid's performance, and how to view details about cluster queues and different types of N1 Grid Engine alerts. All these features are available from the N1 Grid Engine Monitor GUI.

To actually manage applications using N1GE, you must use the various tools and commands available from N1GE itself. For example, you can use N1GE Monitor GUI to view the status of a submitted job but you cannot actually submit a job from this GUI.

Quickly Viewing Grid Performance

You use the Overview tab to view a quick picture of the health of your grid. This tab displays the Monitoring Overview page which shows three tables that have Summary status, Cluster queue information, and aggregated Alerts for Queues, Hosts, and Jobs.

You should reload this page to get the freshest data.

Figure 4–1 N1 Grid Engine Monitoring Overview Page

Summary Status Table

The Summary Status table shows the total number of jobs in the grid in various states: pending, running, suspended, and so forth). It also shows the load averaged across all compute hosts and the total amount of used and installed memory summed over all compute hosts.

Cluster Queues Table

Throughout its duration, a running job is associated with its queue. Queues provide a way to define various job execution parameters that apply to multiple hosts. You can think of an N1GE queue as a container, or description, for a class of jobs. Queues that span multiple execution hosts are sometimes referred to as cluster queues.

The Cluster Queues table shows a summary of the state of all the cluster queues configured on the grid. The slots are indicative of general performance. The states indicate which queues are running various potential error states. The fields include:

For information on cluster queues, see the Monitoring and Controlling Queues section in the N1GE 6 User's Guide and the qmon man page. For more information on queue states see the Queue Alerts.

Alerts Table

The Alerts table displays a quick look at potential or actual problems with the grid. You receive alerts when any of these categories generates a warning, an error, or becomes disabled. Clicking on a category displays the Alert page for that category which contains a table of alerts with additional information. Categories include:

Sorting and Pagination Controls

Items display ten rows at a time. You can see the entire list by using the pagination controls at the bottom of the table. By default, rows are displayed numerically by job ID, but you can use any column to change the ordering of the rows. Clicking on a column header sorts the rows according to the values in that column. Clicking on the column header again reverses the sort. The sorting is preserved across pages if you click on a pagination button.