Sun N1 System Manager 1.3 Grid Engine Provisioning and Monitoring Guide

Chapter 5 Working With N1 Grid Engine Jobs

Each application running on the grid is considered a job. The following sections describe how you can check a job's state as well as it's utilization of resources and it's scheduling policy. This information is displayed in different views of a jobs data including and overview, a utilization view, and an allocation view. You can also see fine-grained information about each job including details about each job's composite tasks.

Checking a Job's State

Use the Jobs Overview tab as a quick way to check a job's State and see some of the factors that might affect its performance. Clicking a job ID displays a Job Details page that provides very detailed information.

Figure 5–1 Jobs Overview Tab

This tab shows you an overview of
all grid jobs.

The fields on the Job Overview tab include:

The Job User, Project, and Department are elements that you can use in an Entitlement policy (also known as a Ticket policy) to affect a job's dispatch priority. For example, jobs from one Department can always be entitled to have a higher dispatch priority than those from another Department.

Dispatch Priority is computed from three top-level scheduling policies: Entitlement, Urgency, and Custom (also known as POSIX) . For more detailed information on N1GE scheduling policies and dispatch priority, see the sge_priority man page and Scheduler Policies for Job Prioritization in the Sun N1 Grid Engine 6 System (www.sun.com/blueprints/1005/819-4325.html).

Checking Grid Resources

Use the Job Utilization View tab to display information that is relevant to a job's consumption of a grid computing resources as well as other elements that factor into a job's dispatch priority. Unlike the Overview view, only running and suspended jobs appear. In the Utilization view, the columns are as follows:

Figure 5–2 Job Utilization View Tab

This tab shows you the job utilization
view.


Note –

If the CPU usage or memory usage values are blank, the usage information for that job has not yet been reported. Check back at a later time to see if the usage is then reported.


For more information on the meaning of each column, see the QMON man page.

Normalized Priorities

The normalized ticket, urgency, and POSIX priorities are the three top level policies used by the N1GE Scheduler to determine a job's dispatch priority. Each calculate a factor that contributes to the overall priority. In order for these three policy contributions to be added together in a meaningful way, they are each normalized to a number between 0 and 1.

Checking Scheduling Policies

With the Job Allocation View tab, you can see information about the factors that constitute scheduling policies that contribute to the dispatch priority that a job enjoys. You can use this view to determine whether your priority policies are actually in effect and to troubleshoot the components that determine an job's overall priority in the queue.

A job's priority is determined based on three policies:

The first part of the equation, Tickets, tells you the calculations that the scheduler is making in order to implement the entitlement-oriented scheduling policy that has been configured. Tickets provide a window into the inner logical workings of the scheduler. This feature helps you to verify that whatever policy you wanted is in fact being obeyed. It also provides you with a means for diagnosing any problems or unexpected behavior you might be seeing.

From a high level, the number of tickets assigned to a job is directly proportional to the job's entitlement. The higher the number, the greater the entitlement. Jobs with a large entitlement often have a high priority, however, the overall priority is affected by the other two aspects as well unless you have deliberately turned off the urgency and custom policies In that case, only the entitlement ("tickets") policy is active.

The second part of the priority equation is Custom (also called POSIX) priority. An administrator can use this value to arbitrarily increase the priority of certain jobs.

The third part of the priority equation, Urgency, accounts for only the job's individual characteristics, not its owner. The urgency value is derived from the sum of three contributions: the deadline contribution, the wait-time contribution, and the resource requirement contribution.

For more detailed information on N1GE scheduling policies and dispatch priority, see the sge_priority man page and Scheduler Policies for Job Prioritization in the Sun N1 Grid Engine 6 System (www.sun.com/blueprints/1005/819-4325.html).

Figure 5–3 Job Allocation View Tab

This tab shows you the resources
allocated for a job.

The Job Allocation View page displays the following information:


Note –

You can see the normalized values for Tickets, POSIX, and Urgency using the Job Utilization View tab.


For more information on the meaning of each column, see the qmon man page.

Seeing Detailed Job Information

You can see complete details about a job by selecting the job ID on any of the job views tabs. The Job Details page that appears presents this information in three tables: General, Usage Details, and Schedule Details.

The General table provides details including various properties related to the jobs environment, resource requests, submit options, and so forth.

Figure 5–4 Job Details Page

This page shows you the complete
details for a particular job.

The Usage Details table shows the current resource utilization for that job. If this information is not available, for example, because the job started too recently or the job is still pending, then this table is empty. For jobs with multiple tasks, the usage of each task appears on a separate line.

The Schedule Details table shows the scheduling information for that job.

Most of the fields on this page are self-explanatory. For more information, see the qstat man page.

Seeing Detailed Task Information

The Task Details page contains four tables that provide detailed information about the selected task. This one details page contains information for each task that appears in the three job views tabs. All the information on this page is useful for diagnosing jobs that might be experiencing some kind of problem or issue.

Figure 5–5 Task Details Page

This page shows you the complete
details for a particular job task.

This Task Details page contains tables of information that correspond to a different file from the job spool directory. For more information on the information in the job spool directory, see the N1 Grid Engine 6 Administration manual. The tables are:

Task Summary Table

The Task Summary table tells you basic information about the job task.