Sun N1 Grid Engine 6.1 User's Guide

Chapter 2 Navigating the Grid Engine System

This chapter describes how to display information about grid engine system components such as users, queues, hosts, and job attributes. The chapter also introduces some basic concepts and terminology that can help you begin to use the software. For complete background information about the product, see Chapter 1, Introduction to the N1TM Grid Engine 6.1 Software.

This chapter also includes instructions for accomplishing the following tasks:

QMON Main Control Window

The grid engine system features a graphical user interface (GUI) command tool, the QMON Main Control window. The QMON Main Control window enables users to perform most grid engine system functions, including submitting jobs, controlling jobs, and gathering important information.

Launching the QMON Main Control Window

To launch the QMON Main Control window, from the command line type the following command:


% qmon

After a message window is displayed, the QMON Main Control window appears.

Dialog box titled Main Control. Shows File, Task,
and Help menus. Shows 15 icon buttons.

See Figure 1–3 to identify the meaning of the icons. The names of the icon buttons appear on screen as you rest the pointer over the buttons. The names describe the functions of the buttons.

Many instructions in this guide call for using the QMON Main Control window.

Customizing QMON

The look and feel of QMON is largely defined by a specifically designed resource file. Reasonable defaults are compiled in sge-root/qmon/Qmon, which also includes a sample resource file.

The cluster administration can do any of the following:

Ask your administrator if any of these cases are relevant in your environment.

In addition, users can configure personal preferences. Users can modify the Qmon file. The Qmon file can be moved to the home directory or to another location pointed to by the private XAPPLRESDIR search path. Users can also include the necessary resource definitions in their private .Xdefaults or .Xresources files. A private Qmon resource file can also be installed using the xrdb command. The xrdb command can be used during operation. xrdb can also be used at startup of the X11 environment, for example, in a .xinitrc resource file.

Refer to the comment lines in the sample Qmon file for detailed information on the possible customizations.

You can also use the Job Customize and Queue Customize dialog boxes to customize qmon. These dialog boxes are shown in Customizing the Job Control Display and in Filtering Cluster Queues and Queue Instances. In both dialog boxes, users can use the Save button to store the filtering and display definitions to the file .qmon_preferences in their home directories. When QMON is restarted, this file is read, and QMON reactivates the previously defined behavior.

Users and User Categories

Users of the grid engine system fall into four categories. Users in each category have access to their own set of grid engine system commands.

Table 2–1 shows the command capabilities that are available to the different user categories.

Table 2–1 User Categories and Associated Command Capabilities

Command 

Manager 

Operator 

Owner 

User 

qacct

Full 

Full 

Own jobs only 

Own jobs only 

qalter

Full 

Full 

Own jobs only 

Own jobs only 

qconf

Full 

No system setup modifications 

Show only configurations and access permissions 

Show only configurations and access permissions 

qdel

Full 

Full 

Own jobs only 

Own jobs only 

qhold

Full 

Full 

Own jobs only 

Own jobs only 

qhost

Full 

Full 

Full 

Full 

qlogin

Full 

Full 

Full 

Full 

qmod

Full 

Full 

Own jobs and owned queues only 

Own jobs only 

qmon

Full 

No system setup modifications 

No configuration changes 

No configuration changes 

qrexec

Full 

Full 

Full 

Full 

qselect

Full 

Full 

Full 

Full 

qsh

Full 

Full 

Full 

Full 

qstat

Full 

Full 

Full 

Full 

qsub

Full 

Full 

Full 

Full 

User Access Permissions

The administrator can restrict access to queues and other facilities, such as parallel environment interfaces. Access can be restricted to certain users or user groups.


Note –

The grid engine software automatically takes into account the access restrictions configured by the cluster administration. The following sections are important only if you want to query your personal access permission.


For the purpose of restricting access permissions, the administrator creates and maintains access lists (ACLs). The ACLs contain user names and UNIX group names. The ACLs are then added to access-allowed or access-denied lists in the queue or in the parallel environment interface configurations. For more information, see the queue_conf(5) or sge_pe(5) man pages.

Users who belong to ACLs that are listed in access-allowed-lists have permission to access the queue or the parallel environment interface. Users who are members of ACLs in access-denied-lists cannot access the resource in question.

ACLs are also used to define projects, to which the corresponding users have access, that is, to which users can subordinate their jobs. The administrator can also restrict access to cluster resources on a per project basis.

The User Configuration dialog box opens when you click the User Configuration button in the QMON Main Control window. This dialog box enables you to query for the ACLs to which you have access. For details, see Chapter 4, Managing User Access, in Sun N1 Grid Engine 6.1 Administration Guide.

You can display project access by clicking the Project Configuration icon in the QMON Main Control window. Details are described in Defining Projects in Sun N1 Grid Engine 6.1 Administration Guide.

From the command line, you can get a list of the currently configured ACLs with the following command:


% qconf -sul

You can list the entries in one or more access lists with the following command:


% qconf -su acl-name[,...]

The ACLs consist of user account names and UNIX group names, with the UNIX group names identified by a prefixed @ sign. In this way, you can determine which ACLs your account belongs to.


Note –

If you have permission to switch your primary UNIX group with the newgrp command, your access permissions might change.


You can check for those queues or parallel environment interfaces to which you have access or to which your access is denied. Query the queue or parallel environment interface configuration, as described in Displaying Queues and Queue Properties and Configuring Parallel Environments With QMON in Sun N1 Grid Engine 6.1 Administration Guide.

The access-allowed-lists are named user_lists. The access-denied-lists are named xuser_lists. If your user account or primary UNIX group is associated with an access-allowed-list, you are allowed to access the resource in question. If you are associated with an access-denied-list, you cannot access the queue or parallel environment interface. If both lists are empty, every user with a valid account can access the resource in question.

You can control project configurations from the command line using the following commands:


% qconf -sprjl
% qconf -sprj project-name

These commands display a list of defined projects and a list of particular project configurations, respectively. The projects are defined through ACLs. You must query the ACL configurations, as described in the previous paragraph.

If you have access to a project, you are allowed to submit jobs that are subordinated to the project. You can submit such jobs from the command line using the following command:


% qsub -P project-name options

The cluster configurations, host configurations, and queue configurations define project access in the same way as for ACLs. These configurations use the project_lists and xproject_lists parameters for this purpose.

Managers, Operators, and Owners

Use the following command to display a list of grid engine system managers:


% qconf -sm

Use the following command to display a list of operators:


% qconf -so

Note –

The superuser of an administration host is considered to be a manager by default.


Users who are owners of a certain queue are contained in the queue configuration, as described in Displaying Queues and Queue Properties. You can display the queue configuration by typing the following command:


% qconf -sq {cluster-queue | queue-instance | queue-domain}

The queue configuration entry in question is called owner_list.

Displaying Queues and Queue Properties

To make the best use of the grid engine system at your site, you should be familiar with the queue structure. You should also be familiar with the properties of the queues that are configured for your grid engine system.

Displaying a List of Queues

The QMON Queue Control dialog box is shown and described in Monitoring and Controlling Queues With QMON. This dialog box provides a quick overview of the installed queues and their current status.

To display a list of queues, from the command line, type the following command.


% qconf -sql

Displaying Queue Properties

You can use either QMON or the command line to display queue properties.

ProcedureHow to Display Queue Properties With QMON

  1. Launch the QMON Main Control window.

  2. Click the Queue Control button.

    The Cluster Queue dialog box appears.

  3. Select a queue, and then click Show Detached Settings.

    The Browser dialog box appears.

  4. In the Browser dialog box, click Queue.

  5. In the Cluster Queue dialog box, click the Queue Instances tab.

  6. Select a queue instance.

    The Browser dialog box lists the queue properties for the selected queue instance.


Example 2–1 Queue Property Information

The following figure shows an example of some of the queue property information that is displayed.

Dialog box titled Browser. Shows list of queue
properties. Shows stdout, stderr, Queue, Job, and Messages buttons.

Displaying Queue Properties From the Command Line

To display queue properties from the command line, type the following command:


% qconf -sq {queue | queue-instance | queue-domain}

Information like that shown in the previous figure is displayed.

Interpreting Queue Property Information

You can find a detailed description of each queue property in the queue_conf(5) man page.

The following is a list of some of the more important parameters:

Hosts and Host Functionality

Clicking the Host Configuration button in the QMON Main Control window displays an overview of the functionality that is associated with the hosts in your cluster. However, without manager privileges, you cannot apply any changes to the configuration.

The host configuration dialog boxes are described in Chapter 1, Configuring Hosts and Clusters, in Sun N1 Grid Engine 6.1 Administration Guide. The following sections describe the commands used to retrieve host information from the command line.

Finding the Name of the Master Host

The location of the master host can migrate between the current master host and one of the shadow master hosts at any time. Therefore, the location of the master host should be transparent to the user.

With a text editor, open the sge-root/cell/common/act_qmaster file.

The name of the current master host is in the file.

Displaying a List of Execution Hosts

To display a list of hosts that are configured as execution hosts in your cluster, use the following commands:


% qconf -sel
% qconf -se hostname
% qhost

The qconf -sel command displays a list of the names of all hosts that are currently configured as execution hosts. The qconf -se command displays detailed information about the specified execution host. The qhost command displays status and load information about the execution hosts.

See the host_conf(5) man page for details on the information displayed using qconf. See the qhost(1) man page for details on its output and other options.

Displaying a List of Administration Hosts

Use the following command to display a list of hosts with administrative permission:


% qconf -sh

Displaying a List of Submit Hosts

Use the following command to display a list of submit hosts:


% qconf -ss

Requestable Attributes

When users submit a job, a requirement profile can be specified for the job. Users can specify attributes or characteristics of a host or queue that the job requires in order to run successfully. The grid engine software maps these job requirements onto the host and queue configurations of the cluster and therefore finds suitable hosts for a job.

The attributes that can be used to specify the job requirements are related to one of the following:

The attributes can also be derived from site policies such as the availability of installed software only on certain hosts.

The available attributes include the following:

For convenience, however, the administrator commonly chooses to define only a subset of all available attributes to be requestable.

The currently requestable attributes are displayed in the Requested Resources dialog box, which is shown in the following figure.

Dialog box titled Requested Resources. Shows
lists hard and soft resources, and a list of available resources.

Use the QMON Submit Job dialog box to access the Requested Resources dialog box. Requestable attributes are listed under Available Resources.

Displaying a List of Requestable Attributes

To display the list of configured resource attributes, from the command line type the following command:


% qconf -sc

The grid engine system complex contains the definitions for all resource attributes. For more information about resource attributes, see Chapter 3, Configuring Complex Resource Attributes, in Sun N1 Grid Engine 6.1 Administration Guide. See also the complex format description on the complex(5) man page.

Sample output from the qconf -sc command is shown in Example 2–2.


Example 2–2 Complex Attributes Displayed


gimli% qconf -sc
#name               shortcut   type        relop requestable consumable default  urgency 
#----------------------------------------------------------------------------------------
arch                a          RESTRING    ==    YES         NO         NONE     0
calendar            c          STRING      ==    YES         NO         NONE     0
cpu                 cpu        DOUBLE      >=    YES         NO         0        0
h_core              h_core     MEMORY      <=    YES         NO         0        0
h_cpu               h_cpu      TIME        <=    YES         NO         0:0:0    0
h_data              h_data     MEMORY      <=    YES         NO         0        0
h_fsize             h_fsize    MEMORY      <=    YES         NO         0        0
h_rss               h_rss      MEMORY      <=    YES         NO         0        0
h_rt                h_rt       TIME        <=    YES         NO         0:0:0    0
h_stack             h_stack    MEMORY      <=    YES         NO         0        0
h_vmem              h_vmem     MEMORY      <=    YES         NO         0        0
hostname            h          HOST        ==    YES         NO         NONE     0
load_avg            la         DOUBLE      >=    NO          NO         0        0
load_long           ll         DOUBLE      >=    NO          NO         0        0
load_medium         lm         DOUBLE      >=    NO          NO         0        0
load_short          ls         DOUBLE      >=    NO          NO         0        0
mem_free            mf         MEMORY      <=    YES         NO         0        0
mem_total           mt         MEMORY      <=    YES         NO         0        0
mem_used            mu         MEMORY      >=    YES         NO         0        0
min_cpu_interval    mci        TIME        <=    NO          NO         0:0:0    0
np_load_avg         nla        DOUBLE      >=    NO          NO         0        0
np_load_long        nll        DOUBLE      >=    NO          NO         0        0
np_load_medium      nlm        DOUBLE      >=    NO          NO         0        0
np_load_short       nls        DOUBLE      >=    NO          NO         0        0
num_proc            p          INT         ==    YES         NO         0        0
qname               q          STRING      ==    YES         NO         NONE     0
rerun               re         BOOL        ==    NO          NO         0        0
s_core              s_core     MEMORY      <=    YES         NO         0        0
s_cpu               s_cpu      TIME        <=    YES         NO         0:0:0    0
s_data              s_data     MEMORY      <=    YES         NO         0        0
s_fsize             s_fsize    MEMORY      <=    YES         NO         0        0
s_rss               s_rss      MEMORY      <=    YES         NO         0        0
s_rt                s_rt       TIME        <=    YES         NO         0:0:0    0
s_stack             s_stack    MEMORY      <=    YES         NO         0        0
s_vmem              s_vmem     MEMORY      <=    YES         NO         0        0
seq_no              seq        INT         ==    NO          NO         0        0
slots               s          INT         <=    YES         YES        1        1000
swap_free           sf         MEMORY      <=    YES         NO         0        0
swap_rate           sr         MEMORY      >=    YES         NO         0        0
swap_rsvd           srsv       MEMORY      >=    YES         NO         0        0
swap_total          st         MEMORY      <=    YES         NO         0        0
swap_used           su         MEMORY      >=    YES         NO         0        0
tmpdir              tmp        STRING      ==    NO          NO         NONE     0
virtual_free        vf         MEMORY      <=    YES         NO         0        0
virtual_total       vt         MEMORY      <=    YES         NO         0        0
virtual_used        vu         MEMORY      >=    YES         NO         0        0
# >#< starts a comment but comments are not saved across edits --------

The column name is identical to the first column displayed by the qconf -sq command. The shortcut column contains administrator-definable abbreviations for the full names in the first column. The user can supply either the full name or the shortcut in the request option of a qsub command.

The column requestable tells whether the resource attribute can be used in a qsub command. The administrator can, for example, disallow the cluster's users to request certain machines or queues for their jobs directly. The administrator can disallow direct requests by setting the entries qname, hostname, or both, to be unrequestable. Making queues or hosts unrequestable implies that feasible user requests can be met in general by multiple queues, which enforces the load balancing capabilities of the grid engine system.

The column relop defines the relational operator used to compute whether a queue or a host meets a user request. The comparison that is executed is as follows:


User_Request     relop     Queue/Host/... -Property

If the result of the comparison is false, the user's job cannot be run in the queue or on the host. For example, let the queue q1 be configured with a soft CPU time limit of 100 seconds. Let the queue q2 be configured to provide 1000 seconds soft CPU time limit. See the queue_conf(5) and the setrlimit(2) man pages for a description of user process limits.

The columns consumable and default affect how the administrator declares consumable resources. See Consumable Resources in Sun N1 Grid Engine 6.1 Administration Guide.

The user requests consumables just like any other attribute. The grid engine system internal bookkeeping for the resources is different, however.

Assume that a user submits the following request:


% qsub -l s_cpu=0:5:0 nastran.sh

The s_cpu=0:5:0 request asks for a queue that grants at least 5 minutes of soft limit CPU time. Therefore, only queues providing at least 5 minutes soft CPU runtime limit are set up properly to run the job. See the qsub(1) man page for details on the syntax.


Note –

The grid engine software considers workload information in the scheduling process only if more than one queue or host can run a job.