This chapter describes how to display information about grid engine system components such as users, queues, hosts, and job attributes. The chapter also introduces some basic concepts and terminology that can help you begin to use the software. For complete background information about the product, see Chapter 1, Introduction to the N1TM Grid Engine 6.1 Software.
This chapter also includes instructions for accomplishing the following tasks:
The grid engine system features a graphical user interface (GUI) command tool, the QMON Main Control window. The QMON Main Control window enables users to perform most grid engine system functions, including submitting jobs, controlling jobs, and gathering important information.
To launch the QMON Main Control window, from the command line type the following command:
% qmon |
After a message window is displayed, the QMON Main Control window appears.
See Figure 1–3 to identify the meaning of the icons. The names of the icon buttons appear on screen as you rest the pointer over the buttons. The names describe the functions of the buttons.
Many instructions in this guide call for using the QMON Main Control window.
The look and feel of QMON is largely defined by a specifically designed resource file. Reasonable defaults are compiled in sge-root/qmon/Qmon, which also includes a sample resource file.
The cluster administration can do any of the following:
Install site-specific defaults in standard locations such as /usr/lib/X11/app-defaults/Qmon
Include QMON–specific resource definitions in the standard .Xdefaults or .Xresources files
Put a site-specific Qmon file in a location referenced by standard search paths such as XAPPLRESDIR
Ask your administrator if any of these cases are relevant in your environment.
In addition, users can configure personal preferences. Users can modify the Qmon file. The Qmon file can be moved to the home directory or to another location pointed to by the private XAPPLRESDIR search path. Users can also include the necessary resource definitions in their private .Xdefaults or .Xresources files. A private Qmon resource file can also be installed using the xrdb command. The xrdb command can be used during operation. xrdb can also be used at startup of the X11 environment, for example, in a .xinitrc resource file.
Refer to the comment lines in the sample Qmon file for detailed information on the possible customizations.
You can also use the Job Customize and Queue Customize dialog boxes to customize qmon. These dialog boxes are shown in Customizing the Job Control Display and in Filtering Cluster Queues and Queue Instances. In both dialog boxes, users can use the Save button to store the filtering and display definitions to the file .qmon_preferences in their home directories. When QMON is restarted, this file is read, and QMON reactivates the previously defined behavior.
Users of the grid engine system fall into four categories. Users in each category have access to their own set of grid engine system commands.
Managers – Managers have full capabilities to manipulate the grid engine system. By default, the superusers of all administration hosts have manager privileges.
Operators – Operators can perform many of the same commands as managers, with the exception of making configuration changes, for example, adding, deleting, or modifying queues.
Owners – Queue owners can suspend or enable the queues that they own. Queue owners can also suspend or enable the jobs within the queues they own. Queue owners have no other management permissions.
Users – Users have certain access permissions, as described in User Access Permissions. Users have no cluster management or queue management capabilities.
Table 2–1 shows the command capabilities that are available to the different user categories.
Table 2–1 User Categories and Associated Command Capabilities
Command |
Manager |
Operator |
Owner |
User |
---|---|---|---|---|
qacct |
Full |
Full |
Own jobs only |
Own jobs only |
qalter |
Full |
Full |
Own jobs only |
Own jobs only |
qconf |
Full |
No system setup modifications |
Show only configurations and access permissions |
Show only configurations and access permissions |
qdel |
Full |
Full |
Own jobs only |
Own jobs only |
qhold |
Full |
Full |
Own jobs only |
Own jobs only |
qhost |
Full |
Full |
Full |
Full |
qlogin |
Full |
Full |
Full |
Full |
qmod |
Full |
Full |
Own jobs and owned queues only |
Own jobs only |
qmon |
Full |
No system setup modifications |
No configuration changes |
No configuration changes |
qrexec |
Full |
Full |
Full |
Full |
qselect |
Full |
Full |
Full |
Full |
qsh |
Full |
Full |
Full |
Full |
qstat |
Full |
Full |
Full |
Full |
qsub |
Full |
Full |
Full |
Full |
The administrator can restrict access to queues and other facilities, such as parallel environment interfaces. Access can be restricted to certain users or user groups.
The grid engine software automatically takes into account the access restrictions configured by the cluster administration. The following sections are important only if you want to query your personal access permission.
For the purpose of restricting access permissions, the administrator creates and maintains access lists (ACLs). The ACLs contain user names and UNIX group names. The ACLs are then added to access-allowed or access-denied lists in the queue or in the parallel environment interface configurations. For more information, see the queue_conf(5) or sge_pe(5) man pages.
Users who belong to ACLs that are listed in access-allowed-lists have permission to access the queue or the parallel environment interface. Users who are members of ACLs in access-denied-lists cannot access the resource in question.
ACLs are also used to define projects, to which the corresponding users have access, that is, to which users can subordinate their jobs. The administrator can also restrict access to cluster resources on a per project basis.
The User Configuration dialog box opens when you click the User Configuration button in the QMON Main Control window. This dialog box enables you to query for the ACLs to which you have access. For details, see Chapter 4, Managing User Access, in Sun N1 Grid Engine 6.1 Administration Guide.
You can display project access by clicking the Project Configuration icon in the QMON Main Control window. Details are described in Defining Projects in Sun N1 Grid Engine 6.1 Administration Guide.
From the command line, you can get a list of the currently configured ACLs with the following command:
% qconf -sul |
You can list the entries in one or more access lists with the following command:
% qconf -su acl-name[,...] |
The ACLs consist of user account names and UNIX group names, with the UNIX group names identified by a prefixed @ sign. In this way, you can determine which ACLs your account belongs to.
If you have permission to switch your primary UNIX group with the newgrp command, your access permissions might change.
You can check for those queues or parallel environment interfaces to which you have access or to which your access is denied. Query the queue or parallel environment interface configuration, as described in Displaying Queues and Queue Properties and Configuring Parallel Environments With QMON in Sun N1 Grid Engine 6.1 Administration Guide.
The access-allowed-lists are named user_lists. The access-denied-lists are named xuser_lists. If your user account or primary UNIX group is associated with an access-allowed-list, you are allowed to access the resource in question. If you are associated with an access-denied-list, you cannot access the queue or parallel environment interface. If both lists are empty, every user with a valid account can access the resource in question.
You can control project configurations from the command line using the following commands:
% qconf -sprjl % qconf -sprj project-name |
These commands display a list of defined projects and a list of particular project configurations, respectively. The projects are defined through ACLs. You must query the ACL configurations, as described in the previous paragraph.
If you have access to a project, you are allowed to submit jobs that are subordinated to the project. You can submit such jobs from the command line using the following command:
% qsub -P project-name options |
The cluster configurations, host configurations, and queue configurations define project access in the same way as for ACLs. These configurations use the project_lists and xproject_lists parameters for this purpose.
Use the following command to display a list of grid engine system managers:
% qconf -sm |
Use the following command to display a list of operators:
% qconf -so |
The superuser of an administration host is considered to be a manager by default.
Users who are owners of a certain queue are contained in the queue configuration, as described in Displaying Queues and Queue Properties. You can display the queue configuration by typing the following command:
% qconf -sq {cluster-queue | queue-instance | queue-domain} |
The queue configuration entry in question is called owner_list.
To make the best use of the grid engine system at your site, you should be familiar with the queue structure. You should also be familiar with the properties of the queues that are configured for your grid engine system.
The QMON Queue Control dialog box is shown and described in Monitoring and Controlling Queues With QMON. This dialog box provides a quick overview of the installed queues and their current status.
To display a list of queues, from the command line, type the following command.
% qconf -sql |
You can use either QMON or the command line to display queue properties.
Launch the QMON Main Control window.
Click the Queue Control button.
The Cluster Queue dialog box appears.
Select a queue, and then click Show Detached Settings.
The Browser dialog box appears.
In the Browser dialog box, click Queue.
In the Cluster Queue dialog box, click the Queue Instances tab.
Select a queue instance.
The Browser dialog box lists the queue properties for the selected queue instance.
The following figure shows an example of some of the queue property information that is displayed.
To display queue properties from the command line, type the following command:
% qconf -sq {queue | queue-instance | queue-domain} |
Information like that shown in the previous figure is displayed.
You can find a detailed description of each queue property in the queue_conf(5) man page.
The following is a list of some of the more important parameters:
qname – The queue name as requested.
hostlist – A list of hosts and host groups associated with the queue.
processors – The processors of a multiprocessor system to which the queue has access.
Do not change this value unless you are certain that you need to change it.
qtype – The type of job that can run in this queue. Currently, type can be either batch or interactive.
slots – The number of jobs that can be executed concurrently in that queue.
owner_list – The owners of the queue, which is explained in Managers, Operators, and Owners
user_lists – The user or group identifiers in the user access lists who are listed under this parameter can access the queue. For more information, see User Access Permissions.
xuser_lists – The user or group identifiers in the user access lists who are listed under this parameter cannot access the queue. For more information, see User Access Permissions.
project_lists – Jobs submitted with the project identifiers that are listed under this parameter can access the queue. For more information, see Defining Projects in Sun N1 Grid Engine 6.1 Administration Guide.
xproject_lists – Jobs submitted with the project identifiers that are listed under this parameter cannot access the queue. For more information, see Defining Projects in Sun N1 Grid Engine 6.1 Administration Guide.
complex_values – Assigns capacities as provided for this queue for certain complex resource attributes. For more information, see Requestable Attributes.
Clicking the Host Configuration button in the QMON Main Control window displays an overview of the functionality that is associated with the hosts in your cluster. However, without manager privileges, you cannot apply any changes to the configuration.
The host configuration dialog boxes are described in Chapter 1, Configuring Hosts and Clusters, in Sun N1 Grid Engine 6.1 Administration Guide. The following sections describe the commands used to retrieve host information from the command line.
The location of the master host can migrate between the current master host and one of the shadow master hosts at any time. Therefore, the location of the master host should be transparent to the user.
With a text editor, open the sge-root/cell/common/act_qmaster file.
The name of the current master host is in the file.
To display a list of hosts that are configured as execution hosts in your cluster, use the following commands:
% qconf -sel % qconf -se hostname % qhost |
The qconf -sel command displays a list of the names of all hosts that are currently configured as execution hosts. The qconf -se command displays detailed information about the specified execution host. The qhost command displays status and load information about the execution hosts.
See the host_conf(5) man page for details on the information displayed using qconf. See the qhost(1) man page for details on its output and other options.
Use the following command to display a list of hosts with administrative permission:
% qconf -sh |
Use the following command to display a list of submit hosts:
% qconf -ss |
When users submit a job, a requirement profile can be specified for the job. Users can specify attributes or characteristics of a host or queue that the job requires in order to run successfully. The grid engine software maps these job requirements onto the host and queue configurations of the cluster and therefore finds suitable hosts for a job.
The attributes that can be used to specify the job requirements are related to one of the following:
The cluster, for example, space required on a network shared disk
Individual hosts, for example, operating system architecture
Queues, for example, permitted CPU time
The attributes can also be derived from site policies such as the availability of installed software only on certain hosts.
The available attributes include the following:
Queue property list – See Displaying Queues and Queue Properties
List of global and host-related attributes – See Assigning Resource Attributes to Queues, Hosts, and the Global Cluster in Sun N1 Grid Engine 6.1 Administration Guide
Administrator-defined attributes
For convenience, however, the administrator commonly chooses to define only a subset of all available attributes to be requestable.
The currently requestable attributes are displayed in the Requested Resources dialog box, which is shown in the following figure.
Use the QMON Submit Job dialog box to access the Requested Resources dialog box. Requestable attributes are listed under Available Resources.
To display the list of configured resource attributes, from the command line type the following command:
% qconf -sc |
The grid engine system complex contains the definitions for all resource attributes. For more information about resource attributes, see Chapter 3, Configuring Complex Resource Attributes, in Sun N1 Grid Engine 6.1 Administration Guide. See also the complex format description on the complex(5) man page.
Sample output from the qconf -sc command is shown in Example 2–2.
gimli% qconf -sc #name shortcut type relop requestable consumable default urgency #---------------------------------------------------------------------------------------- arch a RESTRING == YES NO NONE 0 calendar c STRING == YES NO NONE 0 cpu cpu DOUBLE >= YES NO 0 0 h_core h_core MEMORY <= YES NO 0 0 h_cpu h_cpu TIME <= YES NO 0:0:0 0 h_data h_data MEMORY <= YES NO 0 0 h_fsize h_fsize MEMORY <= YES NO 0 0 h_rss h_rss MEMORY <= YES NO 0 0 h_rt h_rt TIME <= YES NO 0:0:0 0 h_stack h_stack MEMORY <= YES NO 0 0 h_vmem h_vmem MEMORY <= YES NO 0 0 hostname h HOST == YES NO NONE 0 load_avg la DOUBLE >= NO NO 0 0 load_long ll DOUBLE >= NO NO 0 0 load_medium lm DOUBLE >= NO NO 0 0 load_short ls DOUBLE >= NO NO 0 0 mem_free mf MEMORY <= YES NO 0 0 mem_total mt MEMORY <= YES NO 0 0 mem_used mu MEMORY >= YES NO 0 0 min_cpu_interval mci TIME <= NO NO 0:0:0 0 np_load_avg nla DOUBLE >= NO NO 0 0 np_load_long nll DOUBLE >= NO NO 0 0 np_load_medium nlm DOUBLE >= NO NO 0 0 np_load_short nls DOUBLE >= NO NO 0 0 num_proc p INT == YES NO 0 0 qname q STRING == YES NO NONE 0 rerun re BOOL == NO NO 0 0 s_core s_core MEMORY <= YES NO 0 0 s_cpu s_cpu TIME <= YES NO 0:0:0 0 s_data s_data MEMORY <= YES NO 0 0 s_fsize s_fsize MEMORY <= YES NO 0 0 s_rss s_rss MEMORY <= YES NO 0 0 s_rt s_rt TIME <= YES NO 0:0:0 0 s_stack s_stack MEMORY <= YES NO 0 0 s_vmem s_vmem MEMORY <= YES NO 0 0 seq_no seq INT == NO NO 0 0 slots s INT <= YES YES 1 1000 swap_free sf MEMORY <= YES NO 0 0 swap_rate sr MEMORY >= YES NO 0 0 swap_rsvd srsv MEMORY >= YES NO 0 0 swap_total st MEMORY <= YES NO 0 0 swap_used su MEMORY >= YES NO 0 0 tmpdir tmp STRING == NO NO NONE 0 virtual_free vf MEMORY <= YES NO 0 0 virtual_total vt MEMORY <= YES NO 0 0 virtual_used vu MEMORY >= YES NO 0 0 # >#< starts a comment but comments are not saved across edits -------- |
The column name is identical to the first column displayed by the qconf -sq command. The shortcut column contains administrator-definable abbreviations for the full names in the first column. The user can supply either the full name or the shortcut in the request option of a qsub command.
The column requestable tells whether the resource attribute can be used in a qsub command. The administrator can, for example, disallow the cluster's users to request certain machines or queues for their jobs directly. The administrator can disallow direct requests by setting the entries qname, hostname, or both, to be unrequestable. Making queues or hosts unrequestable implies that feasible user requests can be met in general by multiple queues, which enforces the load balancing capabilities of the grid engine system.
The column relop defines the relational operator used to compute whether a queue or a host meets a user request. The comparison that is executed is as follows:
User_Request relop Queue/Host/... -Property |
If the result of the comparison is false, the user's job cannot be run in the queue or on the host. For example, let the queue q1 be configured with a soft CPU time limit of 100 seconds. Let the queue q2 be configured to provide 1000 seconds soft CPU time limit. See the queue_conf(5) and the setrlimit(2) man pages for a description of user process limits.
The columns consumable and default affect how the administrator declares consumable resources. See Consumable Resources in Sun N1 Grid Engine 6.1 Administration Guide.
The user requests consumables just like any other attribute. The grid engine system internal bookkeeping for the resources is different, however.
Assume that a user submits the following request:
% qsub -l s_cpu=0:5:0 nastran.sh |
The s_cpu=0:5:0 request asks for a queue that grants at least 5 minutes of soft limit CPU time. Therefore, only queues providing at least 5 minutes soft CPU runtime limit are set up properly to run the job. See the qsub(1) man page for details on the syntax.
The grid engine software considers workload information in the scheduling process only if more than one queue or host can run a job.