Sun N1 Grid Engine 6.1 Administration Guide

Chapter 6 Managing Resource Quotas

This chapter explains how to use the resource quotas feature of the N1 Grid Engine 6.1 software to limit resources by user, project, host, cluster queue, or parallel environment. For convenience, you can express these limits using user access lists, departments, or host groups.

This chapter contains the following information:

Resource Quota Overview

To prevent users from consuming all available resources, the N1 Grid Engine 6.1 software supports complex attributes that you can configure on a global, queue or host layer. While this layered resource management approach is powerful, the approach leaves gaps that become particularly important in large installations that consist of many different custom resources, user groups, and projects. The resource quota feature closes this gap by enabling you to manage these enterprise environments to the extent that you can control which project or department has to abdicate when single bottleneck resources run out.

The resource quota feature enables you to apply limits to several kinds of resources, several kinds of resource consumers, to all jobs in the cluster, and to combinations of consumers. In this context, resources are any defined complex attribute known by the N1 Grid Engine configuration. For more information about complex attributes, see the complex(5) man page. Resources can be slots, arch, mem_total, num_proc, swap_total, built-in resources, or any custom-defined resource like compiler_license. Resource consumers are (per) users, (per) queues, (per) hosts, (per) projects, and (per) parallel environments.

The resource quota feature provides a way for you to limit the resources that a consumer can use at any time. This limitation provides an indirect method to prioritize users, departments, and projects. To define directly the priorities by which a user should obtain a resource, use the resource urgency and share-based policies described in Configuring the Urgency Policy and Configuring the Share-Based Policy.

To limit resources through the N1 Grid Engine 6.1 software, use the qquota and qconf commands, or the QMON graphical interface. For more information, see the qquota(1) and qconf(1) man pages.

About Resource Quota Sets

Resource quota sets enable you to specify the maximum resource consumption for any job requests. Once you define the resource quota sets, the scheduler uses them to select the next possible jobs to be run by watching that the quotas will not be exceeded. The ultimate result of setting resource quotas is that only those jobs that do not exceed their resource quotas will be scheduled and run.

A resource quota set defines a maximum resource quota for a particular job request. All of the configured rule sets apply all of the time. If multiple resource quota sets are defined, the most restrictive set applies. Every resource quota set consists of one or more resource quota rules. These rules are evaluated in order, and the first rule that matches a specific request is used. A resource quota set always results in at most one effective resource quota rule for a specific request.

A resource quota set consists of the following information:

name – The name of the resource quota set.
enabled – A boolean value that indicates whether the resource set should be considered in scheduling decisions. If enabled is true, the resource quota set is active and will be considered for scheduling decisions. The default value is false.
description – An optional field that contains an arbitrary string that describes the set. The default value is NONE.
limit rule– Every resource quota set needs at least one limit rule, which is contained in the limit field. For example, the following limit rule limits all users together to 10 slots: limit users * to slots=10. The limit rule contains the following information:
- name – An optional name for the rule. If used, the name must be unique within the resource quota set.
- filter scope – The filter scope identifies the list of resource consumers to which the quota applies. A resource consumer contains a keyword followed by a comma-separated list of consumers. Use the following keywords: users, projects, queues (cluster queues), hosts or pes (parallel environments). An example of a resource consumer would be users {user1, user2}. An example of a filter scope might be users {user1, user2} hosts *. This defined filter scope limits user1 and user2 to the maximum number of the configured limit independently from the host.
  
  To include an expandable list in the resource quota definition, use braces {}around the resource consumer list.
  
  To exclude one of a specific resource type from a list, use the exclamation point,! (sometimes referred to as the “not” symbol).
- limit – An attribute-value pair that defines the actual limit for the resource. For example, virtual_free=2G. You can also combine pairs into a comma-separated list of attribute-value pairs. For example, virtual_free=2G,swap_free=1.5G.

Example 6–1 Sample Resource Quota Set

The following example resource quota set restricts user1 and user2 to two gigabytes of free virtual space on each host in the host group lx_hosts.

     {
        name         max_virtual_free_on_lx_hosts
        description  "resource quota for virtual_free restriction"
        enabled      true
        limit        users {user1,user2} hosts {@lx_host} to virtual_free=2g
     }

Static and Dynamic Resource Quotas

Resource quota rules always define a maximum value of a resource that can be used. In most cases, these values are static and equal for all matching filter scopes. Although you could define several different rules to apply to different scopes, you would then have several rules that are nearly identical. Instead of duplicating rules, you can instead define a dynamic limit.

A dynamic limit uses an algebraic expression to derive the rule limit value. The algebraic formula can reference a complex attribute whose value is used to calculate the resulting limit.

Example 6–2 Dynamic Limit Example

The following example illustrates the use of dynamic limits. Users are allowed to use 5 slots per CPU on all Linux hosts.

limit hosts {@linux_hosts} to slots=$num_proc*5

The value of num_proc is the number of processors on the host. The limit is calculated by the formula $num_proc*5, and can be different on each host. Expanding the example above, you could have the following resulting limits:

On a host that has two CPUs, users can use ten slots to run jobs.
On a host that has one CPU, users can use only five slots to run jobs.

Instead of num_proc, you could use any other complex attribute known for a host as either a load value or a consumable resource.

Managing Resource Quotas With QMON

The following task explains how to set resource quotas using the QMON graphical interface.

How to Set Resource Quotas Using QMON

In the QMON Main Control window, click the Resource Quota Configuration button.

Type the Resource Quota information in the text field.

Use the same syntax as you would for the qconf command, as illustrated in the following screen example.

Monitoring Resource Quota Utilization From the Command Line

Use the qquota command to view information about the current N1 Grid Engine resource quotas. The qquota command lists each resource quota that is being used at least once or that defines a static limit. For each applicable resource quota, qquota displays the following information:

Resource quota rule – The name of the rule set name and the name or number of the rule
Limit – The resource name, the number of available items for that resource, and the number of used items for that resource
Filter – The effective resource quota set filter, which results from applying the filter scope explained in About Resource Quota Sets.

The qquota command includes several options that you can use to limit the information to a specific host, cluster queue, project, parallel environments, resource or user. If you use no options, qquota displays information about resource sets that apply to the user name from which you invoke the command. For more information, see the qquota(1) man page.

Example 6–3 Sample `qquota` Command

The following example shows information about the resource quota sets that apply to user user1

$ qquota -u user1
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=1/2            users user1 hosts host2

Configuring Resource Quotas from the Command Line

Use the qconf command to add, modify, or delete resource quota sets and rules.

To add a resource quota set by invoking a text editor, use the following command:
$ qconf -arqs [name]
To add a set that is already defined in a file, use qconf -Arqs filename.
To modify information about a resource quota set by invoking an editor, use the following command:
$ qconf -mrqs [name]
To modify a set from information contained in a file, use qconf -Mrqs filename [name].

Note –
If you use the -mrqs or -Mrqs option without a name, the new rule set replace all the currently configured rule sets.
To delete a resource quota set, use the following command:
$ qconf -drqs [name_list]
To see a list of defined resource quota sets, use the following command:
$ qconf -srqsl
To view detailed information about a defined resource quota set, use the following command:
$ qconf -srqs [name_list]

For more information about qconf, see the qconf(1) man page.

Example

The following example shows how you can use the various commands for resource quotas. The rule set shown in Example 6–4 defines the following limit:

All users together should never take more than 20 slots.
All users should take at most 5 slots on all Linux hosts.
Every user is restricted to one slot per Linux host except user MyUser who is restricted to two slots, and all other slots on hosts are restricted to 0.
The host group @linux includes host1 and host2.

To configure the rule set, use one of the following forms of the qconf command:

qconf -arqs <rule-set-name> for each rule set
qconf -arqs to run all rule sets at once

After jobs are submitted for different users, the qstat command shows output similar to the example shown in Example 6–5.

Example 6–4 Rule Set

{
 name maxujobs
 limit users * to slots=20
}

{
 name max_linux
 limit users * hosts @linux to slots=5
}

{
 name max_per_host
 limit users MyUser hosts {@linux} to slots=2
 limit users {*} hosts {@linux} to slots=1
 limit users * hosts * to slots=0
}

Example 6–5 `qstat` Output

$ qstat
job-ID  prior   name       user       state submit/start at     queue       slots ja-task-ID 
---------------------------------------------------------------------------------------------
     27 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host1   1        
     29 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host1   1        
     30 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host2   1        
     26 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host2   1        
     28 0.55500 Sleeper    user1      r     02/21/2006 15:53:10 all.q@host2   1

Example 6–6 `qquota` Output

$ qquota # as user MyUser
resource quota rule    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=2/2            users MyUser hosts host2
max_per_host/1     slots=2/2            users MyUser hosts host1

$ qquota -h host2 # as user MyUser
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=2/2            users MyUser hosts host2

$ qquota -u user1
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=1/2            users user1 hosts host2

$ qquota -u *
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=2/2            users MyUser hosts host1
max_per_host/1     slots=2/2            users MyUser hosts host2
max_per_host/1     slots=1/2            users user1 hosts host2

Performance Considerations

Efficient Rule Sets

To provide the most efficient processing of jobs and resources in queues, put the most restrictive rule at the first position of a rule set. Following this convention helps the N1 Grid Engine scheduler to restrict the amount of suited queue instances in a particularly efficient manner, because the first rule is never shadowed by any subsequent rule in the same rule set and thus always stands for itself.

To illustrate this rule, consider an environment similar to the following:

Four queues named Q001-Q004
Four managed resources named F001-F004
Jobs that require a specific managed resource, such as F001, are constrained to run in the associated queue, such as Q001
Jobs are submitted into one of five projects P001-P005

In such an environment, you might define a single rule set as follows:

{
      name         30_for_each_project
      description  "not more than 30 per project"
      enabled      TRUE
      limit projects {*} queues Q001 to F001=30
      limit projects {*} queues Q002 to F002=30
      limit projects {*} queues Q003 to F003=30
      limit projects {*} queues Q004 to F004=30
      limit to F001=0,F002=0,F003=0,F004=0
   }

The single rule set limits the utilization of each managed resource to 30 for each project and constrains the jobs in eligible queues at the same time. This will work fine, but in a larger cluster with many hosts, the single rule set would become the cause of slow job dispatching.

To help the N1 Grid Engine scheduler to foreclose as many queue instances as possible during matchmaking, it is better to use four separate rule sets.

{
      name         30_for_each_project_in_Q001
      description  "not more than 30 per project of F001 in Q001"
      enabled      TRUE
      limit queues !Q001 to F001=0
      limit projects {*} queues Q001 to F001=30
   }

   {
      name         30_for_each_project_in_Q002
      description  "not more than 30 per project of F002 in Q002"
      enabled      TRUE
      limit queues !Q002 to F002=0
      limit projects {*} queues Q002 to F002=30
   }

   {
      name         30_for_each_project_in_Q003
      description  "not more than 30 per project of F003 in Q003"
      enabled      TRUE
      limit queues !Q003 to F003=0
      limit projects {*} queues Q003 to F003=30
   }

   {
      name         30_for_each_project_in_Q004
      description  "not more than 30 per project of F004 in Q004"
      enabled      TRUE
      limit queues !Q004 to F004=0
      limit projects {*} queues Q004 to F004=30
   }

These four rule sets constrain the very same per project resource quotas as the single rule set. However, the four rule sets can be processed much more efficiently due to unsuitable queue instances being shielded first . Consolidating these shields into a single resource quota set would not be doable in this case.

Note –

The purpose of the sample above is not to recommend one cluster queue per resource. In fact, the! opposite is true, because fewer queues finally always enable fewer, more powerful shields as shown here:

  {
      name         30_for_each_project_in_Q001
      description  "not more than 30 per project of F001/F002 in Q001"
      enabled      TRUE
      limit queues !Q001 to F001=0,F002=0
      limit projects {*} queues Q001 to F001=30,F002=30
   }

   {
      name         30_for_each_project_in_Q002
      description  "not more than 30 per project of F003/F004 in Q002"
      enabled      TRUE
      limit queues !Q002 to F003=0,F004=0
      limit projects {*} queues Q002 to F003=30,F004=30
   }

In this example, the queues are consolidated from Q001-Q004 down to Q001-Q002. However, this actually increases overall cluster utilization and throughput.