Sun N1 Grid Engine 6.1 Administration Guide

Resource Quota Overview

To prevent users from consuming all available resources, the N1 Grid Engine 6.1 software supports complex attributes that you can configure on a global, queue or host layer. While this layered resource management approach is powerful, the approach leaves gaps that become particularly important in large installations that consist of many different custom resources, user groups, and projects. The resource quota feature closes this gap by enabling you to manage these enterprise environments to the extent that you can control which project or department has to abdicate when single bottleneck resources run out.

The resource quota feature enables you to apply limits to several kinds of resources, several kinds of resource consumers, to all jobs in the cluster, and to combinations of consumers. In this context, resources are any defined complex attribute known by the N1 Grid Engine configuration. For more information about complex attributes, see the complex(5) man page. Resources can be slots, arch, mem_total, num_proc, swap_total, built-in resources, or any custom-defined resource like compiler_license. Resource consumers are (per) users, (per) queues, (per) hosts, (per) projects, and (per) parallel environments.

The resource quota feature provides a way for you to limit the resources that a consumer can use at any time. This limitation provides an indirect method to prioritize users, departments, and projects. To define directly the priorities by which a user should obtain a resource, use the resource urgency and share-based policies described in Configuring the Urgency Policy and Configuring the Share-Based Policy.

To limit resources through the N1 Grid Engine 6.1 software, use the qquota and qconf commands, or the QMON graphical interface. For more information, see the qquota(1) and qconf(1) man pages.

About Resource Quota Sets

Resource quota sets enable you to specify the maximum resource consumption for any job requests. Once you define the resource quota sets, the scheduler uses them to select the next possible jobs to be run by watching that the quotas will not be exceeded. The ultimate result of setting resource quotas is that only those jobs that do not exceed their resource quotas will be scheduled and run.

A resource quota set defines a maximum resource quota for a particular job request. All of the configured rule sets apply all of the time. If multiple resource quota sets are defined, the most restrictive set applies. Every resource quota set consists of one or more resource quota rules. These rules are evaluated in order, and the first rule that matches a specific request is used. A resource quota set always results in at most one effective resource quota rule for a specific request.

A resource quota set consists of the following information:

name – The name of the resource quota set.
enabled – A boolean value that indicates whether the resource set should be considered in scheduling decisions. If enabled is true, the resource quota set is active and will be considered for scheduling decisions. The default value is false.
description – An optional field that contains an arbitrary string that describes the set. The default value is NONE.
limit rule– Every resource quota set needs at least one limit rule, which is contained in the limit field. For example, the following limit rule limits all users together to 10 slots: limit users * to slots=10. The limit rule contains the following information:
- name – An optional name for the rule. If used, the name must be unique within the resource quota set.
- filter scope – The filter scope identifies the list of resource consumers to which the quota applies. A resource consumer contains a keyword followed by a comma-separated list of consumers. Use the following keywords: users, projects, queues (cluster queues), hosts or pes (parallel environments). An example of a resource consumer would be users {user1, user2}. An example of a filter scope might be users {user1, user2} hosts *. This defined filter scope limits user1 and user2 to the maximum number of the configured limit independently from the host.
  
  To include an expandable list in the resource quota definition, use braces {}around the resource consumer list.
  
  To exclude one of a specific resource type from a list, use the exclamation point,! (sometimes referred to as the “not” symbol).
- limit – An attribute-value pair that defines the actual limit for the resource. For example, virtual_free=2G. You can also combine pairs into a comma-separated list of attribute-value pairs. For example, virtual_free=2G,swap_free=1.5G.

Example 6–1 Sample Resource Quota Set

The following example resource quota set restricts user1 and user2 to two gigabytes of free virtual space on each host in the host group lx_hosts.

     {
        name         max_virtual_free_on_lx_hosts
        description  "resource quota for virtual_free restriction"
        enabled      true
        limit        users {user1,user2} hosts {@lx_host} to virtual_free=2g
     }

Static and Dynamic Resource Quotas

Resource quota rules always define a maximum value of a resource that can be used. In most cases, these values are static and equal for all matching filter scopes. Although you could define several different rules to apply to different scopes, you would then have several rules that are nearly identical. Instead of duplicating rules, you can instead define a dynamic limit.

A dynamic limit uses an algebraic expression to derive the rule limit value. The algebraic formula can reference a complex attribute whose value is used to calculate the resulting limit.

Example 6–2 Dynamic Limit Example

The following example illustrates the use of dynamic limits. Users are allowed to use 5 slots per CPU on all Linux hosts.

limit hosts {@linux_hosts} to slots=$num_proc*5

The value of num_proc is the number of processors on the host. The limit is calculated by the formula $num_proc*5, and can be different on each host. Expanding the example above, you could have the following resulting limits:

On a host that has two CPUs, users can use ten slots to run jobs.
On a host that has one CPU, users can use only five slots to run jobs.

Instead of num_proc, you could use any other complex attribute known for a host as either a load value or a consumable resource.