Sun N1 Grid Engine 6.1 Administration Guide

Chapter 6 Managing Resource Quotas

This chapter explains how to use the resource quotas feature of the N1 Grid Engine 6.1 software to limit resources by user, project, host, cluster queue, or parallel environment. For convenience, you can express these limits using user access lists, departments, or host groups.

This chapter contains the following information:

Resource Quota Overview

To prevent users from consuming all available resources, the N1 Grid Engine 6.1 software supports complex attributes that you can configure on a global, queue or host layer. While this layered resource management approach is powerful, the approach leaves gaps that become particularly important in large installations that consist of many different custom resources, user groups, and projects. The resource quota feature closes this gap by enabling you to manage these enterprise environments to the extent that you can control which project or department has to abdicate when single bottleneck resources run out.

The resource quota feature enables you to apply limits to several kinds of resources, several kinds of resource consumers, to all jobs in the cluster, and to combinations of consumers. In this context, resources are any defined complex attribute known by the N1 Grid Engine configuration. For more information about complex attributes, see the complex(5) man page. Resources can be slots, arch, mem_total, num_proc, swap_total, built-in resources, or any custom-defined resource like compiler_license. Resource consumers are (per) users, (per) queues, (per) hosts, (per) projects, and (per) parallel environments.

The resource quota feature provides a way for you to limit the resources that a consumer can use at any time. This limitation provides an indirect method to prioritize users, departments, and projects. To define directly the priorities by which a user should obtain a resource, use the resource urgency and share-based policies described in Configuring the Urgency Policy and Configuring the Share-Based Policy.

To limit resources through the N1 Grid Engine 6.1 software, use the qquota and qconf commands, or the QMON graphical interface. For more information, see the qquota(1) and qconf(1) man pages.

About Resource Quota Sets

Resource quota sets enable you to specify the maximum resource consumption for any job requests. Once you define the resource quota sets, the scheduler uses them to select the next possible jobs to be run by watching that the quotas will not be exceeded. The ultimate result of setting resource quotas is that only those jobs that do not exceed their resource quotas will be scheduled and run.

A resource quota set defines a maximum resource quota for a particular job request. All of the configured rule sets apply all of the time. If multiple resource quota sets are defined, the most restrictive set applies. Every resource quota set consists of one or more resource quota rules. These rules are evaluated in order, and the first rule that matches a specific request is used. A resource quota set always results in at most one effective resource quota rule for a specific request.

A resource quota set consists of the following information:


Example 6–1 Sample Resource Quota Set

The following example resource quota set restricts user1 and user2 to two gigabytes of free virtual space on each host in the host group lx_hosts.

     {
        name         max_virtual_free_on_lx_hosts
        description  "resource quota for virtual_free restriction"
        enabled      true
        limit        users {user1,user2} hosts {@lx_host} to virtual_free=2g
     }

Static and Dynamic Resource Quotas

Resource quota rules always define a maximum value of a resource that can be used. In most cases, these values are static and equal for all matching filter scopes. Although you could define several different rules to apply to different scopes, you would then have several rules that are nearly identical. Instead of duplicating rules, you can instead define a dynamic limit.

A dynamic limit uses an algebraic expression to derive the rule limit value. The algebraic formula can reference a complex attribute whose value is used to calculate the resulting limit.


Example 6–2 Dynamic Limit Example

The following example illustrates the use of dynamic limits. Users are allowed to use 5 slots per CPU on all Linux hosts.


limit hosts {@linux_hosts} to slots=$num_proc*5

The value of num_proc is the number of processors on the host. The limit is calculated by the formula $num_proc*5, and can be different on each host. Expanding the example above, you could have the following resulting limits:

Instead of num_proc, you could use any other complex attribute known for a host as either a load value or a consumable resource.


Managing Resource Quotas With QMON

The following task explains how to set resource quotas using the QMON graphical interface.

ProcedureHow to Set Resource Quotas Using QMON

  1. In the QMON Main Control window, click the Resource Quota Configuration button.

    The Resource Quota Configuration button is the
sixth button from the left on the bottom row.
  2. Type the Resource Quota information in the text field.

    Use the same syntax as you would for the qconf command, as illustrated in the following screen example.

    The Resource Quotas page shows a text box, with
OK, Cancel, and Help buttons on the right.

Monitoring Resource Quota Utilization From the Command Line

Use the qquota command to view information about the current N1 Grid Engine resource quotas. The qquota command lists each resource quota that is being used at least once or that defines a static limit. For each applicable resource quota, qquota displays the following information:

The qquota command includes several options that you can use to limit the information to a specific host, cluster queue, project, parallel environments, resource or user. If you use no options, qquota displays information about resource sets that apply to the user name from which you invoke the command. For more information, see the qquota(1) man page.


Example 6–3 Sample qquota Command

The following example shows information about the resource quota sets that apply to user user1


$ qquota -u user1
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=1/2            users user1 hosts host2

Configuring Resource Quotas from the Command Line

Use the qconf command to add, modify, or delete resource quota sets and rules.

For more information about qconf, see the qconf(1) man page.

Example

The following example shows how you can use the various commands for resource quotas. The rule set shown in Example 6–4 defines the following limit:

To configure the rule set, use one of the following forms of the qconf command:

After jobs are submitted for different users, the qstat command shows output similar to the example shown in Example 6–5.


Example 6–4 Rule Set

{
 name maxujobs
 limit users * to slots=20
}

{
 name max_linux
 limit users * hosts @linux to slots=5
}

{
 name max_per_host
 limit users MyUser hosts {@linux} to slots=2
 limit users {*} hosts {@linux} to slots=1
 limit users * hosts * to slots=0
}


Example 6–5 qstat Output


$ qstat
job-ID  prior   name       user       state submit/start at     queue       slots ja-task-ID 
---------------------------------------------------------------------------------------------
     27 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host1   1        
     29 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host1   1        
     30 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host2   1        
     26 0.55500 Sleeper    MyUser     r     02/21/2006 15:53:10 all.q@host2   1        
     28 0.55500 Sleeper    user1      r     02/21/2006 15:53:10 all.q@host2   1        


Example 6–6 qquota Output


$ qquota # as user MyUser
resource quota rule    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=2/2            users MyUser hosts host2
max_per_host/1     slots=2/2            users MyUser hosts host1

$ qquota -h host2 # as user MyUser
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=2/2            users MyUser hosts host2

$ qquota -u user1
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=1/2            users user1 hosts host2

$ qquota -u *
resource quota    limit                filter
--------------------------------------------------------------------------------
maxujobs/1         slots=5/20           -
max_linux/1        slots=5/5            hosts @linux
max_per_host/1     slots=2/2            users MyUser hosts host1
max_per_host/1     slots=2/2            users MyUser hosts host2
max_per_host/1     slots=1/2            users user1 hosts host2

Performance Considerations

Efficient Rule Sets

To provide the most efficient processing of jobs and resources in queues, put the most restrictive rule at the first position of a rule set. Following this convention helps the N1 Grid Engine scheduler to restrict the amount of suited queue instances in a particularly efficient manner, because the first rule is never shadowed by any subsequent rule in the same rule set and thus always stands for itself.

To illustrate this rule, consider an environment similar to the following:

In such an environment, you might define a single rule set as follows:


{
      name         30_for_each_project
      description  "not more than 30 per project"
      enabled      TRUE
      limit projects {*} queues Q001 to F001=30
      limit projects {*} queues Q002 to F002=30
      limit projects {*} queues Q003 to F003=30
      limit projects {*} queues Q004 to F004=30
      limit to F001=0,F002=0,F003=0,F004=0
   } 

The single rule set limits the utilization of each managed resource to 30 for each project and constrains the jobs in eligible queues at the same time. This will work fine, but in a larger cluster with many hosts, the single rule set would become the cause of slow job dispatching.

To help the N1 Grid Engine scheduler to foreclose as many queue instances as possible during matchmaking, it is better to use four separate rule sets.


{
      name         30_for_each_project_in_Q001
      description  "not more than 30 per project of F001 in Q001"
      enabled      TRUE
      limit queues !Q001 to F001=0
      limit projects {*} queues Q001 to F001=30
   }

   {
      name         30_for_each_project_in_Q002
      description  "not more than 30 per project of F002 in Q002"
      enabled      TRUE
      limit queues !Q002 to F002=0
      limit projects {*} queues Q002 to F002=30
   }

   {
      name         30_for_each_project_in_Q003
      description  "not more than 30 per project of F003 in Q003"
      enabled      TRUE
      limit queues !Q003 to F003=0
      limit projects {*} queues Q003 to F003=30
   }

   {
      name         30_for_each_project_in_Q004
      description  "not more than 30 per project of F004 in Q004"
      enabled      TRUE
      limit queues !Q004 to F004=0
      limit projects {*} queues Q004 to F004=30
   } 

These four rule sets constrain the very same per project resource quotas as the single rule set. However, the four rule sets can be processed much more efficiently due to unsuitable queue instances being shielded first . Consolidating these shields into a single resource quota set would not be doable in this case.


Note –

The purpose of the sample above is not to recommend one cluster queue per resource. In fact, the! opposite is true, because fewer queues finally always enable fewer, more powerful shields as shown here:


  {
      name         30_for_each_project_in_Q001
      description  "not more than 30 per project of F001/F002 in Q001"
      enabled      TRUE
      limit queues !Q001 to F001=0,F002=0
      limit projects {*} queues Q001 to F001=30,F002=30
   }

   {
      name         30_for_each_project_in_Q002
      description  "not more than 30 per project of F003/F004 in Q002"
      enabled      TRUE
      limit queues !Q002 to F003=0,F004=0
      limit projects {*} queues Q002 to F003=30,F004=30
   }

In this example, the queues are consolidated from Q001-Q004 down to Q001-Q002. However, this actually increases overall cluster utilization and throughput.