Consumable resources provide an efficient way to manage limited resources such as available memory, free space on a file system, network bandwidth, or floating software licenses. Consumable resources are also called consumables. The total available capacity of a consumable is defined by the administrator. The consumption of the corresponding resource is monitored by grid engine software internal bookkeeping. The grid engine system accounts for the consumption of this resource for all running jobs. Jobs are dispatched only if the internal bookkeeping indicates that sufficient consumable resources are available.
Consumables can be combined with default load parameters or user-defined load parameters. Load values can be reported for consumable attributes. Conversely, the Consumable flag can be set for load attributes. Load measures the availability of the resource. Consumable resource management takes both the load and the internal bookkeeping into account, ensuring that neither exceeds a given limit. For more information about load parameters, see Load Parameters.
To enable consumable resource management, you must define the total capacity of a resource. You can define resource capacity globally for the cluster, for specified hosts, and for specified queues. These categories can supersede each other in the given order. Thus a host can restrict availability of a global resource, and a queue can restrict host resources and global resources.
You define resource capacities by using the complex_values attribute in the queue and host configurations. The complex_values definition of the global host specifies global cluster consumable settings. For more information, see the host_conf(5) and queue_conf(5) man pages, as well as Configuring Queues and Configuring Hosts.
To each consumable attribute in a complex_values list, a value is assigned that denotes the maximum available amount for that resource. The internal bookkeeping subtracts from this total the assumed resource consumption by all running jobs as expressed through the jobs' resource requests.
qsub -l mem=100M -pe make=8
Memory usage is split across the queues and hosts on which the job runs. If four tasks run on host A and four tasks run on host B, the job consumes 400 Mbytes on each host.
Only numeric attributes can be configured as consumables. Numeric attributes are attributes whose type is INT, DOUBLE, MEMORY, or TIME.
In the QMON Main Control window, click the Complex Configuration button. The Complex Configuration dialog box appears, as shown in Figure 3–1.
To enable the consumable management for an attribute, set the Consumable flag for the attribute in the complex configuration. For example, the following figure shows that the Consumable flag is set for the virtual_free memory resource.
Set up other consumable resources, guided by the examples detailed in the following sections:
Then, for each queue or host for which you want the grid engine software to do the required capacity planning, you must define the capacity in a complex_values list. An example is shown in the following figure, where 1 Gbyte of virtual memory is defined as the capacity value of the current host.
The virtual memory requirements of all jobs running concurrently in any queue on that host are accumulated. The requirements are then subtracted from the capacity of 1 Gbyte to determine available virtual memory. If a job request for virtual_free exceeds the available amount, the job is not dispatched to a queue on that host.
Jobs can be forced to request a resource and thus to specify their assumed consumption through the FORCED value of the Requestable parameter.
For consumable attributes that are not explicitly requested by the job, the administrator can predefine a default value for resource consumption. Doing so is meaningful only if requesting the attribute is not forced, as explained in the previous note. 200 Mbytes is set as the default value.
Use the following examples to guide you in setting up consumable resources for your site.
Suppose you are using the software package pam-crash in your cluster, and you have access to 10 floating licenses. You can use pam-crash on every system as long as no more than 10 invocations of the software are active. The goal is to configure the grid engine system in a way that prevents scheduling pam-crash jobs while all 10 licenses are occupied by other running pam-crash jobs.
With consumable resources, you can achieve this goal easily. First you must add the number of available pam-crash licenses as a global consumable resource to the complex configuration.
The name of the consumable attribute is set to pam-crash. You can use pc as a shortcut in the qalter -l, qselect -l, qsh -l, qstat -l, or qsub -l commands instead.
The attribute type is defined to be an integer counter.
The Requestable flag is set to FORCED. This setting specifies that users must request how many pam-crash licenses that their job will occupy when the job is submitted.
The Consumable flag specifies that the attribute is a consumable resource.
The setting Default is irrelevant since Requestable is set to FORCED, which means that a request value must be received for this attribute with any job.
Consumables receive their value from the global, host, or queue configurations through the complex_values lists. See the host_conf(5) and queue_conf(5) man pages, as well as Configuring Queues and Configuring Hosts.
To activate resource planning for this attribute and for the cluster, the number of available pam-crash licenses must be defined in the global host configuration.
The value for the attribute pam-crash is set to 10, corresponding to 10 floating licenses.
Assume that a user submits the following job:
% qsub -l pc=1 pam-crash.sh
The job starts only if fewer than 10 pam-crash licenses are currently occupied. The job can run anywhere in the cluster, however, and the job occupies one pam-crash license throughout its run time.
One of your hosts in the cluster might not be able to be included in the floating license. For example, you might not have pam-crash binaries for that host. In such a case, you can exclude the host from the pam-crash license management. You can exclude the host by setting to zero the capacity that is related to that host for the consumable attribute pam-crash. Use the Execution Host tab of the Host Configuration dialog box.
The pam-crash attribute is implicitly available to the execution host because the global attributes of the complex are inherited by all execution hosts. By setting the capacity to zero, you could also restrict the number of licenses that a host can manage to a nonzero value such as two. In this case, a maximum of two pam-crash jobs could coexist on that host.
Similarly, you might want to prevent a certain queue from running pam-crash jobs. For example, the queue might be an express queue with memory and CPU-time limits not suitable for pam-crash. In this case, set the corresponding capacity to zero in the queue configuration, as shown in the following figure.
Administrators must often tune a system to avoid performance degradation caused by memory oversubscription, and consequently swapping of a machine. The grid engine software can support you in this task through the Consumable Resources facility.
The standard load parameter virtual_free reports the available free virtual memory, that is, the combination of available swap space and the available physical memory. To avoid swapping, the use of swap space must be minimized. In an ideal case, all the memory required by all processes running on a host should fit into physical memory.
The grid engine software can guarantee the availability of required memory for all jobs started through the grid engine system, given the following assumptions and configurations:
virtual_free is configured as a consumable resource, and its capacity on each host is set to the available physical memory, or lower.
Jobs request their anticipated memory usage, and the value that jobs request is not exceeded during run time.
In the virtual_free resource definition example, the Requestable flag is set to YES instead of to FORCED, as in the example of a global configuration. This means that users need not indicate the memory requirements of their jobs. The value in the Default field is used if an explicit memory request is missing. The value of 1 Gbyte as default request in this case means that a job without a request is assumed to occupy all available physical memory.
virtual_free is one of the standard load parameters of the grid engine system. The additional availability of recent memory statistics is taken into account automatically by the system in the virtual memory capacity planning. If the load report for free virtual memory falls below the value obtained by grid engine software internal bookkeeping, the load value is used to avoid memory oversubscription. Differences in the reported load values and the internal bookkeeping can occur easily if jobs are started without using the grid engine system.
If you run different job classes with different memory requirements on one machine, you might want to partition the memory that these job classes use. This functionality is called space sharing. You can accomplish this functionality by configuring a queue for each job class. Then you assign to each queue a portion of the total memory on that host.
In the example, the queue configuration attaches half of the total memory that is available to host carc to the queue fast.q for the host carc. Hence the accumulated memory consumption of all jobs that are running in queue fast.q on host carc cannot exceed 500 Mbytes. Jobs in other queues are not taken into account. Nonetheless, the total memory consumption of all running jobs on host carc cannot exceed 1 Gbyte.
The attribute virtual_free is available to all queues through inheritance from the complex.
Users might submit jobs to a system configured similarly to the example in either of the following forms:
% qsub -l vf=100M honest.sh % qsub dont_care.sh
The job submitted by the first command can be started as soon as at least 100 Mbytes of memory are available. This amount is taken into account in the capacity planning for the virtual_free consumable resource. The second job runs only if no other job is on the system, as the second job implicitly requests all the available memory. In addition, the second job cannot run in queue fast.q because the job exceeds the queue's memory capacity.
Some applications need to manipulate huge data sets stored in files. Such applications therefore depend on the availability of sufficient disk space throughout their run time. This requirement is similar to the space sharing of available memory, as discussed in the preceding example. The main difference is that the grid engine system does not provide free disk space as one of its standard load parameters. Free disk space is not a standard load parameter because disks are usually partitioned into file systems in a site-specific way. Site-specific partitioning does not allow identifying the file system of interest automatically.
First, the attribute must be configured as a consumable resource, as shown in the following figure.
In the case of local host file systems, a reasonable capacity definition for the disk space consumable can be put in the host configuration, as shown in the following figure.
Submission of jobs to a grid engine system that is configured as described here works similarly to the previous examples:
% qsub -l hf=5G big-sort.sh
The reason the h_fsize attribute is recommended here is that h_fsize also is used as the hard file size limit in the queue configuration. The file size limit restricts the ability of jobs to create files that are larger than what is specified during job submission. The qsub command in this example specifies a file size limit of 5 Gbytes. If the job does not request the attribute, the corresponding value from the queue configuration or host configuration is used. If the Requestable flag for h_fsize is set to FORCED in the example, a request must be included in the qsub command. If the Requestable flag is not set, a request is optional in the qsub command.
By using the queue limit as the consumable resource, you control requests that the user specifies instead of the real resource consumption by the job scripts. Any violation of the limit is sanctioned, which eventually aborts the job. The queue limit ensures that the resource requests on which the grid engine system internal capacity planning is based are reliable. See the queue_conf(5) and the setrlimit(2) man pages for details.
Some operating systems provide only per-process file size limits. In this case, a job might create multiple files with a size up to the limit. On systems that support per-job file size limitation, the grid engine system uses this functionality with the h_fsize attribute. See the queue_conf(5) man page for further details.
You might want applications that are not submitted to the grid engine system to occupy disk space concurrently. If so, the internal bookkeeping might not be sufficient to prevent application failure due to lack of disk space. To avoid this problem, you can periodically receive statistics about disk space usage, which indicates total disk space consumption, including the one occurring outside the grid engine system.
The load sensor interface enables you to enhance the set of standard load parameters with site-specific information, such as the available disk space on a file system. See Adding Site-Specific Load Parameters for more information.
By adding an appropriate load sensor and reporting free disk space for h_fsize, you can combine consumable resource management and resource availability statistics. The grid engine system compares job requirements for disk space with the available capacity and with the most recent reported load value. Available capacity is derived from the internal resource planning. Jobs get dispatched to a host only if both criteria are met.