This chapter describes how to configure resource attribute definitions. Resource attribute definitions are stored in an entity called the grid engine system complex. In addition to background information relating to the complex and its associated concepts, this chapter provides detailed instructions on how to accomplish the following tasks:
The complex configuration provides all pertinent information about the resource attributes users can request for jobs with the qsub -l or qalter -l commands. The complex configuration also provides information about how the grid engine system should interpret these resource attributes.
The complex also builds the framework for the system's consumable resources facility. The resource attributes that are defined in the complex can be attached to the global cluster, to a host, or to a queue instance. The attached attribute identifies a resource with the associated capability. During the scheduling process, the availability of resources and the job requirements are taken into account. The grid engine system also performs the bookkeeping and the capacity planning that is required to prevent oversubscription of consumable resources.
Typical consumable resource attributes include:
Available free memory
Unoccupied licenses of a software package
Free disk space
Available bandwidth on a network connection
Attribute definitions in the grid engine complex define how resource attributes should be interpreted.
Name of the attribute
Shortcut to reference the attribute name
Value type of the attribute, for example, STRING or TIME
Relational operator used by the scheduler
Requestable flag, which determines whether users can request the attribute for a job
Consumable flag, which identifies the attribute as a consumable resource
Default request value that is taken into account for consumable attributes if jobs do not explicitly specify a request for the attribute
Urgency value, which determines job priorities on a per resource basis
Use the QMON Complex Configuration dialog box, which is shown in Figure 3–1, to define complex resource attributes.
In the QMON Main Control window, click the Complex Configuration button. The Complex Configuration dialog box appears.
The Complex Configuration dialog box enables you to add, modify, or delete complex resource attributes.
To add a new attribute, first make sure that no line in the Attributes table is selected. In the fields above the Attributes table, type or select the values that you want, and then click Add.
If you want to add a new attribute and an existing attribute is selected, you must clear the selection. To deselect a highlighted attribute, hold down the Control key and click mouse button 1.
You can add a new attribute by copying an existing attribute and then modifying it. Make sure that the attribute name and its shortcut are unique.
To modify an attribute listed in the Attributes table, select it. The values of the selected attribute are displayed above the Attributes table. Change the attribute values, and then click Modify.
To save configuration changes to a file, click Save. To load values from a file into the complex configuration, click Load, and then select the name of a file from the list that appears.
To delete an attribute in the Attribute table, select it, and then click Delete.
See the complex(5) man page for details about the meaning of the rows and columns in the table.
To register your new or modified complex configuration with sge_qmaster, click Commit.
Resource attributes can be used in the following ways:
As queue resource attributes
As host resource attributes
As global resource attributes
A set of default resource attributes is already attached to each queue and host. Default resource attributes are built in to the system and cannot be deleted, nor can their type be changed.
User-defined resource attributes must first be defined in the complex before you can assign them to a queue instance, a host, or the global cluster. When you assign a resource attribute to one of these targets, you specify a value for the attribute.
The following sections describe each attribute type in detail.
Default queue resource attributes are a set of parameters that are defined in the queue configuration. These parameters are described in the queue_conf(5) man page.
You can add new resource attributes to the default attributes. New attributes are attached only to the queue instances that you modify. When the configuration of a particular queue instance references a resource attribute that is defined in the complex, that queue configuration provides the values for the attribute definition. For details about queue configuration see Configuring Queues.
For example, the queue configuration value h_vmem is used for the virtual memory size limit. This value limits the amount of total memory that each job can consume. An entry in the complex_values list of the queue configuration defines the total available amount of virtual memory on a host or assigned to a queue. For detailed information about consumable resources, see Consumable Resources.
Host resource attributes are parameters that are intended to be managed on a host basis.
The default host-related attributes are load values. You can add new resource attributes to the default attributes, as described earlier in Queue Resource Attributes.
Every sge_execd periodically reports load to sge_qmaster. The reported load values are either the standard load values such as the CPU load average, or the load values defined by the administrator, as described in Load Parameters.
The definitions of the standard load values are part of the default host resource attributes, whereas administrator-defined load values require extending the host resource attributes.
Host-related attributes are commonly extended to include nonstandard load parameters. Host-related attributes are also extended to manage host-related resources such as the number of software licenses that are assigned to a host, or the available disk space on a host's local file system.
If host–related attributes are associated with a host or with a queue instance on that host, a concrete value for a particular host resource attribute is determined by one of the following items:
The queue configuration, if the attribute is also assigned to the queue configuration
A reported load value
The explicit definition of a value in the complex_values entry of the corresponding host configuration. For details, see Configuring Hosts.
In some cases, none of these values are available. For example, say the value is supposed to be a load parameter, but sge_execd does not report a load value for the parameter. In such cases, the attribute is not defined, and the qstat –F command shows that the attribute is not applicable.
For example, the total free virtual memory attribute h_vmem is defined in the queue configuration as limit and is also reported as a standard load parameter. The total available amount of virtual memory on a host can be defined in the complex_values list of that host. The total available amount of virtual memory attached to a queue instance on that host can be defined in the complex_values list of that queue instance. Together with defining h_vmem as a consumable resource, you can efficiently exploit memory of a machine without risking memory oversubscription, which often results in reduced system performance that is caused by swapping. For more information about consumable resources, see Consumable Resources.
Only the Shortcut, Relation, Requestable, Consumable, and Default columns can be changed for the default resource attributes. No default attributes can be deleted.
Global resource attributes are cluster-wide resource attributes, such as available network bandwidth of a file server or the free disk space on a network-wide available file system.
Global resource attributes can also be associated with load reports if the corresponding load report contains the GLOBAL identifier, as described in Load Parameters. Global load values can be reported from any host in the cluster. No global load values are reported by default, therefore there are no default global resource attributes.
Concrete values for global resource attributes are determined by the following items:
Global load reports.
Explicit definition in the complex_values parameter of the global host configuration. See Configuring Hosts.
In association with a particular host or queue and an explicit definition in the corresponding complex_values lists.
Sometimes none of these cases apply. For example, a load value might not yet be reported. In such cases, the attribute does not exist.
By adding resource attributes to the complex, the administrator can extend the set of attributes managed by thegrid engine system. The administrator can also restrict the influence of user-defined attributes to particular queues, hosts, or both.
User-defined attributes are a named collection of attributes with the corresponding definitions as to how the grid engine software is to handle these attributes. You can attach one or more user-defined attributes to a queue, to a host, or globally to all hosts in the cluster. Use the complex_values parameter for the queue configuration and the host configuration. For more information, see Configuring Queues and Configuring Hosts. The attributes defined become available to the queue and to the host, respectively, in addition to the default resource attributes.
For example, say the user-defined resource attributes permas, pamcrash, and nastran, shown in the following figure, are defined.
For at least one or more queues, add the resource attributes to the list of associated user-defined attributes as shown in the Complex tab of the Modify queue-name dialog box. For details on how to configure queues, see Configuring Queues and its related sections.
Then the displayed queue is configured to manage up to 10 licenses of the software package permas. Furthermore, the attribute permas becomes requestable for jobs, as expressed in the Available Resources list in the Requested Resources dialog box.
For details about how to submit jobs, see Chapter 3, Submitting Jobs, in Sun N1 Grid Engine 6.1 User’s Guide.
Alternatively, the user could submit jobs from the command line and could request attributes as follows:
% qsub -l pm=1 permas.sh
You can use the pm shortcut instead of the full attribute name permas.
Consequently, the only eligible queues for these jobs are the queues that are associated with the user-defined resource attributes and that have permas licenses configured and available.
Consumable resources provide an efficient way to manage limited resources such as available memory, free space on a file system, network bandwidth, or floating software licenses. Consumable resources are also called consumables. The total available capacity of a consumable is defined by the administrator. The consumption of the corresponding resource is monitored by grid engine software internal bookkeeping. The grid engine system accounts for the consumption of this resource for all running jobs. Jobs are dispatched only if the internal bookkeeping indicates that sufficient consumable resources are available.
Consumables can be combined with default load parameters or user-defined load parameters. Load values can be reported for consumable attributes. Conversely, the Consumable flag can be set for load attributes. Load measures the availability of the resource. Consumable resource management takes both the load and the internal bookkeeping into account, ensuring that neither exceeds a given limit. For more information about load parameters, see Load Parameters.
To enable consumable resource management, you must define the total capacity of a resource. You can define resource capacity globally for the cluster, for specified hosts, and for specified queues. These categories can supersede each other in the given order. Thus a host can restrict availability of a global resource, and a queue can restrict host resources and global resources.
You define resource capacities by using the complex_values attribute in the queue and host configurations. The complex_values definition of the global host specifies global cluster consumable settings. For more information, see the host_conf(5) and queue_conf(5) man pages, as well as Configuring Queues and Configuring Hosts.
To each consumable attribute in a complex_values list, a value is assigned that denotes the maximum available amount for that resource. The internal bookkeeping subtracts from this total the assumed resource consumption by all running jobs as expressed through the jobs' resource requests.
qsub -l mem=100M -pe make=8
Memory usage is split across the queues and hosts on which the job runs. If four tasks run on host A and four tasks run on host B, the job consumes 400 Mbytes on each host.
Only numeric attributes can be configured as consumables. Numeric attributes are attributes whose type is INT, DOUBLE, MEMORY, or TIME.
In the QMON Main Control window, click the Complex Configuration button. The Complex Configuration dialog box appears, as shown in Figure 3–1.
To enable the consumable management for an attribute, set the Consumable flag for the attribute in the complex configuration. For example, the following figure shows that the Consumable flag is set for the virtual_free memory resource.
Set up other consumable resources, guided by the examples detailed in the following sections:
Then, for each queue or host for which you want the grid engine software to do the required capacity planning, you must define the capacity in a complex_values list. An example is shown in the following figure, where 1 Gbyte of virtual memory is defined as the capacity value of the current host.
The virtual memory requirements of all jobs running concurrently in any queue on that host are accumulated. The requirements are then subtracted from the capacity of 1 Gbyte to determine available virtual memory. If a job request for virtual_free exceeds the available amount, the job is not dispatched to a queue on that host.
Jobs can be forced to request a resource and thus to specify their assumed consumption through the FORCED value of the Requestable parameter.
For consumable attributes that are not explicitly requested by the job, the administrator can predefine a default value for resource consumption. Doing so is meaningful only if requesting the attribute is not forced, as explained in the previous note. 200 Mbytes is set as the default value.
Use the following examples to guide you in setting up consumable resources for your site.
Suppose you are using the software package pam-crash in your cluster, and you have access to 10 floating licenses. You can use pam-crash on every system as long as no more than 10 invocations of the software are active. The goal is to configure the grid engine system in a way that prevents scheduling pam-crash jobs while all 10 licenses are occupied by other running pam-crash jobs.
With consumable resources, you can achieve this goal easily. First you must add the number of available pam-crash licenses as a global consumable resource to the complex configuration.
The name of the consumable attribute is set to pam-crash. You can use pc as a shortcut in the qalter -l, qselect -l, qsh -l, qstat -l, or qsub -l commands instead.
The attribute type is defined to be an integer counter.
The Requestable flag is set to FORCED. This setting specifies that users must request how many pam-crash licenses that their job will occupy when the job is submitted.
The Consumable flag specifies that the attribute is a consumable resource.
The setting Default is irrelevant since Requestable is set to FORCED, which means that a request value must be received for this attribute with any job.
Consumables receive their value from the global, host, or queue configurations through the complex_values lists. See the host_conf(5) and queue_conf(5) man pages, as well as Configuring Queues and Configuring Hosts.
To activate resource planning for this attribute and for the cluster, the number of available pam-crash licenses must be defined in the global host configuration.
The value for the attribute pam-crash is set to 10, corresponding to 10 floating licenses.
Assume that a user submits the following job:
% qsub -l pc=1 pam-crash.sh
The job starts only if fewer than 10 pam-crash licenses are currently occupied. The job can run anywhere in the cluster, however, and the job occupies one pam-crash license throughout its run time.
One of your hosts in the cluster might not be able to be included in the floating license. For example, you might not have pam-crash binaries for that host. In such a case, you can exclude the host from the pam-crash license management. You can exclude the host by setting to zero the capacity that is related to that host for the consumable attribute pam-crash. Use the Execution Host tab of the Host Configuration dialog box.
The pam-crash attribute is implicitly available to the execution host because the global attributes of the complex are inherited by all execution hosts. By setting the capacity to zero, you could also restrict the number of licenses that a host can manage to a nonzero value such as two. In this case, a maximum of two pam-crash jobs could coexist on that host.
Similarly, you might want to prevent a certain queue from running pam-crash jobs. For example, the queue might be an express queue with memory and CPU-time limits not suitable for pam-crash. In this case, set the corresponding capacity to zero in the queue configuration, as shown in the following figure.
Administrators must often tune a system to avoid performance degradation caused by memory oversubscription, and consequently swapping of a machine. The grid engine software can support you in this task through the Consumable Resources facility.
The standard load parameter virtual_free reports the available free virtual memory, that is, the combination of available swap space and the available physical memory. To avoid swapping, the use of swap space must be minimized. In an ideal case, all the memory required by all processes running on a host should fit into physical memory.
The grid engine software can guarantee the availability of required memory for all jobs started through the grid engine system, given the following assumptions and configurations:
virtual_free is configured as a consumable resource, and its capacity on each host is set to the available physical memory, or lower.
Jobs request their anticipated memory usage, and the value that jobs request is not exceeded during run time.
In the virtual_free resource definition example, the Requestable flag is set to YES instead of to FORCED, as in the example of a global configuration. This means that users need not indicate the memory requirements of their jobs. The value in the Default field is used if an explicit memory request is missing. The value of 1 Gbyte as default request in this case means that a job without a request is assumed to occupy all available physical memory.
virtual_free is one of the standard load parameters of the grid engine system. The additional availability of recent memory statistics is taken into account automatically by the system in the virtual memory capacity planning. If the load report for free virtual memory falls below the value obtained by grid engine software internal bookkeeping, the load value is used to avoid memory oversubscription. Differences in the reported load values and the internal bookkeeping can occur easily if jobs are started without using the grid engine system.
If you run different job classes with different memory requirements on one machine, you might want to partition the memory that these job classes use. This functionality is called space sharing. You can accomplish this functionality by configuring a queue for each job class. Then you assign to each queue a portion of the total memory on that host.
In the example, the queue configuration attaches half of the total memory that is available to host carc to the queue fast.q for the host carc. Hence the accumulated memory consumption of all jobs that are running in queue fast.q on host carc cannot exceed 500 Mbytes. Jobs in other queues are not taken into account. Nonetheless, the total memory consumption of all running jobs on host carc cannot exceed 1 Gbyte.
The attribute virtual_free is available to all queues through inheritance from the complex.
Users might submit jobs to a system configured similarly to the example in either of the following forms:
% qsub -l vf=100M honest.sh % qsub dont_care.sh
The job submitted by the first command can be started as soon as at least 100 Mbytes of memory are available. This amount is taken into account in the capacity planning for the virtual_free consumable resource. The second job runs only if no other job is on the system, as the second job implicitly requests all the available memory. In addition, the second job cannot run in queue fast.q because the job exceeds the queue's memory capacity.
Some applications need to manipulate huge data sets stored in files. Such applications therefore depend on the availability of sufficient disk space throughout their run time. This requirement is similar to the space sharing of available memory, as discussed in the preceding example. The main difference is that the grid engine system does not provide free disk space as one of its standard load parameters. Free disk space is not a standard load parameter because disks are usually partitioned into file systems in a site-specific way. Site-specific partitioning does not allow identifying the file system of interest automatically.
First, the attribute must be configured as a consumable resource, as shown in the following figure.
In the case of local host file systems, a reasonable capacity definition for the disk space consumable can be put in the host configuration, as shown in the following figure.
Submission of jobs to a grid engine system that is configured as described here works similarly to the previous examples:
% qsub -l hf=5G big-sort.sh
The reason the h_fsize attribute is recommended here is that h_fsize also is used as the hard file size limit in the queue configuration. The file size limit restricts the ability of jobs to create files that are larger than what is specified during job submission. The qsub command in this example specifies a file size limit of 5 Gbytes. If the job does not request the attribute, the corresponding value from the queue configuration or host configuration is used. If the Requestable flag for h_fsize is set to FORCED in the example, a request must be included in the qsub command. If the Requestable flag is not set, a request is optional in the qsub command.
By using the queue limit as the consumable resource, you control requests that the user specifies instead of the real resource consumption by the job scripts. Any violation of the limit is sanctioned, which eventually aborts the job. The queue limit ensures that the resource requests on which the grid engine system internal capacity planning is based are reliable. See the queue_conf(5) and the setrlimit(2) man pages for details.
Some operating systems provide only per-process file size limits. In this case, a job might create multiple files with a size up to the limit. On systems that support per-job file size limitation, the grid engine system uses this functionality with the h_fsize attribute. See the queue_conf(5) man page for further details.
You might want applications that are not submitted to the grid engine system to occupy disk space concurrently. If so, the internal bookkeeping might not be sufficient to prevent application failure due to lack of disk space. To avoid this problem, you can periodically receive statistics about disk space usage, which indicates total disk space consumption, including the one occurring outside the grid engine system.
The load sensor interface enables you to enhance the set of standard load parameters with site-specific information, such as the available disk space on a file system. See Adding Site-Specific Load Parameters for more information.
By adding an appropriate load sensor and reporting free disk space for h_fsize, you can combine consumable resource management and resource availability statistics. The grid engine system compares job requirements for disk space with the available capacity and with the most recent reported load value. Available capacity is derived from the internal resource planning. Jobs get dispatched to a host only if both criteria are met.
To configure the complex from the command line, type the following command with appropriate options:
% qconf options
See the qconf(1) man page for a detailed definition of the qconf command format and the valid syntax.
The following options enable you to modify the grid engine system complex:
The following command prints the current complex configuration to the standard output stream in the file format defined in the complex(5) man page:
% qconf -sc
A sample output is shown in the following example.
#name shortcut type relop requestable consumable default urgency #--------------------------------------------------------------------------- nastran na INT <= YES NO 0 0 pam-crash pc INT <= YES YES 1 0 permas pm INT <= FORCED YES 1 0 #---- # start a comment but comments are not saved across edits -----------
This section explains the grid engine system's load parameters. Instructions are included for writing your own load sensors.
By default, sge_execd periodically reports several load parameters and their corresponding values to sge_qmaster. These values are stored in the sge_qmaster internal host object, which is described in About Hosts and Daemons. However, the values are used internally only if a complex resource attribute with a corresponding name is defined. Such complex resource attributes contain the definition as to how load values are to be interpreted. See Assigning Resource Attributes to Queues, Hosts, and the Global Cluster for more information.
After the primary installation, a standard set of load parameters is reported. All attributes required for the standard load parameters are defined as host-related attributes. Subsequent releases of N1 Grid Engine 6.1 software may provide extended sets of default load parameters, therefore the set of load parameters that is reported by default is documented in the file sge-root/doc/load_parameters.asc.
How load attributes are defined determines their accessibility. By defining load parameters as global resource attributes, you make them available for the entire cluster and for all hosts. By defining load parameters as host-related attributes, you provide the attributes for all hosts but not for the global cluster.
Do not define load attributes as queue attributes. Queue attributes would not be available to any host nor to the cluster.
The set of default load parameters might not be adequate to completely describe the load situation in a cluster. This possibility is especially likely with respect to site-specific policies, applications, and configurations. Therefore grid engine software provides the means to extend the set of load parameters. For this purpose, sge_execd offers an interface to feed load parameters and the current load values into sge_execd. Afterwards, these parameters are treated like the default load parameters. As for the default load parameters, corresponding attributes must be defined in the complex for the site-specific load parameters to become effective. See Default Load Parameters for more information.
To feed sge_execd with additional load information, you must supply a load sensor. The load sensor can be a script or a binary executable. In either case, the load sensor's handling of the standard input and standard output streams and its control flow must comply with the following rules:
The load sensor must be written as an infinite loop that waits at a certain point for input from STDIN.
If the string quit is read from STDIN, the load sensor is supposed to exit.
As soon as an end-of-line is read from STDIN, a retrieval cycle for loading data is supposed to start.
The load sensor then performs whatever operation is necessary to compute the desired load figures. At the end of the cycle, the load sensor writes the result to STDOUT.
If load retrieval takes a long time, the load measurement process can be started immediately after sending a load report. When quit is received, the load values are then available to be sent.
The format for the load sensor rules is as follows:
A load value report starts with a line that contains nothing but the word begin.
Individual load values are separated by newlines.
Each load value consists of three parts separated by colons (:) and contains no blanks.
The first part of a load value is either the name of the host for which load is reported or the special name global.
The second part of the load sensor is the symbolic name of the load value, as defined in the complex. See the complex(5) man page for details. If a load value is reported for which no entry in the complex exists, the reported load value is not used.
The third part of the load sensor is the measured load value. A load value report ends with a line that contains the word end.
The following example shows a load sensor. The load sensor is a Bourne shell script.
#!/bin/sh myhost=`uname -n` while [ 1 ]; do # wait for input read input result=$? if [ $result != 0 ]; then exit 1 fi if [ $input = quit ]; then exit 0 fi #send users logged in logins=`who | cut -f1 -d" " | sort | uniq | wc -l | sed "s/^ *//"` echo begin echo "$myhost:logins:$logins" echo end done # we never get here exit 0
Save this script to the file load.sh. Assign executable permission to the file with the chmod command. To test the script interactively from the command line, type load.sh and repeatedly press the Return key.
As soon as the procedure works, you can install it for any execution host. To install the procedure, configure the load sensor path as the load_sensor parameter for the cluster configuration, global configuration, or the host-specific configuration. See Basic Cluster Configuration or the sge_conf(5) man page for more information.
The corresponding QMON window might look like the following figure:
The reported load parameter logins is usable as soon as a corresponding attribute is added to the complex. The required definition might look like the last table entry shown in the following figure.