Sun N1 Grid Engine 6.1 Administration Guide

Chapter 3 Configuring Complex Resource Attributes

This chapter describes how to configure resource attribute definitions. Resource attribute definitions are stored in an entity called the grid engine system complex. In addition to background information relating to the complex and its associated concepts, this chapter provides detailed instructions on how to accomplish the following tasks:

Complex Resource Attributes

The complex configuration provides all pertinent information about the resource attributes users can request for jobs with the qsub -l or qalter -l commands. The complex configuration also provides information about how the grid engine system should interpret these resource attributes.

The complex also builds the framework for the system's consumable resources facility. The resource attributes that are defined in the complex can be attached to the global cluster, to a host, or to a queue instance. The attached attribute identifies a resource with the associated capability. During the scheduling process, the availability of resources and the job requirements are taken into account. The grid engine system also performs the bookkeeping and the capacity planning that is required to prevent oversubscription of consumable resources.

Typical consumable resource attributes include:

Attribute definitions in the grid engine complex define how resource attributes should be interpreted.

The definition of a resource attribute includes the following:

Use the QMON Complex Configuration dialog box, which is shown in Figure 3–1, to define complex resource attributes.

Configuring Complex Resource Attributes With QMON

In the QMON Main Control window, click the Complex Configuration button. The Complex Configuration dialog box appears.

Figure 3–1 Complex Configuration Dialog Box

Dialog box titled Complex Configuration. Shows resource
attributes and fields for defining new attributes. Shows Commit, Cancel, and
Help buttons.

The Complex Configuration dialog box enables you to add, modify, or delete complex resource attributes.

To add a new attribute, first make sure that no line in the Attributes table is selected. In the fields above the Attributes table, type or select the values that you want, and then click Add.


Note –

If you want to add a new attribute and an existing attribute is selected, you must clear the selection. To deselect a highlighted attribute, hold down the Control key and click mouse button 1.


You can add a new attribute by copying an existing attribute and then modifying it. Make sure that the attribute name and its shortcut are unique.

To modify an attribute listed in the Attributes table, select it. The values of the selected attribute are displayed above the Attributes table. Change the attribute values, and then click Modify.

To save configuration changes to a file, click Save. To load values from a file into the complex configuration, click Load, and then select the name of a file from the list that appears.

To delete an attribute in the Attribute table, select it, and then click Delete.

See the complex(5) man page for details about the meaning of the rows and columns in the table.

To register your new or modified complex configuration with sge_qmaster, click Commit.

Assigning Resource Attributes to Queues, Hosts, and the Global Cluster

Resource attributes can be used in the following ways:

A set of default resource attributes is already attached to each queue and host. Default resource attributes are built in to the system and cannot be deleted, nor can their type be changed.

User-defined resource attributes must first be defined in the complex before you can assign them to a queue instance, a host, or the global cluster. When you assign a resource attribute to one of these targets, you specify a value for the attribute.

The following sections describe each attribute type in detail.

Queue Resource Attributes

Default queue resource attributes are a set of parameters that are defined in the queue configuration. These parameters are described in the queue_conf(5) man page.

You can add new resource attributes to the default attributes. New attributes are attached only to the queue instances that you modify. When the configuration of a particular queue instance references a resource attribute that is defined in the complex, that queue configuration provides the values for the attribute definition. For details about queue configuration see Configuring Queues.

For example, the queue configuration value h_vmem is used for the virtual memory size limit. This value limits the amount of total memory that each job can consume. An entry in the complex_values list of the queue configuration defines the total available amount of virtual memory on a host or assigned to a queue. For detailed information about consumable resources, see Consumable Resources.

Host Resource Attributes

Host resource attributes are parameters that are intended to be managed on a host basis.

The default host-related attributes are load values. You can add new resource attributes to the default attributes, as described earlier in Queue Resource Attributes.

Every sge_execd periodically reports load to sge_qmaster. The reported load values are either the standard load values such as the CPU load average, or the load values defined by the administrator, as described in Load Parameters.

The definitions of the standard load values are part of the default host resource attributes, whereas administrator-defined load values require extending the host resource attributes.

Host-related attributes are commonly extended to include nonstandard load parameters. Host-related attributes are also extended to manage host-related resources such as the number of software licenses that are assigned to a host, or the available disk space on a host's local file system.

If host–related attributes are associated with a host or with a queue instance on that host, a concrete value for a particular host resource attribute is determined by one of the following items:

In some cases, none of these values are available. For example, say the value is supposed to be a load parameter, but sge_execd does not report a load value for the parameter. In such cases, the attribute is not defined, and the qstat –F command shows that the attribute is not applicable.

For example, the total free virtual memory attribute h_vmem is defined in the queue configuration as limit and is also reported as a standard load parameter. The total available amount of virtual memory on a host can be defined in the complex_values list of that host. The total available amount of virtual memory attached to a queue instance on that host can be defined in the complex_values list of that queue instance. Together with defining h_vmem as a consumable resource, you can efficiently exploit memory of a machine without risking memory oversubscription, which often results in reduced system performance that is caused by swapping. For more information about consumable resources, see Consumable Resources.


Note –

Only the Shortcut, Relation, Requestable, Consumable, and Default columns can be changed for the default resource attributes. No default attributes can be deleted.


Global Resource Attributes

Global resource attributes are cluster-wide resource attributes, such as available network bandwidth of a file server or the free disk space on a network-wide available file system.

Global resource attributes can also be associated with load reports if the corresponding load report contains the GLOBAL identifier, as described in Load Parameters. Global load values can be reported from any host in the cluster. No global load values are reported by default, therefore there are no default global resource attributes.

Concrete values for global resource attributes are determined by the following items:

Sometimes none of these cases apply. For example, a load value might not yet be reported. In such cases, the attribute does not exist.

Adding Resource Attributes to the Complex

By adding resource attributes to the complex, the administrator can extend the set of attributes managed by thegrid engine system. The administrator can also restrict the influence of user-defined attributes to particular queues, hosts, or both.

User-defined attributes are a named collection of attributes with the corresponding definitions as to how the grid engine software is to handle these attributes. You can attach one or more user-defined attributes to a queue, to a host, or globally to all hosts in the cluster. Use the complex_values parameter for the queue configuration and the host configuration. For more information, see Configuring Queues and Configuring Hosts. The attributes defined become available to the queue and to the host, respectively, in addition to the default resource attributes.

The complex_values parameter in the queue configuration and the host configuration must set concrete values for user-defined attributes that are associated with queues and hosts.

For example, say the user-defined resource attributes permas, pamcrash, and nastran, shown in the following figure, are defined.

Dialog box titled Complex Configuration. Shows permas,
pamcrash, and nastran attributes. Shows Commit, Cancel, and Help buttons.

For at least one or more queues, add the resource attributes to the list of associated user-defined attributes as shown in the Complex tab of the Modify queue-name dialog box. For details on how to configure queues, see Configuring Queues and its related sections.

Dialog box titled Modify <queue-name>. Shows Complex
tab with parameter you can set. Shows Ok, Cancel, Refresh, and Help buttons.

Then the displayed queue is configured to manage up to 10 licenses of the software package permas. Furthermore, the attribute permas becomes requestable for jobs, as expressed in the Available Resources list in the Requested Resources dialog box.

Dialog box titled Requested Resource. Shows lists of
requested resources for jobs. Shows OK, Cancel, Clear, and Help buttons.

For details about how to submit jobs, see Chapter 3, Submitting Jobs, in Sun N1 Grid Engine 6.1 User’s Guide.

Alternatively, the user could submit jobs from the command line and could request attributes as follows:


% qsub -l pm=1 permas.sh

Note –

You can use the pm shortcut instead of the full attribute name permas.


Consequently, the only eligible queues for these jobs are the queues that are associated with the user-defined resource attributes and that have permas licenses configured and available.

Consumable Resources

Consumable resources provide an efficient way to manage limited resources such as available memory, free space on a file system, network bandwidth, or floating software licenses. Consumable resources are also called consumables. The total available capacity of a consumable is defined by the administrator. The consumption of the corresponding resource is monitored by grid engine software internal bookkeeping. The grid engine system accounts for the consumption of this resource for all running jobs. Jobs are dispatched only if the internal bookkeeping indicates that sufficient consumable resources are available.

Consumables can be combined with default load parameters or user-defined load parameters. Load values can be reported for consumable attributes. Conversely, the Consumable flag can be set for load attributes. Load measures the availability of the resource. Consumable resource management takes both the load and the internal bookkeeping into account, ensuring that neither exceeds a given limit. For more information about load parameters, see Load Parameters.

To enable consumable resource management, you must define the total capacity of a resource. You can define resource capacity globally for the cluster, for specified hosts, and for specified queues. These categories can supersede each other in the given order. Thus a host can restrict availability of a global resource, and a queue can restrict host resources and global resources.

You define resource capacities by using the complex_values attribute in the queue and host configurations. The complex_values definition of the global host specifies global cluster consumable settings. For more information, see the host_conf(5) and queue_conf(5) man pages, as well as Configuring Queues and Configuring Hosts.

To each consumable attribute in a complex_values list, a value is assigned that denotes the maximum available amount for that resource. The internal bookkeeping subtracts from this total the assumed resource consumption by all running jobs as expressed through the jobs' resource requests.

A parallel job consumes as many consumable resources as it consumes job slots. For example, the following command consumes a total of 800 Mbytes of memory:


qsub -l mem=100M -pe make=8

Memory usage is split across the queues and hosts on which the job runs. If four tasks run on host A and four tasks run on host B, the job consumes 400 Mbytes on each host.

Setting Up Consumable Resources

Only numeric attributes can be configured as consumables. Numeric attributes are attributes whose type is INT, DOUBLE, MEMORY, or TIME.

In the QMON Main Control window, click the Complex Configuration button. The Complex Configuration dialog box appears, as shown in Figure 3–1.

To enable the consumable management for an attribute, set the Consumable flag for the attribute in the complex configuration. For example, the following figure shows that the Consumable flag is set for the virtual_free memory resource.

Figure 3–2 Complex Configuration Dialog Box: virtual_free

Dialog box titled Complex Configuration. Shows resource
attributes and fields for defining new attributes. Shows Commit, Cancel, and
Help buttons.

Set up other consumable resources, guided by the examples detailed in the following sections:

Then, for each queue or host for which you want the grid engine software to do the required capacity planning, you must define the capacity in a complex_values list. An example is shown in the following figure, where 1 Gbyte of virtual memory is defined as the capacity value of the current host.

Figure 3–3 Add/Modify Exec Host: virtual_free

Dialog box titled Add/Modify Exec Host. Shows Consumables/Fixed
Attributes tab with virtual_free memory definition. Shows Ok and Cancel buttons.

The virtual memory requirements of all jobs running concurrently in any queue on that host are accumulated. The requirements are then subtracted from the capacity of 1 Gbyte to determine available virtual memory. If a job request for virtual_free exceeds the available amount, the job is not dispatched to a queue on that host.


Note –

Jobs can be forced to request a resource and thus to specify their assumed consumption through the FORCED value of the Requestable parameter.


For consumable attributes that are not explicitly requested by the job, the administrator can predefine a default value for resource consumption. Doing so is meaningful only if requesting the attribute is not forced, as explained in the previous note. 200 Mbytes is set as the default value.

Examples of Setting Up Consumable Resources

Use the following examples to guide you in setting up consumable resources for your site.

Example 1: Floating Software License Management

Suppose you are using the software package pam-crash in your cluster, and you have access to 10 floating licenses. You can use pam-crash on every system as long as no more than 10 invocations of the software are active. The goal is to configure the grid engine system in a way that prevents scheduling pam-crash jobs while all 10 licenses are occupied by other running pam-crash jobs.

With consumable resources, you can achieve this goal easily. First you must add the number of available pam-crash licenses as a global consumable resource to the complex configuration.

Dialog box titled Complex Configuration. Shows pam-crash
resource attribute definition. Shows Commit, Cancel, and Help buttons.

The name of the consumable attribute is set to pam-crash. You can use pc as a shortcut in the qalter -l, qselect -l, qsh -l, qstat -l, or qsub -l commands instead.

The attribute type is defined to be an integer counter.

The Requestable flag is set to FORCED. This setting specifies that users must request how many pam-crash licenses that their job will occupy when the job is submitted.

The Consumable flag specifies that the attribute is a consumable resource.

The setting Default is irrelevant since Requestable is set to FORCED, which means that a request value must be received for this attribute with any job.

Consumables receive their value from the global, host, or queue configurations through the complex_values lists. See the host_conf(5) and queue_conf(5) man pages, as well as Configuring Queues and Configuring Hosts.

To activate resource planning for this attribute and for the cluster, the number of available pam-crash licenses must be defined in the global host configuration.

Dialog box titled Add/Modify Exec Host. Shows Consumables/Fixed
Attributes tab with pam-crash value definition. Shows Ok and Cancel buttons.

The value for the attribute pam-crash is set to 10, corresponding to 10 floating licenses.


Note –

The table Consumables/Fixed Attributes corresponds to the complex_values entry that is described in the host configuration file format host_conf(5).


Assume that a user submits the following job:


% qsub -l pc=1 pam-crash.sh

The job starts only if fewer than 10 pam-crash licenses are currently occupied. The job can run anywhere in the cluster, however, and the job occupies one pam-crash license throughout its run time.

One of your hosts in the cluster might not be able to be included in the floating license. For example, you might not have pam-crash binaries for that host. In such a case, you can exclude the host from the pam-crash license management. You can exclude the host by setting to zero the capacity that is related to that host for the consumable attribute pam-crash. Use the Execution Host tab of the Host Configuration dialog box.

Dialog box titled Add/Modify Exec Host. Shows Consumables/Fixed
Attributes tab with pam-crash value definition. Shows Ok and Cancel buttons.
Note –

The pam-crash attribute is implicitly available to the execution host because the global attributes of the complex are inherited by all execution hosts. By setting the capacity to zero, you could also restrict the number of licenses that a host can manage to a nonzero value such as two. In this case, a maximum of two pam-crash jobs could coexist on that host.


Similarly, you might want to prevent a certain queue from running pam-crash jobs. For example, the queue might be an express queue with memory and CPU-time limits not suitable for pam-crash. In this case, set the corresponding capacity to zero in the queue configuration, as shown in the following figure.

Dialog box titled Modify <queue-name>. Shows Complex
tab with pam-crash value definition. Shows Ok, Cancel, Refresh, and Help buttons.
Note –

The pam-crash attribute is implicitly available to the queue because the global attributes of the complex are inherited by all queues.


Example 2: Space Sharing for Virtual Memory

Administrators must often tune a system to avoid performance degradation caused by memory oversubscription, and consequently swapping of a machine. The grid engine software can support you in this task through the Consumable Resources facility.

The standard load parameter virtual_free reports the available free virtual memory, that is, the combination of available swap space and the available physical memory. To avoid swapping, the use of swap space must be minimized. In an ideal case, all the memory required by all processes running on a host should fit into physical memory.

The grid engine software can guarantee the availability of required memory for all jobs started through the grid engine system, given the following assumptions and configurations:

An example of a possible virtual_free resource definition is shown in Figure 3–2. A corresponding execution host configuration for a host with 1 Gbyte of main memory is shown in Figure 3–3.

In the virtual_free resource definition example, the Requestable flag is set to YES instead of to FORCED, as in the example of a global configuration. This means that users need not indicate the memory requirements of their jobs. The value in the Default field is used if an explicit memory request is missing. The value of 1 Gbyte as default request in this case means that a job without a request is assumed to occupy all available physical memory.


Note –

virtual_free is one of the standard load parameters of the grid engine system. The additional availability of recent memory statistics is taken into account automatically by the system in the virtual memory capacity planning. If the load report for free virtual memory falls below the value obtained by grid engine software internal bookkeeping, the load value is used to avoid memory oversubscription. Differences in the reported load values and the internal bookkeeping can occur easily if jobs are started without using the grid engine system.


If you run different job classes with different memory requirements on one machine, you might want to partition the memory that these job classes use. This functionality is called space sharing. You can accomplish this functionality by configuring a queue for each job class. Then you assign to each queue a portion of the total memory on that host.

In the example, the queue configuration attaches half of the total memory that is available to host carc to the queue fast.q for the host carc. Hence the accumulated memory consumption of all jobs that are running in queue fast.q on host carc cannot exceed 500 Mbytes. Jobs in other queues are not taken into account. Nonetheless, the total memory consumption of all running jobs on host carc cannot exceed 1 Gbyte.

Dialog box titled Modify <queue-name>. Shows Complex
tab with virtual_free memory definition. Shows Ok, Cancel, Refresh, and Help
buttons.
Note –

The attribute virtual_free is available to all queues through inheritance from the complex.


Users might submit jobs to a system configured similarly to the example in either of the following forms:


% qsub -l vf=100M honest.sh
% qsub dont_care.sh

The job submitted by the first command can be started as soon as at least 100 Mbytes of memory are available. This amount is taken into account in the capacity planning for the virtual_free consumable resource. The second job runs only if no other job is on the system, as the second job implicitly requests all the available memory. In addition, the second job cannot run in queue fast.q because the job exceeds the queue's memory capacity.

Example 3: Managing Available Disk Space

Some applications need to manipulate huge data sets stored in files. Such applications therefore depend on the availability of sufficient disk space throughout their run time. This requirement is similar to the space sharing of available memory, as discussed in the preceding example. The main difference is that the grid engine system does not provide free disk space as one of its standard load parameters. Free disk space is not a standard load parameter because disks are usually partitioned into file systems in a site-specific way. Site-specific partitioning does not allow identifying the file system of interest automatically.

Nevertheless, available disk space can be managed efficiently by the system through the consumables resources facility. You should use the host resource attribute h_fsize for this purpose.

First, the attribute must be configured as a consumable resource, as shown in the following figure.

Dialog box titled Complex Configuration. Shows h_fsize
attribute definition. Shows Add, Modify, Delete, Load, Save, Commit, Cancel,
and Help buttons.

In the case of local host file systems, a reasonable capacity definition for the disk space consumable can be put in the host configuration, as shown in the following figure.

Dialog box titled Add/Modify Exec Host. Shows h_vmem
and h_fsize attribute values. Shows Ok and Cancel buttons.

Submission of jobs to a grid engine system that is configured as described here works similarly to the previous examples:


% qsub -l hf=5G big-sort.sh

The reason the h_fsize attribute is recommended here is that h_fsize also is used as the hard file size limit in the queue configuration. The file size limit restricts the ability of jobs to create files that are larger than what is specified during job submission. The qsub command in this example specifies a file size limit of 5 Gbytes. If the job does not request the attribute, the corresponding value from the queue configuration or host configuration is used. If the Requestable flag for h_fsize is set to FORCED in the example, a request must be included in the qsub command. If the Requestable flag is not set, a request is optional in the qsub command.

By using the queue limit as the consumable resource, you control requests that the user specifies instead of the real resource consumption by the job scripts. Any violation of the limit is sanctioned, which eventually aborts the job. The queue limit ensures that the resource requests on which the grid engine system internal capacity planning is based are reliable. See the queue_conf(5) and the setrlimit(2) man pages for details.


Note –

Some operating systems provide only per-process file size limits. In this case, a job might create multiple files with a size up to the limit. On systems that support per-job file size limitation, the grid engine system uses this functionality with the h_fsize attribute. See the queue_conf(5) man page for further details.


You might want applications that are not submitted to the grid engine system to occupy disk space concurrently. If so, the internal bookkeeping might not be sufficient to prevent application failure due to lack of disk space. To avoid this problem, you can periodically receive statistics about disk space usage, which indicates total disk space consumption, including the one occurring outside the grid engine system.

The load sensor interface enables you to enhance the set of standard load parameters with site-specific information, such as the available disk space on a file system. See Adding Site-Specific Load Parameters for more information.

By adding an appropriate load sensor and reporting free disk space for h_fsize, you can combine consumable resource management and resource availability statistics. The grid engine system compares job requirements for disk space with the available capacity and with the most recent reported load value. Available capacity is derived from the internal resource planning. Jobs get dispatched to a host only if both criteria are met.

Configuring Complex Resource Attributes From the Command Line

To configure the complex from the command line, type the following command with appropriate options:


% qconf options

See the qconf(1) man page for a detailed definition of the qconf command format and the valid syntax.

The following options enable you to modify the grid engine system complex:

The following command prints the current complex configuration to the standard output stream in the file format defined in the complex(5) man page:


% qconf -sc

A sample output is shown in the following example.


Example 3–1 qconf -sc Sample Output


#name      shortcut  type  relop  requestable   consumable  default  urgency
#---------------------------------------------------------------------------
nastran    na        INT   <=     YES           NO          0        0
pam-crash  pc        INT   <=     YES           YES         1        0
permas     pm        INT   <=     FORCED        YES         1        0
#---- # start a comment but comments are not saved across edits -----------

Load Parameters

This section explains the grid engine system's load parameters. Instructions are included for writing your own load sensors.

Default Load Parameters

By default, sge_execd periodically reports several load parameters and their corresponding values to sge_qmaster. These values are stored in the sge_qmaster internal host object, which is described in About Hosts and Daemons. However, the values are used internally only if a complex resource attribute with a corresponding name is defined. Such complex resource attributes contain the definition as to how load values are to be interpreted. See Assigning Resource Attributes to Queues, Hosts, and the Global Cluster for more information.

After the primary installation, a standard set of load parameters is reported. All attributes required for the standard load parameters are defined as host-related attributes. Subsequent releases of N1 Grid Engine 6.1 software may provide extended sets of default load parameters, therefore the set of load parameters that is reported by default is documented in the file sge-root/doc/load_parameters.asc.

How load attributes are defined determines their accessibility. By defining load parameters as global resource attributes, you make them available for the entire cluster and for all hosts. By defining load parameters as host-related attributes, you provide the attributes for all hosts but not for the global cluster.


Note –

Do not define load attributes as queue attributes. Queue attributes would not be available to any host nor to the cluster.


Adding Site-Specific Load Parameters

The set of default load parameters might not be adequate to completely describe the load situation in a cluster. This possibility is especially likely with respect to site-specific policies, applications, and configurations. Therefore grid engine software provides the means to extend the set of load parameters. For this purpose, sge_execd offers an interface to feed load parameters and the current load values into sge_execd. Afterwards, these parameters are treated like the default load parameters. As for the default load parameters, corresponding attributes must be defined in the complex for the site-specific load parameters to become effective. See Default Load Parameters for more information.

Writing Your Own Load Sensors

To feed sge_execd with additional load information, you must supply a load sensor. The load sensor can be a script or a binary executable. In either case, the load sensor's handling of the standard input and standard output streams and its control flow must comply with the following rules:

The load sensor then performs whatever operation is necessary to compute the desired load figures. At the end of the cycle, the load sensor writes the result to STDOUT.


Note –

If load retrieval takes a long time, the load measurement process can be started immediately after sending a load report. When quit is received, the load values are then available to be sent.


Load Sensor Rules Format

The format for the load sensor rules is as follows:

Example of a Load Sensor Script

The following example shows a load sensor. The load sensor is a Bourne shell script.


Example 3–2 Load Sensor – Bourne Shell Script


#!/bin/sh

myhost=`uname -n`

while [ 1 ]; do
     # wait for input
     read input
     result=$?
     if [ $result != 0 ]; then
          exit 1
     fi
     if [ $input = quit ]; then
          exit 0
     fi	
     #send users logged in
     logins=`who | cut -f1 -d" " | sort | uniq | wc -l | sed "s/^ *//"`
     echo begin
     echo "$myhost:logins:$logins"
     echo end
done

# we never get here

exit 0

Save this script to the file load.sh. Assign executable permission to the file with the chmod command. To test the script interactively from the command line, type load.sh and repeatedly press the Return key.

As soon as the procedure works, you can install it for any execution host. To install the procedure, configure the load sensor path as the load_sensor parameter for the cluster configuration, global configuration, or the host-specific configuration. See Basic Cluster Configuration or the sge_conf(5) man page for more information.

The corresponding QMON window might look like the following figure:

Dialog box titled Cluster Settings. Shows General Settings
tab with Mailer, Xterm, and Load Sensor paths. Shows Ok and Cancel buttons.

The reported load parameter logins is usable as soon as a corresponding attribute is added to the complex. The required definition might look like the last table entry shown in the following figure.

Dialog box titled Complex Configuration. Shows definition
for logins resource attribute. Shows Add, Modify, Delete, Load, and Save buttons.