Sun N1 Grid Engine 6.1 Administration Guide

Example 3: Managing Available Disk Space

Some applications need to manipulate huge data sets stored in files. Such applications therefore depend on the availability of sufficient disk space throughout their run time. This requirement is similar to the space sharing of available memory, as discussed in the preceding example. The main difference is that the grid engine system does not provide free disk space as one of its standard load parameters. Free disk space is not a standard load parameter because disks are usually partitioned into file systems in a site-specific way. Site-specific partitioning does not allow identifying the file system of interest automatically.

Nevertheless, available disk space can be managed efficiently by the system through the consumables resources facility. You should use the host resource attribute h_fsize for this purpose.

First, the attribute must be configured as a consumable resource, as shown in the following figure.

Dialog box titled Complex Configuration. Shows h_fsize
attribute definition. Shows Add, Modify, Delete, Load, Save, Commit, Cancel,
and Help buttons.

In the case of local host file systems, a reasonable capacity definition for the disk space consumable can be put in the host configuration, as shown in the following figure.

Dialog box titled Add/Modify Exec Host. Shows h_vmem
and h_fsize attribute values. Shows Ok and Cancel buttons.

Submission of jobs to a grid engine system that is configured as described here works similarly to the previous examples:

% qsub -l hf=5G

The reason the h_fsize attribute is recommended here is that h_fsize also is used as the hard file size limit in the queue configuration. The file size limit restricts the ability of jobs to create files that are larger than what is specified during job submission. The qsub command in this example specifies a file size limit of 5 Gbytes. If the job does not request the attribute, the corresponding value from the queue configuration or host configuration is used. If the Requestable flag for h_fsize is set to FORCED in the example, a request must be included in the qsub command. If the Requestable flag is not set, a request is optional in the qsub command.

By using the queue limit as the consumable resource, you control requests that the user specifies instead of the real resource consumption by the job scripts. Any violation of the limit is sanctioned, which eventually aborts the job. The queue limit ensures that the resource requests on which the grid engine system internal capacity planning is based are reliable. See the queue_conf(5) and the setrlimit(2) man pages for details.

Note –

Some operating systems provide only per-process file size limits. In this case, a job might create multiple files with a size up to the limit. On systems that support per-job file size limitation, the grid engine system uses this functionality with the h_fsize attribute. See the queue_conf(5) man page for further details.

You might want applications that are not submitted to the grid engine system to occupy disk space concurrently. If so, the internal bookkeeping might not be sufficient to prevent application failure due to lack of disk space. To avoid this problem, you can periodically receive statistics about disk space usage, which indicates total disk space consumption, including the one occurring outside the grid engine system.

The load sensor interface enables you to enhance the set of standard load parameters with site-specific information, such as the available disk space on a file system. See Adding Site-Specific Load Parameters for more information.

By adding an appropriate load sensor and reporting free disk space for h_fsize, you can combine consumable resource management and resource availability statistics. The grid engine system compares job requirements for disk space with the available capacity and with the most recent reported load value. Available capacity is derived from the internal resource planning. Jobs get dispatched to a host only if both criteria are met.