Sun N1 Grid Engine 6.1 Administration Guide

Example 2: Space Sharing for Virtual Memory

Administrators must often tune a system to avoid performance degradation caused by memory oversubscription, and consequently swapping of a machine. The grid engine software can support you in this task through the Consumable Resources facility.

The standard load parameter virtual_free reports the available free virtual memory, that is, the combination of available swap space and the available physical memory. To avoid swapping, the use of swap space must be minimized. In an ideal case, all the memory required by all processes running on a host should fit into physical memory.

The grid engine software can guarantee the availability of required memory for all jobs started through the grid engine system, given the following assumptions and configurations:

An example of a possible virtual_free resource definition is shown in Figure 3–2. A corresponding execution host configuration for a host with 1 Gbyte of main memory is shown in Figure 3–3.

In the virtual_free resource definition example, the Requestable flag is set to YES instead of to FORCED, as in the example of a global configuration. This means that users need not indicate the memory requirements of their jobs. The value in the Default field is used if an explicit memory request is missing. The value of 1 Gbyte as default request in this case means that a job without a request is assumed to occupy all available physical memory.


Note –

virtual_free is one of the standard load parameters of the grid engine system. The additional availability of recent memory statistics is taken into account automatically by the system in the virtual memory capacity planning. If the load report for free virtual memory falls below the value obtained by grid engine software internal bookkeeping, the load value is used to avoid memory oversubscription. Differences in the reported load values and the internal bookkeeping can occur easily if jobs are started without using the grid engine system.


If you run different job classes with different memory requirements on one machine, you might want to partition the memory that these job classes use. This functionality is called space sharing. You can accomplish this functionality by configuring a queue for each job class. Then you assign to each queue a portion of the total memory on that host.

In the example, the queue configuration attaches half of the total memory that is available to host carc to the queue fast.q for the host carc. Hence the accumulated memory consumption of all jobs that are running in queue fast.q on host carc cannot exceed 500 Mbytes. Jobs in other queues are not taken into account. Nonetheless, the total memory consumption of all running jobs on host carc cannot exceed 1 Gbyte.

Dialog box titled Modify <queue-name>. Shows Complex
tab with virtual_free memory definition. Shows Ok, Cancel, Refresh, and Help
buttons.
Note –

The attribute virtual_free is available to all queues through inheritance from the complex.


Users might submit jobs to a system configured similarly to the example in either of the following forms:


% qsub -l vf=100M honest.sh
% qsub dont_care.sh

The job submitted by the first command can be started as soon as at least 100 Mbytes of memory are available. This amount is taken into account in the capacity planning for the virtual_free consumable resource. The second job runs only if no other job is on the system, as the second job implicitly requests all the available memory. In addition, the second job cannot run in queue fast.q because the job exceeds the queue's memory capacity.