Refer to Configuring Policy-Based Resource Management With QMON for details on the scheduling administration of resource-sharing policies of the grid engine system. The following sections focus on administering the scheduler configuration sched_conf and related issues.
The default scheduling is a first-in-first-out policy. In other words, the first job that is submitted is the first job the scheduler examines in order to dispatch it to a queue. If the first job in the list of pending jobs finds a queue that is suitable and available, that job is started first. A job ranked behind the first job can be started first only if the first job fails to find a suitable free resource.
The default strategy is to select queue instances on the least-loaded host, provided that the queues deliver suitable service for the job's resource requirements. If several suitable queues share the same load, the queue to be selected is unpredictable.
You can modify the job scheduling and queue selection strategy in various ways:
Changing the scheduling algorithm
Scaling system load
Selecting queue by sequence number
Selecting queue by share
Restricting the number of jobs per user or per group
The following sections explore these alternatives in detail.
The scheduler configuration parameter algorithm provides a selection for the scheduling algorithm in use. See the sched_conf(5) man page for further information. Currently, default is the only allowed setting.
To select the queue to run a job, the grid engine system uses the system load information on the machines that host queue instances. This queue selection scheme builds up a load-balanced situation, thus guaranteeing better use of the available resources in a cluster.
However, the system load may not always tell the truth. For example, if a multi-CPU machine is compared to a single CPU system, the multiprocessor system usually reports higher load figures, because it probably runs more processes. The system load is a measurement strongly influenced by the number of processes trying to get CPU access. But multi-CPU systems are capable of satisfying a much higher load than single-CPU machines. This problem is addressed by processor-number-adjusted sets of load values that are reported by default by sge_execd. Use these load parameters instead of the raw load values to avoid the problem described earlier. See Load Parameters and the sge-root/doc/load_parameters.asc file for details.
Another example of potentially improper interpretation of load values is when systems have marked differences in their performance potential or in their price performance ratio. In both cases, equal load values do not mean that arbitrary hosts can be selected to run a job. In this situation, the administrator should define load scaling factors for the relevant execution hosts and load parameters. See Configuring Execution Hosts With QMON, and related sections.
The scaled load parameters are also compared against the load threshold lists load-thresholds and migr-load-thresholds. See the queue_conf(5) man page for details.
Another problem associated with load parameters is the need for an application-dependent and site-dependent interpretation of the values and their relative importance. The CPU load might be dominant for a certain type of application that is common at a particular site. By contrast, the memory load might be more important for another site and for the application profile to which the site's compute cluster is dedicated. To address this problem, the grid engine system enables the administrator to specify a load formula in the scheduler configuration file sched_conf. See the sched_conf(5) man page for more details. Site-specific information on resource usage and capacity planning can be taken into account by using site-defined load parameters and consumable resources in the load formula. See the sections Adding Site-Specific Load Parameters) and Consumable Resources.
Finally, the time dependency of load parameters must be taken into account. The load that is imposed by the jobs that are running on a system varies in time. Often the load, for example, the CPU load, requires some amount of time to be reported in the appropriate quantity by the operating system. If a job recently started, the reported load might not provide an accurate representation of the load that the job has imposed on that host. The reported load adapts to the real load over time. But the period of time in which the reported load is too low might lead to an oversubscription of that host. The grid engine system enables the administrator to specify load adjustment factors that are used in the scheduler to compensate for this problem. See the sched_conf(5) man page for detailed information on how to set these load adjustment factors.
Load adjustments are used to virtually increase the measured load after a job is dispatched. In the case of oversubscribed machines, this helps to align with load thresholds. If you do not need load adjustments, you should turn them off. Load adjustments impose additional work on the scheduler in connection with sorting hosts and load thresholds verification.
To disable load adjustments, on the Load Adjustment tab of the Scheduler Configuration dialog box, set the Decay Time to zero, and delete all load adjustment values in the table. See Changing the Scheduler Configuration With QMON.
Another way to change the default scheme for queue selection is to set the global cluster configuration parameter queue_sort_method to seq_no instead of to the default load. In this case, the system load is no longer used as the primary method to select queues. Instead, the sequence numbers that are assigned to the queues by the queue configuration parameter seq_no define a fixed order for queue selection. The queues must be suitable for the considered job, and they must be available. See the queue_conf(5) and sched_conf(5) man pages for more details.
This queue selection policy is useful if the machines that offer batch services at your site are ranked in a monotonous price per job order. For example, a job running on machine A costs 1 unit of money. The same job costs 10 units on machine B. And on machine C the job costs 100 units. Thus the preferred scheduling policy is to first fill up host A and then to use host B. Host C is used only if no alternative remains.
If you have changed the method of queue selection to seq_no, and the considered queues all share the same sequence number, queues are selected by the default load.
The goal of this method is to place jobs so as to attempt to meet the targeted share of global system resources for each job. This method takes into account the resource capability represented by each host in relation to all the system resources. This method tries to balance the percentage of tickets for each host (that is, the sum of tickets for all jobs running on a host) with the percentage of the resource capability that particular host represents for the system. See Configuring Execution Hosts With QMON for instructions on how to define the capacity of a host.
The host's load, although of secondary importance, is also taken into account in the sorting. Choose this sorting method for a site that uses the share-tree policy.
The administrator can assign an upper limit to the number of jobs that any user or any UNIX group can run at any time. In order to enforce this feature, do one of the following:
Set maxujobs or maxgjobs, or both, as described in the sched_conf(5) man page.
On the General Parameters tab of the Scheduler Configuration dialog box, use the Max Jobs/User field to set the maximum number of jobs a user or user group can run concurrently.