Solaris Resource Manager 1.3 System Administration Guide

CPU Resource Management

The allocation of the renewable CPU resource is controlled using a fair share scheduler called the Solaris Resource Manager SHR scheduler.

Scheduler Methodology

Each lnode is assigned a number of CPU shares. The processes associated with each lnode are allocated CPU resources in proportion to the total number of outstanding active shares (active means that the lnode has running processes attached). Only active lnodes are considered for an allocation of the resource, because only they have active processes running and need CPU time.

As a process consumes CPU ticks, the CPU usage attribute of its lnode increases. The scheduler regularly adjusts the priorities of all processes to force the relative ratios of CPU usages to converge on the relative ratios of CPU shares for all active lnodes at their respective levels. In this way, users can expect to receive at least their entitlements of CPU service in the long run, regardless of the behavior of other users.

The scheduler is hierarchical because it also ensures that groups receive their group entitlements independently of the behavior of the members. Solaris Resource Manager SHR scheduler is a long-term scheduler; it ensures that all users and applications receive a fair share over the course of the scheduler term. This means that when a light user starts to request the CPU, that user will receive commensurately more resource than heavy users until their comparative usages are in line with their relative "fair" share allocation. The more you use over your entitlement now, the less you will receive in the future.

Additionally, Solaris Resource Manager has a decay period, set by the system administrator, that does not track past usage. The decay model is one of half-life decay, where 50 percent of the resource has been decayed away within one half-life. This ensures that steady, even users are not penalized by short-term, process-intensive users. The half-life decay period sets the responsiveness, or term, of the scheduler; the default value is 120 seconds. A long half-life favors even usage, typical of longer batch jobs, while a short half-life favors interactive users. Shorter values tend to provide more even response across the system, at the expense of slightly less accuracy in computing and maintaining system-wide resource allocation. Regardless of administrative settings, the scheduler tries to prevent resource starvation and ensure reasonable behavior, even in extreme situations.

Scheduler Advantages

The primary advantage of the Solaris Resource Manager SHR scheduler over the standard Solaris scheduler is that it schedules users or applications rather than individual processes. Every process associated with an lnode is subject to a set of limits. For the simple case of one user running a single active process, this is the same as subjecting each process to the limits listed in the corresponding lnode. When more than one process is attached to an lnode, as when members of a group each run multiple processes, all of the processes are collectively subject to the listed limits. This means that users or applications cannot consume CPU at a greater rate than their entitlements allow, regardless of how many concurrent processes they run. The method for assigning entitlements as a number of shares is simple and understandable, and the effect of changing a user's shares is predictable.

Another advantage of the SHR scheduler is that while it manages the scheduling of individual threads (technically, in Solaris, the scheduled entity is a lightweight process (LWP)), it also apportions CPU resources between users.

These concepts are illustrated by the following equation:

Equation shows that new Solaris Resource Manager priority equals current priority plus CPU usage divided by number of shares.

The new_SRM_priority is then mapped to the system priority. The higher the Solaris Resource Manager priority, the lower the system priority, and vice versa. Every decay period, CPU_usage is reduced by half and incremented with the most current usage.

Each user also has a set of flags, which are boolean-like variables used to enable or disable selective system privileges, such as login. Flags can be set individually per user, or be inherited from a parent lnode.

A user's usages, limits, and flags can be read by any user, but they can be altered only by users with the correct administrative privileges.

Eliminating CPU Waste

Solaris Resource Manager never wastes CPU availability. No matter how low a user's allocation, that user is always given all the available CPU if there are no competing users. One consequence of this is that users may notice performance that is less smooth than usual. If a user with a very low effective share is running an interactive process without any competition, the job will appear to run quickly. However, as soon as another user with a greater effective share demands CPU time, it will be given to that user in preference to the first user, so the first user will notice a marked job slow-down. Nevertheless, Solaris Resource Manager goes to some lengths to ensure that legitimate users are not cut off and unable to do any work. All processes being scheduled by Solaris Resource Manager (except those with a maximum nice value) will be allocated CPU regularly by the scheduler. There is also logic to prevent a new user that has just logged in from being given an arithmetically "fair" but excessively large proportion of the CPU to the detriment of existing users.