System Administration Guide: Oracle Solaris Containers-Resource Management and Oracle Solaris Zones

Chapter 8 Fair Share Scheduler (Overview)

The analysis of workload data can indicate that a particular workload or group of workloads is monopolizing CPU resources. If these workloads are not violating resource constraints on CPU usage, you can modify the allocation policy for CPU time on the system. The fair share scheduling class described in this chapter enables you to allocate CPU time based on shares instead of the priority scheme of the timesharing (TS) scheduling class.

This chapter covers the following topics.

To begin using the fair share scheduler, see Chapter 9, Administering the Fair Share Scheduler (Tasks).

Introduction to the Scheduler

A fundamental job of the operating system is to arbitrate which processes get access to the system's resources. The process scheduler, which is also called the dispatcher, is the portion of the kernel that controls allocation of the CPU to processes. The scheduler supports the concept of scheduling classes. Each class defines a scheduling policy that is used to schedule processes within the class. The default scheduler in the Solaris Operating System, the TS scheduler, tries to give every process relatively equal access to the available CPUs. However, you might want to specify that certain processes be given more resources than others.

You can use the fair share scheduler (FSS) to control the allocation of available CPU resources among workloads, based on their importance. This importance is expressed by the number of shares of CPU resources that you assign to each workload.

You give each project CPU shares to control the project's entitlement to CPU resources. The FSS guarantees a fair dispersion of CPU resources among projects that is based on allocated shares, independent of the number of processes that are attached to a project. The FSS achieves fairness by reducing a project's entitlement for heavy CPU usage and increasing its entitlement for light usage, in accordance with other projects.

The FSS consists of a kernel scheduling class module and class-specific versions of the dispadmin(1M) and priocntl(1) commands. Project shares used by the FSS are specified through the project.cpu-shares property in the project(4) database.


Note –

If you are using the project.cpu-shares resource control on a system with zones installed, see Zone Configuration Data, Resource Controls Used in Non-Global Zones, and Using the Fair Share Scheduler on a Solaris System With Zones Installed.


CPU Share Definition

The term “share” is used to define a portion of the system's CPU resources that is allocated to a project. If you assign a greater number of CPU shares to a project, relative to other projects, the project receives more CPU resources from the fair share scheduler.

CPU shares are not equivalent to percentages of CPU resources. Shares are used to define the relative importance of workloads in relation to other workloads. When you assign CPU shares to a project, your primary concern is not the number of shares the project has. Knowing how many shares the project has in comparison with other projects is more important. You must also take into account how many of those other projects will be competing with it for CPU resources.


Note –

Processes in projects with zero shares always run at the lowest system priority (0). These processes only run when projects with nonzero shares are not using CPU resources.


CPU Shares and Process State

In the Solaris system, a project workload usually consists of more than one process. From the fair share scheduler perspective, each project workload can be in either an idle state or an active state. A project is considered idle if none of its processes are using any CPU resources. This usually means that such processes are either sleeping (waiting for I/O completion) or stopped. A project is considered active if at least one of its processes is using CPU resources. The sum of shares of all active projects is used in calculating the portion of CPU resources to be assigned to projects.

When more projects become active, each project's CPU allocation is reduced, but the proportion between the allocations of different projects does not change.

CPU Share Versus Utilization

Share allocation is not the same as utilization. A project that is allocated 50 percent of the CPU resources might average only a 20 percent CPU use. Moreover, shares serve to limit CPU usage only when there is competition from other projects. Regardless of how low a project's allocation is, it always receives 100 percent of the processing power if it is running alone on the system. Available CPU cycles are never wasted. They are distributed between projects.

The allocation of a small share to a busy workload might slow its performance. However, the workload is not prevented from completing its work if the system is not overloaded.

CPU Share Examples

Assume you have a system with two CPUs running two parallel CPU-bound workloads called A and B, respectively. Each workload is running as a separate project. The projects have been configured so that project A is assigned SA shares, and project B is assigned SB shares.

On average, under the traditional TS scheduler, each of the workloads that is running on the system would be given the same amount of CPU resources. Each workload would get 50 percent of the system's capacity.

When run under the control of the FSS scheduler with SA=SB, these projects are also given approximately the same amounts of CPU resources. However, if the projects are given different numbers of shares, their CPU resource allocations are different.

The next three examples illustrate how shares work in different configurations. These examples show that shares are only mathematically accurate for representing the usage if demand meets or exceeds available resources.

Example 1: Two CPU-Bound Processes in Each Project

If A and B each have two CPU-bound processes, and SA = 1 and SB = 3, then the total number of shares is 1 + 3 = 4. In this configuration, given sufficient CPU demand, projects A and B are allocated 25 percent and 75 percent of CPU resources, respectively.

Illustration. The context describes the graphic.

Example 2: No Competition Between Projects

If A and B have only one CPU-bound process each, and SA = 1 and SB = 100, then the total number of shares is 101. Each project cannot use more than one CPU because each project has only one running process. Because no competition exists between projects for CPU resources in this configuration, projects A and B are each allocated 50 percent of all CPU resources. In this configuration, CPU share values are irrelevant. The projects' allocations would be the same (50/50), even if both projects were assigned zero shares.

Illustration. The context describes the graphic.

Example 3: One Project Unable to Run

If A and B have two CPU-bound processes each, and project A is given 1 share and project B is given 0 shares, then project B is not allocated any CPU resources and project A is allocated all CPU resources. Processes in B always run at system priority 0, so they will never be able to run because processes in project A always have higher priorities.

Illustration. The context describes the graphic.

FSS Configuration

Projects and Users

Projects are the workload containers in the FSS scheduler. Groups of users who are assigned to a project are treated as single controllable blocks. Note that you can create a project with its own number of shares for an individual user.

Users can be members of multiple projects that have different numbers of shares assigned. By moving processes from one project to another project, processes can be assigned CPU resources in varying amounts.

For more information on the project(4) database and name services, see project Database.

CPU Shares Configuration

The configuration of CPU shares is managed by the name service as a property of the project database.

When the first task (or process) that is associated with a project is created through the setproject(3PROJECT) library function, the number of CPU shares defined as resource control project.cpu-shares in the project database is passed to the kernel. A project that does not have the project.cpu-shares resource control defined is assigned one share.

In the following example, this entry in the /etc/project file sets the number of shares for project x-files to 5:


x-files:100::::project.cpu-shares=(privileged,5,none)

If you alter the number of CPU shares allocated to a project in the database when processes are already running, the number of shares for that project will not be modified at that point. The project must be restarted for the change to become effective.

If you want to temporarily change the number of shares assigned to a project without altering the project's attributes in the project database, use the prctl command. For example, to change the value of project x-files's project.cpu-shares resource control to 3 while processes associated with that project are running, type the following:


# prctl -r -n project.cpu-shares -v 3 -i project x-files

See the prctl(1) man page for more information.

-r

Replaces the current value for the named resource control.

-n name

Specifies the name of the resource control.

-v val

Specifies the value for the resource control.

-i idtype

Specifies the ID type of the next argument.

x-files

Specifies the object of the change. In this instance, project x-files is the object.

Project system with project ID 0 includes all system daemons that are started by the boot-time initialization scripts. system can be viewed as a project with an unlimited number of shares. This means that system is always scheduled first, regardless of how many shares have been given to other projects. If you do not want the system project to have unlimited shares, you can specify a number of shares for this project in the project database.

As stated previously, processes that belong to projects with zero shares are always given zero system priority. Projects with one or more shares are running with priorities one and higher. Thus, projects with zero shares are only scheduled when CPU resources are available that are not requested by a nonzero share project.

The maximum number of shares that can be assigned to one project is 65535.

FSS and Processor Sets

The FSS can be used in conjunction with processor sets to provide more fine-grained controls over allocations of CPU resources among projects that run on each processor set than would be available with processor sets alone. The FSS scheduler treats processor sets as entirely independent partitions, with each processor set controlled independently with respect to CPU allocations.

The CPU allocations of projects running in one processor set are not affected by the CPU shares or activity of projects running in another processor set because the projects are not competing for the same resources. Projects only compete with each other if they are running within the same processor set.

The number of shares allocated to a project is system wide. Regardless of which processor set it is running on, each portion of a project is given the same amount of shares.

When processor sets are used, project CPU allocations are calculated for active projects that run within each processor set.

Project partitions that run on different processor sets might have different CPU allocations. The CPU allocation for each project partition in a processor set depends only on the allocations of other projects that run on the same processor set.

The performance and availability of applications that run within the boundaries of their processor sets are not affected by the introduction of new processor sets. The applications are also not affected by changes that are made to the share allocations of projects that run on other processor sets.

Empty processor sets (sets without processors in them) or processor sets without processes bound to them do not have any impact on the FSS scheduler behavior.

FSS and Processor Sets Examples

Assume that a server with eight CPUs is running several CPU-bound applications in projects A, B, and C. Project A is allocated one share, project B is allocated two shares, and project C is allocated three shares.

Project A is running only on processor set 1. Project B is running on processor sets 1 and 2. Project C is running on processor sets 1, 2, and 3. Assume that each project has enough processes to utilize all available CPU power. Thus, there is always competition for CPU resources on each processor set.

Diagram shows total system-wide project CPU allocations
on a server with eight CPUs that is running several CPU-bound applications
in three projects.

The total system-wide project CPU allocations on such a system are shown in the following table.

Project 

Allocation 

Project A 

4% = (1/6 X 2/8)pset1

Project B 

28% = (2/6 X 2/8)pset1+ (2/5 * 4/8)pset2

Project C 

67% = (3/6 X 2/8)pset1+ (3/5 X 4/8)pset2+ (3/3 X 2/8)pset3

These percentages do not match the corresponding amounts of CPU shares that are given to projects. However, within each processor set, the per-project CPU allocation ratios are proportional to their respective shares.

On the same system without processor sets, the distribution of CPU resources would be different, as shown in the following table.

Project 

Allocation 

Project A 

16.66% = (1/6) 

Project B 

33.33% = (2/6) 

Project C 

50% = (3/6) 

Combining FSS With Other Scheduling Classes

By default, the FSS scheduling class uses the same range of priorities (0 to 59) as the timesharing (TS), interactive (IA), and fixed priority (FX) scheduling classes. Therefore, you should avoid having processes from these scheduling classes share the same processor set. A mix of processes in the FSS, TS, IA, and FX classes could result in unexpected scheduling behavior.

With the use of processor sets, you can mix TS, IA, and FX with FSS in one system. However, all the processes that run on each processor set must be in one scheduling class, so they do not compete for the same CPUs. The FX scheduler in particular should not be used in conjunction with the FSS scheduling class unless processor sets are used. This action prevents applications in the FX class from using priorities high enough to starve applications in the FSS class.

You can mix processes in the TS and IA classes in the same processor set, or on the same system without processor sets.

The Solaris system also offers a real-time (RT) scheduler to users with superuser privileges. By default, the RT scheduling class uses system priorities in a different range (usually from 100 to 159) than FSS. Because RT and FSS are using disjoint, or non-overlapping, ranges of priorities, FSS can coexist with the RT scheduling class within the same processor set. However, the FSS scheduling class does not have any control over processes that run in the RT class.

For example, on a four-processor system, a single-threaded RT process can consume one entire processor if the process is CPU bound. If the system also runs FSS, regular user processes compete for the three remaining CPUs that are not being used by the RT process. Note that the RT process might not use the CPU continuously. When the RT process is idle, FSS utilizes all four processors.

You can type the following command to find out which scheduling classes the processor sets are running in and ensure that each processor set is configured to run either TS, IA, FX, or FSS processes.


$ ps -ef -o pset,class | grep -v CLS | sort | uniq
1 FSS
1 SYS
2 TS
2 RT
3 FX

Setting the Scheduling Class for the System

To set the default scheduling class for the system, see How to Make FSS the Default Scheduler Class, Scheduling Class in a Zone, and dispadmin(1M). To move running processes into a different scheduling class, see Configuring the FSS and priocntl(1).

Scheduling Class on a System with Zones Installed

Non-global zones use the default scheduling class for the system. If the system is updated with a new default scheduling class setting, non-global zones obtain the new setting when booted or rebooted.

The preferred way to use FSS in this case is to set FSS to be the system default scheduling class with the dispadmin command. All zones then benefit from getting a fair share of the system CPU resources. See Scheduling Class in a Zone for more information on scheduling class when zones are in use.

For information about moving running processes into a different scheduling class without changing the default scheduling class and rebooting, see Table 27–5 and the priocntl(1) man page.

Commands Used With FSS

The commands that are shown in the following table provide the primary administrative interface to the fair share scheduler.

Command Reference 

Description 

priocntl(1)

Displays or sets scheduling parameters of specified processes, moves running processes into a different scheduling class. 

ps(1)

Lists information about running processes, identifies in which scheduling classes processor sets are running. 

dispadmin(1M)

Sets the default scheduler for the system. Also used to examine and tune the FSS scheduler's time quantum value. 

FSS(7)

Describes the fair share scheduler (FSS).