Programming Interfaces Guide

Chapter 4 Process Scheduler

This chapter describes the scheduling of processes and how to modify scheduling.

Overview of the Scheduler contains an overview of the scheduler and the time-sharing scheduling class. Other scheduling classes are briefly described.
Commands and Interfaces describes the commands and interfaces that modify scheduling.
Interactions With Other Interfaces describes the effects of scheduling changes on kernel processes and certain interfaces.
Performance issues to consider when using these commands or interfaces are covered in Scheduling and System Performance.

The chapter is for developers who need more control over the order of process execution than default scheduling provides. See Multithreaded Programming Guide for a description of multithreaded scheduling.

Overview of the Scheduler

When a process is created, the system assigns a lightweight process (LWP) to the process. If the process is multithreaded, more LWPs might be assigned to the process. An LWP is the object that is scheduled by the UNIX system scheduler, which determines when processes run. The scheduler maintains process priorities that are based on configuration parameters, process behavior, and user requests. The scheduler uses these priorities to determine which process runs next. The six priority classes are real-time, system, interactive (IA), fixed-priority (FX), fair-share (FSS), and time-sharing (TS).

The default scheduling is a time-sharing policy. This policy dynamically adjusts process priorities to balance the response time of interactive processes. The policy also dynamically adjusts priorities to balance the throughput of processes that use a lot of CPU time. The time-sharing class has the lowest priority.

The SunOS 5.10 scheduler also provides a real-time scheduling policy. Real-time scheduling enables the assigning of fixed priorities to specific processes by users. The highest-priority real-time user process always gets the CPU as soon as the process is runnable .

The SunOS 5.10 scheduler also provides a policy for fixed-priority scheduling. Fixed-priority scheduling enables the assignment of fixed priorities to specific processes by users. Fixed-priority scheduling uses the same priority range as the time-sharing scheduling class by default.

A program can be written so that its real-time processes have a guaranteed response time from the system. See Chapter 12, Real-time Programming and Administration for detailed information.

The control of process scheduling provided by real-time scheduling is rarely needed. However, when the requirements for a program include strict timing constraints, real-time processes might be the only way to satisfy those constraints.

Caution –

Careless use of real-time processes can have a dramatic negative effect on the performance of time-sharing processes.

Because changes in scheduler administration can affect scheduler behavior, programmers might also need to know something about scheduler administration. The following interfaces affect scheduler administration:

dispadmin(1M) displays or changes scheduler configuration in a running system.
ts_dptbl(4) and rt_dptbl(4) are tables that contain the time-sharing and real-time parameters that are used to configure the scheduler.

A process inherits its scheduling parameters, including scheduling class and priority within that class, when the process is created. A process changes class only by user request. The system bases its adjustments of a process' priority on user requests and the policy associated with the scheduler class of the process.

In the default configuration, the initialization process belongs to the time-sharing class. Therefore, all user login shells begin as time-sharing processes.

The scheduler converts class-specific priorities into global priorities. The global priority of a process determines when the process runs. The scheduler always runs the runnable process with the highest global priority. Higher priorities run first. A process assigned to the CPU runs until the process sleeps, uses its time slice, or is preempted by a higher-priority process. Processes with the same priority run in sequence, around a circle.

All real-time processes have higher priorities than any kernel process, and all kernel processes have higher priorities than any time-sharing process.

Note –

In a single processor system, no kernel process and no time-sharing process runs while a runnable real-time process exists.

Administrators specify default time slices in the configuration tables. Users can assign per-process time slices to real-time processes.

You can display the global priority of a process with the -cl options of the ps(1) command. You can display configuration information about class-specific priorities with the priocntl(1) command and the dispadmin(1M) command.

The following sections describe the scheduling policies of the six scheduling classes.

Time-Sharing Class

The goal of the time-sharing policy is to provide good response time to interactive processes and good throughput to CPU-bound processes. The scheduler switches CPU allocation often enough to provide good response time, but not so often that the system spends too much time on switching. Time slices are typically a few hundred milliseconds.

The time-sharing policy changes priorities dynamically and assigns time slices of different lengths. The scheduler raises the priority of a process that sleeps after only a little CPU use. For example, a process sleeps when the process starts an I/O operation such as a terminal read or a disk read. Frequent sleeps are characteristic of interactive tasks such as editing and running simple shell commands. The time-sharing policy lowers the priority of a process that uses the CPU for long periods without sleeping.

The time-sharing policy that is the default gives larger time slices to processes with lower priorities. A process with a low priority is likely to be CPU-bound. Other processes get the CPU first, but when a low-priority process finally gets the CPU, that process gets a larger time slice. If a higher-priority process becomes runnable during a time slice, however, the higher-priority process preempts the running process.

Global process priorities and user-supplied priorities are in ascending order: higher priorities run first. The user priority runs from the negative of a configuration-dependent maximum to the positive of that maximum. A process inherits its user priority. Zero is the default initial user priority.

The “user priority limit” is the configuration-dependent maximum value of the user priority. You can set a user priority to any value lower than the user priority limit. With appropriate permission, you can raise the user priority limit. Zero is the user priority limit by default.

You can lower the user priority of a process to give the process reduced access to the CPU. Alternately, with the appropriate permission, raise the user priority to get faster service. The user priority cannot be set to a value that is higher than the user priority limit. Therefore, you must raise the user priority limit before raising the user priority if both have their default values at zero.

An administrator configures the maximum user priority independent of global time-sharing priorities. For example, in the default configuration a user can set a user priority in the –20 to +20 range. However, 60 time-sharing global priorities are configured.

The scheduler manages time-sharing processes by using configurable parameters in the time-sharing parameter table ts_dptbl(4). This table contains information specific to the time-sharing class.

System Class

The system class uses a fixed-priority policy to run kernel processes such as servers and housekeeping processes like the paging daemon. The system class is reserved to the kernel. Users cannot add a process to the system class. Users cannot remove a process from the system class. Priorities for system class processes are set up in the kernel code. The priorities of system processes do not change once established. User processes that run in kernel mode are not in the system class.

Real-time Class

The real-time class uses a scheduling policy with fixed priorities so that critical processes run in predetermined order. Real-time priorities never change except when a user requests a change. Privileged users can use the priocntl(1) command or the priocntl(2) interface to assign real-time priorities.

The scheduler manages real-time processes by using configurable parameters in the real-time parameter table rt_dptbl(4). This table contains information specific to the real-time class.

Interactive Class

The IA class is very similar to the TS class. When used in conjunction with a windowing system, processes have a higher priority while running in a window with the input focus. The IA class is the default class while the system runs a windowing system. The IA class is otherwise identical to the TS class, and the two classes share the same ts_dptbl dispatch parameter table.

Fair-Share Class

The FSS class is used by the Fair-Share Scheduler (FSS(7)) to manage application performance by explicitly allocating shares of CPU resources to projects. A share indicates a project's entitlement to available CPU resources. The system tracks resource usage over time. The system reduces entitlement when usage is heavy. The system increases entitlement when usage is light. The FSS schedules CPU time among processes according to their owners' entitlements, independent of the number of processes each project owns. The FSS class uses the same priority range as the TS and IA classes. See the FSS man page for more details.

Fixed-Priority Class

The FX class provides a fixed-priority preemptive scheduling policy. This policy is used by processes that require user or application control of scheduling priorities but are not dynamically adjusted by the system. By default, the FX class has the same priority range as the TS, IA, and FSS classes. The FX class allows user or application control of scheduling priorities through user priority values assigned to processes within the class. These user priority values determine the scheduling priority of a fixed-priority process relative to other processes within its class.

The scheduler manages fixed-priority processes by using configurable parameters in the fixed-priority dispatch parameter table fx_dptbl(4). This table contains information specific to the fixed-priority class.

Commands and Interfaces

The following figure illustrates the default process priorities.

Figure 4–1 Process Priorities (Programmer's View)

Real-time threads have priority over system threads.
System threads have priority over time-sharing threads. Each class has a separate
run queue.

A process priority has meaning only in the context of a scheduler class. You specify a process priority by specifying a class and a class-specific priority value. The class and class-specific value are mapped by the system into a global priority that the system uses to schedule processes.

A system administrator's view of priorities is different from the view of a user or programmer. When configuring scheduler classes, an administrator deals directly with global priorities. The system maps priorities supplied by users into these global priorities. See System Administration Guide: Basic Administration for more information about priorities.

The ps(1) command with -cel options reports global priorities for all active processes. The priocntl(1) command reports the class-specific priorities that users and programmers use.

The priocntl(1) command and the priocntl(2) and priocntlset(2) interfaces are used to set or retrieve scheduler parameters for processes. Setting priorities generally follows the same sequence for the command and both interfaces:

Specify the target processes.
Specify the scheduler parameters that you want for those processes.
Execute the command or interface to set the parameters for the processes.

Process IDs are basic properties of UNIX processes. See Intro(2) for more information. The class ID is the scheduler class of the process. priocntl(2) works only for the time-sharing and the real-time classes, not for the system class.

`priocntl` Usage

The priocntl(1) utility performs four different control interfaces on the scheduling of a process:

priocntl -l: Displays configuration information
priocntl -d: Displays the scheduling parameters of processes
priocntl -s: Sets the scheduling parameters of processes
priocntl -e: Executes a command with the specified scheduling parameters

The following examples demonstrate the use of priocntl(1).

The -l option for the default configuration produces the following output:

$ priocntl -l
CONFIGURED CLASSES
==================

SYS (System Class)

TS (Time Sharing)
Configured TS User Priority Range -60 through 60

RT (Real Time)
Maximum Configured RT Priority: 59

To display information on all processes, do the following:
$ priocntl -d -i all
To display information on all time-sharing processes:
$ priocntl -d -i class TS
To display information on all processes with user ID 103 or 6626, do the following:
$ priocntl -d -i uid 103 6626
To make the process with ID 24668 a real-time process with default parameters, do the following:
$ priocntl -s -c RT -i pid 24668
To make 3608 RT with priority 55 and a one-fifth second time slice:
$ priocntl -s -c RT -p 55 -t 1 -r 5 -i pid 3608
To change all processes into time-sharing processes, do the following:
$ priocntl -s -c TS -i all
To reduce TS user priority and user priority limit to -10 for uid 1122:
$ priocntl -s -c TS -p -10 -m -10 -i uid 1122
To start a real-time shell with default real-time priority, do the following:
$ priocntl -e -c RT /bin/sh
To run make with a time-sharing user priority of -10, do the following:
$ priocntl -e -c TS -p -10 make bigprog

priocntl(1) includes the interface of nice(1). nice works only on time-sharing processes and uses higher numbers to assign lower priorities. The previous example is equivalent to using nice(1) to set an increment of 10:

$ nice -10 make bigprog

`priocntl` Interface

priocntl(2) manages the scheduling parameters of a process or set of processes. An invocation of priocntl(2) can act on a LWP, on a single process, or on a group of processes. A group of processes can be identified by parent process, process group, session, user, group, class, or all active processes. For more details, see the priocntl man page.

The PC_GETCLINFO command gets a scheduler class name and parameters when given the class ID. This command enables you to write programs that make no assumptions about what classes are configured.

The PC_SETXPARMS command sets the scheduler class and parameters of a set of processes. The idtype and id input arguments specify the processes to be changed.

Interactions With Other Interfaces

Altering the priority of a process in the TS class can affect the behavior of other processes in the TS class. This section identifies ways in which a scheduling change can affect other processes.

Kernel Processes

The kernel's daemon and housekeeping processes are members of the system scheduler class. Users can neither add processes to nor remove processes from this class, nor can users change the priorities of these processes. The command ps -cel lists the scheduler class of all processes. A SYS entry in the CLS column identifies processes in the system class when you run ps(1) with the -f option.

Using `fork` and `exec`

Scheduler class, priority, and other scheduler parameters are inherited across the fork(2) and exec(2) interfaces.

Using `nice`

The nice(1) command and the nice(2) interface work as in previous versions of the UNIX system. These commands enable you to change the priority of a time-sharing process. Use lower numeric values to assign higher time-sharing priorities with these interfaces.

To change the scheduler class of a process or to specify a real-time priority, use priocntl(2). Use higher numeric values to assign higher priorities.

init(1M)

Theinit(1M) process is a special case to the scheduler. To change the scheduling properties of init(1M), init must be the only process specified by idtype and id or by the procset structure.

Scheduling and System Performance

The scheduler determines when and for how long processes run. Therefore, the scheduler's behavior strongly affects a system's performance.

By default, all user processes are time-sharing processes. A process changes class only by a priocntl(2) call.

All real-time process priorities have a higher priority than any time-sharing process. Time-sharing processes or system processes cannot run while any real-time process is runnable. A real-time application that occasionally fails to relinquish control of the CPU can completely lock out other users and essential kernel housekeeping.

Besides controlling process class and priorities, a real-time application must also control other factors that affect its performance. The most important factors in performance are CPU power, amount of primary memory, and I/O throughput. These factors interact in complex ways. The sar(1) command has options for reporting on all performance factors.

Process State Transition

Applications that have strict real-time constraints might need to prevent processes from being swapped or paged out to secondary memory. A simplified overview of UNIX process states and the transitions between states is shown in the following figure.

Figure 4–2 Process State Transition Diagram

A running process can be preempted to memory, where
it is runnable, or sleep in memory. A process in memory can be swapped.

An active process is normally in one of the five states in the diagram. The arrows show how the process changes states.

A process is running if the process is assigned to a CPU. A process is removed from the running state by the scheduler if a process with a higher priority becomes runnable. A process is also preempted if a process of equal priority is runnable when the original process consumes its entire time slice.
A process is runnable in memory if the process is in primary memory and ready to run, but is not assigned to a CPU.
A process is sleeping in memory if the process is in primary memory but is waiting for a specific event before continuing execution. For example, a process sleeps while waiting for an I/O operation to complete, for a locked resource to be unlocked, or for a timer to expire. When the event occurs, a wakeup call is sent to the process. If the reason for its sleep is gone, the process becomes runnable.
When a process' address space has been written to secondary memory, and that process is not waiting for a specific event, the process is runnable and swapped.
If a process is waiting for a specific event and has had its whole address space written to secondary memory, the process is sleeping and swapped.

If a machine does not have enough primary memory to hold all its active processes, that machine must page or swap some address space to secondary memory.
When the system is short of primary memory, the system writes individual pages of some processes to secondary memory but leaves those processes runnable. When a running process, accesses those pages, the process sleeps while the pages are read back into primary memory.
When the system encounters a more serious shortage of primary memory, the system writes all the pages of some processes to secondary memory. The system marks the pages that have been written to secondary memory as swapped. Such processes can only be scheduled when the system scheduler daemon selects these processes to be read back into memory.

Both paging and swapping cause delay when a process is ready to run again. For processes that have strict timing requirements, this delay can be unacceptable.

To avoid swapping delays, real-time processes are never swapped, though parts of such processes can be paged. A program can prevent paging and swapping by locking its text and data into primary memory. For more information, see the memcntl(2) man page. How much memory can be locked is limited by how much memory is configured. Also, locking too much can cause intolerable delays to processes that do not have their text and data locked into memory.

Trade-offs between the performance of real-time processes and the performance of other processes depend on local needs. On some systems, process locking might be required to guarantee the necessary real-time response.