System Interface Guide

Chapter 3 Process Scheduler

This chapter describes the scheduling of processes. See the Multithreaded Programming Guide for a description of multithreaded scheduling. This chapter is for programmers who need more control over the order of process execution than default scheduling provides.

Overview of the Scheduler

When a process is created, it is assigned a Light Weight Process (LWP). (If the process is multithreaded, it might be assigned more LWPs.) An LWP is the object that is actually scheduled by the UNIX system scheduler, which determines when processes run. The scheduler maintains process priorities based on configuration parameters, process behavior, and user requests. It uses these priorities to let processes run.

The default scheduling is a time-sharing policy. This policy adjusts process priorities dynamically to balance the response time of interactive processes and the throughput of processes that use a lot of CPU time.

The SunOS 5.8 scheduler also provides a real-time scheduling policy. Real-time scheduling lets users set fixed priorities of specific processes. The highest-priority real-time user process always gets the CPU as soon as the process is runnable, even if system processes are runnable.

A program can be written so that its real-time processes have a guaranteed response time from the system. See Chapter 8, Real-time Programming and Administration for detailed information.

The control of process scheduling provided by real-time scheduling is rarely needed and can cause more problems than it solves. However, when the requirements for a program include strict timing constraints, real-time processes might be the only way to satisfy those constraints.

Note -

Careless use of real-time processes can have a dramatic negative effect on the performance of time-sharing processes.

Because changes in scheduler administration can affect scheduler behavior, programmers might also need to know something about scheduler administration. The manual pages affecting scheduler administration are:

dispadmin(1M) tells how to change scheduler configuration in a running system.
ts_dptbl(4) and rt_dptbl(4) describe the time-sharing and real-time parameter tables that are used to configure the scheduler.

Figure 3-1 shows how the SunOS 5.8 process scheduler works:

Figure 3-1 SunOS 5.8 Process Scheduler

When a process is created, it inherits its scheduling parameters, including scheduling class and a priority within that class. A process changes class only by user request. The system manages the priority of a process based on user requests and the policy associated with the scheduler class of the process.

In the default configuration, the initialization process belongs to the time-sharing class. So, all user login shells begin as time-sharing processes.

The scheduler converts class-specific priorities into global priorities. The global priority of a process determines when it runs--the scheduler always runs the runnable process with the highest global priority. Numerically higher priorities run first. Once the scheduler assigns a process to the CPU, the process runs until it sleeps, uses its time slice, or is preempted by a higher-priority process. Processes with the same priority run round-robin.

All real-time processes have higher priorities than any kernel process, and all kernel processes have higher priorities than any time-sharing process.

Note -

As long as there is a runnable real-time process and assuming a single processor system, no kernel process and no time-sharing process runs.

Administrators specify default time slices in the configuration tables and users can assign per-process time slices to real-time processes.

You can display the global priority of a process with the -cl options of the ps(1) command. You can display configuration information about class-specific priorities with the priocntl(1) command and the dispadmin(1M) command.

The following sections describe the scheduling policies of the three default classes.

Time-Sharing Class

The goal of the time-sharing policy is to provide good response time to interactive processes and good throughput to CPU-bound processes. The scheduler switches CPU allocation often enough to provide good response time, but not so often that it spends too much time on switching. Time slices are typically a few hundred milliseconds.

The time-sharing policy changes priorities dynamically and assigns time slices of different lengths. The scheduler raises the priority of a process that sleeps after only a little CPU use (a process sleeps, for example, when it starts an I/O operation such as a terminal read or a disk read). Frequent sleeps are characteristic of interactive tasks such as editing and running simple shell commands. The time-sharing policy lowers the priority of a process that uses the CPU for long periods without sleeping.

The default time-sharing policy gives larger time slices to processes with lower priorities. A process with a low priority is likely to be CPU-bound. Other processes get the CPU first, but when a low-priority process finally gets the CPU, it gets a bigger chunk of time. If a higher-priority process becomes runnable during a time slice, however, it preempts the running process.

Global process priorities and user-supplied priorities are in ascending order: numerically higher priorities run first. The user priority runs from the negative of a configuration-dependent maximum to the positive of that maximum. A process inherits its user priority. Zero is the default initial user priority.

The "user priority limit" is the configuration-dependent maximum value of the user priority. You can set a user priority to any value below the user priority limit. With appropriate permission, you can raise the user priority limit. Zero is the default user priority limit.

You can lower the user priority of a process to give the process reduced access to the CPU or, with the appropriate permission, raise the user priority to get better service. Because you cannot set the user priority above the user priority limit, you must raise the user priority limit before you raise the user priority, if both have their default values at zero.

An administrator configures the maximum user priority independent of global time-sharing priorities. In the default configuration, for example, a user can set a user priority only in the range from -20 to +20, but 60 time-sharing global priorities are configured.

The scheduler manages time-sharing processes using configurable parameters in the time-sharing parameter table ts_dptbl(4). This table contains information specific to the time-sharing class.

System Class

The system class uses a fixed-priority policy to run kernel processes such as servers and housekeeping processes like the paging daemon. The system class is reserved to the kernel. Users can neither add nor remove a process from the system class. Priorities for system class processes are set up in the kernel code. Once established, the priorities of system processes do not change. (User processes running in kernel mode are not in the system class.)

Real-time Class

The real-time class uses a fixed-priority scheduling policy so that critical processes run in predetermined order. Real-time priorities never change except when a user requests a change. Privileged users can use the priocntl(1) command or the priocntl(2) function to assign real-time priorities.

The scheduler manages real-time processes using configurable parameters in the real-time parameter table rt_dptbl(4). This table contains information specific to the real-time class.

Commands and Functions

Figure 3-2 illustrates the default process priorities.

Figure 3-2 Process Priorities (Programmer's View)

A process priority has meaning only in the context of a scheduler class. You specify a process priority by specifying a class and a class-specific priority value. The class and class-specific value are mapped by the system into a global priority that the system uses to schedule processes.

A system administrator's view of priorities is different from that of a user or programmer. When configuring scheduler classes, an administrator deals directly with global priorities. The system maps priorities supplied by users into these global priorities. See System Administration Guide, Volume I for more information about priorities.

The ps(1)command with -cel options reports global priorities for all active processes. The priocntl(1) command reports the class-specific priorities that users and programmers use.

The priocntl(1) command and the priocntl(2) and priocntlset(2) functions set or retrieve scheduler parameters for processes. Setting priorities is generally the same for all three functions:

Specify the target processes.
Specify the scheduler parameters you want for those processes.
Do the command or function to set the parameters for the processes.

These IDs are basic properties of UNIX processes. (See Intro(2).) The class ID is the scheduler class of the process. priocntl(2) works only for the time-sharing and the real-time classes, not for the system class.

Thepriocntl(1) Command

The priocntl(1) utility performs four different control functions on the scheduling of a process:

`priocntl -l`	displays configuration information
`priocntl -d`	displays the scheduling parameters of processes
`priocntl -s`	sets the scheduling parameters of processes
`priocntl -e`	executes a command with the specified scheduling parameters

The following are some examples of using priocntl(1).

The output of the -l option for the default configuration is:

$ priocntl -d -i all
CONFIGURED CLASSES
==================

SYS (System Class)

TS (Time Sharing)
Configured TS User Priority Range -20 through 20

RT (Real Time)
Maximum Configured RT Priority: 59

An example of displaying information on all processes:

$ priocntl -d -i all

An example of displaying information on all time-sharing processes:

$ priocntl -d -i class TS

An example of displaying information on all processes with user ID 103 or 6626:

$ priocntl -d -i uid 103 6626

An example of making the process with ID 24668 a real-time process with default parameters:

$ priocntl -s -c RT -i pid 24668

An example of making 3608 RT with priority 55 and a one-fifth second time slice:

$ priocntl -s -c RT -p 55 -t 1 -r 5 -i pid 3608

An example of changing all processes into time-sharing processes:

$ priocntl -s -c TS -i all

For uid 1122, reduce TS user priority and user priority limit to -10:

$ priocntl -s -c TS -p -10 -m -10 -i uid 1122

An example of starting a real-time shell with default real-time priority:

$ priocntl -e -c RT /bin/sh

An example of running make with a time-sharing user priority of -10:

$ priocntl -e -c TS -p -10 make bigprog

priocntl(1) subsumes the function of nice(1). nice works only on time-sharing processes and uses higher numbers to assign lower priorities. The example above is equivalent to using nice(1) to set an "increment" of 10:

$ nice -10 make bigprog

Thepriocntl(2) Function

priocntl(2) gets or sets the scheduling parameters of a process or set of processes much as the priocntl(1) utility does for a process. An invocation of priocntl(2) can act on a LWP, on a single process, or on a group of processes. A group of processes can be identified by parent process, process group, session, user, group, class, or all active processes. The manual page contains the details of its use.

An example of using priocntl(2) to do the equivalent of % priocntl -l is in Appendix A, Full Code Examples.

The PC_GETCLINFO command gets a scheduler class name and parameters given the class ID. This command makes it easy to write programs that make no assumptions about what classes are configured. An example of using priocntl(2) with PC_GETCLINFO to get the class name of a process based on the process ID is in Example A-2.

The PC_SETPARMS command sets the scheduler class and parameters of a set of processes. The idtype and id input arguments specify the processes to be changed. Example A-3 provides an example of using priocntl(2) with the PC_SETPARMS command to convert a time-share process into a real-time process.

The priocntlset(2) Function

The priocntlset(2) function changes scheduler parameters of a set of processes, like priocntl(2). priocntlset(2) has the same command set as priocntl(2). The cmd and arg input arguments are the same. But while priocntl(2) applies to a set of processes specified by a single idtype/id pair, priocntlset(2) applies to a set of processes that results from a logical combination of two idtype/id pairs. Again, refer to the manual page for details.

An example of using priocntlset(2) to change the priority of a real-time processes without changing time-sharing processes with the same user ID to real-time processes is in Example A-4.

Interaction With Other Functions

Kernel Processes

The kernel's daemon and housekeeping processes are assigned to the system scheduler class. Users can neither add processes to nor remove processes from this class, nor can they change the priorities of these processes. The command ps -cel lists the scheduler class of all processes. Processes in the system class are identified by a SYS entry in the CLS column.

fork(2) and exec(2)

Scheduler class, priority, and other scheduler parameters are inherited across the fork(2) and exec(2) functions.

nice(2)

The nice(1) command and the nice(2) function work as in previous versions of the UNIX system. They let you change the priority of a time-sharing process. Use lower numeric values to assign higher time-sharing priorities with these functions.

To change the scheduler class of a process or to specify a real-time priority, you must use one of the priocntl functions. Use higher numeric values to assign higher priorities with the priocntl(2) functions.

init(1M)

Theinit(1M) process is a special case to the scheduler. To change the scheduling properties of init(1M), init must be the only process specified by idtype and id or by the procset structure.

Performance

Because the scheduler determines when and for how long processes run, it has an overriding importance in the performance and perceived performance of a system.

By default, all user processes are time-sharing processes. A process changes class only by a priocntl(2) call.

All real-time process priorities have a higher priority than any time-sharing process. As long as any real-time process is runnable, no time-sharing process or system process ever runs. So if a real-time application fails to relinquish control of the cpu occasionally, it can completely lock out other users and essential kernel housekeeping.

Besides controlling process class and priorities, a real-time application must also control several other factors that influence its performance. The most important factors in performance are CPU power, amount of primary memory, and I/O throughput. These factors interact in complex ways. The sar(1) command has options for reporting on all performance factors.

Process State Transition

Applications that have strict real-time constraints might need to prevent processes from being swapped or paged out to secondary memory. Here's a simplified overview of UNIX process states and the transitions between states:

Figure 3-3 Process State Transition Diagram

An active process is normally in one of the five states in the diagram. The arrows show how it changes states.

A process is running if it is assigned to a CPU. A process is preempted--that is, removed from the running state--by the scheduler if a process with a higher priority becomes runnable. A process is also preempted if it consumes its entire time slice and a process of equal priority is runnable.

A process is runnable in memory if it is in primary memory and ready to run, but is not assigned to a CPU.

A process is sleeping in memory if it is in primary memory but is waiting for a specific event before it can continue execution. For example, a process is sleeping if it is waiting for an I/O operation to complete, for a locked resource to be unlocked, or for a timer to expire. When the event occurs, the process is sent a wake up; if the reason for its sleep is gone, the process becomes runnable.

A process is runnable and swapped if it is not waiting for a specific event but has had its whole address space written to secondary memory to make room in primary memory for other processes.

A process is sleeping and swapped if it is both waiting for a specific event and has had its whole address space written to secondary memory to make room in primary memory for other processes.

If a machine does not have enough primary memory to hold all its active processes, it must page or swap some address space to secondary memory.

When the system is short of primary memory, it writes individual pages of some processes to secondary memory but still leaves those processes runnable. When a process runs, if it accesses those pages, it must sleep while the pages are read back into primary memory.

When the system gets into a more serious shortage of primary memory, it writes all the pages of some processes to secondary memory and marks those processes as swapped. Such processes get back into a state where they can be scheduled only by being chosen by the system scheduler daemon process, then read back into memory.

Both paging and swapping cause delay when a process is ready to run again. For processes that have strict timing requirements, this delay can be unacceptable.

To avoid swapping delays, real-time processes are never swapped, though parts of them can be paged. A program can prevent paging and swapping by locking its text and data into primary memory. For more information, see memcntl(2). How much memory can be locked is limited by how much memory is configured. Also, locking too much can cause intolerable delays to processes that do not have their text and data locked into memory.

Trade-offs between performance of real-time processes and performance of other processes depend on local needs. On some systems, process locking might be required to guarantee the necessary real-time response.

Software Latencies

See "Dispatch Latency" for information about latencies in real-time applications.