System Administration Guide

Chapter 67 The Scheduler (Reference)

This chapter contains reference information for the SunOS 5.x scheduler. This is a list of the overview information in this chapter.

About the Scheduler

The scheduler (or dispatcher) is the portion of the kernel that controls the allocation of the CPU to processes. It determines when processes run and for how long, depending on their assigned priorities. Priorities are based on scheduling class and process behavior. Four scheduling classes are supported by default: timesharing, system, real-time and interactive.

The scheduler has an overriding effect on the performance of a system.


Note -

The fundamental scheduling entity is the kernel thread. For single-threaded processes, scheduling the kernel thread is synonymous with process scheduling.


The SunOS 5.x scheduler controls the order in which processes run and the amount of CPU time each process may use before another process can run.

The scheduler allocates CPU time to processes according to the scheduling policies defined for each scheduling class. Associated with each scheduling class is a set of priority levels or queues. Ready-to-run processes are moved among these queues. Within a class, you can view these queues as a contiguous set of priority levels. These priority levels are mapped into a set of global scheduling priorities.

The global priority of a process determines when it runs--the scheduler runs the process with the highest global priority that is ready to run. Processes with numerically higher priorities run first, and processes with the same priority run using a round robin scheduling policy.

Once the scheduler assigns a process to a CPU, the process runs until one of the following events occur:

By default, all real-time processes have higher priorities than any system process, and all system processes have higher priorities than any timesharing process.

A process inherits its scheduler parameters from its parent process, including its scheduler class and its priority within that class. A process changes class only from a user request (with the priocntl command or system call). The system manages the priority of a process based on user requests and the policy associated with the scheduling class of the process.

Scheduler Class Policies

The following sections describe the scheduling policies of the three default classes: timesharing, system, and real-time.

Timesharing Class Policies

In the default configuration, the initialization process (init) belongs to the timesharing class. Because processes inherit their scheduler parameters, all user login shells--and consequently the processes run from those shells--begin as timesharing processes.

The goal of the timesharing policy is to provide good response time for interactive processes and good throughput for processes that use a lot of CPU time. The scheduler tries to divide the CPU's time fairly between processes, subject to the priorities associated with the processes. Those with higher priorities get more attention than those with lower priorities. However, to prevent any one job (process) from hogging the CPU, the scheduler can move jobs from high priorities to low priorities and vice versa.

The scheduler switches CPU allocation frequently enough to provide good response time, but not so frequently that it spends too much time doing the switching. Time slices are typically on the order of a few hundredths of a second.

The timesharing policy changes priorities dynamically and assigns time slices of different lengths. Once a process has started, its timesharing priority varies according to how much CPU time it's getting, how much time it's spending in queues, and other factors. The scheduler raises the priority of a process that "sleeps." (A process sleeps, for example, when it starts an I/O operation such as a terminal read or a disk read.) Entering sleep states frequently is characteristic of interactive tasks such as editing and running simple shell commands. On the other hand, the timesharing policy lowers the priority of a process that uses the CPU for long periods without sleeping.

The default timesharing policy gives larger time slices to processes with lower priorities. A process with a low priority is likely to be stuck in the CPU. Other processes get the CPU first, but when a lower-priority process finally gets the CPU, it gets a bigger chunk of time. If a higher-priority process becomes ready to run during a time slice, however, it preempts the running process.

The scheduler manages timesharing processes using parameters in the timesharing parameter table ts_dptbl. This table contains information specific to the timesharing class. It is automatically loaded into core memory from the TS_DPTBL loadable module located in the /kernel/sched directory.

System Class Policies

The system class uses a fixed-priority policy to run kernel processes such as servers, and housekeeping processes such as the page daemon. Their priorities are not dynamically adjusted like timesharing processes. The system class is reserved for use by the kernel, and users may neither add nor remove a process from the system class. Priorities for system-class processes are set up in the kernel code for the kernel processes, and, once established, these priorities do not change. (User processes running in kernel mode are not in the system class.)

Real-Time Class Policies

The SunOS 5.x operating system uses a real-time scheduling policy as well as a timesharing policy. Real-time scheduling allows users to set fixed priorities on a per-process basis, so that critical processes can run in predetermined order. The real-time scheduler never moves jobs between priorities. Real-time priorities change only when a user requests a change (using the priocntl command). Contrast this fixed-priority policy with the timesharing policy, in which the system changes priorities to provide good interactive response time.

The user process with the highest real-time priority always gets the CPU as soon as it can be run, even if other processes are ready to run. An application can be written so that its real-time processes have a guaranteed response time from the operating system.


Note -

As long as there is a real-time process ready to run, no process and no timesharing process runs. Other real-time processes can run only if they have a higher priority. Real-time processes managed carelessly can have a dramatic negative effect on the performance of timesharing processes.


The real-time policy gives higher-priority processes smaller time slices, by default. The higher priorities are allocated to real-time processes that are driven by external events. The operating system must be able to respond instantly to I/O. The lower-priority real-time processes are those that need more computation time. If a process with the highest priority uses up its time slice, it runs again because there is no process with a higher priority to pre-empt it.

The scheduler manages real-time processes by using parameters in the real-time parameter table rt_dptbl. This table contains information specific to the real-time class. It is automatically loaded into core from the RT_DPTBL loadable module located in the /kernel/sched directory.

Scheduler Configuration

This section describes the parameters and tables that control the scheduler configuration. A basic assumption is that your work load is reasonable for your system resources, such as CPU, memory, and I/O. If your resources are inadequate to meet the demands, reconfiguring the scheduler won't help.

You can display or change (fine tune) the scheduler parameters in a running system for both the timesharing and real-time classes by using the dispadmin command. Changes made by the dispadmin command do not survive a reboot. To make permanent changes in scheduler configuration, you must change the scheduler parameter tables in the appropriate loadable module: TS_DPTBL or RT_DPTBL provided in the /kernel/sched directory. See ts_dptbl(4) and rt_dptbl(4) for instructions on replacing these modules.

The primary user command for controlling process scheduling is priocntl(1). With this command, a user can start a process at a specified priority or manipulate the priorities of running processes. You can find out what classes are configured on your system with the priocntl -l command. The primary function call for controlling process scheduling is priocntl(2).

See Chapter 63, Managing Processes (Tasks), for examples of using the priocntl command. See System Interface Guide for a detailed descriptions of real-time programming, and the dispadmin(1M) and priocntl(1) commands.

Default Global Priorities

The following table shows the scheduling order and ranges of global priorities for each scheduler class.

Table 67-1 Scheduling Order and Global Priorities

Scheduling Order 

Global Priority 

Scheduler Class  

First

159  

 

 

.  

 

 

Real-Time  

 

.  

 

 

100  

 

 

99

 

 

.  

 

 

System  

 

.  

 

 

60  

 

 

59

 

 

.  

 

 

Timesharing  

 

.  

 

Last 

 

How Global Priorities Are Constructed

When your operating system is built, it constructs the global priorities from the tunable parameters and scheduler parameter tables described in the following sections. There isn't any command that will show you this complete global priority table. However, the dispadmin command displays the priorities (from 0 to n) specific to the real-time and timesharing classes. You can display the global priority of an active process with the ps -cl command.

Initial Global Priorities of Processes

A timesharing process inherits its scheduling class and priority from its parent process. The init process is the first process to entire the timesharing class.

System processes initially run with a priority that depends on the process's importance (which is programmed into the kernel). The most important system processes start with a priority at or near the top of the system class range.

Tunable Parameters

This section describes the tunable parameters that control scheduler configuration. To change any of these kernel parameters, enter a line in the /etc/system file with the format:

set parameter=value

See system(4) for more information.

The parameters described in this section control aspects of process scheduling, timesharing policy, and real-time policy.

The initial priority of a real-time process is determined when the process is put into the real-time scheduling class.

The priocntl -p command is used to specify the relative priority within the real-time class.

This is added to the base priority of the real-time class, which by default is 100. For example:

priocntl -e -c RT -p 20 command

This command would put the command into execution at a real-time priority of 120.

Process Scheduling Parameters

The following kernel parameters control aspects of process scheduling:

Timesharing Policy

The following parameter is specified in the TS loadable module, which controls the timesharing policy:

Real-Time Policy

The following parameter is specified in the RT loadable module, which controls the real-time policy:

Scheduler Parameter Tables

The scheduler tables are described in Table 67-2.

Table 67-2 Scheduler Parameters

Table 

Used to Manage ... 

rt_dptbl

Real-time processes 

ts_dptbl

Timesharing processes 

ts_kmdpris

Sleeping timesharing processes that own critical resources 

These tables define scheduling policy by setting the scheduling parameters to use for real-time and timesharing processes. The parameters specify how much CPU time processes get at different priority levels.

Default time slices for the priority levels are specified in the ts_dptbl and rt_dptbl configuration tables, which are defined in the TS_DPTBL and RT_DPTBL loadable modules. These modules are automatically loaded from the /kernel/sched directory into the kernel as needed.

The time slices are specified in units (quanta) with a resolution defined by a "resolution" line. The default resolution is 1000, which means the time quantum values are interpreted as milliseconds. This is derived from the reciprocal of the specified resolution in seconds. The quanta are rounded up to the next integral multiple of the system clock's resolution in clock ticks. (The system clock ticks HZ times per second, where HZ is a hardware-dependent constant defined in the param.h header file.) For example, if the clock tick is 10 milliseconds, 42 quanta is rounded up to 50 milliseconds.

Timesharing Parameter Table

A default version of the ts_dptb, is delivered with the system in /kernel/sched/TS_DPTBL. The default configuration has 60 timesharing priorities.

The dispadmin -c TS -g command displays a sample ts_dptbl table.


$ dispadmin -c TS -g
# Time Sharing Dispatcher Configuration
RES=1000
 
# ts_quantum  ts_tqexp  ts_slpret  ts_maxwait ts_lwait  PRIORITY LEVEL
       200         0        50           0        50        #     0
       200         0        50           0        50        #     1
       200         0        50           0        50        #     2
       200         0        50           0        50        #     3
       200         0        50           0        50        #     4
       200         0        50           0        50        #     5
       200         0        50           0        50        #     6
       200         0        50           0        50        #     7
       200         0        50           0        50        #     8
       200         0        50           0        50        #     9
       160         0        51           0        51        #    10
       160         1        51           0        51        #    11
       160         2        51           0        51        #    12
       160         3        51           0        51        #    13
       160         4        51           0        51        #    14
       160         5        51           0        51        #    15
       160         6        51           0        51        #    16
       160         7        51           0        51        #    17
       160         8        51           0        51        #    18
       160         9        51           0        51        #    19
       120        10        52           0        52        #    20
       120        11        52           0        52        #    21
       120        12        52           0        52        #    22
       120        13        52           0        52        #    23
       120        14        52           0        52        #    24
       120        15        52           0        52        #    25
       120        16        52           0        52        #    26
       120        17        52           0        52        #    27
       120        18        52           0        52        #    28
       120        19        52           0        52        #    29
        80        20        53           0        53        #    30
        80        21        53           0        53        #    31
        80        22        53           0        53        #    32
        80        23        53           0        53        #    33
        80        24        53           0        53        #    34
        80        25        54           0        54        #    35
        80        26        54           0        54        #    36
        80        27        54           0        54        #    37
        80        28        54           0        54        #    38
        80        29        54           0        54        #    39
        40        30        55           0        55        #    40
        40        31        55           0        55        #    41
        40        32        55           0        55        #    42
        40        33        55           0        55        #    43
        40        34        55           0        55        #    44
        40        35        56           0        56        #    45
        40        36        57           0        57        #    46
        40        37        58           0        58        #    47
        40        38        58           0        58        #    48
        40        39        58           0        59        #    49
        40        40        58           0        59        #    50
        40        41        58           0        59        #    51
        40        42        58           0        59        #    52
        40        43        58           0        59        #    53
        40        44        58           0        59        #    54
        40        45        58           0        59        #    55
        40        46        58           0        59        #    56
        40        47        58           0        59        #    57
        40        48        58           0        59        #    58
        20        49        59       32000        59        #    59
$ 

Table 67-3 describes the fields in the ts_dptbl table.

Table 67-3 Fields in the ts_dptbl Table

Field Name 

Description 

ts_quantum

(runtime) 

Contains the time slice (in milliseconds by default) that a process at a given priority is allowed to run before the scheduler reevaluates its priority. If the process uses up its entire time slice, it is put on the expired-level (ts_tqexp) queue. Time slices run from 40 milliseconds for the highest priority (59) to 200 milliseconds (0) for the lowest priority.

ts_tqexp

(expired level) 

Determines the new process priority for a process whose time slice has expired. If a process uses its whole time slice without sleeping, the scheduler changes its priority to the level indicated in the ts_tqexp column. The expired level is lower than the prior level. For example, a process with a priority of 30 that used up its time slice (80 milliseconds) will get a new priority of 20.

ts_slpret

(sleep level)  

Determines the priority assigned to a process when it returns from sleep. A process may sleep during certain system calls or when waiting for I/O (for example, servicing a page fault or waiting for a lock). When a process returns from sleep, it is always a given a priority of 59.

ts_maxwait

(wait time)  

Specifies the number of seconds a process will be left on a dispatch queue without its time slice expiring. If it does not use its time slice (in ts_maxwait seconds), its new priority will be set to ts_lwait. This is used to prevent a low-priority process from being starved of CPU time.

ts_lwait

(wait level)  

Contains the new priority for a ready-to-run process that has exceeded the maximum wait time (ts_maxwait) without getting its full time slice.

PRIORITY LEVEL

Contains global priorities. Processes put in queues at the higher priority levels run first. The global priorities run from a high of 59 to a low of 0. This is the only column in the table that is not tunable.

Real-Time Parameter Table

A default version of rt_dptbl is delivered with the system in the /kernel/sched/RT_DPTBL loadable module.

The dispadmin -c RT -g command displays rt_dptbl information similar to the following.


$ dispadmin -c RT -g
# Real Time Dispatcher Configuration
RES=1000
 
# TIME QUANTUM                    PRIORITY
# (rt_quantum)                      LEVEL
      1000                    #        0
      1000                    #        1
      1000                    #        2
      1000                    #        3
      1000                    #        4
      1000                    #        5
      1000                    #        6
      1000                    #        7
      1000                    #        8
      1000                    #        9
       800                    #       10
       800                    #       11
       800                    #       12
       800                    #       13
       800                    #       14
       800                    #       15
       800                    #       16
       800                    #       17
       800                    #       18
       800                    #       19
       600                    #       20
       600                    #       21
       600                    #       22
       600                    #       23
       600                    #       24
       600                    #       25
       600                    #       26
       600                    #       27
       600                    #       28
       600                    #       29
       400                    #       30
       400                    #       31
       400                    #       32
       400                    #       33
       400                    #       34
       400                    #       35
       400                    #       36
       400                    #       37
       400                    #       38
       400                    #       39
       200                    #       40
       200                    #       41
       200                    #       42
       200                    #       43
       200                    #       44
       200                    #       45
       200                    #       46
       200                    #       47
       200                    #       48
       200                    #       49
       100                    #       50
       100                    #       51
       100                    #       52
       100                    #       53
       100                    #       54
       100                    #       55
       100                    #       56
       100                    #       57
       100                    #       58
       100                    #       59
$
 

Table 67-4 describes the fields in the real-time parameter table.

Table 67-4 Fields in the rt_dptbl Table

Field Name 

Description 

rt_glbpri

Contains global priorities. Processes put in queues at the higher priority levels run first. Note that the dispadmin command, which you can use to display the table, shows only the relative priorities within the class, and not the global priorities. This column cannot be changed with dispadmin.

rt_qntm

Describes the default time slice (in milliseconds) a process with this priority (rt_glbpri) may run before the scheduler gives another process a chance. The time slice for a real-time process can be specified with the priocntl -t command.

Kernel-Mode Parameter Table

The scheduler uses the kernel-mode parameter table, ts_kmdpris, to manage sleeping timesharing processes. A default version of ts_kmdpris is delivered with the system, in the /kernel/sched/TS_DPTBL loadable module, and is automatically built into the kernel as part of system configuration. See ts_dptbl(4) for more information.


Note -

The kernel assumes that it has at least 40 priorities in ts_kmdpris. It panics if it does not.


The kernel-mode parameter table is a one-dimensional array of global priorities from 60 through 99. If a process owns a critical resource, it is assigned a kernel priority so that it can release the resource as soon as possible. Critical resources are:

Prior to SunOS 5.3, processes were assigned kernel priorities while they were asleep. This ensured that the resources they were waiting for were not paged out before they had a chance to execute again.

In order to do this after SunOS 5.3, processes return from sleep with the highest time-share priority (59).