System Interface Guide

Scheduling

Real-time scheduling constraints are necessary to manage data acquisition or process control hardware. The real-time environment requires that a process be able to react to external events in a bounded amount of time. Such constraints can exceed the capabilities of a kernel designed to provide a "fair" distribution of the processing resources to a set of time-sharing processes.

This section describes the SunOS 5.0 through 5.8 real-time scheduler, its priority queue, and how to use system calls and utilities that control scheduling.

Dispatch Latency

The most significant element in scheduling behavior for real-time applications is the provision of a real-time scheduling class. The standard time-sharing scheduling class is not suitable for real-time applications because this scheduling class treats every process equally and has a limited notion of priority. Real-time applications require a scheduling class in which process priorities are taken as absolute and are changed only by explicit application operations.

The term dispatch latency describes the amount of time it takes for a system to respond to a request for a process to begin operation. With a scheduler written specifically to honor application priorities, real-time applications can be developed with a bounded dispatch latency.

Figure 8-2 illustrates the amount of time it takes an application to respond to a request from an external event.

Figure 8-2 Application Response Time.

The overall application response time is composed of the interrupt response time, the dispatch latency, and the time it takes the application itself to determine its response.

The interrupt response time for an application includes both the interrupt latency of the system and the device driver's own interrupt processing time. The interrupt latency is determined by the longest interval that the system must run with interrupts disabled; this is minimized in SunOS 5.0 through 5.8 using synchronization primitives that do not commonly require a raised processor interrupt level.

During interrupt processing, the driver's interrupt routine wakes up the high priority process and returns when finished. The system detects that a process with higher priority than the interrupted process in now dispatchable and arranges to dispatch that process. The time to switch context from a lower priority process to a higher priority process is included in the dispatch latency time.

Figure 8-3 illustrates the internal dispatch latency/application response time of a system, defined in terms of the amount of time it takes for a system to respond to an internal event. The dispatch latency of an internal event represents the amount of time required for one process to wake up another higher priority process, and for the system to dispatch the higher priority process.

The application response time is the amount of time it takes for a driver to wake up a higher priority process, have a low priority process release resources, reschedule the higher priority task, calculate the response, and dispatch the task.

Note -

Interrupts can arrive and be processed during the dispatch latency interval. This processing increases the application response time, but is not attributed to the dispatch latency measurement, and so is not bounded by the dispatch latency guarantee.

Figure 8-3 Internal Dispatch Latency

With the new scheduling techniques provided with real-time SunOS 5.0 through 5.8, the system dispatch latency time is within specified bounds. As you can see in the table below, dispatch latency improves with a bounded number of processes.

Table 8-1 Real-time System Dispatch Latency with SunOS 5.0 through 5.8


Workstation	Bounded Number of Processes	Arbitrary Number of Processes
SPARCstation 2	<0.5 milliseconds in a system with fewer than 16 active processes	1.0 milliseconds
SPARCstation 5	<0.3 millisecond	0.3 millisecond
Ultra 1-167	<0.15 millisecond	<0.15 millisecond

Tests for dispatch latency and experience with such critical environments as manufacturing and data acquisition have proven that SunOS 5.8 is an effective platform for the development of real-time applications. (These examples are not of current products.)

Scheduling Classes

The SunOS 5.0 through 5.8 kernel dispatches processes by priority. The scheduler (or dispatcher) supports the concept of scheduling classes. Classes are defined as Real-time (RT), System (sys), and Time-Sharing (TS). Each class has a unique scheduling policy for dispatching processes within its class.

The kernel dispatches highest priority processes first. By default, real-time processes have precedence over sys and TS processes, but administrators can configure systems so that TS and RT processes have overlapping priorities.

Figure 8-4 illustrates the concept of classes as viewed by the SunOS 5.0 through 5.8 kernel.

Figure 8-4 Dispatch Priorities for Scheduling Classes

At highest priority are the hardware interrupts; these cannot be controlled by software. The interrupt processing routines are dispatched directly and immediately from interrupts, without regard to the priority of the current process.

Real-time processes have the highest default software priority. Processes in the RT class have a priority and time quantum value. RT processes are scheduled strictly on the basis of these parameters. As long as an RT process is ready to run, no SYS or TS process can run. Fixed priority scheduling allows critical processes to run in a predetermined order until completion. These priorities never change unless an application changes them.

An RT class process inherits the parent's time quantum, whether finite or infinite. A process with a finite time quantum runs until the time quantum expires or the process terminates, blocks (while waiting for an I/O event), or is preempted by a higher priority runnable real-time process. A process with an infinite time quantum ceases execution only when it terminates, blocks, or is preempted.

The SYS class exists to schedule the execution of special system processes, such as paging, STREAMS, and the swapper. It is not possible to change the class of a process to the SYS class. The SYS class of processes has fixed priorities established by the kernel when the processes are started.

At lowest priority are the time-sharing (TS) processes. TS class processes are scheduled dynamically, with a few hundred milliseconds for each time slice. The TS scheduler switches context in round-robin fashion often enough to give every process an equal opportunity to run, depending upon its time slice value, its process history (when the process was last put to sleep), and considerations for CPU utilization. Default time-sharing policy gives larger time slices to processes with lower priority.

A child process inherits the scheduling class and attributes of the parent process through fork(2). A process' scheduling class and attributes are unchanged by exec(2).

Different algorithms dispatch each scheduling class. Class dependent routines are called by the kernel to make decisions about CPU process scheduling. The kernel is class-independent, and takes the highest priority process off its queue. Each class is responsible for calculating a process' priority value for its class. This value is placed into the dispatch priority variable of that process.

As Figure 8-5 illustrates, each class algorithm has its own method of nominating the highest priority process to place on the global run queue.

Figure 8-5 The Kernel Dispatch Queue

Each class has a set of priority levels that apply to processes in that class. A class-specific mapping maps these priorities into a set of global priorities. It is not required that a set of global scheduling priority maps start with zero, nor that they be contiguous.

By default, the global priority values for time-sharing (TS) processes range from -20 to +20, mapped into the kernel from 0-40, with temporary assignments as high as 99. The default priorities for real-time (RT) processes range from 0-59, and are mapped into the kernel from 100 to 159. The kernel's class-independent code runs the process with the highest global priority on the queue.

Dispatch Queue

The dispatch queue is a linear-linked list of processes with the same global priority. Each process is invoked with class specific information attached to it. A process is dispatched from the kernel dispatch table based upon its global priority.

Dispatching Processes

When a process is dispatched, the process' context is mapped into memory along with its memory management information, its registers, and its stack. Then execution begins. Memory management information is in the form of hardware registers containing data needed to perform virtual memory translations for the currently running process.

Preemption

When a higher priority process becomes dispatchable, the kernel interrupts its computation and forces the context switch, preempting the currently running process. A process can be preempted at any time if the kernel finds that a higher priority process is now dispatchable.

For example, suppose that process A performs a read from a peripheral device. Process A is put into the sleep state by the kernel. The kernel then finds that a lower priority process B is runnable, so process B is dispatched and begins execution. Eventually, the peripheral device interrupts, and the driver of the device is entered. The device driver makes process A runnable and returns. Rather than returning to the interrupted process B, the kernel now preempts B from processing and resumes execution of the awakened process A.

Another interesting situation occurs when several processes contend for kernel resources. When a lower priority process releases a resource for which a higher priority real-time process is waiting, the kernel immediately preempts the lower priority process and resumes execution of the higher priority process.

Kernel Priority Inversion

Priority inversion occurs when a higher priority process is blocked by one or more lower priority processes for a long time. The use of synchronization primitives such as mutual-exclusion locks in the SunOS 5.0 through 5.8 kernel can lead to priority inversion.

A process is blocked when it must wait for one or more processes to relinquish resources. If blocking continues, it can lead to missed deadlines, even for low levels of utilization.

The problem of priority inversion has been addressed for mutual-exclusion locks for the SunOS 5.0 through 5.8 kernel by implementing a basic priority inheritance policy. The policy states that a lower priority process inherits the priority of a higher priority process when the lower priority process blocks the execution of the higher priority process. This places an upper bound on the amount of time a process can remain blocked. The policy is a property of the kernel's behavior, not a solution that a programmer institutes through system calls or function execution. User-level processes can still exhibit priority inversion, however.

User Priority Inversion

This issue and the means to deal with it are discussed in "Mutual Exclusion Lock Attributes" in Multithreaded Programming Guide.

Function Calls That Control Scheduling

priocntl(2)

Control over scheduling of active classes is done with priocntl(2). Class attributes are inherited through fork(2) and exec(2), along with scheduling parameters and permissions required for priority control. This is true for both the RT and the TS classes.

The priocntl(2) function is the interface for specifying a real-time process, a set of processes, or a class to which the system call applies. priocntlset(2) also provides the more general interface for specifying an entire set of processes to which the system call applies.

The command arguments of priocntl(2) can be one of: PC_GETCID, PC_GETCLINFO, PC_GETPARMS, or PC_SETPARMS. The real or effective ID of the calling process must match that of the affected processes, or must have super-user privilege.

`PC_GETCID`	This command takes the name field of a structure that contains a recognizable class name (RT for real-time and TS for time-sharing). The class ID and an array of class attribute data are returned.
`PC_GETCLINFO`	This command takes the ID field of a structure that contains a recognizable class identifier. The class name and an array of class attribute data are returned.
`PC_GETPARMS`	This command returns the scheduling class identifier and/or the class specific scheduling parameters of one of the specified processes. Even though `idtype` & `id` might specify a big set, `PC_GETPARMS` returns the parameter of only one process. It is up to the class to select which one.
`PC_SETPARMS`	This command sets the scheduling class and/or the class specific scheduling parameters of the specified process or processes.

sched_get_priority_max(3RT)

Returns the maximum values for the specified policy.

sched_get_priority_min(3RT)

Returns the minimum values for the specified policy (see sched_get_priority_max(3R)).

sched_rr_get_interval(3RT)

Updates the specified timespec structure to the current execution time limit (see sched_get_priority_max(3RT)).

sched_setparam(3RT), sched_getparam(3RT)

Sets or gets the scheduling parameters of the specified process.

sched_yield(3RT)

Blocks the calling process until it returns to the head of the process list.

Utilities That Control Scheduling

The administrative utilities that control process scheduling are dispadmin(1M) and priocntl(1). Both these utilities support the priocntl(2) system call with compatible options and loadable modules. These utilities provide system administration functions that control real-time process scheduling during runtime.

priocntl(1)

The priocntl(1) command sets and retrieves scheduler parameters for processes.

dispadmin(1M)

The dispadmin(1M) utility displays all current process scheduling classes by including the -l command line option during runtime. Process scheduling can also be changed for the class specified after the -c option, using RT as the argument for the real-time class.

The options shown in Table 8-2 are also available.

Table 8-2 Class Options for the dispadmin(1M) Utility


Option	Meaning
`-l`	Lists scheduler classes currently configured
`-c`	Specifies the class whose parameters are to be displayed or changed
`-g`	Gets the dispatch parameters for the specified class
`-r`	Used with -g, specifies time quantum resolution
`-s`	Specifies a file where values can be located

A class specific file containing the dispatch parameters can also be loaded during runtime. Use this file to establish a new set of priorities replacing the default values established during boot time. This class specific file must assert the arguments in the format used by the -g option. Parameters for the RT class are found in the rt_dptbl(4), and are listed in the example at the end of this section.

To add an RT class file to the system, the following modules must be present:

An rt_init() routine in the class module that loads the rt_dptbl(4)
An rt_dptbl(4) module that provides the dispatch parameters and a routine to return pointers to config_rt_dptbl
The dispadmin(1M) executable

Load the class specific module with the following command, where module_name is the class specific module:
# modload /kernel/sched/module_name

Invoke the dispadmin(1M) command:
# dispadmin -c RT -s file_name
The file must describe a table with the same number of entries as the table that is being overwritten.

Configuring Scheduling

Associated with both scheduling classes is a parameter table, rt_dptbl(4), and ts_dptbl(4). These tables are configurable by using a loadable module at boot time, or with dispadmin(1M) during runtime.

Dispatcher Parameter Table

The in-core table for real-time establishes the properties for RT scheduling. The rt_dptbl(4) structure consists of an array of parameters, struct rt_dpent_t, one for each of the n priority levels. The properties of a given priority level are specified by the ith parameter structure in the array, rt_dptbl[i].

A parameter structure consists of the following members (also described in the /usr/include/sys/rt.h header file).

`rt_globpri`	The global scheduling priority associated with this priority level. The `rt_globpri` values cannot be changed with dispadmin(1M).
`rt_quantum`	The length of the time quantum allocated to processes at this level in ticks (see "Timestamp Functions"). The time quantum value is only a default or starting value for processes at a particular level. The time quantum of a realtime process can be changed by using the priocntl(1) command or the priocntl(2) system call.

Reconfiguring `config_rt_dptbl`

A real-time administrator can change the behavior of the real-time portion of the scheduler by reconfiguring the config_rt_dptbl at any time. One method is described in rt_dptbl(4) in the section titled "REPLACING THE RT_DPTBL LOADABLE MODULE."

A second method for examining or modifying the real-time parameter table on a running system is through using the dispadmin(1M) command. Invoking dispadmin(1M) for the real-time class allows retrieval of the current rt_quantum values in the current config_rt_dptbl configuration from the kernel's in-core table. When overwriting the current in-core table, the configuration file used for input to dispadmin(1M) must conform to the specific format described in rt_dptbl(4).

Following is an example of prioritized processes rtdpent_t with their associated time quantum config_rt_dptbl[] value as they might appear in config_rt_dptbl[]:

rtdpent_t  rt_dptbl[] = { 			129,    60,
	 /* prilevel Time quantum */							130,    40,
		100,    100,											131,    40,
		101,    100,											132,    40,
		102,    100,											133,    40,
		103,    100,											134,    40,
		104,    100,											135,    40,
		105,    100,											136,    40,
		106,    100,											137,    40,
		107,    100,											138,    40
		108,    100,											139,    40,
		109,    100,											140,    20,
		110,    80,												141,    20,
		111,    80,												142,    20,
		112,    80,												143,    20,
		113,    80,												144,    20,
		114,    80,												145,    20,
		115,    80,												146,    20,
		116,    80,												147,    20,
		117,    80,												148,    20,
		118,    80,												149,    20,
		119,    80,												150,    10,
		120,    60,												151,    10,
		121,    60,												152,    10,
		122,    60,												153,    10,
		123,    60,												154,    10,
		124,    60,												155,    10,
		125,    60,												156,    10,
		126,    60,												157,    10,
		126,    60,												158,    10,
		127,    60,												159,    10,
		128,    60,										}