Programming Interfaces Guide

Chapter 10 Real-time Programming and Administration

This chapter describes writing and porting real-time applications to run under SunOS. This chapter is written for programmers that are experienced in writing real-time applications and for administrators familiar with real-time processing and the Solaris system.

This chapter discusses the following topics:

Scheduling needs of real-time applications, which are covered in The Real-Time Scheduler.
Memory Locking.
Asynchronous Network Communication.

Basic Rules of Real-time Applications

Real-time response is guaranteed when certain conditions are met. This section identifies these conditions and some of the more significant design errors.

Most of the potential problems described here can degrade the response time of the system. One of the potential problems can freeze a workstation. Other, more subtle, mistakes are priority inversion and system overload.

A Solaris real-time process has the following characteristics:

Runs in the RT scheduling class, as described in The Real-Time Scheduler
Locks down all the memory in its process address space, as described in Memory Locking
Is from a statically linked program or from a program in which all dynamic binding is completed early, as described in Shared Libraries

Real-time operations are described in this chapter in terms of single-threaded processes, but the description can also apply to multithreaded processes. For detailed information about multithreaded processes, see the Multithreaded Programming Guide. To guarantee real-time scheduling of a thread, the thread must be created as a bound thread. Furthermore, the thread's LWP must be run in the RT scheduling class. The locking of memory and early dynamic binding is effective for all threads in a process.

When a process is the highest priority real-time process, the process acquires the processor within the guaranteed dispatch latency period of becoming runnable. For more information, see Dispatch Latency. The process continues to run for as long as it remains the highest priority runnable process.

A real-time process can lose control of the processor because of other system events. A real-time process can also be unable to gain control of the processor because of other system events. These events include external events, such as interrupts, resource starvation, waiting on external events such as synchronous I/O, and pre-emption by a higher priority process.

Real-time scheduling generally does not apply to system initialization and termination services such as open(2) and close(2).

Factors that Degrade Response Time

The problems described in this section all increase the response time of the system to varying extents. The degradation can be serious enough to cause an application to miss a critical deadline.

Real-time processing can also impair the operation of aspects of other applications that are active on a system that is running a real-time application. Because real-time processes have higher priority, time-sharing processes can be prevented from running for significant amounts of time. This phenomenon can cause interactive activities, such as displays and keyboard response time, to slow noticeably.

Synchronous I/O Calls

System response under SunOS provides no bounds to the timing of I/O events. This means that synchronous I/O calls should never be included in any program segment whose execution is time critical. Even program segments that permit very large time bounds must not perform synchronous I/O. Mass storage I/O is such a case, where causing a read or write operation hangs the system while the operation takes place.

A common application mistake is to perform I/O to get error message text from disk. Performing I/O in this fashion should be done from an independent process or independent thread. This independent process or independent thread should not run in real time.

Interrupt Servicing

Interrupt priorities are independent of process priorities. The priorities that are set for a group of processes are not inherited by the services of hardware interrupts that result from those processes' actions. As a consequence, devices controlled by high-priority real-time processes do not necessarily have high-priority interrupt processing.

Shared Libraries

Time-sharing processes can save significant amounts of memory by using dynamically linked, shared libraries. This type of linking is implemented through a form of file mapping. Dynamically linked library routines cause implicit reads.

Real-time programs can set the environment variable LD_BIND_NOW to a non-NULL value when the program is invoked. Setting the value of this environment value allows the use of shared libraries while avoiding dynamic binding. This procedure also forces all dynamic linking to be bound before the program begins execution. See the Linker and Libraries Guide for more information.

Priority Inversion

A time-sharing process can block a real-time process by acquiring a resource that is required by a real-time process. Priority inversion occurs when a higher priority process is blocked by a lower priority process. The term blocking describes a situation in which a process must wait for one or more processes to relinquish control of resources. Real-time processes might miss their deadlines if this blocking is prolonged.

Consider the case that is depicted in the following figure, where a high-priority process requires a shared resource. A lower priority process holds the resource and is pre-empted by an intermediate priority process, blocking the high-priority process. Any number of intermediate processes can be involved. All intermediate processes must finish executing, as well as the lower-priority process' critical section. This series of executions can take an arbitrarily long time.

Figure 10–1 Unbounded Priority Inversion

This issue and the methods of dealing with this issue are described in “Mutual Exclusion Lock Attributes” in Multithreaded Programming Guide.

Sticky Locks

A page is permanently locked into memory when its lock count reaches 65535 (0xFFFF). The value 0xFFFF is defined by the implementation and might change in future releases. Pages that are locked this way cannot be unlocked.

Runaway Real-time Processes

Runaway real-time processes can cause the system to halt. Such runaway processes can also slow the system response so much that the system appears to halt.

Note –

If you have a runaway process on a SPARC system, press Stop-A. You might have to do press Stop-A more than one time. If pressing Stop-A does not work, turn the power off, wait a moment, then turn the power back on. If you have a runaway process on a non-SPARC system, turn the power off, wait a moment, then turn the power back on.

When a high priority real-time process does not relinquish control of the CPU, you must break the infinite loop in order to regain control of the system. Such a runaway process does not respond to Control-C. Attempts to use a shell set at a higher priority than the priority of the runaway process do not work.

Asynchronous I/O Behavior

Asynchronous I/O operations do not always execute in the sequence in which the operations are queued to the kernel. Asynchronous operations do not necessarily return to the caller in the sequence in which the operations were performed.

If a single buffer is specified for a rapid sequence of calls to aioread(3AIO), the buffer's state is uncertain. The uncertainty of the buffer's state is from the time the first call is made to the time the last result is signaled to the caller.

An individual aio_result_t structure can be used for only one asynchronous operation. The operation can be a read or a write operation.

Real-time Files

SunOS provides no facilities to ensure that files are allocated as physically contiguous.

For regular files, the read(2) and write(2) operations are always buffered. An application can use mmap(2) and msync(3C) to effect direct I/O transfers between secondary storage and process memory.

The Real-Time Scheduler

Real-time scheduling constraints are necessary to manage data acquisition or process control hardware. The real-time environment requires that a process be able to react to external events in a bounded amount of time. Such constraints can exceed the capabilities of a kernel that is designed to provide a fair distribution of the processing resources to a set of time-sharing processes.

This section describes the SunOS real-time scheduler, its priority queue, and how to use system calls and utilities that control scheduling.

Dispatch Latency

The most significant element in scheduling behavior for real-time applications is the provision of a real-time scheduling class. The standard time-sharing scheduling class is not suitable for real-time applications because this scheduling class treats every process equally. The standard time-sharing scheduling class has a limited notion of priority. Real-time applications require a scheduling class in which process priorities are taken as absolute. Real-time applications also require a scheduling class in which process priorities are changed only by explicit application operations.

The term dispatch latency describes the amount of time a system takes to respond to a request for a process to begin operation. With a scheduler that is written specifically to honor application priorities, real-time applications can be developed with a bounded dispatch latency.

The following figure illustrates the amount of time an application takes to respond to a request from an external event.

Figure 10–2 Application Response Time

The overall application response time consists of the interrupt response time, the dispatch latency, and the application's response time.

The interrupt response time for an application includes both the interrupt latency of the system and the device driver's own interrupt processing time. The interrupt latency is determined by the longest interval that the system must run with interrupts disabled. This time is minimized in SunOS using synchronization primitives that do not commonly require a raised processor interrupt level.

During interrupt processing, the driver's interrupt routine wakes the high-priority process and returns when finished. The system detects that a process with higher priority than the interrupted process is now ready to dispatch and dispatches the process. The time to switch context from a lower-priority process to a higher-priority process is included in the dispatch latency time.

Figure 10–3 illustrates the internal dispatch latency and application response time of a system. The response time is defined in terms of the amount of time a system takes to respond to an internal event. The dispatch latency of an internal event represents the amount of time that a process needs to wake up a higher priority process. The dipatch latency also includes the time that the system takes to dispatch the higher priority process.

The application response time is the amount of time that a driver takes to: wake up a higher-priority process, release resources from a low-priority process, reschedule the higher-priority task, calculate the response, and dispatch the task.

Interrupts can arrive and be processed during the dispatch latency interval. This processing increases the application response time, but is not attributed to the dispatch latency measurement. Therefore, this processing is not bounded by the dispatch latency guarantee.

Figure 10–3 Internal Dispatch Latency

This graphic depicts the components of internal dispatch latency: wakeup and dispatch.

With the new scheduling techniques provided with real-time SunOS, the system dispatch latency time is within specified bounds. As you can see in the following table, dispatch latency improves with a bounded number of processes.

Table 10–1 Real-time System Dispatch Latency


Workstation	Bounded Number of Processes	Arbitrary Number of Processes
SPARCstation 2	<0.5 milliseconds in a system with fewer than 16 active processes	1.0 milliseconds
SPARCstation 5	<0.3 millisecond	0.3 millisecond
Ultra 1-167	<0.15 millisecond	<0.15 millisecond

Scheduling Classes

The SunOS kernel dispatches processes by priority. The scheduler or dispatcher supports the concept of scheduling classes. Classes are defined as real-time (RT), system (SYS), and time-sharing (TS). Each class has a unique scheduling policy for dispatching processes within its class.

The kernel dispatches highest priority processes first. By default, real-time processes have precedence over sys and TS processes. Administrators can configure systems so that the priorities for TS processes and RT processes overlap.

The following figure illustrates the concept of classes as viewed by the SunOS kernel.

Figure 10–4 Dispatch Priorities for Scheduling Classes

This graphic depicts hardware system interrupts with a higher priority than software interrupts from realtime, kernel, or time-sharing processes.

Hardware interrupts, which cannot be controlled by software, have the highest priority. The routines that process interrupts are dispatched directly and immediately from interrupts, without regard to the priority of the current process.

Real-time processes have the highest default software priority. Processes in the RT class have a priority and time quantum value. RT processes are scheduled strictly on the basis of these parameters. As long as an RT process is ready to run, no SYS or TS process can run. Fixed-priority scheduling enables critical processes to run in a predetermined order until completion. These priorities never change unless they are changed by an application.

An RT class process inherits the parent's time quantum, whether finite or infinite. A process with a finite time quantum runs until the time quantum expires. A process with a finite time quantum also stops running if the process blocks while waiting for an I/O event or is pre-empted by a higher-priority runnable real-time process. A process with an infinite time quantum ceases execution only when the process terminates, blocks, or is pre-empted.

The SYS class exists to schedule the execution of special system processes, such as paging, STREAMS, and the swapper. You cannot change the class of a process to the SYS class. The SYS class of processes has fixed priorities established by the kernel when the processes are started.

The time-sharing (TS) processes have the lowest priority. TS class processes are scheduled dynamically, with a few hundred milliseconds for each time slice. The TS scheduler switches context in round-robin fashion often enough to give every process an equal opportunity to run, depending upon:

The time slice value
The process history, which records when the process was last put to sleep
Considerations for CPU utilization

Default time-sharing policy gives larger time slices to processes with lower priority.

A child process inherits the scheduling class and attributes of the parent process through fork(2). A process's scheduling class and attributes are unchanged by exec(2).

Different algorithms dispatch each scheduling class. Class-dependent routines are called by the kernel to make decisions about CPU process scheduling. The kernel is class-independent, and takes the highest priority process off its queue. Each class is responsible for calculating a process's priority value for its class. This value is placed into the dispatch priority variable of that process.

As the following figure illustrates, each class algorithm has its own method of nominating the highest priority process to place on the global run queue.

Figure 10–5 Kernel Dispatch Queue

Each class has a set of priority levels that apply to processes in that class. A class-specific mapping maps these priorities into a set of global priorities. A set of global scheduling priority maps is not required to start with zero or be contiguous.

By default, the global priority values for time-sharing (TS) processes range from -20 to +20. These global priority values are mapped into the kernel from 0-40, with temporary assignments as high as 99. The default priorities for real-time (RT) processes range from 0-59, and are mapped into the kernel from 100 to 159. The kernel's class-independent code runs the process with the highest global priority on the queue.

Dispatch Queue

The dispatch queue is a linear-linked list of processes with the same global priority. Each process has class-specific information attached to the process upon invocation. A process is dispatched from the kernel dispatch table in an order that is based on the process' global priority.

Dispatching Processes

When a process is dispatched, the context of the process is mapped into memory along with its memory management information, its registers, and its stack. Execution begins after the context mapping is done. Memory management information is in the form of hardware registers that contain the data that is needed to perform virtual memory translations for the currently running process.

Process Pre-emption

When a higher priority process becomes dispatchable, the kernel interrupts its computation and forces the context switch, pre-empting the currently running process. A process can be pre-empted at any time if the kernel finds that a higher-priority process is now dispatchable.

For example, suppose that process A performs a read from a peripheral device. Process A is put into the sleep state by the kernel. The kernel then finds that a lower-priority process B is runnable. Process B is dispatched and begins execution. Eventually, the peripheral device sends an interrupt, and the driver of the device is entered. The device driver makes process A runnable and returns. Rather than returning to the interrupted process B, the kernel now pre-empts B from processing, resuming execution of the awakened process A.

Another interesting situation occurs when several processes contend for kernel resources. A high-priority real-time process might be waiting for a resource held by a low-priority process. When the low-priority process releases the resource, the kernel pre-empts that process to resume execution of the higher-priority process.

Kernel Priority Inversion

Priority inversion occurs when a higher-priority process is blocked by one or more lower-priority processes for a long time. The use of synchronization primitives such as mutual-exclusion locks in the SunOS kernel can lead to priority inversion.

A process is blocked when the process must wait for one or more processes to relinquish resources. Prolonged blocking can lead to missed deadlines, even for low levels of utilization.

The problem of priority inversion has been addressed for mutual-exclusion locks for the SunOS kernel by implementing a basic priority inheritance policy. The policy states that a lower-priority process inherits the priority of a higher-priority process when the lower-priority process blocks the execution of the higher-priority process. This inheritance places an upper bound on the amount of time a process can remain blocked. The policy is a property of the kernel's behavior, not a solution that a programmer institutes through system calls or interface execution. User-level processes can still exhibit priority inversion, however.

User Priority Inversion

The issue of user priority inversion, and the means to deal with priority inversion, are discussed in “Mutual Exclusion Lock Attributes” in Multithreaded Programming Guide.

Interface Calls That Control Scheduling

The following interface calls control process scheduling.

Using `priocntl`

Control over scheduling of active classes is done with priocntl(2). Class attributes are inherited through fork(2) and exec(2), along with scheduling parameters and permissions required for priority control. This inheritance happens with both the RT and the TS classes.

priocntl(2) is the interface for specifying a real-time process, a set of processes, or a class to which the system call applies. priocntlset(2) also provides the more general interface for specifying an entire set of processes to which the system call applies.

The command arguments of priocntl(2) can be one of: PC_GETCID, PC_GETCLINFO, PC_GETPARMS, or PC_SETPARMS. The real or effective ID of the calling process must match the real or effective ID of the affected processes, or must have superuser privilege.

PC_GETCID: This command takes the name field of a structure that contains a recognizable class name. The class ID and an array of class attribute data are returned.
PC_GETCLINFO: This command takes the ID field of a structure that contains a recognizable class identifier. The class name and an array of class attribute data are returned.
PC_GETPARMS: This command returns the scheduling class identifier or the class specific scheduling parameters of one of the specified processes. Even though idtype and id might specify a big set, PC_GETPARMS returns the parameter of only one process. The class selects the process.
PC_SETPARMS: This command sets the scheduling class or the class-specific scheduling parameters of the specified process or processes.

Other interface calls

sched_get_priority_max: Returns the maximum values for the specified policy.
sched_get_priority_min: Returns the minimum values for the specified policy. For mor einformation, see the sched_get_priority_max(3R) man page.
sched_rr_get_interval: Updates the specified timespec structure to the current execution time limit. For more information, see the sched_get_priority_max(3RT) man page.
sched_setparam, sched_getparam: Sets or gets the scheduling parameters of the specified process.
sched_yield: Blocks the calling process until the calling process returns to the head of the process list.

Utilities That Control Scheduling

The administrative utilities that control process scheduling are dispadmin(1M) and priocntl(1). Both of these utilities support the priocntl(2) system call with compatible options and loadable modules. These utilities provide system administration functions that control real-time process scheduling during runtime.

priocntl(1)

The priocntl(1) command sets and retrieves scheduler parameters for processes.

dispadmin(1M)

The dispadmin(1M) utility displays all current process scheduling classes by including the -l command line option during runtime. Process scheduling can also be changed for the class specified after the -c option, using RT as the argument for the real-time class.

The class options for dispadmin(1M) are in the following list:

-l: Lists scheduler classes currently configured
-c: Specifies the class with parameters to be displayed or to be changed
-g: Gets the dispatch parameters for the specified class
-r: Used with -g, specifies time quantum resolution
-s: Specifies a file where values can be located

A class-specific file that contains the dispatch parameters can also be loaded during runtime. Use this file to establish a new set of priorities that replace the default values that were established during boot time. This class-specific file must assert the arguments in the format used by the -g option. Parameters for the RT class are found in the rt_dptbl(4), and are listed in Example 10–1.

To add an RT class file to the system, the following modules must be present:

An rt_init() routine in the class module that loads the rt_dptbl(4).
An rt_dptbl(4) module that provides the dispatch parameters and a routine to return pointers to config_rt_dptbl.
The dispadmin(1M) executable.

The following steps install a RT class dispatch table:

Load the class-specific module with the following command, where module_name is the class-specific module:
# modload /kernel/sched/module_name
Invoke the dispadmin command:
# dispadmin -c RT -s file_name
The file must describe a table with the same number of entries as the table that is being overwritten.

Configuring Scheduling

Associated with both scheduling classes is a parameter table, rt_dptbl(4), and ts_dptbl(4). These tables are configurable by using a loadable module at boot time, or with dispadmin(1M) during runtime.

Dispatcher Parameter Table

The in-core table for real-time establishes the properties for RT scheduling. The rt_dptbl(4) structure consists of an array of parameters, struct rt_dpent_t. Each of the n priority levels has one parameter. The properties of a given priority level are specified by the ith parameter structure in the array, rt_dptbl[i].

A parameter structure consists of the following members, which are also described in the /usr/include/sys/rt.h header file.

rt_globpri: The global scheduling priority associated with this priority level. The rt_globpri values cannot be changed with dispadmin(1M).
rt_quantum: The length of the time quantum allocated to processes at this level in ticks. For more information, see Timestamp Interfaces. The time quantum value is only a default or starting value for processes at a particular level. The time quantum of a real-time process can be changed by using the priocntl(1) command or the priocntl(2) system call.

Reconfiguring `config_rt_dptbl`

A real-time administrator can change the behavior of the real-time portion of the scheduler by reconfiguring the config_rt_dptbl at any time. One method is described in the rt_dptbl(4) man page, in the section titled “Replacing the rt_dptbl Loadable Module.”

A second method for examining or modifying the real-time parameter table on a running system is through the dispadmin(1M) command. Invoking dispadmin(1M) for the real-time class enables retrieval of the current rt_quantum values in the current config_rt_dptbl configuration from the kernel's in-core table. When overwriting the current in-core table, the configuration file used for input to dispadmin(1M) must conform to the specific format described in the rt_dptbl(4) man page.

Following is an example of prioritized processes rtdpent_t with their associated time quantum config_rt_dptbl[] value as the processes might appear in config_rt_dptbl[].

Example 10–1 RT Class Dispatch Parameters

rtdpent_t  rt_dptbl[] = { 			129,    60,
	 /* prilevel Time quantum */							130,    40,
		100,    100,											131,    40,
		101,    100,											132,    40,
		102,    100,											133,    40,
		103,    100,											134,    40,
		104,    100,											135,    40,
		105,    100,											136,    40,
		106,    100,											137,    40,
		107,    100,											138,    40
		108,    100,											139,    40,
		109,    100,											140,    20,
		110,    80,											 141,    20,
		111,    80,											 142,    20,
		112,    80,											 143,    20,
		113,    80,											 144,    20,
		114,    80,											 145,    20,
		115,    80,											 146,    20,
		116,    80,											 147,    20,
		117,    80,											 148,    20,
		118,    80,											 149,    20,
		119,    80,											 150,    10,
		120,    60,											 151,    10,
		121,    60,											 152,    10,
		122,    60,											 153,    10,
		123,    60,											 154,    10,
		124,    60,											 155,    10,
		125,    60,											 156,    10,
		126,    60,											 157,    10,
		126,    60,											 158,    10,
		127,    60,											 159,    10,
		128,    60,										}

Memory Locking

Locking memory is one of the most important issues for real-time applications. In a real-time environment, a process must be able to guarantee continuous memory residence to reduce latency and to prevent paging and swapping.

This section describes the memory locking mechanisms that are available to real-time applications in SunOS.

Under SunOS, the memory residency of a process is determined by its current state, the total available physical memory, the number of active processes, and the processes' demand for memory. This residency is appropriate in a time-share environment. This residency is often unacceptable for a real-time process. In a real-time environment, a process must guarantee a memory residence to reduce the process' memory access and dispatch latency.

Real-time memory locking in SunOS is provided by a set of library routines. These routines allow a process running with superuser privileges to lock specified portions of its virtual address space into physical memory. Pages locked in this manner are exempt from paging until the pages are unlocked or the process exits.

The operating system has a system-wide limit on the number of pages that can be locked at any time. This limit is a tunable parameter whose default value is calculated at boot time. The default value is based on the number of page frames minus another percentage, currently set at ten percent.

Locking a Page

A call to mlock(3C) requests that one segment of memory be locked into the system's physical memory. The pages that make up the specified segment are faulted in. The lock count of each page is incremented. Any page whose lock count value is greater than zero is exempt from paging activity.

A particular page can be locked multiple times by multiple processes through different mappings. If two different processes lock the same page, the page remains locked until both processes remove their locks. However, within a given mapping, page locks do not nest. Multiple calls of locking interfaces on the same address by the same process are removed by a single unlock request.

If the mapping through which a lock has been performed is removed, the memory segment is implicitly unlocked. When a page is deleted through closing or truncating the file, the page is also implicitly unlocked.

Locks are not inherited by a child process after a fork(2) call. If a process that has some memory locked forks a child, the child must perform a memory locking operation on its own behalf to lock its own pages. Otherwise, the child process incurs copy-on-write page faults, which are the usual penalties that are associated with forking a process.

Unlocking a Page

To unlock a page of memory, a process requests the release of a segment of locked virtual pages by a calling munlock(3C). munlock decrements the lock counts of the specified physical pages. After decrementing a page's lock count to 0, the page swaps normally.

Locking All Pages

A superuser process can request that all mappings within its address space be locked by a call to mlockall(3C). If the flag MCL_CURRENT is set, all the existing memory mappings are locked. If the flag MCL_FUTURE is set, every mapping that is added to an existing mapping or that replaces an existing mapping is locked into memory.

Recovering Sticky Locks

A page is permanently locked into memory when its lock count reaches 65535 (0xFFFF). The value 0xFFFF is defined by implementation. This value might change in future releases. Pages that are locked in this manner cannot be unlocked. Reboot the system to recover.

High Performance I/O

This section describes I/O with real-time processes. In SunOS, the libraries supply two sets of interfaces and calls to perform fast, asynchronous I/O operations. The POSIX asynchronous I/O interfaces are the most recent standard. The SunOS environment also provides file and in-memory synchronization operations and modes to prevent information loss and data inconsistency.

Standard UNIX I/O is synchronous to the application programmer. An application that calls read(2) or write(2) usually waits until the system call has finished.

Real-time applications need asynchronous, bounded I/O behavior. A process that issues an asynchronous I/O call proceeds without waiting for the I/O operation to complete. The caller is notified when the I/O operation has finished.

Asynchronous I/O can be used with any SunOS file. Files are opened synchronously and no special flagging is required. An asynchronous I/O transfer has three elements: call, request, and operation. The application calls an asynchronous I/O interface, the request for the I/O is placed on a queue, and the call returns immediately. At some point, the system dequeues the request and initiates the I/O operation.

Asynchronous and standard I/O requests can be intermingled on any file descriptor. The system maintains no particular sequence of read and write requests. The system arbitrarily resequences all pending read and write requests. If a specific sequence is required for the application, the application must insure the completion of prior operations before issuing the dependent requests.

POSIX Asynchronous I/O

POSIX asynchronous I/O is performed using aiocb structures. An aiocb control block identifies each asynchronous I/O request and contains all of the controlling information. A control block can be used for only one request at a time. A control block can be reused after its request has been completed.

A typical POSIX asynchronous I/O operation is initiated by a call to aio_read(3RT) or aio_write(3RT). Either polling or signals can be used to determine the completion of an operation. If signals are used for completing operations, each operation can be uniquely tagged. The tag is then returned in the si_value component of the generated signal. See the siginfo(3HEAD) man page.

aio_read: aio_read(3RT) is called with an asynchronous I/O control block to initiate a read operation.
aio_write: aio_write(3RT) is called with an asynchronous I/O control block to initiate a write operation.
aio_return, aio_error: aio_return(3RT) and aio_error(3RT) are called to obtain return and error values, respectively, after an operation is known to have completed.
aio_cancel: aio_cancel(3RT) is called with an asynchronous I/O control block to cancel pending operations. aio_cancel can be used to cancel a specific request, if a request is specified by the control block. aio_cancel can also cancel all of the requests that are pending for the specified file descriptor.
aio_fsync: aio_fsync(3RT) queues an asynchronous fsync(3C) or fdatasync(3RT) request for all of the pending I/O operations on the specified file.
aio_suspend: aio_suspend(3RT) suspends the caller as though one or more of the preceding asynchronous I/O requests had been made synchronously.

Solaris Asynchronous I/O

This section discusses asynchronous I/O operations in the Solaris operating environment.

Notification (SIGIO)

When an asynchronous I/O call returns successfully, the I/O operation has only been queued and waits to be done. The actual operation has a return value and a potential error identifier. This return value and potential error identifier would have been returned to the caller if the call had been synchronous. When the I/O is finished, both the return and error values are stored at a location given by the user at the time of the request as a pointer to an aio_result_t. The structure of the aio_result_t is defined in <sys/asynch.h>:

typedef struct aio_result_t {
 	ssize_t	aio_return; /* return value of read or write */
 	int 		aio_errno;  /* errno generated by the IO */
 } aio_result_t;

When the aio_result_t has been updated, a SIGIO signal is delivered to the process that made the I/O request.

Note that a process with two or more asynchronous I/O operations pending has no certain way to determine the cause of the SIGIO signal. A process that receives a SIGIO should check all its conditions that could be generating the SIGIO signal.

Using `aioread`

The aioread(3AIO) routine is the asynchronous version of read(2). In addition to the normal read arguments, aioread(3AIO) takes the arguments that specify a file position and the address of an aio_result_t structure. The resulting information about the operation is stored in the aio_result_t structure. The file position specifies a seek to be performed within the file before the operation. Whether the aioread(3AIO) call succeeds or fails, the file pointer is updated.

Using `aiowrite`

The aiowrite(3AIO) routine is the asynchronous version of write(2). In addition to the normal write arguments, aiowrite(3AIO) takes arguments that specify a file position and the address of an aio_result_t structure. The resulting information about the operation is stored in the aio_result_t structure.

The file position specifies that a seek operation is to be performed within the file before the operation. If the aiowrite(3AIO) call succeeds, the file pointer is updated to the position that would have resulted in a successful seek and write. The file pointer is also updated when a write fails to allow for subsequent write requests.

Using `aiocancel`

The aiocancel(3AIO) routine attempts to cancel the asynchronous request whose aio_result_t structure is given as an argument. An aiocancel(3AIO) call succeeds only if the request is still queued. If the operation is in progress, aiocancel(3AIO) fails.

Using `aiowait`

A call to aiowait(3AIO) blocks the calling process until at least one outstanding asynchronous I/O operation is completed. The timeout parameter points to a maximum interval to wait for I/O completion. A timeout value of zero specifies that no wait is wanted. aiowait(3AIO) returns a pointer to the aio_result_t structure for the completed operation.

Using `poll()`

To determine the completion of an asynchronous I/O event synchronously rather than depend on a SIGIO interrupt, use poll(2). You can also poll to determine the origin of a SIGIO interrupt.

poll(2) is slow when used on very large numbers of files. This problem is resolved by poll(7D).

Using the `poll` Driver

Using /dev/poll provides a highly scalable way of polling a large number of file descriptors. This scalability is provided through a new set of APIs and a new driver, /dev/poll. The /dev/poll API is an alternative to, not a replacement of, poll(2). Use poll(7D) to provide details and examples of the /dev/poll API. When used properly, the /dev/poll API scales much better than poll(2). This API is especially suited for applications that satisfy the following criteria:

Applications that repeatedly poll a large number of file descriptors
Polled file descriptors that are relatively stable, meaning that the descriptors are not constantly closed and reopened
The set of file descriptors that actually have polled events pending is small, comparing to the total number of file descriptors that are being polled

Using `close`

Files are closed by calling close(2). The call to close(2) cancels any outstanding asynchronous I/O request that can be closed. close(2) waits for an operation that cannot be cancelled. For more information, see Using aiocancel. When close(2) returns, no asynchronous I/O is pending for the file descriptor. Only asynchronous I/O requests queued to the specified file descriptor are cancelled when a file is closed. Any I/O pending requests for other file descriptors are not cancelled.

Synchronized I/O

Applications might need to guarantee that information has been written to stable storage, or that file updates are performed in a particular order. Synchronized I/O provides for these needs.

Synchronization Modes

Under SunOS, a write operation succeeds when the system ensures that all written data is readable after any subsequent open of the file. This check assumes no failure of the physical storage medium. Data is successfully transferred for a read operation when an image of the data on the physical storage medium is available to the requesting process. An I/O operation is complete when the associated data has been successfully transferred, or when the operation has been diagnosed as unsuccessful.

An I/O operation has reached synchronized I/O data integrity completion when:

For reads, the operation has been completed, or diagnosed if unsuccessful. The read is complete only when an image of the data has been successfully transferred to the requesting process. If the synchronized read operation is requested when pending write requests affect the data to be read, these write requests are successfully completed before the data is read.
For writes, the operation has been completed, or diagnosed if unsuccessful. The write operation succeeds when the data specified in the write request is successfully transferred. Furthermore, all file system information required to retrieve the data must be successfully transferred.
File attributes that are not necessary for data retrieval are not transferred prior to returning to the calling process.
Synchronized I/O file integrity completion requires that all file attributes relative to the I/O operation be successfully transferred before returning to the calling process. Synchronized I/O file integrity completion is otherwise identical to synchronized I/O data integrity compleiton.

Synchronizing a File

fsync(3C) and fdatasync(3RT) explicitly synchronize a file to secondary storage.

The fsync(3C) routine guarantees that the interface is synchronized at the I/O file integrity completion level. fdatasync(3RT) guarantees that the interface is synchronized at level of I/O data integrity completion.

Applications can synchronize each I/O operation before the operation completes. Setting the O_DSYNC flag on the file description by using open(2) or fcntl(2) ensures that all I/O writes reach I/O data completion before the operation completes. Setting the O_SYNC flag on the file description ensures that all I/O writes have reached completion before the operation is indicated as completed. Setting the O_RSYNC flag on the file description ensures that all I/O reads read(2) and aio_read(3RT) reach the same level of completion that is requested by the descriptor setting. The descriptor setting can be either O_DSYNC or O_SYNC.

Interprocess Communication

This section describes the interprocess communication (IPC) interfaces of SunOS as the interfaces relate to real-time processing. Signals, pipes, FIFOs, message queues, shared memory, file mapping, and semaphores are described here. For more information about the libraries, interfaces, and routines that are useful for interprocess communication, see Chapter 6, Interprocess Communication.

Processing Signals

The sender can use sigqueue(3RT) to send a signal together with a small amount of information to a target process.

To queue subsequent occurrences of a pending signal, the target process must have the SA_SIGINFO bit set for the specified signal. See the sigaction(2) man page.

The target process normally receive signals asynchronously. To receive signals synchronously, block the signal and call either sigwaitinfo(3RT) or sigtimedwait(3RT). See the sigprocmask(2) man page. This procedure causes the signal to be received synchronously. The value sent by the caller of sigqueue(3RT) is stored in the si_value member of the siginfo_t argument. Leaving the signal unblocked causes the signal to be delivered to the signal handler specified by sigaction(2), with the value appearing in the si_value of the siginfo_t argument to the handler.

A specified number of signals with associated values can be sent by a process and remain undelivered. Storage for {SIGQUEUE_MAX} signals is allocated at the first call to sigqueue(3RT). Thereafter, a call to sigqueue(3RT) either successfully enqueues at the target process or fails within a bounded amount of time.

Pipes, Named Pipes, and Message Queues

Pipes, named pipes, and message queues behave similarly to character I/O devices. These interfaces have different methods of connecting. See Pipes Between Processes for more information about pipes. See Named Pipes for more information about named pipes. See System V Messages and POSIX Messages for more information about message queues.

Using Semaphores

Semaphores are also provided in both System V and POSIX styles. See System V Semaphores and POSIX Semaphores for more information.

Note that using semaphores can cause priority inversions unless priority inversions are explicitly avoided by the techniques mentioned earlier in this chapter.

Shared Memory

The fastest way for processes to communicate is directly, through a shared segment of memory. When more than two processes attempt to read and write shared memory simultaneously, the memory contents can become inaccurate. This potential inaccuracy is the major difficulty with using shared memory.

Asynchronous Network Communication

This section introduces asynchronous network communication, using sockets or Transport-Level Interface (TLI) for real-time applications. Asynchronous networking with sockets is done by setting an open socket, of type SOCK_STREAM, to asynchronous and non blocking. For more information on asynchronous sockets, see Advanced Socket Topics. Asynchronous network processing of TLI events is supported using a combination of STREAMS asynchronous features and the non-blocking mode of the TLI library routines.

For more information on the Transport-Level Interface, see Chapter 8, Programming With XTI and TLI.

Modes of Networking

Both sockets and transport-level interface provide two modes of service: connection-mode and connectionless-mode.

Connection-mode service is circuit-oriented. This service enables the transmission of data over an established connection in a reliable, sequenced manner. This service also provides an identification procedure that avoids the overhead of address resolution and transmission during the data transfer phase. This service is attractive for applications that require relatively long-lived, datastream-oriented interactions.

Connectionless-mode service is message-oriented and supports data transfer in self-contained units with no logical relationship required among multiple units. A single service request passes all the information required to deliver a unit of data from the sender to the transport provider. This service request includes the destination address and the data to be delivered. Connectionless-mode service is attractive for applications that involve short-term interactions that do not require guaranteed, in-sequence delivery of data. Connectionless transports are generally unreliable.

Timing Facilities

This section describes the timing facilities that are available for real-time applications under SunOS. Real-time applications that use these mechanisms require detailed information from the man pages of the routines that are listed in this section.

The timing interfaces of SunOS fall into two separate areas: timestamps and interval timers. The timestamp interfaces provide a measure of elapsed time. The timestamp interfaces also enable the application to measure the duration of a state or the time between events. Interval timers allow an application to wake up at specified times and to schedule activities based on the passage of time.

Timestamp Interfaces

Two interfaces provide timestamps. gettimeofday(3C) provides the current time in a timeval structure, representing the time in seconds and microseconds since midnight, Greenwich Mean Time, on January 1, 1970. clock_gettime, with a clockid of CLOCK_REALTIME, provides the current time in a timespec structure, representing in seconds and nanoseconds the same time interval returned by gettimeofday(3C).

SunOS uses a hardware periodic timer. For some workstations, the hardware periodic timer is the sole source of timing information. If the hardware periodic timer is the sole source of timing information, the accuracy of timestamps is limited to the timer's resolution. For other platforms, a timer register with a resolution of one microsecond means that timestamps are accurate to one microsecond.

Interval Timer Interfaces

Real-time applications often schedule actions by using interval timers. Interval timers can be either of two types: a one-shot type or a periodic type.

A one-shot is an armed timer that is set to an expiration time relative to either a current time or an absolute time. The timer expires once and is disarmed. This type of a timer is useful for clearing buffers after the data has been transferred to storage, or to time-out an operation.

A periodic timer is armed with an initial expiration time, either absolute or relative, and a repetition interval. Every time the interval timer expires, the timer is reloaded with the repetition interval. The timer is then rearmed. This timer is useful for data logging or for servo-control. In calls to interval timer interfaces, time values that are smaller than the timer's resolution are rounded up to the next multiple of the hardware timer interval. This interval is typically 10ms.

SunOS has two sets of timer interfaces. The setitimer(2) and getitimer(2) interfaces operate fixed set timers, which are called the BSD timers, using the timeval structure to specify time intervals. The POSIX timers, which are created with timer_create(3RT), operate the POSIX clock, CLOCK_REALTIME. POSIX timer operations are expressed in terms of the timespec structure.

The getitimer(2) and setitimer(2) functions retrieve and establish, respectively, the value of the specified BSD interval timer. The three BSD interval timers that are available to a process include a real-time timer designated ITIMER_REAL. If a BSD timer is armed and allowed to expire, the system sends an appropriate signal to the process that set the timer.

The timer_create(3RT) routine can create up to TIMER_MAX POSIX timers. The caller can specify what signal and what associated value are sent to the process when the timer expires. The timer_settime(3RT) and timer_gettime(3RT) routines retrieve and establish respectively the value of the specified POSIX interval timer. POSIX timers can expire while the required signal is pending delivery. The timer expirations are counted, and timer_getoverrun(3RT) retrieves the count. timer_delete(3RT) deallocates a POSIX timer.

The following example illustrates how to use setitimer(2) to generate a periodic interrupt, and how to control the arrival of timer interrupts.

Example 10–2 Controlling Timer Interrupts

#include	<unistd.h>
#include	<signal.h>
#include	<sys/time.h>

#define TIMERCNT 8

void timerhandler();
int	 timercnt;
struct	 timeval alarmtimes[TIMERCNT];

main()
{
	struct itimerval times;
	sigset_t	sigset;
	int		i, ret;
	struct sigaction act;
	siginfo_t	si;

	/* block SIGALRM */
	sigemptyset (&sigset);
	sigaddset (&sigset, SIGALRM);
	sigprocmask (SIG_BLOCK, &sigset, NULL);

	/* set up handler for SIGALRM */
	act.sa_action = timerhandler;
	sigemptyset (&act.sa_mask);
	act.sa_flags = SA_SIGINFO;
	sigaction (SIGALRM, &act, NULL);
	/*
	 * set up interval timer, starting in three seconds,
	 *	then every 1/3 second
	 */
	times.it_value.tv_sec = 3;
	times.it_value.tv_usec = 0;
	times.it_interval.tv_sec = 0;
	times.it_interval.tv_usec = 333333;
	ret = setitimer (ITIMER_REAL, &times, NULL);
	printf ("main:setitimer ret = %d\n", ret);

	/* now wait for the alarms */
	sigemptyset (&sigset);
	timerhandler (0, si, NULL);
	while (timercnt < TIMERCNT) {
		ret = sigsuspend (&sigset);
	}
	printtimes();
}

void timerhandler (sig, siginfo, context)
	int		sig;
	siginfo_t	*siginfo;
	void		*context;
{
	printf ("timerhandler:start\n");
	gettimeofday (&alarmtimes[timercnt], NULL);
	timercnt++;
	printf ("timerhandler:timercnt = %d\n", timercnt);
}

printtimes ()
{
	int	i;

	for (i = 0; i < TIMERCNT; i++) {
		printf("%ld.%0l6d\n", alarmtimes[i].tv_sec,
				alarmtimes[i].tv_usec);
	}
}