System Interface Guide

High Performance I/O

This section describes I/O with realtime processes. In SunOS 5.0 through 5.8, the libraries supply two sets of functions and calls to perform fast, asynchronous, I/O operations. The POSIX asynchronous I/O interfaces are the new standard. For robustness, SunOS also provides file and in-memory synchronization operations and modes to prevent information loss and data inconsistency.

Standard UNIX I/O is synchronous to the application programmer. An application that calls read(2) or write(2) usually waits until the system call has finished.

Real-time applications need asynchronous, bounded I/O behavior. A process that issues an asynchronous I/O call proceeds without waiting for the I/O operation to complete. The caller is notified when the I/O operation has finished. In the mean time the process does something useful.

Asynchronous I/O can be used with any SunOS file. Files are opened in the synchronous way and no special flagging is required. An asynchronous I/O transfer has three elements: call, request, and operation. The application calls an asynchronous I/O function, the request for the I/O is placed on a queue, and the call returns immediately. At some point, the system dequeues the request and initiates the I/O operation.

Asynchronous and standard I/O requests can be intermingled on any file descriptor. The system maintains no particular sequence of read and write requests. The system arbitrarily resequences all pending read and write requests. If a specific sequence is required for the application, the application must insure the completion of prior operations before issuing the dependent requests.

POSIX Asynchronous I/O

POSIX asynchronous I/O is performed using aiocb structures. An aiocb control block identifies each asynchronous I/O request and contains all of the controlling information. A control block can be used for only one request at a time and can be reused after its request has been completed.

A typical POSIX asynchronous I/O operation is initiated by a call to aio_read(3RT) or aio_write(3RT). Either polling or signals can be used to determine the completion of an operation. If signals are used for operation completion, each operation can be uniquely tagged and the tag is returned in the si_value component of the generated signal (see siginfo(3HEAD)).

aio_read(3RT)

aio_read(3RT) is called with an asynchronous I/O control block to initiate a read operation.

aio_write(3RT)

aio_write(3RT) is called with an asynchronous I/O control block to initiate a write operation.

aio_return(3RT) and aio_error(3RT)

aio_return(3RT) and aio_error(3RT) are called to obtain return and error values. respectively, after an operation is known to have been completed.

aio_cancel(3RT)

aio_cancel(3RT) is called with an asynchronous I/O control block to cancel pending operations. It can be used to cancel a specific request, if the control block specifies one, or all of the requests pending for the specified file descriptor.

aio_fsync(3RT)

aio_fsync(3RT) queues an asynchronousfsync(3C) or fdatasync(3RT) request for all of the pending I/O operations on the specified file.

aio_suspend(3RT)

aio_suspend(3RT) suspends the caller as though one, or more, of the preceding asynchronous I/O requests had been made synchronously.

Solaris Asynchronous I/O

Notification (SIGIO)

When an asynchronous I/O call returns successfully, the I/O operation has only been queued, waiting to be done. The actual operation also has a return value and a potential error identifier, the values that would have been returned to the caller as the result of a synchronous call. When the I/O is finished, the return value and error value are stored at a location given by the user at the time of the request as a pointer to an aio_result_t. The structure of the aio_result_t is defined in <sys/asynch.h>:

typedef struct aio_result_t {
 	ssize_t	aio_return; /* return value of read or write */
 	int 		aio_errno;  /* errno generated by the IO */
 } aio_result_t;

When aio_result_t has been updated, a SIGIO signal is delivered to the process that made the I/O request.

Note that a process with two or more asynchronous I/O operations pending has no certain way to determine which request, if any, is the cause of the SIGIO signal. A process receiving a SIGIO should check all its conditions that could be generating the SIGIO signal.

aioread(3AIO)

aioread(3AIO) is the asynchronous version of read(2). In addition to the normal read arguments, aioread(3AIO) takes the arguments specifying a file position and the address of an aio_result_t structure in which the system stores the result information about the operation. The file position specifies a seek to be performed within the file before the operation. Whether the aioread(3AIO) call succeeds or fails, the file pointer is updated.

aiowrite(3AIO)

aiowrite(3AIO) is the asynchronous version of write(2). In addition to the normal write arguments, aiowrite(3AIO) takes arguments specifying a file position and the address of an aio_result_t structure in which the system is to store the resulting information about the operation.

The file position specifies a seek to be performed within the file before the operation. If the aiowrite(3AIO) call succeeds, the file pointer is updated to the position that would have resulted in a successful seek and write. The file pointer is also updated when a write fails to allow for subsequent write requests.

aiocancel(3AIO)

aiocancel(3AIO) attempts to cancel the asynchronous request whose aio_result_t structure is given as an argument. An aiocancel(3AIO) call succeeds only if the request is still queued. If the operation is in progress, aiocancel(3AIO) fails.

aiowait(3AIO)

A call to aiowait(3AIO) blocks the calling process until at least one outstanding asynchronous I/O operation is completed. The timeout parameter points to a maximum interval to wait for I/O completion. A timeout value of zero specifies that no wait is wanted. aiowait(3AIO) returns a pointer to the aio_result_t structure for the completed operation.

poll(2)

To synchronously determine the completion of an asynchronous I/O event rather than depend on a SIGIO interrupt, use poll(2). You can also poll to determine the origin of a SIGIO interrupt.

Use of poll(2) for very large numbers of files is slow. This problem is resolved by poll(7D).

poll(7D)

/dev/poll provides a highly scalable way of polling a large number of file descriptors. This is provided through a new set of APIs and a new driver, /dev/poll. The /dev/poll API is an alternative to, not a replacement of poll(2). poll(7D) provides details and examples of the /dev/poll API. When used properly, the /dev/poll API scales much better than poll(2). It is especially suited for applications that satisfy the following criteria:

close(2)

Files are closed by calling close(2). close(2) cancels any outstanding asynchronous I/O request that can be. close(2) waits for an operation that cannot be cancelled (see "aiocancel(3AIO)"). When close(2) returns, there is no asynchronous I/O pending for the file descriptor. Only asynchronous I/O requests queued to the specified file descriptor are cancelled when a file is closed. Any I/O pending requests for other file descriptors are not cancelled.

Synchronized I/O

Applications might need to guarantee that information has been written to stable storage, or that file updates are performed in a particular order. Synchronized I/O provides for these needs.

Modes of Synchronization

Under SunOS 5.0 through 5.8, for a write operation, data is successfully transferred to a file when the system ensures that all written data is readable after any subsequent open of the file in the absence of a failure of the physical storage medium (even one that follows a system or power failure). For a read operation data is successfully transferred when an image of the data on the physical storage medium is available to the requesting process. An I/O operation is complete when either the associated data has been successfully transferred or the operation has been diagnosed as unsuccessful.

An I/O operation has reached synchronized I/O data integrity completion when:

For reads, the operation has been completed or diagnosed unsuccessful. The read is complete only when an image of the data has been successfully transferred to the requesting process. If there were any pending write requests affecting the data to be read at the time that the synchronized read operation was requested, these write requests are successfully transferred prior to reading the data.

For writes, the operation has been completed or diagnosed if unsuccessful. The write is complete only when the data specified in the write request is successfully transferred, and all file system information required to retrieve the data is successfully transferred.

File attributes that are not necessary for data retrieval (access time, modification time, status change time) are not transferred prior to returning to the calling process.

Synchronized I/O file integrity completion is identical to synchronized I/O data integrity completion with the addition that all file attributes relative to the I/O operation (including access time, modification time, status change time) must be successfully transferred prior to returning to the calling process.

Synchronizing a File

The fsync(3C) and fdatasync(3RT) functions explicitly synchronize a file to secondary storage.

fsync(3C) guarantees the function is synchronized at the I/O file integrity completion level, while fdatasync(3RT) guarantees the function is synchronized at the I/O data integrity completion level.

Applications can synchronize each I/O operation before the operation completes. Setting the O_DSYNC flag on the file description by open(2) or fcntl(2) ensures that all I/O writes (write(2) and aiowrite(3AIO)) have reached I/O data completion before the operation is indicated as completed. Setting the O_SYNC flag on the file description ensures that all I/O writes have reached completion before the operation is indicated as completed. Setting the O_RSYNC flag on the file description ensures that all I/O reads read(2) and aio_read(3RT)) have reached the same level of completion as request for writes by the setting, O_DSYNC or O_SYNC, on the descriptor.