STREAMS Programming Guide

Chapter 7 STREAMS Framework -Kernel Level

Because the STREAMS subsystem of UNIX^TM provides a framework on which communications services can be built, it iis often called the STREAMS framework. This framework consists of the Stream head and a series of utilities (put, putnext), kernel structures (mblk, dblk), and linkages (queues) that facilitate the interconnections between modules, drivers, and basic system calls. This chapter describes the STREAMS components from the kernel developer's perspective.

Overview of Streams in Kernel Space

Chapter 1, Overview of STREAMS describes a Stream as a full-duplex processing and data transfer path between a STREAMS driver in kernel space and a process in user space. In the kernel, a Stream consists of a Stream head, a driver, and zero or more modules between the driver and the Stream head.

The Stream head is the end of the Stream nearest the user process. All system calls made by user-level applications on a Stream are processed by the Stream head.

Messages are the containers in which data and control information is passed between the Stream head, modules, and drivers. The Stream head is responsible for translating the appropriate messages between the user application and the kernel. Messages are simply pointers to structures (mblk, dblk) that describe the data contained in them. Messages are categorized by type according to the purpose and priority of the message.

Queues are the basic elements by which the Stream head, modules, and drivers are connected. Queues identify the open, close, put, and service entry points. Additionally, queues specify parameters and private data for use by modules and drivers, and are the repository for messages destined for deferred processing.

In the rest of this chapter, the word "modules" refers to modules, drivers, or multiplexers, except where noted.

Stream Head

The Stream head interacts between applications in the user space and the rest of the Stream in kernel space. The Stream head is responsible for configuring the plumbing of the Stream through open, close, push, pop, link, and unlink operations. It also translates user data into messages to be passed down the Stream, and translates messages that arrive at the Stream head into user data. Any characteristics of the Stream that can be modified by the user application or the underlying Stream are controlled by the Stream head, which also informs users of data arrival and events such as error conditions.

If an application makes a system call with a STREAMS file descriptor, the Stream head routines are invoked, resulting in data copying, message generation, or control operations. Only the Stream head can copy data between the user space and kernel space. All other parts of the Stream pass data by way of messages and are thus isolated from direct interaction with users of the Stream.

Kernel-Level Messages

Chapter 3, STREAMS Application-Level Mechanisms discusses messages from the application perspective. The following sections discuss message types, message structure and linkage; how messages are sent and received; and message queues and priority from the kernel perspective.

Message Types

Several STREAMS messages differ in their purpose and queueing priority. The message types are briefly described and classified, according to their queueing priority, in Table 7-1. A detailed discussion of Message Types is in Chapter 8, Messages - Kernel Level.

Some message types are defined as high-priority types. The others can have a normal priority of 0, or a priority (also called a band) from 1 to 255.

Table 7-1 Ordinary Messages, Showing Direction of Communication Flow


Ordinary Messages (also called Normal Messages)		Direction
`M_BREAK`	Request to a Stream driver to send a "break"	Upstream
`M_CTL`	Control or status request used for intermodule communication	Bidirectional
`M_DATA`	User data message for I/O system calls	Bidirectional
`M_DELAY`	Request for a real-time delay on output	Downstream
`M_IOCTL`	Control/status request generated by a Stream head	Downstream
`M_PASSFP`	File pointer-passing message	Bidirectional
`M_PROTO`	Protocol control information	Bidirectional
`M_SETOPTS`	Sets options at the Stream head; sends upstream	Upstream
`M_SIG`	Signal sent from a module or driver	Upstream

Table 7-2 High-Priority Messages, Showing Direction of Communication Flow


High-Priority Messages:		Direction
`M_COPYIN`	Copies in data for transparent ioctl(2)s	Upstream
`M_COPYOUT`	Copies out data for transparent ioctl(2)s	Upstream
`M_ERROR`	Reports downstream error condition	Upstream
`M_FLUSH`	Flushes module queue	Bidirectional
`M_HANGUP`	Sets a Stream head hangup condition	Upstream
`M_UNHANGUP`	Reconnects line, sends upstream when hangup reverses	Upstream
`M_IOCACK`	Positiveioctl(2) acknowledgment	Upstream
`M_IOCDATA`	Data for transparent ioctl(2)s, sent downstream	Downstream
`M_IOCNAK`	Negative ioctl(2) acknowledgment	Upstream
`M_PCPROTO`	Protocol control information	Bidirectional
`M_PCSIG`	Sends signal from a module or driver	Upstream
`M_READ`	Read notification; sends downstream	Downstream
`M_START`	Restarts stopped device output	Downstream
`M_STARTI`	Restarts stopped device input	Downstream
`M_STOP`	Suspends output	Downstream
`M_STOPI`	Suspends input	Downstream

Message Structure

A STREAMS message in its simplest form contains three elements--a message block, a data block, and a data buffer. The data buffer is the location in memory where the actual data of the message is stored. The data block (datab(9S) describes the data buffer--where it starts, where it ends, the message types, and how many message blocks reference it. The message block (msgb(9S)) describes the data block and how the data is used.

The data block has a typedef of dblk_t and has the following public elements:

struct datab {
	unsigned char       *db_base;          /* first byte of buffer */
	unsigned char       *db_lim;           /* last byte+1 of buffer */
	unsigned char        db_ref;           /* msg count ptg to this blk */
	unsigned char        db_type;          /* msg type */
};

typedef struct datab dblk_t;

The datab structure specifies the data buffers' fixed limits (db_base and db_lim), a reference count field (db_ref), and the message type field (db_type). db_base points to the address where the data buffer starts, db_lim points one byte beyond where the data buffer ends, db_ref maintains a count of the number of message blocks sharing the data buffer.

Note -

db_base, db_lim, and db_ref should not be modified directly. db_type is modified under carefully monitored conditions, such as changing the message type to reuse the message block.

In a simple message, the message block references the data block, identifying for each message the address where the message data begins and ends. Each simple message block refers to the data block to identify these addresses, which must be within the confines of the buffer such that db_base >= b_rptr >=>= b_wptr >= db_lim. For ordinary messages, a priority band can be indicated, and this band is used if the message is queued.

Figure 7-1 shows the linkages between msgb, datab, and data buffer in a simple message.

Figure 7-1 Simple Message Referencing the Data Block

The message block has a typedef of mblk_t and has the following public elements:

struct msgb {
	struct msgb            *b_next;      /*next msg on queue*/
	struct msgb            *b_prev;      /*previous msg on queue*/
	struct msgb            *b_cont;      /*next msg block of message*/
	unsigned char          *b_rptr;      /*1st unread byte in bufr*/
	unsigned char          *b_wptr;      /*1st unwritten byte in bufr*/
	struct datab           *b_datap;     /*data block*/
	unsigned char           b_band;      /*message priority*/
	unsigned short          b_flag;      /*message flags*/
};

The STREAMS framework uses the b_next and b_prev fields to link messages into queues. b_rptr and b_wptr specify the current read and write pointers respectively, in the data buffer pointed to by b_datap. The fields b_rptr and b_wptr are maintained by drivers and modules.

The field b_band specifies a priority band where the message is placed when it is queued using the STREAMS utility routines. This field has no meaning for high-priority messages and is set to zero for these messages. When a message is allocated using allocb(9F), the b_band field is initially set to zero. Modules and drivers can set this field to a value from 0 to 255 depending on the number of priority bands needed. Lower numbers represent lower priority. The kernel incurs overhead in maintaining bands if nonzero numbers are used.

Note -

Message block data elements must not modify b_next, b_prev, or b_datap. The first two fields are modified by utility routines such as putq(9F) and getq(9F). Message block data elements can modify b_cont, b_rptr, b_wptr, b_band (for ordinary messages types), and b_flag.

Note -

SunOS has b_band in the msgb structure. Some other STREAMS implementations place b_band in the datab structure. The SunOS implementation is more flexible because each message is independent. For shared data blocks, the b_band can differ in the SunOS implementation, but not in other implementations.

Message Linkage

A complex message can consist of several linked message blocks. If buffer size is limited or if processing expands the message, multiple message blocks are formed in the message, as shown in Figure 7-2. When a message is composed of multiple message blocks, the type associated with the first message block determines the overall message type, regardless of the types of the attached message blocks.

Figure 7-2 Linked Message Blocks

Queued Messages

A put procedure processes single messages immediately and can pass the message to the next module's put procedure using put or putnext. Alternatively, the message is linked on the message queue for later processing, to be processed by a module's service procedure (putq(9F)). Note that only the first module of a set of linked modules is linked to the next message in the queue.

Think of linked message blocks as a concatenation of messages. Queued messages are a linked list of individual messages that can also be linked message blocks.

Figure 7-3 Queued Messages

In Figure 7-3 messages are queued: Message 1 being the first message on the queue, followed by Message 2 and Message 3. Notice that Message 1 is a linked message consisting of more than one mblk.

Note -

Modules or drivers must not modify b_next and b_prev. These fields are modified by utility routines such as putq(9F) and getq(9F).

Shared Data

In Figure 7-4, two message blocks are shown pointing to one data block. db_ref indicates that there are two references (mblks) to the data block. db_base and db_lim point to an address range in the data buffer. The b_rptr and b_wptr of both message blocks must fall within the assigned range specified by the data block.

Figure 7-4 Shared Data Block

Data blocks are shared using utility routines (see dupmsg(9F) or dupb(9F)). STREAMS maintains a count of the message blocks sharing a data block in the db_ref field.

These two mblks share the same data and datablock. If a module changes the contents of the data or message type, it is visible to the owner of the message block.

When modifying data contained in the dblk or data buffer, if the reference count of the message is greater than one, the module should copy the message using copymsg(9F), free the duplicated message, and then change the appropriate data.

STREAMS provides utility routines and macros (identified in Appendix B "STREAMS Utilities"), to assist in managing messages and message queues, and in other areas of module and driver development. Always use utility routines to operate on a message queue or to free or allocate messages. If messages are manipulated in the queue without using the STREAMS utilities, the message ordering can become confused and cause inconsistent results.

Caution -

Not adhering to the DDI/DKI can result in panics and system crashes.

Sending and Receiving Messages

Among the message types, the most commonly used messages are M_DATA, M_PROTO, and M_PCPROTO. These messages can be passed between a process and the topmost module in a Stream, with the same message boundary alignment maintained between user and kernel space. This allows a user process to function, to some degree, as a module above the Stream and maintain a service interface. M_PROTO and M_PCPROTO messages carry service interface information among modules, drivers, and user processes.

Modules and drivers do not interact directly with any interfaces except open(2) and close(2). The Stream head translates and passes all messages between user processes and the uppermost STREAMS module. Message transfers between a process and the Stream head can occur in different forms. For example, M_DATA and M_PROTO messages can be transferred in their direct form by getmsg(2) and putmsg(2). Alternatively, write(2) creates one or more M_DATA messages from the data buffer supplied in the call. M_DATA messages received at the Stream head are consumed by read(2) and copied into the user buffer.

Any module or driver can send any message in either direction on a Stream. However, based on their intended use in STREAMS and their treatment by the Stream head, certain messages can be categorized as upstream, downstream, or bidirectional. For example, M_DATA, M_PROTO, or M_PCPROTO messages can be sent in both directions. Other message types such as M_IOACK are sent upstream to be processed only by the Stream head. Messages to be sent downstream are silently discarded if received by the Stream head. Table 7-1 and Table 7-2 indicate the intended direction of message types.

STREAMS lets modules create messages and pass them to neighboring modules. read(2) and write(2) are not enough to allow a user process to generate and receive all messages. In the first place, read(2) and write(2) are byte-stream oriented with no concept of message boundaries. The message boundary of each service primitive must be preserved so that the start and end of each primitive can be located in order to support service interfaces. Furthermore, read(2) and write(2) offer only one buffer to the user for transmitting and receiving STREAMS messages. If control information and data is placed in a single buffer, the user has to parse the contents of the buffer to separate the data from the control information.

getmsg(2) and putmsg(2) let a user process and the Stream pass data and control information between one another while maintaining distinct message boundaries.

Message Queues and Message Priority

Message queues grow when the STREAMS scheduler is delayed from calling a service procedure by system activity, or when the procedure is blocked by flow control. When called by the scheduler, a module's service procedure processes queued messages in a FIFO manner (getq(9F)). However, some messages associated with certain conditions, such as M_ERROR, must reach their Stream destination as rapidly as possible. This is accomplished by associating priorities with the messages. These priorities imply a certain ordering of messages in the queue, as shown in Figure 7-5. Each message has a priority band associated with it. Ordinary messages have a priority band of zero. The priority band of high-priority messages is ignored since they are high priority and thus not affected by flow control. putq(9F) places high-priority messages at the head of the message queue, followed by priority band messages (expedited data) and ordinary messages.

Figure 7-5 Message Ordering in a Queue

When a message is queued, it is placed after the messages of the same priority already in the queue (in other words, FIFO within their order of queueing). This affects the flow-control parameters associated with the band of the same priority. Message priorities range from 0 (normal) to 255 (highest). This provides up to 256 bands of message flow within a Stream. An example of how to implement expedited data would be with one extra band of data flow (priority band 1), as shown in Figure 7-6. Queues are explained in detail in the next section.

Figure 7-6 Message Ordering with One Priority Band

High-priority messages are not subject to flow control. When they are queued by putq(9F), the associated queue is always scheduled, even if the queue has been disabled (noenable(9F)). When the service procedure is called by the Stream's scheduler, the procedure uses getq(9F) to retrieve the first message on queue, which is a high-priority message. Service procedures must be implemented to act on high-priority messages immediately. The mechanisms just mentioned--priority message queueing, absence of flow control, and immediate processing by a procedure--result in rapid transport of high-priority messages between the originating and destination components in the Stream.

Note -

In general, high-priority messages should be processed immediately by the module's put procedure and not placed on the service queue.

Caution -

A service procedure must never queue a high-priority message on its own queue because an infinite loop results. The enqueuing triggers the queue to be immediately scheduled again.

Queues

The queue is the fundamental component of a Stream. It is the interface between a STREAMS module and the rest of the Stream, and is the repository for deferred message processing. For each instance of an open driver or pushed module or Stream head, a pair of queues is allocated, one for the read side of the Stream and one for the write side.

The RD(9F), WR(9F), and OTHERQ(9F) routines allow reference of one queue from the other. Given a queue RD(9F) returns a pointer to the read queue, WR(9F) returns a pointer to the write queue and OTHERQ(9F) returns a pointer to the opposite queue of the pair. Also see QUEUE(9S).

By convention, queue pairs are depicted graphically as side- by-side blocks, with the write queue on the left and the read queue on the right (see Figure Figure 7-7).

Figure 7-7 Queue Pair Allocation

queue(9S) Structure

As previously discussed, messages are ordered in message queues. Message queues, message priority, service procedures, and basic flow control all combine in STREAMS. A service procedure processes the messages in its queue. If there is no service procedure for a queue, putq(9F) does not schedule the queue to be run. The module developer must ensure that the messages in the queue are processed. Message priority and flow control are associated with message queues.

The queue structure is defined in stream.h as the typedef queue_t, and has the following public elements:

struct  qinit   *q_qinfo;       /* procs and limits for queue */
struct  msgb    *q_first;       /* first data block */
struct  msgb    *q_last;        /* last data block */
struct  queue   *q_next;        /* Q of next stream */
struct  queue   *q_link;        /* to next Q for scheduling */
void            *q_ptr;         /* to private data structure */
size_t          q_count;        /* number of bytes on Q */
uint            q_flag;         /* queue state */
ssize_t         q_minpsz;       /* min packet size accepted by this module */
ssize_t         q_maxpsz;       /* max packet size accepted by this module */
size_t          q_hiwat;        /* queue high-water mark */
size_t          q_lowat;        /* queue low-water mark */

q_first points to the first message on the queue, and q_last points to the last message on the queue. q_count is used in flow control and contains the total number of bytes contained in normal and high-priority messages in band 0 of this queue. Each band is flow controlled individually and has its own count. See "qband(9S) Structure " for more details. qsize(9F) can be used to determine the total number of messages on the queue. q_flag indicates the state of the queue. See Table 7-3 for the definition of these flags.

q_minpsz contains the minimum packet size accepted by the queue, and q_maxpsz contains the maximum packet size accepted by the queue. These are suggested limits, and some implementations of STREAMS may not enforce them. The SunOS^TM Stream head enforces these values but is voluntary at the module level. Design modules to handle messages of any size.

q_hiwat indicates the limiting maximum number of bytes that can be put on a queue before flow control occurs. q_lowat indicates the lower limit where STREAMS flow control is released.

q_ptr is the element of the queue structure where modules can put values or pointers to data structures that are private to the module. This data can include any information required by the module for processing messages passed through the module, such as state information, module IDs, routing tables, and so on. Effectively, this element can be used any way the module or driver writer chooses. q_ptr can be accessed or changed directly by the driver, and is typically initialized in the open(9E) routine.

When a queue pair is allocated, streamtab initializes q_qinfo, and module_info initializes q_minpsz, q_maxpsz, q_hiwat, and q_lowat. Copying values from the module_info structure allows them to be changed in the queue without modifying the streamtab and module_info values.

Queue Flags

Be aware of the following queue flags. See queue(9S).

Table 7-3 Queue Flags


`QENAB`	The queue is enabled to run the service procedure.
`QFULL`	The queue is full.
`QREADR`	Set for all read queues.
`QNOENB`	Do not enable the queue when data is placed on it.

Using Queue Information

The q_first, q_last, q_count, and q_flags components must not be modified by the module, and should be accessed using strqget(9F). The values of q_minpsz, q_maxpsz, q_hiwat, and q_lowat are accessed through strqget(9F), and are modified by strqset(9F). q_ptr can be accessed and modified by the module and contains data private to the module.

All other accesses to fields in the queue(9S) structure should be made through STREAMS utility routines (see Appendix B, "STREAMS Utilities"). Modules and drivers should not change any fields not explicitly listed previously.

strqget(9F) lets modules and drivers get information about a queue or particular band of the queue. This insulates the STREAMS data structures from the modules and drivers. The syntax of the routine is:

int
strqget(queue_t *q, qfields_t what, unsigned char pri, void *valp)

q specifies from which queue the information is to be retrieved; what defines the queue_t field value to obtain (see the following code example). pri identifies a specific priority band. The value of the field is returned in valp. The fields that can be obtained are defined in <sys/stream.h> and shown here as:

QHIWAT              /* high-water mark */
QLOWAT              /* low-water mark */
QMAXPSZ             /* largest packet accepted */
QMINPSZ             /* smallest packet accepted */
QCOUNT              /* approx. size (in bytes) of data */
QFIRST              /* first message */
QLAST               /* last message */
QFLAG               /* status */

strqset(9F) lets modules and drivers change information about a queue or a band of the queue. This also insulates the STREAMS data structures from the modules and drivers. Its format is:

int 
strqset(queue_t *q. qfields_t what, unsigned char pri, intptr_t val)

The q, what, and pri fields are the same as in strqget(9F), but the information to be updated is provided in val instead of through a pointer. If the field is read-only, EPERM is returned and the field is left unchanged. The following fields are read-only: QCOUNT, QFIRST, QLAST, and QFLAG.

Entry Points

The q_qinfo component points to a qinit structure. This structure defines the module's entry point procedures for each queue, which include the following:

int        (*qi_putp)();       /* put procedure */
int        (*qi_srvp)();       /* service procedure */
int        (*qi_qopen)();      /* open procedure */
int        (*qi_qclose)();     /* close procedure */
struct module_info *qi_minfo;  /* module information structure */

There is generally a unique q_init structure associated with the read queue and the write queue. qi_putp identifies the put procedure for the module. qi_srvp identifies the optional service procedure for the module.

The open and close entry points are required for the read-side queue. The put procedure is generally required on both queues and the service procedure is optional.

If the put procedure is not defined and a subsequent put is done to the module, a panic occurs. As a precaution, putnext should be declared as the module's put procedure.

If a module only requires a service procedure, putq(9F) can be used as the module's put procedure. If the service procedure is not defined, the module's put procedure must not queue data (putq(9F)).

The qi_qopen member of the read-side qinit structure identifies the open(9E) entry point of the module. The qi_qclose member of the read-side qinit structure identifies the close(9E) entry point of the module.

The qi_minfo member points to the module_info(9S) structure.

struct module_info {
		ushort   mi_idnum;          /* module id number */
		char     *mi_idname;        /* module name */
		ssize_t  mi_minpsz;         /* min packet size accepted */
		ssize_t  mi_maxpsz;         /* max packet size accepted */
		size_t   mi_hiwat;          /* hi-water mark */
		size_t   mi_lowat;          /* lo-water mark */
};

mi_idnum is the module's unique identifier defined by the developer and used in strlog(9F). mi_idname is an ASCII string containing the name of the module. mi_minpsz is the initial minimum packet size of the queue. mi_maxpsz is the initial maximum packet size of the queue. mi_hiwat is the initial high-water mark of the queue. mi_lowat is the initial low-water mark of the queue.

`open` Routine

The open routine of a device is called once for the initial open of the device, then is called again on subsequent reopens of the Stream. Module open routines are called once for the initial push onto the Stream and again on subsequent reopens of the Stream. See open(9E).

Figure 7-8 Order of a Module's `open` Procedure

The Stream is analogous to a stack. Initially the driver is opened and, as modules are pushed onto the Stream, their open routines are invoked. Once the Stream is built, this order reverses if a reopen of the Stream occurs. For example, while building the Stream shown in Figure 7-8, device A's open routine is called, followed by B's and C's when they are pushed onto the Stream. If the Stream is reopened, Module C's open routine is called first, followed by B's, and finally by A's.

Usually the module or driver does not check this, but the issue is raised so that dependencies on the order of open routines are not introduced by the programmer. Note that although an open can happen more than once, close is only called once. See the next section on the close routine for more details. If a file is duplicated (dup(2)) the Stream is not reopened.

The syntax of the open entry point is:

int prefix open(queue_t *q, dev_t *devp, int oflag, int sflag, cred_t *cred_p)

q -- Pointer to the read queue of this module.

devp -- Pointer to a device number that is always associated with the device at the end of the Stream. Modules cannot modify this value, but drivers can, as described in Chapter 9, STREAMS Drivers.

oflag -- For devices, oflag can contain the following bit mask values: FEXCL, FNDELAY, FREAD, and FWRITE. See Chapter 9, STREAMS Drivers for more information on drivers.

sflag -- When the open is associated with a driver, sflag is set to 0 or CLONEOPEN, see Chapter 9, STREAMS Drivers, "Cloning" for more details. If the open is associated with a module, sflag contains the value MODOPEN.

cred_p Pointer to the user credentials structure.

the open routines to devices are serialized (if more than one process attempts to open the device, only one proceeds and the others wait until the first finishes). Interrupts are not blocked during an open. So the driver's interrupt and open routines must allow for this. See Chapter 9, STREAMS Drivers for more information.

The open routines for both drivers and modules have user context. For example, they can do blocking operations, but the blocking routine should return in the event of a signal. In other words, q_wait_sig is allowed, but q_wait is not.

If the module or driver is to allocate a controlling terminal, it should send an M_SETOPTS message with SO_ISTTY set to the Stream head.

The open routine usually initializes the q_ptr member of the queue. q_ptr is generally initialized to some private data structure that contains various state information private to the module or driver. The module's close routine is responsible for freeing resources allocated by the module including q_ptr. Example 7-1 shows a simple open routine.

Example 7-1 A Simple `open` Routine

/* example of a module open */
int xx_open(queue_t *q, dev_t *devp, int oflag, int sflag, cred_t *crp)
{
	struct xxstr *xx_ptr;

	xx_ptr = kmemzalloc(sizeof(struct xxstr), KM_SLEEP);
	xx_ptr->xx_foo = 1;
	q->q_ptr = WR(q)->q_ptr = xx_ptr;
	qprocson(q);
	return (0);
}

In a multithreaded environment, data can flow up the Stream during the open. A module receiving this data before its open routine finishes initialization can panic. To eliminate this problem, modules and drivers are not linked into the Stream until qprocson(9F) is called. In other words, messages flow around the module until qprocson(9F) is called. Figure 7-9 illustrates this process.

Figure 7-9 Messages Flowing Around the Module Before `qprocson`

The module or driver instance is guaranteed to be single-threaded before qprocson(9F) is called, except for interrupts or callbacks that must be handled separately. qprocson(9F) must be called before calling qbufcall(9F), qtimeout(9F), qwait(9F), or qwait_sig(9F) .

`close` Routine

The close routine of devices is called only during the last close of the device. Module close routines are called during the last close or if the module is popped.

The syntax of the close entry point is:

int prefix close (queue *q, int flag, cred_t * cred_p)

q is a pointer to the read queue of the module. flag is analogous to the oflag parameter of the open entry point. If FNBLOCK or FNDELAY is set, then the module should attempt to avoid blocking during the close. cred_p is a pointer to the user credential structure.

Like open, the close entry point has user context and can block. Likewise, the blocking routines should return in the event of a signal. Device drivers must take into consideration that interrupts are not blocked during close. The close routine must cancel all pending timeout(9F) and qbufcall(9F) callbacks, and process any remaining data on its service queue.

The open and close procedures are only used on the read side of the queue and can be set to NULL in the write-side qinit structure initialization. Example 7-3 shows an example of a module close routine.

Example 7-2 Example of a Module Close

/* example of a module close */
static int
xx_close(queue_t *, *rq, int flag, cred_t *credp)
{
		struct xxstr   *xxp;
  
   /*
    * Disable xxput() and xxsrv() procedures on this queue.
    */
		qprocsoff(rq);
		xxp = (struct xxstr *) rq->q_ptr;

   /*
    * Cancel any pending timeout.
    * This example assumes that the timeout was issued
    * against the write queue.
    */

		if (xxp->xx_timeoutid != 0) {
			(void) quntimeout(WR(rq), xxp->xx_timeoutid);
			xxp->xx_timeoutid=0;
    }
   /*
    * Cancel any pending bufcalls.
    * This example assumes that the bufcall was issued
    * against the write queue.
    */
		if (xxp->xx_bufcallid !=0) {
			(void) qunbufcall(WR(rq), xxp->xx_bufcallid);
 		xxp->xx_bufcallid = 0;
		}
		rq->q_ptr = WR(rq)->q_ptr = NULL;

   /*
    * Free resources allocated during open
    */
		kmem_free (xxp, sizeof (struct xxstr));
		return (0);
}

The `put` Procedure

The put procedure passes messages from the queue of a module to the queue of the next module. The queue's put procedure is invoked by the preceding module to process a message immediately (see put(9F) and putnext(9F)). Almost all modules and drivers must have a put routine. The exception is that the read-side driver does not need a put routine because there can be no downstream component to call the put.

A driver's put procedure must do one of:

Process and free the message
Process and route the message back upstream
Queue the message to be processed by the driver's service procedure

All M_IOCTL type messages must be acknowledged through M_IOACK or rejected through M_IOCNACK. M_IOCTL messages should not be freed. Drivers must free any unrecognized message.

A module's put procedure must do one of the following as shown in Example 7-3:

Process and free the message
Process the message and pass it to the next module or driver
Queue the message to be processed later by the module's service procedure

Unrecognized messages are passed to the next module or driver. The Stream operates more efficiently when messages are processed in the put procedure. Processing a message with the service procedure imposes some latency on the message.

Example 7-3 Flow of `put` Procedure

If the next module is flow controlled (see canput(9F) and canputnext(9F)), the put procedure can queue the message for processing by the next service procedure (see putq(9F)). The put routine is always called before the component's corresponding srv(9E) service routine, so always use put for immediate message processing.

The preferred naming convention for a put procedure reflects the direction of the message flow. The read put procedure is suffixed by r (rput), and the write procedure by w (wput). For example, the read-side put procedure for module xx is declared as int xxrput (queue_t *q, mblk_t *mp). The write-side put procedure is declared as: int xxwput(queue_t *q, mblk_t *mp), where q points to the corresponding read or write queue and mp points to the message to be processed.

Although high-priority messages can be placed on the service queue, it is better to process them immediately in the put procedure. Place ordinary or priority-band messages on the service queue (putq(9F)) if:

The Stream has been flow controlled, that is, canput fails.
There are already messages on the service queue, that is, q_first is not NULL.
Deferred processing is desired.

If other messages already exist on the queue, and the put procedure does not queue new messages (provided they are not high-priority), messages are reordered. If the next module is flow controlled (see canput(9F) and canputnext(9F)). the put procedure can queue the message for processing by the service procedure (see putq(9F)).

/*example of a module put procedure */
int
xxrput(queue_t *,mblk_t, *mp)
{
   /*
    * If the message is a high-priority message or
    * the next module is not flow controlled and we have not
    * already deferred processing, then:
    */

    if (mp->b_datap->db_type >= QPCTL ||
             (canputnext(q) && q->q_first == NULL)) {
        /*
         * Process message
         */

         .
         .
         .
         putnext(q,mp);
    } else {
         /*
          * put message on service queue
          */
          putq(q,mp);
     }
     return (0);
}

A module need not process the message immediately, and can queue it for later processing by the service procedure (see putq(9F)).

The SunOS STREAMS framework is multithreaded. Unsafe (nonmultithreaded) modules are not supported. To make multithreading of modules easier, the SunOS STREAMS framework uses perimeters (see "MT STREAMS Perimeters" in Chapter 12, MultiThreaded STREAMS for information). Perimeters are a facility fore specifing that the framework provide exclusive access for the entire module, queue pair, or an individual queue. Perimeters make it easier to deal with multithreaded issues, such as message ordering and recursive locking.

Caution -

Mutex locks must not be held across a call to put(9F), putnext(9F), or qreply(9F).

Because of the asynchronous nature of STREAMS, don't assume that a module's put procedure has been called just because put(9F), putnext(9F), or qreply(9F) has returned.

`service` Procedure

A queue's service procedure is invoked to process messages on the queue. It removes successive messages from the queue, processes them, and calls the put procedure of the next module in the Stream to give the processed message to the next queue.

The service procedure is optional. A module or driver can use a service procedure for the following reasons:

Streams flow control is implemented by service procedures. If the next component on the Stream has been flow controlled, the put procedure can queue the message. (See "Flow Control in Service Procedures" in Chapter 7, STREAMS Framework -Kernel Level for more on Flow Control.)
Resource allocation recovery. If a put or service procedure cannot allocate a resource, such as memory, the message is usually queued to process later.
A device driver can queue a message and get out of interrupt context.
To combine multiple messages into larger messages.

The service procedure is invoked by the STREAMS scheduler. A STREAMS service procedure is scheduled to run if:

The queue is not disabled (noenable(9F)) and
- The message being queued (putq(9F)) is the first message on the queue,
- The message being queued (putq(9F)) is a priority band message,
The message being queued (putq(9F) or putbq(9F)) is a high-priority message,
The queue has been back enabled because flow control has been relieved,
The queue has been explicitly enabled (qenable(9F)).

A service procedure usually processes all messages on its queue (getq(9F)) or takes appropriate action to ensure it is reenabled (qenable(9F)) at a later time. Figure 7-11 shows the flow of a service procedure.

Caution -

High-priority messages must never be placed back on a service queue (putbq(9F)); this can cause an infinite loop.

Figure 7-10 Flow of `service` Procedure

Example 7-4 shows a module service procedure.

Example 7-4 A Module `service` Procedure

/*example of a module service procedure */
int
xxrsrv(queue_t *q)
{
		mblk_t *mp;
   /*
    * While there are still messages on our service queue
    */
		while ((mp = getq(q) != NULL) {
			/*
       * We check for high priority messages, but
       * none is ever seen since the put procedure
       * never queues them.
       * If the next module is not flow controlled, then
       */
      if (mp->b_datap->db_type >= QPCTL || (canputnext (q)) {
				/*
				 * process message
				 */
				.
				.
				.
				putnext (q, mp);
			} else {
				/*
          * put message back on service queue
          */
				putbq(q,mp);
				break;
			}
		}
		return (0);
}

qband(9S) Structure

The queue flow information for each band, other than band 0, is contained in a qband structure. This structure is not visible to other modules. For accessible information see strqget(9F) and strqset(9F). qband(9S) is defined as follows:

typedef struct qband {
		struct  qband    *qb_next;   /* next band's info */
		size_t  qb_count;            /* number of bytes in band */
		struct  msgb     *qb_first;  /* beginning of band's data */
		struct  msgb     *qb_last;   /* end of band's data */
		size_t  qb_hiwat;            /* high- water mark for band */
		size_t  qb_lowat;            /* low-water mark for band */
		uint    qb_flag;             /* see below */
} qband_t;/

The structure contains pointers to the linked list of messages in the queue. These pointers, qb_first and qb_last, denote the beginning and end of messages for the particular band. The qb_count field is analogous to the queue's q_count field. However, qb_count only applies to the messages in the queue in the band of data flow represented by the corresponding qband structure. In contrast, q_count only contains information regarding normal and high-priority messages.

Each band has a separate high and low watermark, qb_hiwat and qb_lowat. These are initially set to the queue's q_hiwat and q_lowat respectively. Modules and drivers can change these values through the strqset(9F) function. One flag defined for qb_flag is QB_FULL, which denotes that the particular band is full.

The qband(9S) structures are not preallocated per queue. Rather, they are allocated when a message with a priority greater than zero is placed in the queue using putq(9F), putbq(9F), or insq(9F). Since band allocation can fail, these routines return 0 on failure and 1 on success. Once a qband(9S) structure is allocated, it remains associated with the queue until the queue is freed. strqset(9F) and strqget(9F) cause qband(9S) allocation. Sending a message to a band causes all bands up to and including that one to be created.

Using qband(9S) Information

The STREAMS utility routines should be used when manipulating the fields in the queue and qband structures. strqget(9F) and strqset(9F) are used to access band information.

Drivers and modules can change the qb_hiwat and qb_lowat fields of the qband structure. Drivers and modules can only read the qb_count, qb_first, qb_last, and qb_flag fields of the qband structure. Only the fields listed previously can be referenced. There are fields in the structure that are reserved and are not documented.

Figure 7-11 shows a queue with two extra bands of flow.

Figure 7-11 Data Structure Linkage

Several routines are provided to aid you in controlling each priority band of data flow. These routines are

flushband(9F) is discussed in "Flush Handling". bcanputnext(9F) is discussed in "Flow Control in Service Procedures", and the other two routines are described in the following section. Appendix B, STREAMS Utilities also has a description of these routines.

Message Processing

Typically, put procedures are required in pushable modules, but service procedures are optional. If the put routine queues messages, there must exist a corresponding service routine that handles the queued messages. If the put routine does not queue messages, the service routine need not exist.

Example 7-3 shows typical processing flow for a put procedure which works as follows:

A message is received by the put procedure associated with queue, where some processing can be performed on the message.
The put procedure determines if the message can be sent to the next module by the use of canput(9F) or canputnext(9F).
If the next module is flow controlled, the put procedure queues the message using putq(9F).
putq(9F) places the message in the queue based on its priority.
Then, putq(9F) makes the queue ready for execution by the STREAMS scheduler, following all other queues currently scheduled.
If the next module is not flow controlled, the put procedure does any processing needed on the message and sends it to the next module using putnext(9F). Note that if the module does not have a service procedure it cannot queue the message, and must process and send the message to the next module.

Figure 7-10 shows typical processing flow for a service procedure that works as follows:

When the system goes from kernel mode to user mode, the STREAMS scheduler calls the service procedure.
The service procedure gets the first message (q_first) from the message queue using the getq(9F) utility.
The put procedure determines if the message can be sent to the next module using canput(9F) or canputnext(9F).
If the next module is flow controlled, the put procedure requeues the message with putbq(9F), and then returns.
If the next module is not flow controlled, the service procedure processes the message and passes it to the put procedure of the next queue with putnext(9F).
The service procedure gets the next message and processes it. This processing continues until the queue is empty or flow control blocks further processing. The service procedure returns to the caller.

Caution -

A service or put procedure must never block since it has no user context. It must always return to its caller.

If no processing is required in the put procedure, the procedure does not have to be explicitly declared. Rather, putq(9F) can be placed in the qinit(9S) structure declaration for the appropriate queue side to queue the message for the service procedure. For example:

static struct qinit winit = { putq, modwsrv, ...... };

More typically, put procedures process high-priority messages to avoid queueing them.

Device drivers associated with hardware are examples of STREAMS devices that might not have a put procedure. Since there are no queues below the hardware level, another module does not call the module's put procedure. Data comes into the Stream from an interrupt routine, and is either processed or queued it for the service procedure.

A STREAMS filter is an example of a module without a service procedure &mdash; messages passed to it are either passed or filtered. Flow control is described in "Flow Control in Service Procedures".

The key attribute of a service procedure in the STREAMS architecture is delayed processing. When a service procedure is used in a module, the module developer is implying that there are other, more time-sensitive activities to be performed elsewhere in this Stream, in other Streams, or in the system in general.

Note -

The presence of a service procedure is mandatory if the flow control mechanism is to be utilized by the queue. If you don't implement flow control, it is possible to overflow queues and hang the system.

Flow Control in Service Procedures

The STREAMS flow control mechanism is voluntary and operates between the two nearest queues in a Stream containing service procedures (see Figure 7-12). Messages are held on a queue only if a service procedure is present in the associated queue.

Messages accumulate on a queue when the queue's service procedure processing does not keep pace with the message arrival rate, or when the procedure is blocked from placing its messages on the following Stream component by the flow control mechanism. Pushable modules can contain independent upstream and downstream limits. The Stream head contains a preset upstream limit (which can be modified by a special message sent from downstream) and a driver can contain a downstream limit. See M_SETOPTS for more information.

Flow control operates as follows:

Each time a STREAMS message-handling routine (for example, putq(9F)) adds or removes a message from a message queue, the limits are checked. STREAMS calculates the total size of all message blocks (bp->b_wptr - bp->b_rptr) on the message queue.
The total is compared to the queue high and low watermark values. If the total exceeds the high watermark value, an internal full indicator is set for the queue. The operation of the service procedure in this queue is not affected if the indicator is set, and the service procedure continues to be scheduled.
The next part of flow control processing occurs in the nearest preceding queue that contains a service procedure. In Figure 7-12, if D is full and C has no service procedure, then B is the nearest preceding queue.

Figure 7-12

The service procedure in B uses canputnext(9F) to check if a queue ahead is marked full. If messages cannot be sent, the scheduler blocks the service procedure in B from further execution. B remains blocked until the low watermark of the full queue, D, is reached.
While B is blocked, any messages except high-priority messages arriving at B accumulate on its message queue (high-priority messages are not subject to flow control). Eventually, B can reach a full state and the full condition propagates back to the preceding module in the Stream.
When the service procedure processing on D causes the message block total to fall below the low watermark, the full indicator is turned off. STREAMS then schedules the nearest preceding blocked queue (B in this case). This automatic scheduling is called back-enabling a queue.

Modules and drivers need to observe the message priority. High-priority messages, determined by the type of the first block in the message,

(mp)->b_datap->db_type >= QPCTL

are not subject to flow control. They should be processed immediately and forwarded, as appropriate.

For ordinary messages, flow control must be tested before any processing is performed. canputnext(9F) determines if the forward path from the queue is blocked by flow control.

This is the general flow control processing of ordinary messages:

Retrieve the message at the head of the queue with getq(9F).
Determine if the message type is high priority and not to be processed here.
If so, pass the message to the put procedure of the following queue with putnext(9F).
Use canputnext(9F) to determine if messages can be sent onward.
If messages cannot be forwarded, put the message back in the queue with putbq(9F) and return from the procedure.

Caution -

High-priority messages must be processed and not placed back on the queue.

Otherwise, process the message.

The canonical representation of this processing within a service procedure is:

while (getq() != NULL)
	if (high priority message || no flow control) {
		process message
		putnext()
	} else {
		putbq()
		return
	}

Expedited data has its own flow control with the same processing method as that of ordinary messages. bcanputnext(9F) provides modules and drivers with a test of flow control in a priority band. It returns 1 if a message of the given priority can be placed in the queue. It returns 0 if the priority band is flow controlled. If the band does not exist in the queue in question, the routine returns 1.

If the band is flow controlled, the higher bands are not affected. However, lower bands are also stopped from sending messages. Without this, lower priority messages can be passed along ahead of the flow-controlled higher priority messages.

The call bcanputnext(q, 0); is equivalent to the call canputnext(q);.

Note -

A service procedure must process all messages in its queue unless flow control prevents this.

A service procedure must continue processing messages from its queue until getq(9F) returns NULL. When an ordinary message is queued by putq(9F), it causes the service procedure to be scheduled only if the queue was previously empty, and a previous getq(9F) call returns NULL (that is, the QWANTR flag is set). If there are messages in the queue, putq(9F) presumes the service procedure is blocked by flow control and the procedure is automatically rescheduled by STREAMS when the block is removed. If the service procedure cannot complete processing as a result of conditions other than flow control (for example, no buffers), it must ensure a later return (for example, by bufcall(9F)) or it must discard all messages in the queue. If this is not done, STREAMS never schedules the service procedure to be run unless the queue's put procedure queues a priority message with putq(9F).

Note -

High-priority messages are discarded only if there is already a high- priority message on the Stream head read queue. That is, there can be only one high priority message (PC_PROTO) present on the Stream head read queue at any time.

putbq(9F) replaces a message at the beginning of the appropriate section of the message queue according to its priority. This might not be the same position at which the message was retrieved by the preceding getq(9F). A subsequent getq(9F) might return a different message.

putq(9F) checks only the priority band in the first message. If a high-priority message is passed to putq(9F) with a nonzero b_band value, b_band is reset to 0 before placing the message in the queue. If the message is passed to putq(9F) with a b_band value that is greater than the number of qband(9S)structures associated with the queue, putq(9F) tries to allocate a new qband(9S) structure for each band, up to and including the band of the message.

qband(9S) and insq(9F) work similarly. If you try to insert a message out of order in a queue with insq(9F), the message is not inserted and the routine fails.

putq(9F) does not schedule a queue if noenable(9F) was previously called for the queue. noenable(9F) forces putq(9F) to queue the message when called by this queue, but not to schedule the service procedure. noenable(9F) does not prevent the queue from being scheduled by a flow control back-enable. The inverse of noenable(9F) is enableok(9F).

The service procedure is written using the following algorithm:

while ((bp = getq(q)) != NULL) {
	if (queclass (bp) == QPCTL)
		/* Process the message */
		putnext(q, bp);
	} else if (bcanputnext(q, bp->b_band)) {
		/* Process the message */
		putnext(q, bp);
	} else {
		putbq(q, bp);
		return;
	}

If the module or driver ignores priority bands, the algorithm is the same as described in the previous paragraphs, except that canputnext(q) is substituted for bcanputnex(q, bp->b_band).

qenable(9F), another flow-control utility, lets a module or driver cause one of its queues, or another module's queues, to be scheduled. qenable(9F) can also be used to delay message processing. An example of this is a buffer module that gathers messages in its message queue and forwards them as a single, larger message. This module uses noenable(9F) to inhibit its service procedure and queues messages with its put procedure until a certain byte count or "in queue" time has been reached. When either of these conditions is met, the module calls qenable(9F) to cause its service procedure to run.

Another example is a communication line discipline module that implements end-to-end (for example, to a remote system) flow control. Outbound data is held on the write side message queue until the read side receives a transmit window from the remote end of the network.

Note -

STREAMS routines are called at different priority levels. Interrupt routines are called at the interrupt priority of the interrupting device. Service routines are called with interrupts enabled (so that service routines for STREAMS drivers can be interrupted by their own interrupt routines).

Chapter 7 STREAMS Framework -Kernel Level

Overview of Streams in Kernel Space

Stream Head

Kernel-Level Messages

Message Types

Message Structure

Figure 7-1 Simple Message Referencing the Data Block

Message Linkage

Figure 7-2 Linked Message Blocks

Queued Messages

Figure 7-3 Queued Messages

Shared Data

Figure 7-4 Shared Data Block

Sending and Receiving Messages

Message Queues and Message Priority

Figure 7-5 Message Ordering in a Queue

Figure 7-6 Message Ordering with One Priority Band

Queues

Figure 7-7 Queue Pair Allocation

queue(9S) Structure

Queue Flags

Using Queue Information

Entry Points

open Routine

Figure 7-8 Order of a Module's open Procedure

Example 7-1 A Simple open Routine

Figure 7-9 Messages Flowing Around the Module Before qprocson

close Routine

Example 7-2 Example of a Module Close

The put Procedure

Example 7-3 Flow of put Procedure

service Procedure

Figure 7-10 Flow of service Procedure

Example 7-4 A Module service Procedure

qband(9S) Structure

Using qband(9S) Information

Figure 7-11 Data Structure Linkage

Message Processing

Flow Control in Service Procedures

Figure 7-12

`open` Routine

Figure 7-8 Order of a Module's `open` Procedure

Example 7-1 A Simple `open` Routine

Figure 7-9 Messages Flowing Around the Module Before `qprocson`

`close` Routine

The `put` Procedure

Example 7-3 Flow of `put` Procedure

`service` Procedure

Figure 7-10 Flow of `service` Procedure

Example 7-4 A Module `service` Procedure