STREAMS Programming Guide

Part II Kernel Interfaces

Chapter 7 STREAMS Framework -Kernel Level

Because the STREAMS subsystem of UNIXTM provides a framework on which communications services can be built, it iis often called the STREAMS framework. This framework consists of the Stream head and a series of utilities (put, putnext), kernel structures (mblk, dblk), and linkages (queues) that facilitate the interconnections between modules, drivers, and basic system calls. This chapter describes the STREAMS components from the kernel developer's perspective.

Overview of Streams in Kernel Space

Chapter 1, Overview of STREAMS describes a Stream as a full-duplex processing and data transfer path between a STREAMS driver in kernel space and a process in user space. In the kernel, a Stream consists of a Stream head, a driver, and zero or more modules between the driver and the Stream head.

The Stream head is the end of the Stream nearest the user process. All system calls made by user-level applications on a Stream are processed by the Stream head.

Messages are the containers in which data and control information is passed between the Stream head, modules, and drivers. The Stream head is responsible for translating the appropriate messages between the user application and the kernel. Messages are simply pointers to structures (mblk, dblk) that describe the data contained in them. Messages are categorized by type according to the purpose and priority of the message.

Queues are the basic elements by which the Stream head, modules, and drivers are connected. Queues identify the open, close, put, and service entry points. Additionally, queues specify parameters and private data for use by modules and drivers, and are the repository for messages destined for deferred processing.

In the rest of this chapter, the word "modules" refers to modules, drivers, or multiplexers, except where noted.

Stream Head

The Stream head interacts between applications in the user space and the rest of the Stream in kernel space. The Stream head is responsible for configuring the plumbing of the Stream through open, close, push, pop, link, and unlink operations. It also translates user data into messages to be passed down the Stream, and translates messages that arrive at the Stream head into user data. Any characteristics of the Stream that can be modified by the user application or the underlying Stream are controlled by the Stream head, which also informs users of data arrival and events such as error conditions.

If an application makes a system call with a STREAMS file descriptor, the Stream head routines are invoked, resulting in data copying, message generation, or control operations. Only the Stream head can copy data between the user space and kernel space. All other parts of the Stream pass data by way of messages and are thus isolated from direct interaction with users of the Stream.

Kernel-Level Messages

Chapter 3, STREAMS Application-Level Mechanisms discusses messages from the application perspective. The following sections discuss message types, message structure and linkage; how messages are sent and received; and message queues and priority from the kernel perspective.

Message Types

Several STREAMS messages differ in their purpose and queueing priority. The message types are briefly described and classified, according to their queueing priority, in Table 7-1. A detailed discussion of Message Types is in Chapter 8, Messages - Kernel Level.

Some message types are defined as high-priority types. The others can have a normal priority of 0, or a priority (also called a band) from 1 to 255.

Table 7-1 Ordinary Messages, Showing Direction of Communication Flow

Ordinary Messages (also called Normal Messages)

Direction 

M_BREAK

Request to a Stream driver to send a "break"  

Upstream 

M_CTL

Control or status request used for intermodule communication  

Bidirectional 

M_DATA

User data message for I/O system calls  

Bidirectional 

M_DELAY

Request for a real-time delay on output  

Downstream 

M_IOCTL

Control/status request generated by a Stream head  

Downstream 

M_PASSFP

File pointer-passing message  

Bidirectional 

M_PROTO

Protocol control information  

Bidirectional 

M_SETOPTS

Sets options at the Stream head; sends upstream  

Upstream 

M_SIG

Signal sent from a module or driver  

Upstream 

Table 7-2 High-Priority Messages, Showing Direction of Communication Flow

High-Priority Messages:

Direction 

M_COPYIN

Copies in data for transparent ioctl(2)s

Upstream 

M_COPYOUT

Copies out data for transparent ioctl(2)s

Upstream 

M_ERROR

Reports downstream error condition  

Upstream 

M_FLUSH

Flushes module queue  

Bidirectional 

M_HANGUP

Sets a Stream head hangup condition  

Upstream 

M_UNHANGUP

Reconnects line, sends upstream when hangup reverses  

Upstream 

M_IOCACK

Positiveioctl(2) acknowledgment

Upstream 

M_IOCDATA

Data for transparent ioctl(2)s, sent downstream

Downstream 

M_IOCNAK

Negative ioctl(2) acknowledgment

Upstream 

M_PCPROTO

Protocol control information  

Bidirectional 

M_PCSIG

Sends signal from a module or driver  

Upstream 

M_READ

Read notification; sends downstream  

Downstream 

M_START

Restarts stopped device output  

Downstream 

M_STARTI

Restarts stopped device input  

Downstream 

M_STOP

Suspends output  

Downstream 

M_STOPI

Suspends input  

Downstream 

Message Structure

A STREAMS message in its simplest form contains three elements--a message block, a data block, and a data buffer. The data buffer is the location in memory where the actual data of the message is stored. The data block (datab(9S) describes the data buffer--where it starts, where it ends, the message types, and how many message blocks reference it. The message block (msgb(9S)) describes the data block and how the data is used.

The data block has a typedef of dblk_t and has the following public elements:


struct datab {
	unsigned char       *db_base;          /* first byte of buffer */
	unsigned char       *db_lim;           /* last byte+1 of buffer */
	unsigned char        db_ref;           /* msg count ptg to this blk */
	unsigned char        db_type;          /* msg type */
};

typedef struct datab dblk_t;

The datab structure specifies the data buffers' fixed limits (db_base and db_lim), a reference count field (db_ref), and the message type field (db_type). db_base points to the address where the data buffer starts, db_lim points one byte beyond where the data buffer ends, db_ref maintains a count of the number of message blocks sharing the data buffer.


Note -

db_base, db_lim, and db_ref should not be modified directly. db_type is modified under carefully monitored conditions, such as changing the message type to reuse the message block.


In a simple message, the message block references the data block, identifying for each message the address where the message data begins and ends. Each simple message block refers to the data block to identify these addresses, which must be within the confines of the buffer such that db_base >= b_rptr >=>= b_wptr >= db_lim. For ordinary messages, a priority band can be indicated, and this band is used if the message is queued.

Figure 7-1 shows the linkages between msgb, datab, and data buffer in a simple message.

Figure 7-1 Simple Message Referencing the Data Block

Graphic

The message block has a typedef of mblk_t and has the following public elements:


struct msgb {
	struct msgb            *b_next;      /*next msg on queue*/
	struct msgb            *b_prev;      /*previous msg on queue*/
	struct msgb            *b_cont;      /*next msg block of message*/
	unsigned char          *b_rptr;      /*1st unread byte in bufr*/
	unsigned char          *b_wptr;      /*1st unwritten byte in bufr*/
	struct datab           *b_datap;     /*data block*/
	unsigned char           b_band;      /*message priority*/
	unsigned short          b_flag;      /*message flags*/
};

The STREAMS framework uses the b_next and b_prev fields to link messages into queues. b_rptr and b_wptr specify the current read and write pointers respectively, in the data buffer pointed to by b_datap. The fields b_rptr and b_wptr are maintained by drivers and modules.

The field b_band specifies a priority band where the message is placed when it is queued using the STREAMS utility routines. This field has no meaning for high-priority messages and is set to zero for these messages. When a message is allocated using allocb(9F), the b_band field is initially set to zero. Modules and drivers can set this field to a value from 0 to 255 depending on the number of priority bands needed. Lower numbers represent lower priority. The kernel incurs overhead in maintaining bands if nonzero numbers are used.


Note -

Message block data elements must not modify b_next, b_prev, or b_datap. The first two fields are modified by utility routines such as putq(9F) and getq(9F). Message block data elements can modify b_cont, b_rptr, b_wptr, b_band (for ordinary messages types), and b_flag.



Note -

SunOS has b_band in the msgb structure. Some other STREAMS implementations place b_band in the datab structure. The SunOS implementation is more flexible because each message is independent. For shared data blocks, the b_band can differ in the SunOS implementation, but not in other implementations.


Message Linkage

A complex message can consist of several linked message blocks. If buffer size is limited or if processing expands the message, multiple message blocks are formed in the message, as shown in Figure 7-2. When a message is composed of multiple message blocks, the type associated with the first message block determines the overall message type, regardless of the types of the attached message blocks.

Figure 7-2 Linked Message Blocks

Graphic

Queued Messages

A put procedure processes single messages immediately and can pass the message to the next module's put procedure using put or putnext. Alternatively, the message is linked on the message queue for later processing, to be processed by a module's service procedure (putq(9F)). Note that only the first module of a set of linked modules is linked to the next message in the queue.

Think of linked message blocks as a concatenation of messages. Queued messages are a linked list of individual messages that can also be linked message blocks.

Figure 7-3 Queued Messages

Graphic

In Figure 7-3 messages are queued: Message 1 being the first message on the queue, followed by Message 2 and Message 3. Notice that Message 1 is a linked message consisting of more than one mblk.


Note -

Modules or drivers must not modify b_next and b_prev. These fields are modified by utility routines such as putq(9F) and getq(9F).


Shared Data

In Figure 7-4, two message blocks are shown pointing to one data block. db_ref indicates that there are two references (mblks) to the data block. db_base and db_lim point to an address range in the data buffer. The b_rptr and b_wptr of both message blocks must fall within the assigned range specified by the data block.

Figure 7-4 Shared Data Block

Graphic

Data blocks are shared using utility routines (see dupmsg(9F) or dupb(9F)). STREAMS maintains a count of the message blocks sharing a data block in the db_ref field.

These two mblks share the same data and datablock. If a module changes the contents of the data or message type, it is visible to the owner of the message block.

When modifying data contained in the dblk or data buffer, if the reference count of the message is greater than one, the module should copy the message using copymsg(9F), free the duplicated message, and then change the appropriate data.

STREAMS provides utility routines and macros (identified in Appendix B "STREAMS Utilities"), to assist in managing messages and message queues, and in other areas of module and driver development. Always use utility routines to operate on a message queue or to free or allocate messages. If messages are manipulated in the queue without using the STREAMS utilities, the message ordering can become confused and cause inconsistent results.


Caution - Caution -

Not adhering to the DDI/DKI can result in panics and system crashes.


Sending and Receiving Messages

Among the message types, the most commonly used messages are M_DATA, M_PROTO, and M_PCPROTO. These messages can be passed between a process and the topmost module in a Stream, with the same message boundary alignment maintained between user and kernel space. This allows a user process to function, to some degree, as a module above the Stream and maintain a service interface. M_PROTO and M_PCPROTO messages carry service interface information among modules, drivers, and user processes.

Modules and drivers do not interact directly with any interfaces except open(2) and close(2). The Stream head translates and passes all messages between user processes and the uppermost STREAMS module. Message transfers between a process and the Stream head can occur in different forms. For example, M_DATA and M_PROTO messages can be transferred in their direct form by getmsg(2) and putmsg(2). Alternatively, write(2) creates one or more M_DATA messages from the data buffer supplied in the call. M_DATA messages received at the Stream head are consumed by read(2) and copied into the user buffer.

Any module or driver can send any message in either direction on a Stream. However, based on their intended use in STREAMS and their treatment by the Stream head, certain messages can be categorized as upstream, downstream, or bidirectional. For example, M_DATA, M_PROTO, or M_PCPROTO messages can be sent in both directions. Other message types such as M_IOACK are sent upstream to be processed only by the Stream head. Messages to be sent downstream are silently discarded if received by the Stream head. Table 7-1 and Table 7-2 indicate the intended direction of message types.

STREAMS lets modules create messages and pass them to neighboring modules. read(2) and write(2) are not enough to allow a user process to generate and receive all messages. In the first place, read(2) and write(2) are byte-stream oriented with no concept of message boundaries. The message boundary of each service primitive must be preserved so that the start and end of each primitive can be located in order to support service interfaces. Furthermore, read(2) and write(2) offer only one buffer to the user for transmitting and receiving STREAMS messages. If control information and data is placed in a single buffer, the user has to parse the contents of the buffer to separate the data from the control information.

getmsg(2) and putmsg(2) let a user process and the Stream pass data and control information between one another while maintaining distinct message boundaries.

Message Queues and Message Priority

Message queues grow when the STREAMS scheduler is delayed from calling a service procedure by system activity, or when the procedure is blocked by flow control. When called by the scheduler, a module's service procedure processes queued messages in a FIFO manner (getq(9F)). However, some messages associated with certain conditions, such as M_ERROR, must reach their Stream destination as rapidly as possible. This is accomplished by associating priorities with the messages. These priorities imply a certain ordering of messages in the queue, as shown in Figure 7-5. Each message has a priority band associated with it. Ordinary messages have a priority band of zero. The priority band of high-priority messages is ignored since they are high priority and thus not affected by flow control. putq(9F) places high-priority messages at the head of the message queue, followed by priority band messages (expedited data) and ordinary messages.

Figure 7-5 Message Ordering in a Queue

Graphic

When a message is queued, it is placed after the messages of the same priority already in the queue (in other words, FIFO within their order of queueing). This affects the flow-control parameters associated with the band of the same priority. Message priorities range from 0 (normal) to 255 (highest). This provides up to 256 bands of message flow within a Stream. An example of how to implement expedited data would be with one extra band of data flow (priority band 1), as shown in Figure 7-6. Queues are explained in detail in the next section.

Figure 7-6 Message Ordering with One Priority Band

Graphic

High-priority messages are not subject to flow control. When they are queued by putq(9F), the associated queue is always scheduled, even if the queue has been disabled (noenable(9F)). When the service procedure is called by the Stream's scheduler, the procedure uses getq(9F) to retrieve the first message on queue, which is a high-priority message. Service procedures must be implemented to act on high-priority messages immediately. The mechanisms just mentioned--priority message queueing, absence of flow control, and immediate processing by a procedure--result in rapid transport of high-priority messages between the originating and destination components in the Stream.


Note -

In general, high-priority messages should be processed immediately by the module's put procedure and not placed on the service queue.



Caution - Caution -

A service procedure must never queue a high-priority message on its own queue because an infinite loop results. The enqueuing triggers the queue to be immediately scheduled again.


Queues

The queue is the fundamental component of a Stream. It is the interface between a STREAMS module and the rest of the Stream, and is the repository for deferred message processing. For each instance of an open driver or pushed module or Stream head, a pair of queues is allocated, one for the read side of the Stream and one for the write side.

The RD(9F), WR(9F), and OTHERQ(9F) routines allow reference of one queue from the other. Given a queue RD(9F) returns a pointer to the read queue, WR(9F) returns a pointer to the write queue and OTHERQ(9F) returns a pointer to the opposite queue of the pair. Also see QUEUE(9S).

By convention, queue pairs are depicted graphically as side- by-side blocks, with the write queue on the left and the read queue on the right (see Figure Figure 7-7).

Figure 7-7 Queue Pair Allocation

Graphic

queue(9S) Structure

As previously discussed, messages are ordered in message queues. Message queues, message priority, service procedures, and basic flow control all combine in STREAMS. A service procedure processes the messages in its queue. If there is no service procedure for a queue, putq(9F) does not schedule the queue to be run. The module developer must ensure that the messages in the queue are processed. Message priority and flow control are associated with message queues.

The queue structure is defined in stream.h as the typedef queue_t, and has the following public elements:


struct  qinit   *q_qinfo;       /* procs and limits for queue */
struct  msgb    *q_first;       /* first data block */
struct  msgb    *q_last;        /* last data block */
struct  queue   *q_next;        /* Q of next stream */
struct  queue   *q_link;        /* to next Q for scheduling */
void            *q_ptr;         /* to private data structure */
size_t          q_count;        /* number of bytes on Q */
uint            q_flag;         /* queue state */
ssize_t         q_minpsz;       /* min packet size accepted by this module */
ssize_t         q_maxpsz;       /* max packet size accepted by this module */
size_t          q_hiwat;        /* queue high-water mark */
size_t          q_lowat;        /* queue low-water mark */

q_first points to the first message on the queue, and q_last points to the last message on the queue. q_count is used in flow control and contains the total number of bytes contained in normal and high-priority messages in band 0 of this queue. Each band is flow controlled individually and has its own count. See "qband(9S) Structure " for more details. qsize(9F) can be used to determine the total number of messages on the queue. q_flag indicates the state of the queue. See Table 7-3 for the definition of these flags.

q_minpsz contains the minimum packet size accepted by the queue, and q_maxpsz contains the maximum packet size accepted by the queue. These are suggested limits, and some implementations of STREAMS may not enforce them. The SunOSTM Stream head enforces these values but is voluntary at the module level. Design modules to handle messages of any size.

q_hiwat indicates the limiting maximum number of bytes that can be put on a queue before flow control occurs. q_lowat indicates the lower limit where STREAMS flow control is released.

q_ptr is the element of the queue structure where modules can put values or pointers to data structures that are private to the module. This data can include any information required by the module for processing messages passed through the module, such as state information, module IDs, routing tables, and so on. Effectively, this element can be used any way the module or driver writer chooses. q_ptr can be accessed or changed directly by the driver, and is typically initialized in the open(9E) routine.

When a queue pair is allocated, streamtab initializes q_qinfo, and module_info initializes q_minpsz, q_maxpsz, q_hiwat, and q_lowat. Copying values from the module_info structure allows them to be changed in the queue without modifying the streamtab and module_info values.

Queue Flags

Be aware of the following queue flags. See queue(9S).

Table 7-3 Queue Flags

QENAB

The queue is enabled to run the service procedure. 

QFULL

The queue is full.  

QREADR

Set for all read queues.  

QNOENB

Do not enable the queue when data is placed on it.  

Using Queue Information

The q_first, q_last, q_count, and q_flags components must not be modified by the module, and should be accessed using strqget(9F). The values of q_minpsz, q_maxpsz, q_hiwat, and q_lowat are accessed through strqget(9F), and are modified by strqset(9F). q_ptr can be accessed and modified by the module and contains data private to the module.

All other accesses to fields in the queue(9S) structure should be made through STREAMS utility routines (see Appendix B, "STREAMS Utilities"). Modules and drivers should not change any fields not explicitly listed previously.

strqget(9F) lets modules and drivers get information about a queue or particular band of the queue. This insulates the STREAMS data structures from the modules and drivers. The syntax of the routine is:


int
strqget(queue_t *q, qfields_t what, unsigned char pri, void *valp)

q specifies from which queue the information is to be retrieved; what defines the queue_t field value to obtain (see the following code example). pri identifies a specific priority band. The value of the field is returned in valp. The fields that can be obtained are defined in <sys/stream.h> and shown here as:


QHIWAT              /* high-water mark */
QLOWAT              /* low-water mark */
QMAXPSZ             /* largest packet accepted */
QMINPSZ             /* smallest packet accepted */
QCOUNT              /* approx. size (in bytes) of data */
QFIRST              /* first message */
QLAST               /* last message */
QFLAG               /* status */

strqset(9F) lets modules and drivers change information about a queue or a band of the queue. This also insulates the STREAMS data structures from the modules and drivers. Its format is:


int 
strqset(queue_t *q. qfields_t what, unsigned char pri, intptr_t val)

The q, what, and pri fields are the same as in strqget(9F), but the information to be updated is provided in val instead of through a pointer. If the field is read-only, EPERM is returned and the field is left unchanged. The following fields are read-only: QCOUNT, QFIRST, QLAST, and QFLAG.

Entry Points

The q_qinfo component points to a qinit structure. This structure defines the module's entry point procedures for each queue, which include the following:


int        (*qi_putp)();       /* put procedure */
int        (*qi_srvp)();       /* service procedure */
int        (*qi_qopen)();      /* open procedure */
int        (*qi_qclose)();     /* close procedure */
struct module_info *qi_minfo;  /* module information structure */

There is generally a unique q_init structure associated with the read queue and the write queue. qi_putp identifies the put procedure for the module. qi_srvp identifies the optional service procedure for the module.

The open and close entry points are required for the read-side queue. The put procedure is generally required on both queues and the service procedure is optional.

If the put procedure is not defined and a subsequent put is done to the module, a panic occurs. As a precaution, putnext should be declared as the module's put procedure.

If a module only requires a service procedure, putq(9F) can be used as the module's put procedure. If the service procedure is not defined, the module's put procedure must not queue data (putq(9F)).

The qi_qopen member of the read-side qinit structure identifies the open(9E) entry point of the module. The qi_qclose member of the read-side qinit structure identifies the close(9E) entry point of the module.

The qi_minfo member points to the module_info(9S) structure.


struct module_info {
		ushort   mi_idnum;          /* module id number */
		char     *mi_idname;        /* module name */
		ssize_t  mi_minpsz;         /* min packet size accepted */
		ssize_t  mi_maxpsz;         /* max packet size accepted */
		size_t   mi_hiwat;          /* hi-water mark */
		size_t   mi_lowat;          /* lo-water mark */
};

mi_idnum is the module's unique identifier defined by the developer and used in strlog(9F). mi_idname is an ASCII string containing the name of the module. mi_minpsz is the initial minimum packet size of the queue. mi_maxpsz is the initial maximum packet size of the queue. mi_hiwat is the initial high-water mark of the queue. mi_lowat is the initial low-water mark of the queue.

open Routine

The open routine of a device is called once for the initial open of the device, then is called again on subsequent reopens of the Stream. Module open routines are called once for the initial push onto the Stream and again on subsequent reopens of the Stream. See open(9E).

Figure 7-8 Order of a Module's open Procedure

Graphic

The Stream is analogous to a stack. Initially the driver is opened and, as modules are pushed onto the Stream, their open routines are invoked. Once the Stream is built, this order reverses if a reopen of the Stream occurs. For example, while building the Stream shown in Figure 7-8, device A's open routine is called, followed by B's and C's when they are pushed onto the Stream. If the Stream is reopened, Module C's open routine is called first, followed by B's, and finally by A's.

Usually the module or driver does not check this, but the issue is raised so that dependencies on the order of open routines are not introduced by the programmer. Note that although an open can happen more than once, close is only called once. See the next section on the close routine for more details. If a file is duplicated (dup(2)) the Stream is not reopened.

The syntax of the open entry point is:


int prefix open(queue_t *q, dev_t *devp, int oflag, int sflag, cred_t *cred_p)

q -- Pointer to the read queue of this module.

devp -- Pointer to a device number that is always associated with the device at the end of the Stream. Modules cannot modify this value, but drivers can, as described in Chapter 9, STREAMS Drivers.

oflag -- For devices, oflag can contain the following bit mask values: FEXCL, FNDELAY, FREAD, and FWRITE. See Chapter 9, STREAMS Drivers for more information on drivers.

sflag -- When the open is associated with a driver, sflag is set to 0 or CLONEOPEN, see Chapter 9, STREAMS Drivers, "Cloning" for more details. If the open is associated with a module, sflag contains the value MODOPEN.

cred_p Pointer to the user credentials structure.

the open routines to devices are serialized (if more than one process attempts to open the device, only one proceeds and the others wait until the first finishes). Interrupts are not blocked during an open. So the driver's interrupt and open routines must allow for this. See Chapter 9, STREAMS Drivers for more information.

The open routines for both drivers and modules have user context. For example, they can do blocking operations, but the blocking routine should return in the event of a signal. In other words, q_wait_sig is allowed, but q_wait is not.

If the module or driver is to allocate a controlling terminal, it should send an M_SETOPTS message with SO_ISTTY set to the Stream head.

The open routine usually initializes the q_ptr member of the queue. q_ptr is generally initialized to some private data structure that contains various state information private to the module or driver. The module's close routine is responsible for freeing resources allocated by the module including q_ptr. Example 7-1 shows a simple open routine.


Example 7-1 A Simple open Routine

/* example of a module open */
int xx_open(queue_t *q, dev_t *devp, int oflag, int sflag, cred_t *crp)
{
	struct xxstr *xx_ptr;

	xx_ptr = kmemzalloc(sizeof(struct xxstr), KM_SLEEP);
	xx_ptr->xx_foo = 1;
	q->q_ptr = WR(q)->q_ptr = xx_ptr;
	qprocson(q);
	return (0);
}

In a multithreaded environment, data can flow up the Stream during the open. A module receiving this data before its open routine finishes initialization can panic. To eliminate this problem, modules and drivers are not linked into the Stream until qprocson(9F) is called. In other words, messages flow around the module until qprocson(9F) is called. Figure 7-9 illustrates this process.

Figure 7-9 Messages Flowing Around the Module Before qprocson

Graphic

The module or driver instance is guaranteed to be single-threaded before qprocson(9F) is called, except for interrupts or callbacks that must be handled separately. qprocson(9F) must be called before calling qbufcall(9F), qtimeout(9F), qwait(9F), or qwait_sig(9F) .

close Routine

The close routine of devices is called only during the last close of the device. Module close routines are called during the last close or if the module is popped.

The syntax of the close entry point is:


int prefix close (queue *q, int flag, cred_t * cred_p)

q is a pointer to the read queue of the module. flag is analogous to the oflag parameter of the open entry point. If FNBLOCK or FNDELAY is set, then the module should attempt to avoid blocking during the close. cred_p is a pointer to the user credential structure.

Like open, the close entry point has user context and can block. Likewise, the blocking routines should return in the event of a signal. Device drivers must take into consideration that interrupts are not blocked during close. The close routine must cancel all pending timeout(9F) and qbufcall(9F) callbacks, and process any remaining data on its service queue.

The open and close procedures are only used on the read side of the queue and can be set to NULL in the write-side qinit structure initialization. Example 7-3 shows an example of a module close routine.


Example 7-2 Example of a Module Close

/* example of a module close */
static int
xx_close(queue_t *, *rq, int flag, cred_t *credp)
{
		struct xxstr   *xxp;
  
   /*
    * Disable xxput() and xxsrv() procedures on this queue.
    */
		qprocsoff(rq);
		xxp = (struct xxstr *) rq->q_ptr;

   /*
    * Cancel any pending timeout.
    * This example assumes that the timeout was issued
    * against the write queue.
    */

		if (xxp->xx_timeoutid != 0) {
			(void) quntimeout(WR(rq), xxp->xx_timeoutid);
			xxp->xx_timeoutid=0;
    }
   /*
    * Cancel any pending bufcalls.
    * This example assumes that the bufcall was issued
    * against the write queue.
    */
		if (xxp->xx_bufcallid !=0) {
			(void) qunbufcall(WR(rq), xxp->xx_bufcallid);
 		xxp->xx_bufcallid = 0;
		}
		rq->q_ptr = WR(rq)->q_ptr = NULL;

   /*
    * Free resources allocated during open
    */
		kmem_free (xxp, sizeof (struct xxstr));
		return (0);
}

The put Procedure

The put procedure passes messages from the queue of a module to the queue of the next module. The queue's put procedure is invoked by the preceding module to process a message immediately (see put(9F) and putnext(9F)). Almost all modules and drivers must have a put routine. The exception is that the read-side driver does not need a put routine because there can be no downstream component to call the put.

A driver's put procedure must do one of:

All M_IOCTL type messages must be acknowledged through M_IOACK or rejected through M_IOCNACK. M_IOCTL messages should not be freed. Drivers must free any unrecognized message.

A module's put procedure must do one of the following as shown in Example 7-3:

Unrecognized messages are passed to the next module or driver. The Stream operates more efficiently when messages are processed in the put procedure. Processing a message with the service procedure imposes some latency on the message.


Example 7-3 Flow of put Procedure

Graphic

If the next module is flow controlled (see canput(9F) and canputnext(9F)), the put procedure can queue the message for processing by the next service procedure (see putq(9F)). The put routine is always called before the component's corresponding srv(9E) service routine, so always use put for immediate message processing.

The preferred naming convention for a put procedure reflects the direction of the message flow. The read put procedure is suffixed by r (rput), and the write procedure by w (wput). For example, the read-side put procedure for module xx is declared as int xxrput (queue_t *q, mblk_t *mp). The write-side put procedure is declared as: int xxwput(queue_t *q, mblk_t *mp), where q points to the corresponding read or write queue and mp points to the message to be processed.

Although high-priority messages can be placed on the service queue, it is better to process them immediately in the put procedure. Place ordinary or priority-band messages on the service queue (putq(9F)) if:

If other messages already exist on the queue, and the put procedure does not queue new messages (provided they are not high-priority), messages are reordered. If the next module is flow controlled (see canput(9F) and canputnext(9F)). the put procedure can queue the message for processing by the service procedure (see putq(9F)).


/*example of a module put procedure */
int
xxrput(queue_t *,mblk_t, *mp)
{
   /*
    * If the message is a high-priority message or
    * the next module is not flow controlled and we have not
    * already deferred processing, then:
    */

    if (mp->b_datap->db_type >= QPCTL ||
             (canputnext(q) && q->q_first == NULL)) {
        /*
         * Process message
         */

         .
         .
         .
         putnext(q,mp);
    } else {
         /*
          * put message on service queue
          */
          putq(q,mp);
     }
     return (0);
}

A module need not process the message immediately, and can queue it for later processing by the service procedure (see putq(9F)).

The SunOS STREAMS framework is multithreaded. Unsafe (nonmultithreaded) modules are not supported. To make multithreading of modules easier, the SunOS STREAMS framework uses perimeters (see "MT STREAMS Perimeters" in Chapter 12, MultiThreaded STREAMS for information). Perimeters are a facility fore specifing that the framework provide exclusive access for the entire module, queue pair, or an individual queue. Perimeters make it easier to deal with multithreaded issues, such as message ordering and recursive locking.


Caution - Caution -

Mutex locks must not be held across a call to put(9F), putnext(9F), or qreply(9F).


Because of the asynchronous nature of STREAMS, don't assume that a module's put procedure has been called just because put(9F), putnext(9F), or qreply(9F) has returned.

service Procedure

A queue's service procedure is invoked to process messages on the queue. It removes successive messages from the queue, processes them, and calls the put procedure of the next module in the Stream to give the processed message to the next queue.

The service procedure is optional. A module or driver can use a service procedure for the following reasons:

The service procedure is invoked by the STREAMS scheduler. A STREAMS service procedure is scheduled to run if:

A service procedure usually processes all messages on its queue (getq(9F)) or takes appropriate action to ensure it is reenabled (qenable(9F)) at a later time. Figure 7-11 shows the flow of a service procedure.


Caution - Caution -

High-priority messages must never be placed back on a service queue (putbq(9F)); this can cause an infinite loop.


Figure 7-10 Flow of service Procedure

Graphic

Example 7-4 shows a module service procedure.


Example 7-4 A Module service Procedure

/*example of a module service procedure */
int
xxrsrv(queue_t *q)
{
		mblk_t *mp;
   /*
    * While there are still messages on our service queue
    */
		while ((mp = getq(q) != NULL) {
			/*
       * We check for high priority messages, but
       * none is ever seen since the put procedure
       * never queues them.
       * If the next module is not flow controlled, then
       */
      if (mp->b_datap->db_type >= QPCTL || (canputnext (q)) {
				/*
				 * process message
				 */
				.
				.
				.
				putnext (q, mp);
			} else {
				/*
          * put message back on service queue
          */
				putbq(q,mp);
				break;
			}
		}
		return (0);
}

qband(9S) Structure

The queue flow information for each band, other than band 0, is contained in a qband structure. This structure is not visible to other modules. For accessible information see strqget(9F) and strqset(9F). qband(9S) is defined as follows:


typedef struct qband {
		struct  qband    *qb_next;   /* next band's info */
		size_t  qb_count;            /* number of bytes in band */
		struct  msgb     *qb_first;  /* beginning of band's data */
		struct  msgb     *qb_last;   /* end of band's data */
		size_t  qb_hiwat;            /* high- water mark for band */
		size_t  qb_lowat;            /* low-water mark for band */
		uint    qb_flag;             /* see below */
} qband_t;/

The structure contains pointers to the linked list of messages in the queue. These pointers, qb_first and qb_last, denote the beginning and end of messages for the particular band. The qb_count field is analogous to the queue's q_count field. However, qb_count only applies to the messages in the queue in the band of data flow represented by the corresponding qband structure. In contrast, q_count only contains information regarding normal and high-priority messages.

Each band has a separate high and low watermark, qb_hiwat and qb_lowat. These are initially set to the queue's q_hiwat and q_lowat respectively. Modules and drivers can change these values through the strqset(9F) function. One flag defined for qb_flag is QB_FULL, which denotes that the particular band is full.

The qband(9S) structures are not preallocated per queue. Rather, they are allocated when a message with a priority greater than zero is placed in the queue using putq(9F), putbq(9F), or insq(9F). Since band allocation can fail, these routines return 0 on failure and 1 on success. Once a qband(9S) structure is allocated, it remains associated with the queue until the queue is freed. strqset(9F) and strqget(9F) cause qband(9S) allocation. Sending a message to a band causes all bands up to and including that one to be created.

Using qband(9S) Information

The STREAMS utility routines should be used when manipulating the fields in the queue and qband structures. strqget(9F) and strqset(9F) are used to access band information.

Drivers and modules can change the qb_hiwat and qb_lowat fields of the qband structure. Drivers and modules can only read the qb_count, qb_first, qb_last, and qb_flag fields of the qband structure. Only the fields listed previously can be referenced. There are fields in the structure that are reserved and are not documented.

Figure 7-11 shows a queue with two extra bands of flow.

Figure 7-11 Data Structure Linkage

Graphic

Several routines are provided to aid you in controlling each priority band of data flow. These routines are

flushband(9F) is discussed in "Flush Handling". bcanputnext(9F) is discussed in "Flow Control in Service Procedures", and the other two routines are described in the following section. Appendix B, STREAMS Utilities also has a description of these routines.

Message Processing

Typically, put procedures are required in pushable modules, but service procedures are optional. If the put routine queues messages, there must exist a corresponding service routine that handles the queued messages. If the put routine does not queue messages, the service routine need not exist.

Example 7-3 shows typical processing flow for a put procedure which works as follows:

Figure 7-10 shows typical processing flow for a service procedure that works as follows:


Caution - Caution -

A service or put procedure must never block since it has no user context. It must always return to its caller.


If no processing is required in the put procedure, the procedure does not have to be explicitly declared. Rather, putq(9F) can be placed in the qinit(9S) structure declaration for the appropriate queue side to queue the message for the service procedure. For example:


static struct qinit winit = { putq, modwsrv, ...... };

More typically, put procedures process high-priority messages to avoid queueing them.

Device drivers associated with hardware are examples of STREAMS devices that might not have a put procedure. Since there are no queues below the hardware level, another module does not call the module's put procedure. Data comes into the Stream from an interrupt routine, and is either processed or queued it for the service procedure.

A STREAMS filter is an example of a module without a service procedure &mdash; messages passed to it are either passed or filtered. Flow control is described in "Flow Control in Service Procedures".

The key attribute of a service procedure in the STREAMS architecture is delayed processing. When a service procedure is used in a module, the module developer is implying that there are other, more time-sensitive activities to be performed elsewhere in this Stream, in other Streams, or in the system in general.


Note -

The presence of a service procedure is mandatory if the flow control mechanism is to be utilized by the queue. If you don't implement flow control, it is possible to overflow queues and hang the system.


Flow Control in Service Procedures

The STREAMS flow control mechanism is voluntary and operates between the two nearest queues in a Stream containing service procedures (see Figure 7-12). Messages are held on a queue only if a service procedure is present in the associated queue.

Messages accumulate on a queue when the queue's service procedure processing does not keep pace with the message arrival rate, or when the procedure is blocked from placing its messages on the following Stream component by the flow control mechanism. Pushable modules can contain independent upstream and downstream limits. The Stream head contains a preset upstream limit (which can be modified by a special message sent from downstream) and a driver can contain a downstream limit. See M_SETOPTS for more information.

Flow control operates as follows:

Figure 7-12

Graphic

Modules and drivers need to observe the message priority. High-priority messages, determined by the type of the first block in the message,


(mp)->b_datap->db_type >= QPCTL

are not subject to flow control. They should be processed immediately and forwarded, as appropriate.

For ordinary messages, flow control must be tested before any processing is performed. canputnext(9F) determines if the forward path from the queue is blocked by flow control.

This is the general flow control processing of ordinary messages:


Caution - Caution -

High-priority messages must be processed and not placed back on the queue.


The canonical representation of this processing within a service procedure is:

while (getq() != NULL)
	if (high priority message || no flow control) {
		process message
		putnext()
	} else {
		putbq()
		return
	}

Expedited data has its own flow control with the same processing method as that of ordinary messages. bcanputnext(9F) provides modules and drivers with a test of flow control in a priority band. It returns 1 if a message of the given priority can be placed in the queue. It returns 0 if the priority band is flow controlled. If the band does not exist in the queue in question, the routine returns 1.

If the band is flow controlled, the higher bands are not affected. However, lower bands are also stopped from sending messages. Without this, lower priority messages can be passed along ahead of the flow-controlled higher priority messages.

The call bcanputnext(q, 0); is equivalent to the call canputnext(q);.


Note -

A service procedure must process all messages in its queue unless flow control prevents this.


A service procedure must continue processing messages from its queue until getq(9F) returns NULL. When an ordinary message is queued by putq(9F), it causes the service procedure to be scheduled only if the queue was previously empty, and a previous getq(9F) call returns NULL (that is, the QWANTR flag is set). If there are messages in the queue, putq(9F) presumes the service procedure is blocked by flow control and the procedure is automatically rescheduled by STREAMS when the block is removed. If the service procedure cannot complete processing as a result of conditions other than flow control (for example, no buffers), it must ensure a later return (for example, by bufcall(9F)) or it must discard all messages in the queue. If this is not done, STREAMS never schedules the service procedure to be run unless the queue's put procedure queues a priority message with putq(9F).


Note -

High-priority messages are discarded only if there is already a high- priority message on the Stream head read queue. That is, there can be only one high priority message (PC_PROTO) present on the Stream head read queue at any time.


putbq(9F) replaces a message at the beginning of the appropriate section of the message queue according to its priority. This might not be the same position at which the message was retrieved by the preceding getq(9F). A subsequent getq(9F) might return a different message.

putq(9F) checks only the priority band in the first message. If a high-priority message is passed to putq(9F) with a nonzero b_band value, b_band is reset to 0 before placing the message in the queue. If the message is passed to putq(9F) with a b_band value that is greater than the number of qband(9S)structures associated with the queue, putq(9F) tries to allocate a new qband(9S) structure for each band, up to and including the band of the message.

qband(9S) and insq(9F) work similarly. If you try to insert a message out of order in a queue with insq(9F), the message is not inserted and the routine fails.

putq(9F) does not schedule a queue if noenable(9F) was previously called for the queue. noenable(9F) forces putq(9F) to queue the message when called by this queue, but not to schedule the service procedure. noenable(9F) does not prevent the queue from being scheduled by a flow control back-enable. The inverse of noenable(9F) is enableok(9F).

The service procedure is written using the following algorithm:

while ((bp = getq(q)) != NULL) {
	if (queclass (bp) == QPCTL)
		/* Process the message */
		putnext(q, bp);
	} else if (bcanputnext(q, bp->b_band)) {
		/* Process the message */
		putnext(q, bp);
	} else {
		putbq(q, bp);
		return;
	}

If the module or driver ignores priority bands, the algorithm is the same as described in the previous paragraphs, except that canputnext(q) is substituted for bcanputnex(q, bp->b_band).

qenable(9F), another flow-control utility, lets a module or driver cause one of its queues, or another module's queues, to be scheduled. qenable(9F) can also be used to delay message processing. An example of this is a buffer module that gathers messages in its message queue and forwards them as a single, larger message. This module uses noenable(9F) to inhibit its service procedure and queues messages with its put procedure until a certain byte count or "in queue" time has been reached. When either of these conditions is met, the module calls qenable(9F) to cause its service procedure to run.

Another example is a communication line discipline module that implements end-to-end (for example, to a remote system) flow control. Outbound data is held on the write side message queue until the read side receives a transmit window from the remote end of the network.


Note -

STREAMS routines are called at different priority levels. Interrupt routines are called at the interrupt priority of the interrupting device. Service routines are called with interrupts enabled (so that service routines for STREAMS drivers can be interrupted by their own interrupt routines).


Chapter 8 Messages - Kernel Level

This chapter describes the structure and use of each STREAMS message type.

ioctl(2) Processing

STREAMS is a special type of character device driver that is different from the historical character input/output (I/O) mechanism in several ways.

In the classical device driver, all ioctl(2) calls are processed by the single device driver, which is responsible for their resolution. The classical device driver has user context, that is, all data can be copied directly to and from user space.

By contrast, the Stream head itself can process some ioctl(2) calls (defined in streamio(7I)). Generally, STREAMS ioctl(2) calls operate independently of any particular module or driver on the Stream. This means the valid ioctl(2) calls that are processed on a Stream change over time, as modules are pushed and popped on the Stream. The Stream modules have no user context and must rely on the Stream head to perform copyin and copyout requests.

There is no user context in a module or driver when the information associated with the ioctl(2) call is received. This prevents use of ddi_copyin(9F) or ddi_copyout(9F) by the module. No user context also prevents the module and driver from associating any kernel data with the currently running process. In any case, by the time the module or driver receives the ioctl(2) call, the process generating can have exited.

STREAMS allows user processes to control functions on specific modules and drivers in a Stream using ioctl(2) calls. In fact, many streamio(7I) ioctl(2) commands go no further than the Stream head. They are fully processed there and no related messages are sent downstream. If, however, it is an I_STR ioctl(2) or an unrecognized ioctl(2) command, the Stream head creates an M_IOCTL message, which includes the ioctl(2) argument. This is then sent downstream to be processed by the pertinent module or driver. The M_IOCTL message is the precursor message type carrying ioctl(2) information to modules. Other message types are used to complete the ioctl processing in the Stream. Each module has its own set of M_IOCTL messages it must recognize.

Message Allocation and Freeing

The allocb(9F) utility routine allocates a message and the space to hold the data for the message. allocb(9F) returns a pointer to a message block containing a data buffer of at least the size requested, providing there is enough memory available. The routinereturns NULL on failure. allocb(9F) always returns a message of type M_DATA. The type can then be changed if required. b_rptr and b_wptr are set to db_base (see msgb(9S) and datab(9S)), which is the start of the memory location for the data.

allocb(9F) can return a buffer larger than the size requested. If allocb(9F) indicates buffers are not available (allocb(9F) fails), the put or service procedure cannot block to wait for a buffer to become available. Instead, bufcall(9F) defers processing in the module or the driver until a buffer becomes available.

If message space allocation is done by the put procedure and allocb(9F) fails, the message is usually discarded. If the allocation fails in the service routine, the message is returned to the queue. bufcall(9F) is called to set a call to the service routine when a message buffer becomes available, and the service routine returns.

freeb(9F) releases the message block descriptor and the corresponding data block, if the reference count (see datab(9S)) is equal to 1. If the reference count exceeds 1, the data block is not released.

freemsg(9F) releases all message blocks in a message. It uses freeb(9F) to free all message blocks and corresponding data blocks.

In Example 8-1, allocb(9F) is used by the bappend subroutine that appends a character to a message block:


Example 8-1 Use of allocb(9F)

/*
 * Append a character to a message block.
 * If (*bpp) is null, it will allocate a new block
 * Returns 0 when the message block is full, 1 otherwise
 */
#define MODBLKSZ						128			/* size of message blocks */

static int bappend(mblk_t **bpp, int ch)
{
 	mblk_t *bp;

 	if ((bp = *bpp) != NULL) {
 			if (bp->b_wptr >= bp->b_datap->db_lim)
 				return (0);
 	} else {
 			if ((*bpp = bp = allocb(MODBLKSZ, BPRI_MED)) == NULL)
 				return (1);
 	}
 	*bp->b_wptr++ = ch;
 	return 1;
}

bappend receives a pointer to a message block and a character as arguments. If a message block is supplied (*bpp != NULL), bappend checks if there is room for more data in the block. If not, it fails. If there is no message block, a block of at least MODBLKSZ is allocated through allocb(9F).

If allocb(9F) fails, bappend returns success and discards the character. If the original message block is not full or the allocb(9F) is successful, bappend stores the character in the block.

Example 8-2 shows the processing of all the message blocks in any downstream data (type M_DATA) messages. freemsg(9F) frees messages.


Example 8-2 Subroutine modwput

/* Write side put procedure */
static int modwput(queue_t *q, mblk_t *mp)
{
 	switch (mp->b_datap->db_type) {
 	default:
 			putnext(q, mp);					/* Don't do these, pass along */
 			break;

	case M_DATA: {
 			mblk_t *bp;
			struct mblk_t *nmp = NULL, *nbp = NULL;

			for (bp = mp; bp != NULL; bp = bp->b_cont) {
 				while (bp->b_rptr < bp->b_wptr) {
 						if (*bp->b_rptr == '\n')
 								if (!bappend(&nbp, '\r'))
 									goto newblk;
 						if (!bappend(&nbp, *bp->b_rptr))
 								goto newblk;

						bp->b_rptr++;
 						continue;

				newblk:
 						if (nmp == NULL)
 								nmp = nbp;
 						else { /* link msg blk to tail of nmp */
 								linkb(nmp, nbp);
 								nbp = NULL;
 						}
 				}
 			}
			if (nmp == NULL)
	 			nmp = nbp;
 			else
	 			linkb(nmp, nbp);
	 		freemsg(mp); /* de-allocate message */
 			if (nmp)
 				putnext(q, nmp);
 			break;
 	 	}
 	}
}

Data messages are scanned and filtered. modwput copies the original message into new blocks, modifying as it copies. nbp points to the current new message block. nmp points to the new message being formed as multiple M_DATA message blocks. The outer for loop goes through each message block of the original message. The inner while loop goes through each byte. bappend is used to add characters to the current or new block. If bappend fails, the current new block is full. If nmp is NULL, nmp is pointed at the new block. If nmp is not NULL, the new block is linked to the end of nmp by use of linkb(9F).

At the end of the loops, the final new block is linked to nmp. The original message (all message blocks) is returned to the pool by freemsg(9F). If a new message exists, it is sent downstream.

Recovering From No Buffers

bufcall(9F) can be used to recover from an allocb(9F) failure. The call syntax is as follows:


bufcall_id_t bufcall(int size, int pri, void(*func)(), long arg);

Note -

qbufcall(9F) and qunbufcall(9F) must be used with perimeters.


bufcall(9F) calls (*func)(arg) when a buffer of size bytes is available. When func is called, it has no user context and must return without blocking. Also, there is no guarantee that when func is called, a buffer will actually still be available.

On success, bufcall(9F) returns a nonzero identifier that can be used as a parameter to unbufcall(9F) to cancel the request later. On failure, 0 is returned and the requested function is never called.


Caution - Caution -

Care must be taken to avoid deadlock when holding resources while waiting for bufcall(9F) to call (*func)(arg). bufcall(9F) should be used sparingly.


Two examples are provided. Example 8-3 is a device-receive-interrupt handler and Example 8-4 is a write service procedure:


Example 8-3 Device Interrupt handler

#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
buffcall_id_t id;						/* hold id val for unbufcall */

dev_rintr(dev)
{
 	/* process incoming message ... */
 	/* allocate new buffer for device */
 	dev_re_load(dev);
}

/*
 * Reload device with a new receive buffer
 */
dev_re_load(dev)
{
 	mblk_t *bp;
 	id = 0;						/* begin with no waiting for buffers */
 	if ((bp = allocb(DEVBLKSZ, BPRI_MED)) == NULL) {
 			cmn_err(CE_WARN,"dev:allocbfailure(size%d)\n",
 				 DEVBLKSZ);
 			/*
 			 * Allocation failed. Use bufcall to
 			 * schedule a call to ourselves.
 			 */
 			id = bufcall(DEVBLKSZ,BPRI_MED,dev_re_load,dev);
 			return;
 	}

 	/* pass buffer to device ... */
}

See Chapter 12, MultiThreaded STREAMS for more information on the uses of unbufcall(9F). These references are protected by MT locks.

Since bufcall(9F) can fail, there is still a chance that the device hangs. A better strategy, if bufcall(9F) fails, is to discard the current input message and resubmit that buffer to the device. Losing input data is preferable than hanging.

Example 8-4, mod_wsrv prefixes each output message with a header.


Example 8-4 Write Service Procedure

static int mod_wsrv(queue_t *q)
{
 	extern int qenable();
 	mblk_t *mp, *bp;
		while (mp = getq(q)) {
			/* check for priority messages and canput ... */

			/* Allocate a header to prepend to the message.
 		 * If the allocb fails, use bufcall to reschedule.
 		 */
 		if ((bp = allocb(HDRSZ, BPRI_MED)) == NULL) {
 			if (!(id=bufcall(HDRSZ,BPRI_MED,qenable, q))) {
  				timeout(qenable, (caddr_t)q,
					drv_usectohz());
 				/*
 				 * Put the msg back and exit, we will be
 				 * re-enabled later
					 */
 				putbq(q, mp);
 				return;
 			}
 			/* process message .... */
 		}
		}
	}

In this example, mod_wsrv illustrates a potential deadlock case. If allocb(9F) fails, mod_wsrv tends to recover without loss of data and calls bufcall(9F). In this case, the routine passed to bufcall(9F) is qenable(9F). When a buffer is available, the service procedure is automatically re-enabled. Before exiting, the current message is put back in the queue. Example 8-4 deals with bufcall(9F) failure by calling timeout(9F). timeout(9F)

timeout(9F) schedules the given function to be run with the given argument in the given number of clock cycles. In this example, if bufcall(9F) fails, the system runs qenable(9F) after two seconds have passed.

Releasing Callback Requests

When allocb(9F) fails and bufcall(9F) is called, a callback is pending until a buffer is actually returned. Since this callback is asynchronous, it must be released before all processing is complete. To release this queued event, use unbufcall(9F).

Pass the id returned by bufcall(9F) to unbufcall(9F). Then close the driver in the normal way. If this sequence of unbufcall(9F) and xxclose is not followed, a situation exists where the callback can occur and the driver is closed. This is one of the most difficult types of bugs to find during the debugging stage.


Caution - Caution -

All bufcall(9F) and timeouts must be canceled in the close routine.


Extended STREAMS Buffers

Some hardware using the STREAMS mechanism supports memory-mapped I/O (see mmap(2)) that allows the sharing of buffers between users, kernel, and the I/O card.

If the hardware supports memory-mapped I/O, data received from the hardware is placed in the DARAM (dual access RAM) section of the I/O card. Since DARAM is memory shared between the kernel and the I/O card, coordinated data transfer between the kernel and the I/O card is eliminated. Once in kernel space, the data buffer is manipulated as if it were a kernel resident buffer. Similarly, data sent downstream is placed in the DARAM and forwarded to the network.

In a typical network arrangement, data is received from the network by the I/O card. The controller reads the block of data into the card's internal buffer. It interrupts the host computer to notify that data have arrived. The STREAMS driver gives the controller the kernel address where the data block is to go and the number of bytes to transfer. After the controller has read the data into its buffer and verified the checksum, it copies the data into main memory to the address specified by the DMA (direct memory access) memory address. Once in the kernel space, the data is packaged into message blocks and processed in the usual manner.

When data is transmitted from a user process to the network, it is copied from the user space to the kernel space, packaged as a message block, and sent to the downstream driver. The driver interrupts the I/O card, signaling that data is ready to be transmitted to the network. The controller copies the data from the kernel space to the internal buffer on the I/O card, and from there it is placed on the network.

The STREAMS buffer allocation mechanism enables the allocation of message and data blocks to point directly to a client-supplied (non-STREAMS) buffer. Message and data blocks allocated this way are indistinguishable from the normal data blocks. The client-supplied buffers are processed as if they were normal STREAMS data buffers.

Drivers can attach non-STREAMS data buffers and also free them. This is done as follows:

freeb(9F) detects when a buffer is a client supplied, non-STREAMS buffer. If it is, freeb(9F) finds the free_rtn(9S) structure associated with the buffer. After calling the driver-dependent routine (defined in free_rtn(9S)) to free the buffer, freeb(9F) frees the message and data block.

The free routine should not reference any dynamically allocated data structures that are freed when the driver is closed, as messages can exist in a Stream after the driver is closed. For example, when a Stream is closed, the driver close routine is called and its private data structure can be deallocated. If the driver sends a message created by esballoc upstream, that message can still be on the Stream head read queue. When the Stream head read queue is flushed, the message is freed and a call is made to the driver's free routine after the driver has been closed.

The format of the free_rtn(9S) structure is as follows:


void (*free_func)();   /*driver dependent free routine*/
char *free_arg;        /* argument for free_rtn */

The structure has two fields: a pointer to a function and a location for any argument passed to the function. Instead of defining a specific number of arguments, free_arg is defined as a char *. This way, drivers can pass pointers to structures if more than one argument is needed.

The method by which free_func is called is implementation-specific. Do not assume that free_func is or is not called directly from STREAMS utility routines like freeb(9F). The free_func function must not call another module's put procedure nor try to acquire a private module lock that can be held by another thread across a call to a STREAMS utility routine that could free a message block. Otherwise, the possibility for lock recursion and deadlock exists.

esballoc(9F), provides a common interface for allocating and initializing data blocks. It makes the allocation as transparent to the driver as possible and provides a way to modify the fields of the data block, since modification should only be performed by STREAMS. The driver calls this routine to attach its own data buffer to a newly allocated message and data block. If the routine successfully completes the allocation and assigns the buffer, it returns a pointer to the message block. The driver is responsible for supplying the arguments to esballoc(9F), a pointer to its data buffer, the size of the buffer, the priority of the data block, and a pointer to the free_rtn structure. All arguments should be non-NULL. See Appendix B, STREAMS Utilities, for a description of esballoc(9F).

esballoc(9F) Example

Example 8-5 (which will not compile) shows how extended buffers are managed in the multithreaded environment. The driver maintains a pool of special memory that is allocated by esballoc(9F). The allocator free routine uses the queue struct assigned to the driver or other queue private data, so the allocator and the close routine need to coordinate to ensure that no outstanding esballoc(9F) memory blocks remain after the close. The special memory blocks are of type ebm_t, the counter is ebm, the mutex mp and the condition variable cvp are used to implement the coordination.


Example 8-5 esballoc Example

ebm_t *
special_new()
{
		mutex_enter(&mp);
		/*
 	 * allocate some special memory
		 */
		esballoc();
		/*
		 * increment counter
		 */
		ebm++;
		mutex_exit(&mp);
}

void
special_free()
{
		mutex_enter(&mp);
		/*
 	 * de-allocate some special memory
		 */
		freeb();
	
		/*
		 * decrement counter
		 */
		ebm--;
		if (ebm == 0)
			cv_broadcast(&cvp);
		mutex_exit(&mp);
}

open_close(q, .....)
	....
{
		/*
		 * do some stuff
		 */
		/*
		 * Time to decommission the special allocator.  Are there
		 * any outstanding allocations from it?
		 */
		mutex_enter(&mp);
		while (ebm > 0)
			cv_wait(&cvp, &mp);
	
		mutex_exit(&mp);
}


Caution - Caution -

Close routine must wait for all esballoc(9F) memory to be freed.


General ioctl(2) Processing


Note -

Please see ioctl() section in the Writing Device Driversfor information on the 64-bit data structure macros.


When the Stream head is called to process an ioctl(2) that it does not recognize, it creates an M_IOCTL message and sends it down the Stream. An M_IOCTL message is a single M_IOCTL message block followed by zero or more M_DATA blocks. The M_IOCTL message block has the form of an iocblk(9S) structure. This structure contains the following elements.


int        ioc_cmd;              /* ioctls command type */
cred_t     *ioc_cr;              /* full credentials */
uint       ioc_id;               /* ioctl id */
uint       ioc_count;            /* byte cnt in data field */
int        ioc_error;            /* error code */
int        ioc_rval;             /* return value */

For an I_STR ioctl(2), ioc_cmd contains the command supplied by the user in the ic_cmd member of the strioctl structure defined in streamio(7I). For others, it is the value of the cmd argument in the call to ioctl(2). The ioc_cr field is the credentials of the user process.

The ioc_id field is a unique identifier used by the Stream head to identify the ioctl and its response messages.

The ioc_count field indicates the number of bytes of data associated with this ioctl request. If the value is greater than zero, there will be one or more M_DATA mblks linked to the M_IOCTL mblkb_cont field. If the value of the ioc_count field is zero, there will be no M_DATA mblk associated with the M_IOCTL mblk. If the value of ioc_count is equal to the special value TRANSPARENT, then there is one M_DATA mblk linked to this mblk, its contents will be the value of the argument passed to ioctl(2). This can be a user address or numeric value. (see "Transparent ioctl(2) Processing").

An M_IOCTL message is processed by the first module or driver that recognizes it. If a module does not recognize the command, it should pass it down. If a driver does not recognize the command, it should send a negative acknowledgment or M_IOCNAK message upstream. In all circumstances, if a module or driver processes an M_IOCTL message it must acknowledge it.

Modules must always pass unrecognized messages on. Drivers should nak unrecognized ioctl(2) messages and free any other unrecognized message.

If a module or driver finds an error in an M_IOCTL message for any reason, it must produce the negative acknowledgment message. To do this, set the message type to M_IOCNAK and send the message upstream. No data or return value can be sent. If ioc_error is set to 0, the Stream head causes the ioctl(2) to fail with EINVAL. The module can set ioc_error to an alternate error number optionally.

ioc_error can be set to a nonzero value in both M_IOCACK and M_IOCNAK. This causes the value to be returned as an error number to the process that sent the ioctl(2).

If a module checks what ioctl(2) of other modules below it are doing, the module should not just search for a specific M_IOCTL on the write side, but also look for M_IOCACK or M_IOCNAK on the read side. For example, the module's write side sees TCSETA (see termio(7I)) and records what is being set. The read-side processing knows that the module is waiting for an answer for the ioctl(2). When the read-side processing sees an ack or nak, it checks for the same ioctl(2) by checking the command (here TCSETA) and the ioc_id. If these match, the module can use the information previously saved.

If the module checks, for example, the TCSETA/TCGETA group of ioctl(2) calls as they pass up or down a Stream, it must never assume that because TCSETA comes down it actually has a data buffer attached to it. The user can form TCSETA as an I_STR call and accidentally given a NULL data buffer pointer. Always check b_cont to see if it is NULL before using it as an index to the data block that goes with M_IOCTL messages.

The TCGETA call, if formed as an I_STR call with a data buffer pointer set to a value by the user, always has a data buffer attached to b_cont from the main message block. Do not assume that the data block is missing and allocate a new buffer, then assign b_cont to point to it, because the original buffer will be lost.

STREAMS ioctl Issues

Regular device drivers have user context in the ioctl(9E) call. However, in a STREAMS driver or module, the only guarantee of user context is in the open(9E) and close(9E) routines. It is therefore necessary to have some indication of the calling context where data is used.

The notion of data models as well as new macros for handling data structure access are discussed in Writing Device Drivers. A STREAMS driver or module writer should use these flags and macros when dealing with structures that change size between data models.

A new flag value which represents the data model of the entity invoking the operation has been added to the ioc_flag field of the iocblk(9S) structure, the cq_flag of the copyreq(9S) structure, and the cp_flag of the copyresp(9S) structure.

The data model flag is one of these possibilities:

In addition, IOC_NATIVE is conditionally defined to match the data model of the kernel implementation.

By looking at the data model flag field of the relevant iocblk(9S), copyreq(9S), or copyresp(9S) structures, the STREAMS module can determine the best method of handling the data.


Caution - Caution -

The layout of the iocblk, copyreq, and copyresp structures is different between the 32-bit and 64-bit kernels. Be cautious of any data structure overloading in the cp_private, cq_private, or the cq_filler fields since alignment has changed.


I_STR ioctl(2) Processing

Neither the transparent nor nontransparent method implements ioctl(2) in the Stream head, but in the Streams driver or module itself. I_STR ioctl(2) (also referred to as nontransparent ioctl(2)) is created when a user requests an I_STR ioctl(2) and specifies a pointer to a strioctl structure as the argument. For example, assuming that fd is an open lp Streams device and LP_CRLF is a valid option, the user could make a request by issuing the following:

struct strioctl *str;
short lp_opt = LP_CRLF;

str.ic_cmd = SET_OPTIONS;
str.ic_timout = -1;
str.ic_dp = (char *)&lp_opt;
str.ic_len = sizeof (lp_opt)

ioctl(fd, I_STR, &str);

On receipt of the I_STR ioctl(2) request, the Stream head creates an M_IOCTL message. ioc_cmd is set to SET_OPTIONS, ioc_count is set to the value contained in ic_len (in this example sizeof (short)). An M_DATA mblk is linked to the M_IOCTL mblk and the data pointed to by ic_dp is copied into it (in this case LP_CRLF).

Example 8-6, illustrates processing associated with an I_STR ioctl(2). lpdoioctl is called by lp write-side put or service procedure to process M_IOCTL messages:


Example 8-6 I_STR ioctl(2)

static void
lpdoioctl (queue_t *q, mblk_t	 *mp)
{
		struct iocblk *iocp;
		struct lp *lp;

		lp = (struct lp *)q->q_ptr;

		/* 1st block contains iocblk structure */
		iocp = (struct iocblk *)mp->b_rptr;

		switch (iocp->ioc_cmd) {
			case SET_OPTIONS:
				/* Count should be exactly one short's worth
				 * (for this example) */
				if (iocp->ioc_count != sizeof(short))
					goto iocnak;
				if (mp->b_cont == NULL)
					goto lognak; /* not shown in this example */
				/* Actual data is in 2nd message block */
				iocp->ioc_error = lpsetopt (lp, *(short *)mp->b_cont->b_rptr)

				/* ACK the ioctl */
				mp->b_datap->db_type = M_IOCACK;
				iocp->ioc_count = 0;
				qreply(q, mp);
				break;
			default:
				iocnak:
				/* NAK the ioctl */
				mp->b_datap->db_type = M_IOCNAK;
				qreply(q, mp);
		}
	}

lpdoioctl illustrates driver M_IOCTL processing, which also applies to modules. In this example, only one command is recognized, SET_OPTIONS. ioc_count contains the number of user-supplied data bytes. For this example, ioc_count must equal the size of a short.

Once the command has been verified [lines 20-24], lpsetopt (not shown here) is called to process the request [lines 26-27]. lpsetopt returns 0 if the request is satisfied, otherwise an error number is returned.

If ioc_error is nonzero, on receipt of the acknowledgment the Stream head returns -1 to the application's ioctl(2) request and sets errno to the value of ioc_error. The ioctl(2) is acknowledged [lines 30-33). This includes changing the M_IOCTL message type to M_IOCACK and setting the ioc_count field to zero to indicate that no data is to be returned to the user. Finally, the message is sent upstream using qreply(9F).

If ioc_count was left nonzero, the Stream head would copy that many bytes from the second through the nth message blocks into the user buffer. You must set ioc_count if you want to pass any data back to the user.

This example is for a driver. In the default case, for unrecognized commands, or for malformed requests, a nak is generated [lines 34-38). This is done by changing the message type to an M_IOCNAK and sending it back up stream. A module does not acknowledge (nak) an unrecognized command, but passes the message on. A module does not acknowledge (nak) a malformed request.

Transparent ioctls

Transparent ioctl's are used from within module to tell the stream head to perform a copyin() or copyout() on behalf of the module. It is important for the stream head to have knowledge of the data model of the caller in order to process the copyin and copyout properly. The user should use the ioctl macros as described in Writing Device Drivers when coding a STREAMS module that uses Transparent ioctls.

Transparent ioctl(2) Messages

The transparent STREAMS ioctl(2) mechanism is needed because user context does not exist in modules and drivers when an ioctl(2) is processed. This prevents them from using the kernel ddi_copyin/ddi_copyout functions.

Transparent ioctl(2) also let an application be written using conventional ioctl(2) semantics instead of the I_STR ioctl(2) and an strioctl structure. The difference between transparent and nontransparent ioctl(2)

ioctl(2) processing in a Streams driver and module is the way data is transferred from user to kernel space.

The transparent ioctl(2) mechanism allows backward compatibility for older programs. This transparency only works for modules and drivers that support transparent ioctl(2). Trying to use transparent ioctl(2) on a Stream that doesn't support them makes the driver send an error message upstream, causing the ioctl to fail.

The following example illustrates the semantic difference between a nontransparent and transparent ioctl(2). A module that allows arbitrary character translations is pushed on the Stream The ioctl(2) specifies the translation to do, and in this case all uppercase vowels are changed to lowercase. A transparent ioctl(2) uses XCASE instead of I_STR to inform the module directly.

Assume that fd points to a Streams device and that the conversion module has been pushed onto it. Use a nontransparent I_STR command to inform the module to change the case of AEIOU. The semantics of this command are:

strioctl.ic_cmd = XCASE;
strioctl.ic_timout = 0;
strioctl.ic_dp = "AEIOU"
strioctl.ic_len = strlen(strioctl.ic_dp);
ioctl(fd,I_STR, &strioctl);

When the Stream head receives the I_STR ioctl(2) it creates an M_IOCTL message with the ioc_cmd set to XCASE and the data specified by ic_dp. AEIOU is copied into the first mblk following the M_IOCTL mblk.

The same ioctl(2) specified as a transparent ioctl(2) is called as follows:


ioctl(fd, XCASE, "AEIOU");

The Stream head creates an M_IOCTL message with the ioc_cmd set to XCASE, but the data is not copied in. Instead, ioc_count is set to TRANSPARENT and the address of the user data is placed in the first mblk following the M_IOCTL mblk. The module then requests the Stream head to copy in the data ("AEIOU") from user space.

Unlike the nontransparent ioctl(2), which can specify a timeout parameter, transparent ioctl(2)s block until processing is complete.


Caution - Caution -

Incorrectly written drivers can cause applications using transparent ioctl(2) to block indefinitely.


Notice that even though this process is simpler in the application, transparent ioctl adds considerable complexity to modules and drivers, and additional overhead to the time required to process the request.

The form of the M_IOCTL message generated by the Stream head for a transparent ioctl(2) is a single M_IOCTL message block followed by one M_DATA block. The form of the iocblk(9S) structure in the M_IOCTL block is the same as described under General ioctl(2) processing. However, ioc_cmd is set to the value of the command argument in ioctl(2) and ioc_count is set to the special value of TRANSPARENT. The value TRANSPARENT distinguishes when an I_STR ioctl(2) can specify a value of ioc_cmd that is equivalent to the command argument of a transparent ioctl(2). The b_cont block of the message contains the value of the arg parameter in the call.


Caution - Caution -

If a module processes a specific ioc_cmd and does not validate the ioc_count field of the M_IOCTL message, it breaks when transparent ioctl(2) is performed with the same command.



Note -

Write modules and drivers to support both transparent and I_STR ioctl(2).


All M_IOCTL message types (M_COPYIN, M_COPYOUT, M_IOCDATA,M_IOCACK and M_IOCNACK) have some similar data structures and sizes. Reuse these structures instead of reallocating them. Note the similarities in the command type, credentials, and id.

The iocblk(9S) structure is contained in M_IOCTL, M_IOCACK and M_IOCNAK message types. For the transparent case, M_IOCTL has one M_DATA message linked to it. This message contains a copy of the argument passed to ioctl(2). Transparent processing of M_IOCACK and M_IONAK does not allow any messages to be linked to them.

The copyreq(9S) structure is contained in M_COPYIN and M_COPYOUT message types. The M_COPYIN message type must not have any other message linked to it (that is, b_cont == NULL). The M_COPYOUT message type must have one or more M_DATA messages linked to it. These messages contain the data to be copied into user space.

The copyresp(9S) structure is contained in M_IOCDATA response message types. These messages are generated by the Stream head in response to an M_COPYIN or M_COPYOUT request. If the message is in response to an M_COPYOUT request, the message has no messages attached to it (b_cont is NULL). If the response is to an M_COPYIN, then zero or more M_DATA message types are attached to the M_IOCDATA message. These attached messages contain a copy of the user data requested by the M_COPYIN message.

The iocblk(9S), copyreq(9S), and copyresp(9S) structures contain a field indicating the type of ioctl(2) command, a pointer to the user's credentials, and a unique identifier for this ioctl(2). These fields must be preserved.

The structure member cq_private is reserved for use by the module. M_COPYIN and M_COPYOUT request messages contain a cq_private field that can be set to contain state information for ioctl(2) processing (which identifies what the subsequent M_IOCDATA response message contains). This state is returned in cp_private in the M_IOCDATA message. This state information determines the next step in processing the message. Keeping the state in the message makes the message self-describing and simplifies the ioctl(2) processing.

For each piece of data the module copies from user space an M_COPYIN message is sent to the Stream head. The M_COPYIN message specifies the user address (cq_addr) and number of bytes (cq_size) to copy from user space. The Stream head responds to the M_COPYIN request with a M_IOCDATA message. The b_cont field of the M_IOCDATA mblk contains the contents pointed to by the M_COPYIN request. Likewise, for each piece of data the module copies to user space, an M_COPYOUT message is sent to the Stream head. Specify the user address (cq_addr) and number of bytes to copy (cq_size). The data to be copied is linked to the M_COPYOUT message as one or more M_DATA messages. The Stream head responds to M_COPYOUT requests with an M_IOCDATA message, but b_cont is null.

After the module has completed processing the ioctl (that is, all M_COPYIN and M_COPYOUT requests have been processed), the ioctl(2) must be acknowledged with an M_IOCACK to indicate successful completion of the command or an M_IOCNAK to indicate failure.

If an error occurs when attempting to copy data to or from user address space, the Stream head will set cp_rval in the M_IOCDATA message to the error number. In the event of such an error, the M_IOCDATA message should be freed by the module or driver. No acknowledgement of the ioctl(2) is sent in this case.

Transparent ioctl(2) Examples

Following are three examples of transparent ioctl(2) processing. The first illustrates M_COPYIN to copy data from user space. The second illustrates M_COPYOUT to copy data to user space. The third is a more complex example showing state transitions that combine M_COPYIN and M_COPYOUT.

In these examples the message blocks are reused to avoid the overhead of allocating, copying, and releasing message.. This is standard practice.

The Stream head guarantees that the size of the message block containing an iocblk(9S) structure is large enough to also hold the copyreq(9S) and copyresp(9S) structures.

M_COPYIN Example


Note -

Please see copyin() section in the Writing Device Driversfor information on the 64-bit data structure macros.


Example 8-7 illustrates only the processing of a transparent ioctl(2) request (nontransparent request processing is not shown). In this example, the contents of a user buffer are to be transferred into the kernel as part of an ioctl call of the form


ioctl(fd, SET_ADDR, (caddr_t) &bufadd);

where bufadd is a struct address whose elements are:


struct address {	
     int            ad_len;;          /* buffer length in bytes */
     caddr_t        ad_addr;          /* buffer address */
};

This requires two pairs of messages (request and response) following receipt of the M_IOCTL message: the first copyin(9F)s the structure (address) and the second copyin(9F) the buffer (address.ad.addr). Two states are maintained and processed in this example: GETSTRUCT is for copying in the address structure, and GETADDR for copying in the ad_addr of the structure.

xxwput verifies that the SET_ADDR is TRANSPARENT to avoid confusion with an I_STR ioctl(2), which uses a value of ioc_cmd equivalent to the command argument of a transparent ioctl(2). This is done by checking if the size count is equal to TRANSPARENT[line 28]. If it is equal to TRANSPARENT, then the message was generated from a transparent ioctl(2); that is not from an I_STR ioctl(2)[line 29-32].


Example 8-7 M_COPYIN Example

	struct address {			/* same members as in user space */
		int	ad_len;	/* length in bytes */
		caddr_t	ad_addr;	/* buffer address */
	};

	/* state values (overloaded in private field) */
	#define GETSTRUCT 0			/* address structure */
	#define GETADDR	 1		/* byte string from ad_addr */

	static void xxioc(queue_t *q, mblk_t *mp);

	static int
	xxwput(q, mp)
		queue_t *q;		/* write queue */
		mblk_t *mp;
	{
		struct iocblk *iocbp;
		struct copyreq *cqp;

		switch (mp->b_datap->db_type) {
			.
			.
			.
			case M_IOCTL:
				/* Process ioctl commands */
				iocbp = (struct iocblk *)mp->b_rptr;
				switch (iocbp->ioc_cmd) {
					case SET_ADDR;
						if (iocbp->ioc_count != TRANSPARENT) {
							/* do non-transparent processing here
							 *       (not shown here) */
						} else {
							/* ioctl command is transparent 
							 * Reuse M_IOCTL block for first M_COPYIN request 
							 * of address structure */
							cqp = (struct copyreq *)mp->b_rptr;
							/* Get user space structure address from linked 
							 * M_DATA block */
							cqp->cq_addr = *(caddr_t *) mp->b_cont->b_rptr;
							cqp->cq_size = sizeof(struct address);
							/* MUST free linked blks */
							freemsg(mp->b_cont);
							mp->b_cont = NULL;

							/* identify response */
							cqp->cq_private = (mblk_t *)GETSTRUCT;

							/* Finish describing M_COPYIN message */
							cqp->cq_flag = 0;
							mp->b_datap->db_type = M_COPYIN;
							mp->b_wptr = mp->b_rptr + sizeof(struct copyreq);
							qreply(q, mp);
						break;
					default: /* M_IOCTL not for us */
						/* if module, pass on */
						/* if driver, nak ioctl */
						break;
				} /* switch (iocbp->ioc_cmd) */
				break;
			case M_IOCDATA:
				/* all M_IOCDATA processing done here */
				xxioc(q, mp);
				break;
		}
		return (0);
	}

The transparent part of the SET_ADDR M_IOCTL message processing requires the address structure to be copied from user address space. To accomplish this, it issues an M_COPYIN request to the Stream head [lines 37-64].

The mblk is reused and mapped into a copyreq(9S) structure [line 42]. The user space address of bufadd is contained in the b_cont of the M_IOCTL mblk. This address and its size are copied into the copyreq(9S) message [lines 47-49]. The b_cont of the copy request mblk is not needed, so it is freed and then NULLed [lines 51-52].


Caution - Caution -

The layout of the iocblk, copyreq, and copyresp structures is different between 32-bit and 64-bit kernels. Be cautious of any data structure overloading in the cp_private or the cq_filler fields since alignment has changed.


On receipt of the M_IOCDATA message for the SET_ADDR command, xxioc routine checks cp_rval. If an error occurred during the copyin operation, cp_rval is set. The mblk is freed [lines 93-96] and, if necessary, xxioc cleans up from previous M_IOCTL requests, freeing memory, resetting state variables, and so on. The Stream head returns the appropriate error to the user.

The cp_private field is set to GETSTRUCT [lines 97-99]. This indicates that the linked b_cont mblk contains a copy of the user's address structure. The example then copies the actual address specified in address.ad_addr.

The program issues another M_COPYIN request to the Stream head [lines 100-116], but this time cq_private contains GETADDR to indicate that the M_IOCDATA response will contain a copy of address.ad_addr. The Stream head copies the information at the requested user address and sends it downstream in another, final M_IOCDATA message.

The final M_IOCDATA message arrives from the Stream head. cp_private contains GETADDR [line 118]. The ad_addr data is contained in the b_cont link of the mblk. If the address is successfully processed by xx_set_addr (not shown here), the message is acknowledged with a M_IOCACK message [lines 124-128]. If xx_set_addr fails, the message is rejected with an M_IOCNAK message [lines 121-122]. xx_set_addr is a routine (not shown in the example) that processes the user address from the ioctl(2).

After the final M_IOCDATA message is processed, the module acknowledges the ioctl(2), to let the Stream head know that processing is complete. This is done by sending an M_IOCACK message upstream if the request was successfully processed. Always zero ioc_error, otherwise an error code could be passed to the user application. ioc_rval and ioc_count are also zeroed to reflect that a return value of 0 and no data is to be passed up [lines 124-128].

If the request cannot be processed, either an M_IOCNAK or M_IOCACK can be sent upstream with an appropriate error number. When sending an M_IOCNAK or M_IOCACK, freeing the linked M_DATA block is not mandatory, but is more efficient, as the Stream head frees it.

If ioc_error is set in an M_IOCNAK or M_IOCNACK message, this error code will be returned to the user. If no error code is set in an M_IOCNAK message, EINVAL will be returned to the user.

	xxioc(queue_t *q, mblk_t *mp)			/* M_IOCDATA processing */
	{
		struct iocblk *iocbp;
		struct copyreq *cqp;
		struct copyresp *csp;
		struct address *ap;

		csp = (struct copyresp *)mp->b_rptr;
		iocbp = (struct iocblk *)mp->b_rptr;

		/* validate this M_IOCDATA is for this module */
		switch (csp->cp_cmd) {
			case SET_ADDR:
				if (csp->cp_rval){ /*GETSTRUCT or GETADDRfail*/
					freemsg(mp);
					return;
				}
				switch ((int)csp->cp_private){ /*determine state*/
					case GETSTRUCT:					/* user structure has arrived */
						/* reuse M_IOCDATA block */
						mp->b_datap->db_type = M_COPYIN;
						mp->b_wptr = mp->b_rptr + sizeof (struct copyreq);
						cqp = (struct copyreq *)mp->b_rptr;
						/* user structure */
						ap = (struct address *)mp->b_cont->b_rptr;
						/* buffer length */
						cqp->cq_size = ap->ad_len;
						/* user space buffer address */
						cqp->cq_addr = ap->ad_addr;
						freemsg(mp->b_cont);
						mp->b_cont = NULL;
						cqp->cq_flag = 0;
						cqp->cp_private=(mblk_t *)GETADDR;  /*nxt st*/
						qreply(q, mp);
						break;

					case GETADDR:						/* user address is here */
						/* hypothetical routine */
						if (xx_set_addr(mp->b_cont) == FAILURE) {
							mp->b_datap->db_type = M_IOCNAK;
							iocbp->ioc_error = EIO;
						} else {
							mp->b_datap->db_type=M_IOCACK;/*success*/
							/* can have been overwritten */
							iocbp->ioc_error = 0;
							iocbp->ioc_count = 0;
							iocbp->ioc_rval = 0;
						}
						mp->b_wptr=mp->b_rptr + sizeof (struct ioclk);
						freemsg(mp->b_cont);
						mp->b_cont = NULL;
						qreply(q, mp);
						break;

					default: /* invalid state: can't happen */
						freemsg(mp->b_cont);
						mp->b_cont = NULL;
						mp->b_datap->db_type = M_IOCNAK;
						mp->b_wptr = mp->rptr + sizeof(struct iocblk);
						/* can have been overwritten */
						iocbp->ioc_error = EINVAL;
						qreply(q, mp);
						break;
				}
				break;						/* switch (cp_private) */

			default: /* M_IOCDATA not for us */
				/* if module, pass message on */
				/* if driver, free message */
				break;

M_COPYOUT Example


Note -

Please see copyout() section in the Writing Device Driversfor information on the 64-bit data structure macros.


Example 8-8 returns option values for this Stream device by placing them in the user's options structure. This is done by a transparent ioctl(2) call of the form


struct options optadd;

ioctl(fd, GET_OPTIONS,(caddr_t) &optadd) 

or by an I_STR call

	struct strioctl opts_strioctl;
	structure options optadd;

	opts_strioctl.ic_cmd = GET_OPTIONS;
	opts_strioctl.ic_timeout = -1
	opts_strioctl.ic_len = sizeof (struct options);
	opts_strioctl.ic_dp = (char *)&optadd;
	ioctl(fd, I_STR, (caddr_t) &opts_strioctl) 

In the I_STR case, opts_strioctl.ic_dp points to the options structure, optadd.

Example 8-8 illustrates support of both the I_STR and transparent forms of ioctl(2). The transparent form requires a single M_COPYOUT message following receipt of the M_IOCTL to copyout the contents of the structure. xxwput is the write-side put procedure of module or driver xx.

This example first checks if the ioctl(2) command is transparent [line 22]. If it is, the message is reused as an M_COPYOUT copy request message [lines 24-32]. The pointer to the receiving buffer is in the linked message and is copied into cq_addr [lines 26-27]. Since only a single copy out is being done, no state information needs to be stored in cq_private. The original linked message is freed, in case it isn't big enough to hold the request [lines 32-33]. As an optimization, the following code checks the size of the message for reuse:


mp->;b_cont->b_datap->db_lim - mp->b_cont->b_datap->db_base >= 
sizeof (struct options)

A new linked message is allocated to hold the option request [lines 32-40]. When using the transparent ioctl(2) the M_COPYOUT command data contained in the linked message is passed to the Stream head. The Stream head will copy the data to the user's address space and issue an M_IOCDATA in response to the M_COPYOUT message, which the module must acknowledge in a M_IOCACK message [lines 59-73]. ioc_error, ioc_count, and ioc_rval are cleared to prevent any stale data from being passed back to the Stream head [lines 69-71].

If the message is not transparent (is issued through an I_STR ioctl(2)) the data is sent with the M_IOCACK acknowledgment message and copied into the buffer specified by the strioctl data structure [lines 50-51].


Example 8-8 M_COPYOUT Example

	struct options {						/* same members as in user space */
		int			op_one;
		int			op_two;
		short			op_three;
		long			op_four;
	};

	static int
	xxwput (queue_t *q, mblk_t *mp)
	{
		struct iocblk *iocbp;
		struct copyreq *cqp;
		struct copyresp *csp;
		int transparent = 0;

		switch (mp->b_datap->db_type) {
			.
			.
			.
			case M_IOCTL:
				iocbp = (struct iocblk *)mp->b_rptr;
				switch (iocbp->ioc_cmd) {
					case GET_OPTIONS:
						if (iocbp->ioc_count == TRANSPARENT) {
							transparent = 1;
							cqp = (struct copyreq *)mp->b_rptr;
							cqp->cq_size = sizeof(struct options);
							/* Get struct address from linked M_DATA block */
							cqp->cq_addr = (caddr_t) 
														*(caddr_t *)mp->b_cont->b_rptr;
							cqp->cq_flag = 0;
							/* No state necessary - we will only ever get one 
							 * M_IOCDATA from the Stream head indicating success or 
							 * failure for the copyout */
						}
						if (mp->b_cont)
							freemsg(mp->b_cont);
						if ((mp->b_cont = 
									allocb(sizeof(struct options), BPRI_MED)) == NULL) {
							mp->b_datap->db_type = M_IOCNAK;
							iocbp->ioc_error = EAGAIN;
							qreply(q, mp);
							break;
						}
						/* hypothetical routine */
						xx_get_options(mp->b_cont);
						if (transparent) {
							mp->b_datap->db_type = M_COPYOUT;
							mp->b_wptr = mp->b_rptr + sizeof(struct copyreq);
						} else {
							mp->b_datap->db_type = M_IOCACK;
							iocbp->ioc_count = sizeof(struct options);
						}
						qreply(q, mp);
						break;

					default: /* M_IOCTL not for us */
						/*if module, pass on;if driver, nak ioctl*/
						break;
				} /* switch (iocbp->ioc_cmd) */
				break;

			case M_IOCDATA:
				csp = (struct copyresp *)mp->b_rptr;
				/* M_IOCDATA not for us */
				if (csp->cmd != GET_OPTIONS) {
					/*if module/pass on, if driver/free message*/
					break;
				}
				if ( csp->cp_rval ) {
					freemsg(mp);	/* failure */
					return (0);
				}
				/* Data successfully copied out, ack */

				/* reuse M_IOCDATA for ack */
				mp->b_datap->db_type = M_IOCACK;
				mp->b_wptr = mp->b_rptr + sizeof(struct iocblk);
				/* can have been overwritten */
				iocbp->ioc_error = 0;
				iocbp->ioc_count = 0;
				iocbp->ioc_rval = 0;
				qreply(q, mp);
				break;
				.
				.
				.
			} /* switch (mp->b_datap->db_type) */
			return (0);

Bidirectional Transfer Example

Example 8-9 illustrates bidirectional data transfer between the kernel and application during transparent ioctl(2) processing. It also shows how to use more complex state information.

The user wants to send and receive data from user buffers as part of a transparent ioctl(2) call of the form



	ioctl(fd, XX_IOCTL, (caddr_t) &addr_xxdata) 

Three pairs of messages are required following the M_IOCTL message: the first to copyin the structure; the second to copyin one user buffer; and the third to copyout the second user buffer. xxwput is the write-side put procedure for module or driver xx:


Example 8-9 Bidirectional Transfer

struct xxdata {             /* same members in user space */
   int         x_inlen;     /* number of bytes copied in */
   caddr_t     x_inaddr;    /* buf addr of data copied in */
   int         x_outlen;    /* number of bytes copied out */
   caddr_t     x_outaddr;   /* buf addr of data copied out */
};
/* State information for ioctl processing */
struct state {
		int         st_state;    /* see below */
		struct xxdata		st_data;				/* see above */
};
/* state values */

#define GETSTRUC     0   /* get xxdata structure */
#define GETINDATA    1   /*get data from x_inaddr */
#define PUTOUTDATA   2   /* get response from M_COPYOUT */

static void xxioc(queue_t *q, mblk_t *mp);

static int
xxwput (queue_t *q, 	mblk_t *mp) {
		struct iocblk *iocbp;
		struct copyreq *cqp;
		struct state *stp;
		mblk_t *tmp;

		switch (mp->b_datap->db_type) {
			.
			.
			.
			case M_IOCTL:
				iocbp = (struct iocblk *)mp->b_rptr;
				switch (iocbp->ioc_cmd) {
				case XX_IOCTL:
				/* do non-transparent processing. (See I_STR ioctl
				 * processing discussed in previous section.)
				 */
				/*Reuse M_IOCTL block for M_COPYIN request*/

				cqp = (struct copyreq *)mp->b_rptr;

				/* Get structure's user address from
				 * linked M_DATA block */

				cqp->cq_addr = (caddr_t)
				 *(long *)mp->b_cont->b_rptr;
				freemsg(mp->b_cont);
				mp->b_cont = NULL;

				/* Allocate state buffer */

				if ((tmp = allocb(sizeof(struct state),
				 BPRI_MED)) == NULL) {
						mp->b_datap->db_type = M_IOCNAK;
						iocbp->ioc_error = EAGAIN;
						qreply(q, mp);
						break;
				}
				tmp->b_wptr += sizeof(struct state);
				stp = (struct state *)tmp->b_rptr;
				stp->st_state = GETSTRUCT;
				cqp->cq_private = tmp;

				/* Finish describing M_COPYIN message */

				cqp->cq_size = sizeof(struct xxdata);
				cqp->cq_flag = 0;
				mp->b_datap->db_type = M_COPYIN;
				mp->b_wptr=mp->b_rptr+sizeof(struct copyreq);
				qreply(q, mp);
				break;

			default: /* M_IOCTL not for us */
				/* if module, pass on */
				/* if driver, nak ioctl */
				break;

			} /* switch (iocbp->ioc_cmd) */
			break;

	case M_IOCDATA:
			xxioc(q, mp);/*all M_IOCDATA processing here*/
			break;
			.
			.
			.
	} /* switch (mp->b_datap->db_type) */
}

xxwput allocates a message block to contain the state structure and reuses the M_IOCTL to create an M_COPYIN message to read in the xxdata structure.

M_IOCDATA processing is done in xxioc():

xxioc(										/* M_IOCDATA processing */
	queue_t *q,
	mblk_t *mp)
{
	struct iocblk *iocbp;
	struct copyreq *cqp;
	struct copyresp *csp;
	struct state *stp;
	mblk_t *xx_indata();

	csp = (struct copyresp *)mp->b_rptr;
	iocbp = (struct iocblk *)mp->b_rptr;
	switch (csp->cp_cmd) {

	case XX_IOCTL:
			if (csp->cp_rval) { /* failure */
				if (csp->cp_private) /* state structure */
						freemsg(csp->cp_private);
				freemsg(mp);
				return;
			 }
			stp = (struct state *)csp->cp_private->b_rptr;
			switch (stp->st_state) {

			case GETSTRUCT:					/* xxdata structure copied in */
					/* save structure */

				stp->st_data =
				 *(struct xxdata *)mp->b_cont->b_rptr;
				freemsg(mp->b_cont);
				mp->b_cont = NULL;
				/* Reuse M_IOCDATA to copyin data */
				mp->b_datap->db_type = M_COPYIN;
				cqp = (struct copyreq *)mp->b_rptr;
				cqp->cq_size = stp->st_data.x_inlen;
				cqp->cq_addr = stp->st_data.x_inaddr;
				cqp->cq_flag = 0;
				stp->st_state = GETINDATA; /* next state */
				qreply(q, mp);
				break;

			case GETINDATA: /* data successfully copied in */
				/* Process input, return output */
				if ((mp->b_cont = xx_indata(mp->b_cont))
				 == NULL) { /* hypothetical */
							/* fail xx_indata */
							mp->b_datap->db_type = M_IOCNAK;
							mp->b_wptr = mp->b_rptr +
								sizeof(struct iocblk);
						iocbp->ioc_error = EIO;
						qreply(q, mp);
						break;
				}
				mp->b_datap->db_type = M_COPYOUT;
				cqp = (struct copyreq *)mp->b_rptr;
				cqp->cq_size = min(msgdsize(mp->b_cont),
				 stp->st_data.x_outlen);
				cqp->cq_addr = stp->st_data.x_outaddr;
				cqp->cq_flag = 0;
				stp->st_state = PUTOUTDATA; /* next state */
				qreply(q, mp);
				break;
			case PUTOUTDATA: /* data copied out, ack ioctl */
				freemsg(csp->cp_private); /*state structure*/
				mp->b_datap->db_type = M_IOCACK;
				mp->b_wtpr = mp->b_rptr + sizeof (struct iocblk);
c				/* can have been overwritten */
				iocbp->ioc_error = 0;
				iocbp->ioc_count = 0;
				iocbp->ioc_rval = 0;
				qreply(q, mp);
				break;

			default: /* invalid state: can't happen */
				freemsg(mp->b_cont);
				mp->b_cont = NULL;
				mp->b_datap->db_type = M_IOCNAK;
				mp->b_wptr=mp->b_rptr + sizeof (struct iocblk);
				iocbp->ioc_error = EINVAL;
				qreply(q, mp);
				break;
			} /* switch (stp->st_state) */
			break;
	default: /* M_IOCDATA not for us */
			/* if module, pass message on */
			/* if driver, free message */
			break;
	} /* switch (csp->cp_cmd) */
}

At case GETSTRUCT, the user xxdata structure is copied into the module's state structure (pointed to by cp_private in the message) and the M_IOCDATA message is reused to create a second M_COPYIN message to read the user data. At case GETINDATA, the input user data is processed by xx_indata (not supplied in the example), which frees the linked M_DATA block and returns the output data message block. The M_IOCDATA message is reused to create an M_COPYOUT message to write the user data. At case PUTOUTDATA, the message block containing the state structure is freed and an acknowledgment is sent upstream.

Care must be taken at the "can't happen" default case since the message block containing the state structure (cp_private) is not returned to the pool because it might not be valid. This might result in a lost block. The ASSERT helps find errors in the module if a "can't happen" condition occurs.

I_LIST ioctl(2)Example

The I_LIST ioctl(2) lists the drivers and module in a Stream.

(Available as I-LIST2 file)

#include <stdio.h>
#include <string.h>
#include <stropts.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/socket.h>

main(int argc, const char **argv)
{
		int               s, i;
		int               mods;
		struct str_list   mod_list;
		struct str_mlist *mlist;

		/* Get a socket... */
		if((s = socket(AF_INET, SOCK_STREAM, 0)) <= 0) {
			perror("socket: ");
			exit(1);
		}

		/* Determine the number of modules in the stream. */
		if((mods = ioctl(s, I_LIST, 0)) < 0){
			perror("I_LIST ioctl");
		}
		if(mods == 0) {
			printf("No modules\n");
			exit(1);
		} else {
			printf("%d modules\n", mods);
		}
		/* Allocate memory for all of the module names... */
		mlist = (struct str_mlist *) calloc(mods, sizeof(struct str_mlist));
		if(mlist == 0){
			perror("malloc failure");
			exit(1);
		}
		mod_list.sl_modlist = mlist;
		mod_list.sl_nmods = mods;

		/* Do the ioctl and get the module names. */
		if(ioctl(s, I_LIST, &mod_list) < 0){
			perror("I_LIST ioctl fetch");
			exit(1);
		}

		/* Print out the name of the modules */
		for(i = 0; i < mods; i++) {
			printf("s: %s\n", mod_list.sl_modlist[i].l_name);
		}

		free(mlist);

		exit(0);
}

Flush Handling

All modules and drivers are expected to handle M_FLUSH messages. An M_FLUSH message can originate at the Stream head or from a module or a driver. The user can cause data to be flushed from queued messages of a Stream by submiting an I_FLUSH ioctl(2). Data can be flushed from the read side, write side, or both sides of a Stream.



ioctl(fd,I_FLUSH, arg);

The first byte of the M_FLUSH message is an option flag that can have values described in Table 8-1.

Table 8-1 M_FLUSH Arguments and bi_flag Values

Flag 

Description

FLUSHR

Flush read side of Stream 

FLUSHW

Flush write queue  

FLUSHRW

Flush both, read and write, queues 

FLUSHBAND

Flush a specified priority band only 

Flushing Priority Bands

In addition to being able to flush all the data from a queue, a specific band can be flushed using the I_FLUSHBAND ioctl(2).



ioctl(fd, I_FLUSHBAND, bandp); 

The ioctl(2) is passed a pointer to a bandinfo structure. The bi_pri field indicates the band priority to be flushed (from 0 to 255]. The bi_flag field is used to indicate the type of flushing to be done. The legal values for bi_flag are defined in Table 8-1. bandinfo has the following format:


struct bandinfo {
		unsigned char       bi_pri;
		in                  bi_flag;
};

See "M_FLUSH" for details on how modules and drivers should handle flush band requests.

Figure 8-1 and Figure 8-2 further demonstrate flushing the entire Stream due to a line break. Figure 8-1 shows the flushing of the write-side of a Stream, and Figure 8-2 shows the flushing of the read-side of a Stream. In the figures, dotted boxes indicate flushed queues.

Figure 8-1 Flushing the Write-Side of a Stream

Graphic

The following takes place (dotted lines mean flushed queues):

  1. A break is detected by a driver.

  2. The driver generates an M_BREAK message and sends it upstream.

  3. The module translates the M_BREAK into an M_FLUSH message with FLUSHW set, then sends it upstream.

  4. The Stream head does not flush the write queue (no messages are ever queued there).

  5. The Stream head turns the message around (sends it down the write-side).

  6. The module flushes its write queue.

  7. The message is passed downstream.

  8. The driver flushes its write queue and frees the message.

    Figure 8-2 shows flushing the read-side of a Stream.

    Figure 8-2 Flushing the Read-Side of a Stream

    Graphic

    The events taking place are:

  1. After generating the first M_FLUSH message, the module generates an M_FLUSH with FLUSHR set and sends it downstream.

  2. The driver flushes its read queue.

  3. The driver turns the message around (sends it up the read-side).

  4. The module flushes its read queue.

  5. The message is passed upstream.

  6. The Stream head flushes the read queue and frees the message.

    The following code shows line discipline module flush handling.

    static int
    ld_put(
     	queue_t *q,						/* pointer to read/write queue */
     	mblk_t *mp)						/* pointer to message being passed */
    {
     	switch (mp->b_datap->db_type) {
     		default:
     			putq(q, mp); /* queue everything */
    			return (0);					 /* except flush */
    
     		case M_FLUSH:
     			if (*mp->b_rptr & FLUSHW)					/* flush write q */
     					flushq(WR(q), FLUSHDATA);
    
     			if (*mp->b_rptr & FLUSHR)					/* flush read q */
     					flushq(RD(q), FLUSHDATA);
    
     			putnext(q, mp);											/* pass it on */
     			return(0);
     	}
    }

The Stream head turns around the M_FLUSH message if FLUSHW is set (FLUSHR is cleared). A driver turns around M_FLUSH if FLUSHR is set (should mask off FLUSHW).

Flushing Priority Band

The bi_flag field is one of FLUSHR, FLUSHW, or FLUSHRW.

The following example shows flushing according to the priority band.

queue_t *rdq;								/* read queue */
queue_t *wrq;								/* write queue */

	case M_FLUSH:
		if (*bp->b_rptr & FLUSHBAND) {
			if (*bp->b_rptr & FLUSHW)
				flushband(wrq, FLUSHDATA, *(bp->b_rptr + 1));
			if (*bp->b_rptr & FLUSHR)
				flushband(rdq, FLUSHDATA, *(bp->b_rptr + 1));
		} else {
			if (*bp->b_rptr & FLUSHW)
				flushq(wrq, FLUSHDATA);
			if (*bp->b_rptr & FLUSHR)
				flushq(rdq, FLUSHDATA);
		}
		/*
		 * modules pass the message on;
		 * drivers shut off FLUSHW and loop the message
		 * up the read-side if FLUSHR is set; otherwise,
		 * drivers free the message.
		 */
		break;

Note that modules and drivers are not required to treat messages as flowing in separate bands. Modules and drivers can view the queue having only two bands of flow, normal and high priority. However, the latter alternative flushes the entire queue whenever an M_FLUSH message is received.

One use of the field b_flag of the msgb structure is provided to give the Stream head a way to stop M_FLUSH messages from being reflected forever when the Stream is used as a pipe. When the Stream head receives an M_FLUSH message, it sets the MSGNOLOOP flag in the b_flag field before reflecting the message down the write-side of the Stream. If the Stream head receives an M_FLUSH message with this flag set, the message is freed rather than reflected.

Figure 8-3 Interfaces Affecting Drivers

Graphic

The set of STREAMS utilities available to drivers are listed in Appendix B, STREAMS Utilities. No system-defined macros that manipulate global kernel data or introduce structure-size dependencies are permitted in these utilities. So, some utilities that have been implemented as macros in the prior Solaris system releases are implemented as functions in the SunOS 5 System. This does not preclude the existence of both macro and function versions of these utilities. It is intended that driver source code include a header file that picks up function declarations while the core operating system source includes a header file that defines the macros. With the DKI interface, the following STREAMS utilities are implemented as C programming language functions: datamsg(9F), OTHERQ(9F), putnext(9F), RD(9F), and WR(9F).

Replacing macros such as RD with function equivalents in the driver source code allows driver objects to be insulated from changes in the data structures and their size, increasing the useful lifetime of driver source code and objects. Multithreaded drivers are also protected against changes in implementation-specific STREAMS synchronization.

The DKI defines an interface suitable for drivers and there is no need for drivers to access global kernel data structures directly. The kernel function drv_getparm(9F) fetches information from these structures. This restriction has an important consequence. Since drivers are not permitted to access global kernel data structures directly, changes in the contents/offsets of information within these structures will not break objects.

Driver and Module Service Interfaces

STREAMS provides the means to implement a service interface between any two components in a Stream, and between a user process and the topmost module in the Stream. A service interface is defined at the boundary between a service user and a service provider (see Figure 8-4). A service interface is a set of primitives. The rules that define a service and the allowable state transitions that result as these primitives are passed between the user and the provider. These rules are typically represented by a state machine. In STREAMS, the service user and provider are implemented in a module, driver, or user process. The primitives are carried bidirectionally between a service user and provider in M_PROTO and M_PCPROTO messages.

PROTO messages (M_PROTO and M_PCPROTO) can be multiblock, with the second through last blocks of type M_DATA. The first block in a PROTO message contains the control part of the primitive in a form agreed upon by the user and provider. The block is not intended to carry protocol headers. (Although its use is not recommended, upstream PROTO messages can have multiple PROTO blocks at the start of the message. getmsg(2) compacts the blocks into a single control part when sending to a user process.) The M_DATA block contains any data part associated with the primitive. The data part can be processed in a module that receives it, or it can be sent to the next Stream component, along with any data generated by the module. The contents of PROTO messages and their allowable sequences are determined by the service interface specification.

PROTO messages can be sent bidirectionally (upstream and downstream) on a Stream and between a Stream and a user process. putmsg(2) and getmsg(2) system calls are analogous respectively to write(2) and read(2) except that the former allow both data and control parts to be (separately) passed, and they retain the message boundaries across the user-Stream interface. putmsg(2) and getmsg(2) separately copy the control part (M_PROTO or M_PCPROTO block) and data part (M_DATA blocks) between the Stream and user process.

An M_PCPROTO message is normally used to acknowledge primitives composed of other messages. M_PCPROTO ensures that the acknowledgment reaches the service user before any other message. If the service user is a user process, the Stream head will only store a single M_PCPROTO message, and discard subsequent M_PCPROTO messages until the first one is read with getmsg(2).

Figure 8-4 Protocol Substitution

Graphic

By defining a service interface through which applications interact with a transport protocol, you can substitute a different protocol below the service interface completely transparent to the application. In Figure 8-5, the same application can run over the Transmission Control Protocol (TCP) and the ISO transport protocol. Of course, the service interface must define a set of services common to both protocols.

The three components of any service interface are the service user, the service provider, and the service interface itself, as seen in Figure 8-5.

Figure 8-5 Service Interface

Graphic

Typically, an application makes requests of a service provider using some well-defined service primitive. Responses and event indications are also passed from the provider to the user using service primitives.

Each service interface primitive is a distinct STREAMS message that has two parts, control part and a data part. The control part contains information that identifies the primitive and includes all necessary parameters. The data part contains user data associated with that primitive.

An example of a service interface primitive is a transport protocol connect request. This primitive requests the transport protocol service provider to establish a connection with another transport user. The parameters associated with this primitive can include a destination protocol address and specific protocol options to be associated with that connection. Some transport protocols also allow a user to send data with the connect request. A STREAMS message would be used to define this primitive. The control part would identify the primitive as a connect request and would include the protocol address and options. The data part would contain the associated user data.

Service Interface Library Example

The service interface library example presented here includes four functions that let a user do the following:

Five primitives are defined. The first two represent requests from the service user to the service provider. These are:

BIND_REQ

This request asks the provider to bind a specified protocol address. It requires an acknowledgment from the provider to verify that the contents of the request were syntactically correct.

UNITDATA_REQ

This request asks the provider to send data to the specified destination address. It does not require an acknowledgment from the provider.

The three other primitives represent acknowledgments of requests, or indications of incoming events, and are passed from the service provider to the service user.

OK_ACK

This primitive informs the user that a previous bind request was received successfully by the service provider.

ERROR_ACK

This primitive informs the user that a nonfatal error was found in the previous bind request. It indicates that no action was taken with the primitive that caused the error.

UNITDATA_IND

This primitive indicates that data destined for the user has arrived.

The defined structures describe the contents of the control part of each service interface message passed between the service user and service provider. The first field of each control part defines the type of primitive being passed.

Module Service Interface Example

The following code is part of a module that illustrates the concept of a service interface. The module implements a simple service interface and mirrors the service interface library example. The following rules pertain to service interfaces.

Declarations

The service interface primitives are defined in the declarations:

#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
#include <sys/errno.h>

/* Primitives initiated by the service user */

#define BIND_REQ                      1    /* bind request */
#define UNITDATA_REQ                  2    /* unitdata request */

 /* Primitives initiated by the service provider */

#define OK_ACK                        3    /* bind acknowledgment */
#define ERROR_ACK                     4    /* error acknowledgment */
#define UNITDATA_IND                  5    /* unitdata indication */
/*
 * The following structures define the format of the
 * stream message block of the above primitives.

 */
struct bind_req {                       /* bind request */
   t_scalar_t    PRIM_type;             /* always BIND_REQ */
   t_uscalar_t   BIND_addr;             /* addr to bind	*/
};
struct unitdata_req {                   /* unitdata request */
   t_scalar_t    PRIM_type;             /* always UNITDATA_REQ */
   t_scalar_t    DEST_addr;             /* dest addr */
};
struct ok_ack {                         /* ok acknowledgment */
   t_scalar_t    PRIM_type;             /* always OK_ACK */
};
struct error_ack {                      /* error acknowledgment */
   t_scalar_t    PRIM_type;             /* always ERROR_ACK */
   t_scalar_t    UNIX_error;            /* UNIX system error code*/
};
struct unitdata_ind {                   /* unitdata indication */
   t_scalar_t    PRIM_type;             /* always UNITDATA_IND */
   t_scalar_t    SRC_addr;              /* source addr */
};

union primitives {								/* union of all primitives */
   long                      type;
   struct bind_req           bind_req;
   struct unitdata_req       unitdata_req;
   struct ok_ack             ok_ack;
   struct error_ack          error_ack;
   struct unitdata_ind       unitdata_ind;
};
struct dgproto {                        /* structure minor device */
   short state;                         /* current provider state */
   long addr;                           /* net address */
};

/* Provider states */
#define IDLE 0
#define BOUND 1

In general, the M_PROTO or M_PCPROTO block is described by a data structure containing the service interface information. In this example, union primitives is that structure.

The module recognizes two commands:

BIND_REQ

Give this Stream a protocol address (for example, give it a name on the network). After a BIND_REQ is completed, data from other senders will find their way through the network to this particular Stream.

UNITDATA_REQ

Send data to the specified address.

The module generates three messages:

OK_ACK

A positive acknowledgment (ack) of BIND_REQ.

ERROR_ACK

A negative acknowledgment (nak) of BIND_REQ.

UNITDATA_IND

Data from the network has been received.

The acknowledgment of a BIND_REQ informs the user that the request was syntactically correct (or incorrect if ERROR_ACK). The receipt of a BIND_REQ is acknowledged with an M_PCPROTO to ensure that the acknowledgment reaches the user before any other message. For example, a UNITDATA_IND comes through before the bind is completed, the application is confused.

The driver uses a per-minor device data structure, dgproto, which contains the following:

state

Current state of the service provider IDLE or BOUND

addr

Network address that has been bound to this Stream

It is assumed (though not shown) that the module open procedure sets the write queue q_ptr to point at the appropriate private data structure.

Service Interface Procedure

The write put procedure is:

static int protowput(queue_t *q, mblk_t *mp)
{
 	union primitives *proto;
 	struct dgproto *dgproto;
 	int err;
 	dgproto = (struct dgproto *) q->q_ptr;  /* priv data struct */
 	switch (mp->b_datap->db_type) {
 	default:
 			/* don't understand it */
 			mp->b_datap->db_type = M_ERROR;
 			mp->b_rptr = mp->b_wptr = mp->b_datap->db_base;
 			*mp->b_wptr++ = EPROTO;
 			qreply(q, mp);
 			break;
 	case M_FLUSH: /* standard flush handling goes here ... */
 			break;
 	case M_PROTO:
 			/* Protocol message -> user request */
 			proto = (union primitives *) mp->b_rptr;
 			switch (proto->type) {
 			default:
 				mp->b_datap->db_type = M_ERROR;
 				mp->b_rptr = mp->b_wptr = mp->b_datap->db_base;
 				*mp->b_wptr++ = EPROTO;
 				qreply(q, mp);
 				return;
 			case BIND_REQ:
 				if (dgproto->state != IDLE) {
 						err = EINVAL;
 						goto error_ack;
 				}
 				if (mp->b_wptr - mp->b_rptr !=
 				 sizeof(struct bind_req)) {
 						err = EINVAL;
 						goto error_ack;
 				}
 				if (err = chkaddr(proto->bind_req.BIND_addr))
 						goto error_ack;
 				dgproto->state = BOUND;
 				dgproto->addr = proto->bind_req.BIND_addr;
 				mp->b_datap->db_type = M_PCPROTO;
 				proto->type = OK_ACK;
 				mp->b_wptr=mp->b_rptr+sizeof(structok_ack);
 				qreply(q, mp);
 				break;
			error_ack:
 				mp->b_datap->db_type = M_PCPROTO;
 				proto->type = ERROR_ACK;
 				proto->error_ack.UNIX_error = err;
 				mp->b_wptr = mp->b_rptr+sizeof(structerror_ack);
 				qreply(q, mp);
 				break;
 			case UNITDATA_REQ:
 				if (dgproto->state != BOUND)
 						goto bad;
 				if (mp->b_wptr - mp->b_rptr !=
 					 sizeof(struct unitdata_req))
 						goto bad;
 				if(err=chkaddr(proto->unitdata_req.DEST_addr))
 						goto bad;
 				putq(q, mp);
 				/* start device or mux output ... */
 				break;
 			bad:
 				freemsg(mp);
 				break;
 			}
	 }
return(0);
}

The write put procedure switches on the message type. The only types accepted are M_FLUSH and M_PROTO. For M_FLUSH messages, the driver performs the canonical flush handling (not shown). For M_PROTO messages, the driver assumes the message block contains a union primitive and switches on the type field. Two types are understood: BIND_REQ and UNITDATA_REQ.

For a BIND_REQ, the current state is checked; it must be IDLE. Next, the message size is checked. If it is the correct size, the passed-in address is verified for legality by calling chkaddr. If everything checks, the incoming message is converted into an OK_ACK and sent upstream. If there was any error, the incoming message is converted into an ERROR_ACK and sent upstream.

For UNITDATA_REQ, the state is also checked; it must be BOUND. As above, the message size and destination address are checked. If there is any error, the message is discarded. If all is well, the message is put in the queue, and the lower half of the driver is started.

If the write put procedure receives a message type that it does not understand, either a bad b_datap->db_type or bad proto->type, the message is converted into an M_ERROR message and is then sent upstream.

The generation of UNITDATA_IND messages (not shown in the example) would normally occur in the device interrupt if this is a hardware driver or in the lower read put procedure if this is a multiplexer. The algorithm is simple: the data part of the message is prefixed by an M_PROTO message block that contains a unitdata_ind structure and sent upstream.

Message Type Change Rules

Well-known ioctl Interfaces

Many ioctl operations are common to a class of STREAMS drivers or STREAMS modules. Modules that deal with terminals usually implement a subset of the termio(7I) ioctls. Similarly, drivers that deal with audio devices usually implement a subset of the audio(7I) interfaces.

Because no data structures have changed size as a result of the LP64 data model for either termio(7I) or audio(7I), you do no need to use any of the structure macros to decode any of these ioctls.

FIORDCHK

The FIORDCHK ioctl returns a count (in bytes) of the number of bytes to be read as the return value. Although FIORDCHK should be able to return more than MAXINT bytes, it is constrained to returning an int by the type of the ioctl(2) function.

FIONREAD

The FIONREAD ioctl returns the number of data byte (in all data messages queued) in the location pointed to by the arg parameter. The ioctl returns a 32-bit quantity for both 32-bit and 64-bit application., Therefore, code that passes the address of a long variable needs to be changed to pass an int variable for 64-bit applications.

I_NREAD

The I_NREAD ioctl (streamio(7I)) is an informational ioctl which counts the data bytes as well as the number of messages in the stream head read queue. The number of bytes in the stream head read queue is returned in the location pointed to by the arg parameter of the ioctl. The number of messages in the stream head read queue is returned as the return value of the ioctl.

Like FIONREAD, the arg parameter to the I_NREAD ioctl should be a pointer to an int, not a long. And, like FIORDCHK, the return value is constrained to be less than or equal to MAXINT bytes, even if more data is available.

Signals

STREAMS modules and drivers send signals to application processes through a special signal message. If the signal specified by the module or driver is not SIGPOLL (see signal(5)), the signal is delivered to the process group associated with the Stream. If the signal is SIGPOLL, the signal is only sent to processes that have registered for the signal by using the I_SETSIG ioctl(2).

Modules or drivers use an M_SIG message to insert an explicit in-band signal into a message Stream. For example, a message can be sent to the application process immediately before a particular service interface message. When the M_SIG message reaches the head of the Stream read queue, a signal is generated and the M_SIG message is removed. The service interface message is the next message to be processed by the user. (The M_SIG message is usually defined as part of the service interface of the driver or module.)

Chapter 9 STREAMS Drivers

This chapter describes the operation of STREAMS drivers and some of the processing typically required in the drivers.

STREAMS Device Drivers

STREAMS drivers can be considered a subset of device drivers in general and character device drivers in particular. While there are some differences between STREAMS drivers and non-STREAMS drivers, much of the information contained in Writing Device Drivers also applies to STREAMS drivers. For more information on global driver issues and non-STREAMS drivers, see Writing Device Drivers.


Note -

The word module is used differently when talking about drivers. A device driver is a kernel-loadable module that provides the interface between a device and the Device Driver Interface, and is linked to the kernel when it is first invoked.


STREAMS drivers share a basic programming model with STREAMS modules; information common to both drivers and modules is discussed in Chapter 10, Modules. After summarizing some basic device driver concepts, this chapter discusses several topics specific to STREAMS device drivers (and not covered elsewhere) and then presents code samples illustrating basic STREAMS driver processing.

Basic Driver

A device driver is a loadable kernel module that translates between an I/O device and the kernel to operate the device.

Device drivers can also be software-only, implementing a pseudo-device such as RAM disk or a pseudo-terminal that only exists in software.

In the Solaris system, the interface between the kernel and device drivers is called the Device Driver Interface (DDI/DKI). This interface is specified in the Section 9E manual pages that specify the driver entry points. Section 9 also details the kernel data structures (9S) and utility functions (9F) available to drivers. Drivers that adhere to the specified interfaces are forward compatible with future releases of the Solaris system.

The DDI protects the kernel from device specifics. Application programs and the rest of the kernel need little (if any) device-specific code to use the device. The DDI makes the system more portable and easier to maintain.

There are three basic types of device drivers corresponding to the three basic types of devices. Character devices handle data serially and transfer data to and from the processor one character at a time, the same as keyboards and low performance printers. Serial block devices and drivers also handle data serially, but transfer data to and from memory without processor intervention, the same as tape drives. Direct access block devices and drivers also transfer data without processor intervention and blocks of storage on the device can be addressed directly, the same as disk drives.

There are two types of character device drivers: standard character device drivers and STREAMS device drivers. STREAMS is a separate programming model for writing a character driver. Devices that receive data asynchronously (such as terminal and network devices) are suited to a STREAMS implementation.

STREAMS drivers share some kinds of processing with STREAMS modules. Important differences between drivers and modules include how the application manipulates drivers and modules and how interrupts are handled. In STREAMS, drivers are opened and modules are pushed. A device driver has an interrupt routine to process hardware interrupts.

STREAMS Driver Topics

STREAMS drivers have five different points of contact with the kernel:

Configuration (kernel dynamic loading) entry points

These are the routines that allow the kernel to find the driver binary in the file system and load it into, or unload it from, the running kernel. The entry points include _init(9E), _info(9E), and _fini(9E).

Initialization entry points

The entry points allow the driver to determine a device's presence and initialize its state. These routines are accessed through the dev_ops(9S) data structure during system initialization. They include getinfo(9E), identify(9E), probe(9E), attach(9E), and detach(9E).

Table-driven entry points

The table-driven entry points are accessed through cb_ops(9S), the character and block access table, when an application calls the appropriate interface. The members of the cb_ops(9S) structure include pointers to entry points that perform the device's functions, such as read(9E), write(9E), ioctl(9E). The cb_ops(9S) table contains a pointer to the streamtab(9S)structure.

STREAMS queue processing entry points

These entry points are contained in the streamtab; they read and process the STREAMS messages that travel through the queue structures. Examples of STREAMS queue processing entry points are put(9E) and srv(9E).

Interrupt routines

A driver's interrupt routine handles the interrupts from the device (or software interrupts). It is added to the kernel by ddi_add_intr(9F) when the kernel configuration software calls attach(9E).

These points of contact are discussed in the following sections.

STREAMS Configuration Entry Points

As with other SunOS 5 drivers, STREAMS drivers are dynamically linked and loaded when referred to for the first time. For example, when the system is initially booted, the STREAMS pseudo-tty slave, pseudo-driver (pts(7D)) is loaded automatically into the kernel when it is first opened.

In STREAMS, the header declarations differ between drivers and modules. (The word module is used in two different ways when talking about drivers. There are STREAMS modules, which are pushable nondriver entities, and there are kernel-loadable modules, which are components of the kernel.). See the appropriate chapters in Writing Device Drivers.

The kernel configuration mechanism must distinguish between STREAMS devices and traditional character devices because system calls to STREAMS drivers are processed by STREAMS routines, not by the system driver routines. The streamtab pointer in the cb_ops(9S) structure provides this distinction. If it is NULL, there are no STREAMS routines to execute; otherwise, STREAMS drivers initialize streamtab with a pointer to a streamtab(9S) structure containing the driver's STREAMS queue processing entry points.

STREAMS Initialization Entry Points

STREAMS drivers' initialization entry points must perform the same tasks as those of non-STREAMS drivers. See Writing Device Drivers for more information.

STREAMS Table-Driven Entry Points

In non-STREAMS drivers, most of the driver's work is accomplished through the entry points in the cb_ops(9S) structure. For STREAMS drivers, most of the work is accomplished through the message-based STREAMS queue processing entry points.

Figure 9-1 shows multiple Streams (corresponding to minor devices) connecting to a common driver. There are two distinct Streams opened from the same major device. Consequently, they have the same streamtab and the same driver procedures.

Figure 9-1 Device Driver Streams

Graphic

Multiple instances (minor devices) of the same driver are handled during the initial open for each device. Typically, a driver stores the queue address in a driver-private structure "uniquely identified" by the minor device number. (The DDI/DKI provides a mechanism for uniform handling of driver-private structures; see ddi_soft_state(9F)). The q_ptr of the queue points to the private data structure entry. When the messages are received by the queue, the calls to the driver put and service procedures pass the address of the queue, allowing the procedures to determine the associated device through the q_ptr field.

STREAMS guarantees that only one open or close can be active at a time per major/minor device pair.

STREAMS Queue Processing Entry Points

STREAMS device drivers have processing routines that are registered with the framework through the streamtab structure. The put

procedure is a driver's entry point, but it is a message (not system) interface. STREAMS drivers and STREAMS modules implement these entry points similarly, as described in "Entry Points".

The Stream head translates write(2) and ioctl(2) calls into messages and sends them downstream to be processed by the driver's write queue put(9E) procedure. read is seen directly only by the Stream head, which contains the functions required to process system calls. A STREAMS driver does not check system interfaces other than open and close, but it can detect the absence of a read indirectly if flow control propagates from the Stream head to the driver and affects the driver's ability to send messages upstream.

For read-side processing, when the driver is ready to send data or other information to a user process, it prepares a message and sends it upstream to the read queue of the appropriate (minor device) Stream. The driver's open routine generally stores the queue address corresponding to this Stream.

For write-side (or output) processing, the driver receives messages in place of a write call. If the message cannot be sent immediately to the hardware, it may be stored on the driver's write message queue. Subsequent output interrupts can remove messages from this queue.

A driver is at the end of a Stream. As a result, drivers must include standard processing for certain message types that a module might be able to pass to the next component. For example, a driver must process all M_IOCTL messages; otherwise, the Stream head blocks for an M_IOCNAK, M_IOCACK, or until the timeout (potentially infinite) expires. If a driver does not understand an ioctl(2), an M_IOCNAK message is sent upstream.

Messages that are not understood by the drivers should be freed.

The Stream head locks up the Stream when it receives an M_ERROR message, so driver developers should be careful when using the M_ERROR message.

STREAMS Interrupt Handlers

Most hardware drivers have an interrupt handler routine. You must supply an interrupt routine for the device's driver. The interrupt handling for STREAMS drivers is not fundamentally different from that for other device drivers. Drivers usually register interrupt handlers in their attach(9E)entry point, using ddi_add_intr(9F). Drivers unregister the interrupt handler at detach time using ddi_remove_intr(9F).

The system also supports software interrupts. The routines ddi_add_softintr(9F) and ddi_remove_softintr(9F) register and unregister (respectively) soft-interrupt handlers. A software interrupt is generated by calling ddi_trigger_softintr(9F).

See Writing Device Drivers for more information.

Driver Unloading

STREAMS drivers can prevent unloading through the standard driver detach(9E) entry point.

STREAMS Driver Code Samples

The following code samples illustrate three STREAMS driver topics:

Basic hardware/pseudo drivers

This type of driver communicates with a specific piece of hardware (or simulated hardware). The lp example simulates a simple printer driver.

Clonable drivers

The STREAMS framework supports a CLONEOPEN facility, which allows multiple Streams to be opened from a single special file. If a STREAMS device driver chooses to support CLONEOPEN, it can be referred to as a clonable device. The attach(9E) routines from two Solaris drivers, ptm(7D) and log(7D), illustrate two approaches to cloning.

Multiple instances in drivers

A multiplexer driver is a regular STREAMS driver that can handle multiple Streams connected to it instead of just one Stream. Multiple connections occur when more than one minor device of the same driver is in use. See "Cloning" for more information.

Printer Driver Example

The first example is a simple interrupt-per-character line printer driver. The driver is unidirectional--it has no read-side processing. It demonstrates some differences between module and driver programming, including the following:

Most of the STREAMS processing in the driver is independent of the actual printer hardware; in this example, actual interaction with the printer is limited to the lpoutchar function, which prints one character at a time. For purposes of demonstration, the "printer hardware" is actually the system console, accessed through cmn_err(9F). Since there's no actual hardware to generate a genuine hardware interrupt, lpoutchar simulates interrupts using ddi_trigger_softintr(9F). For a real printer, the lpoutchar function is rewritten to send a character to the printer, which should generate a hardware interrupt.

The driver declarations follow. After specifying header files (include <sys/ddi.h> and <sys/sunddi.h> as the last two header files), the driver declares a per-printer structure, struct lp. This structure contains members that enable the driver to keep track of each instance of the driver, such as flags (what the driver is doing), msg (the current STREAMS print message), qptr (pointer to the Stream's write queue), dip (the instance's device information handle), iblock_cookie (for registering an interrupt handler), siid (the handle of the soft interrupt), and lp_lock (a mutex to protect the data structure from multithreaded race conditions). The driver next defines the bits for the flags member of struct lp; the driver defines only one flag, BUSY.

Following function prototypes, the driver provides some standard STREAMS declarations: a module_info(9S) structure (minfo), a qinit(9S) structure for the read side (rinit) that is initialized by the driver's open and close entry points, a qinit(9S) structure for the write side (winit) that is initialized by the write put procedure, and a streamtab(9S) that points to rinit and winit. The values in the module name and ID fields in the module_info(9S) structure must be unique in the system. Because the driver is unidirectional, there is no read side put or service procedure. The flow control limits for use on the write side are 50 bytes for the low-watermark and 150 bytes for the high-watermark.

The driver next declares lp_state. This is an anchor on which the various DDK - provided "soft-state" functions operate. The ddi_soft_state(9F) manual page describes how to maintain multiple instances of a driver.

The driver next declares acb_ops(9S) structure, which is required in all device drivers. In non-STREAMS device drivers, cb_ops(9S) contains vectors to the table-driven entry points. For STREAMS drivers, however, cb_ops(9S) contains mostly nodev entries. The cb_stream field, however, is initialized with a pointer to the driver's streamtab(9S) structure. This indicates to the kernel that this driver is a STREAMS driver.

Next, the driver declares a dev_ops(9S) structure, which points to the various initialization entry points as well as to the cb_ops(9S) structure. Finally, the driver declares a struct moldrv and a struct modlinkage for use by the kernel linker when the driver is dynamically loaded. Struct moldrv contains a pointer to mod_driverops (a significant difference between a STREAMS driver and a STREAMS module--a STREAMS module would contain a pointer to mod_strops instead).


Example 9-1 Simple line printer driver

#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
#include <sys/stropts.h>
#include <sys/signal.h>
#include <sys/errno.h>
#include <sys/cred.h>
#include <sys/stat.h>
#include <sys/modctl.h>
#include <sys/conf.h>
#include <sys/ddi.h>
#include <sys/sunddi.h>

/* This is a private data structure, one per minor device number */

struct lp {
	short flags; /* flags -- see below */
	mblk_t *msg; /* current message being output */
	queue_t *qptr; /* back pointer to write queue */
	dev_info_t *dip; /* devinfo handle */
	ddi_iblock_cookie_t iblock_cookie;
	ddi_softintr_t siid;
	kmutex_t lp_lock; /* sync lock */
};

/* flags bits */

#define BUSY 1 /* dev is running, int is forthcoming */

/*
 * Function prototypes.
 */
static int lpattach(dev_info_t *, ddi_attach_cmd_t);
static int lpdetach(dev_info_t *, ddi_detach_cmd_t);
static int lpgetinfo(dev_info_t *, ddi_info_cmd_t, void *, void **);
static int lpidentify(dev_info_t *);
static uint lpintr(caddr_t lp);
static void lpout(struct lp *lp);
static void lpoutchar(struct lp *lp, char c);
static int lpopen(queue_t*, dev_t*, int, int, cred_t*);
static int lpclose(queue_t*, int, cred_t*);
static int lpwput(queue_t*, mblk_t*);

/* Standard Streams declarations */

static struct module_info minfo = {
	0xaabb,
	"lp",
	0,
	INFPSZ,
	150,
	50
};

static struct qinit rinit = {
	(int (*)()) NULL,
	(int (*)()) NULL,
	lpopen,
	lpclose,
	(int (*)()) NULL,
	&minfo,
	NULL
};

static struct qinit winit = {
	lpwput,
	(int (*)()) NULL,
	(int (*)()) NULL,
	(int (*)()) NULL,
	(int (*)()) NULL,
	&minfo,
	NULL
};

static struct streamtab lpstrinfo = { &rinit, &winit, NULL, NULL };

/*
 * An opaque handle where our lp lives
 */
static void *lp_state;

/* Module Loading/Unloading and Autoconfiguration declarations */

static struct cb_ops lp_cb_ops = {
	nodev, /* cb_open */
	nodev, /* cb_close */
	nodev, /* cb_strategy */
	nodev, /* cb_print */
	nodev, /* cb_dump */
	nodev, /* cb_read */
	nodev, /* cb_write */
	nodev, /* cb_ioctl */
	nodev, /* cb_devmap */
	nodev, /* cb_mmap */
	nodev, /* cb_segmap */
	nochpoll, /* cb_chpoll */
	ddi_prop_op, /* cb_prop_op */
	&lpstrinfo, /* cb_stream */
	D_MP | D_NEW, /* cb_flag */
};

static struct dev_ops lp_ops = {
	DEVO_REV, /* devo_rev */
	0, /* devo_refcnt */
	lpgetinfo, /* devo_getinfo */
	lpidentify, /* devo_identify */
	nulldev, /* devo_probe */
	lpattach, /* devo_attach */
	lpdetach, /* devo_detach */
	nodev, /* devo_reset */
	&lp_cb_ops, /* devo_cb_ops */
	(struct bus_ops *)NULL /* devo_bus_ops */
};

/*
 * Module linkage information for the kernel.
 */
static struct modldrv modldrv = {
	&mod_driverops,
	"Simple Sample Printer Streams Driver", /* Description */
	&lp_ops, /* driver ops */
};

static struct modlinkage modlinkage = {
	MODREV_1, &modldrv, NULL
};

Example 9-2 shows the required driver configuration entry points _init(9E),, _fini(9E), and _info(9E). In addition to installing the driver using mod_install(9F), the _init entry point also initializes the per-instance driver structure using ddi_soft_state_init(9F). _fini(9E) performs the complementary calls to mod_remove(9F) and ddi_soft_state_fini(9F) to unload the driver and release the resources used by the soft-state routines.


Example 9-2

int
_init(void)
{
	int e;

	if ((e = ddi_soft_state_init(&lp_state,
	 sizeof (struct lp), 1)) != 0) {
		return (e);
	}

	if ((e = mod_install(&modlinkage)) != 0) {
		ddi_soft_state_fini(&lp_state);
	}

	return (e);
}

int
_fini(void)
{
	int e;

	if ((e = mod_remove(&modlinkage)) != 0) {
		return (e);
	}
	ddi_soft_state_fini(&lp_state);
	return (e);
}

int
_info(struct modinfo *modinfop)
{
	return (mod_info(&modlinkage, modinfop));
}

Example 9-3 shows the lp driver's implementation of the initialization entry points. In lpidentify, the driver simply ensures that the name of the device being attached is "lp".

lpattach first uses ddi_soft_state_zalloc(9F) to allocate a per-instance structure for the printer being attached. Next it creates a node in the device tree for the printer using ddi_create_minor_node(9F); user programs use the node to access the device. lpattach then registers the driver interrupt handler; because the sample is driver pseudo-hardware, the driver uses soft interrupts. A driver for a real printer would use ddi_add_intr(9F) instead of ddi_add_softintr(9F). A driver for a real printer would also need to perform any other required hardware initialization in lpattach. Finally, lpattach initializes the per-instance mutex.

In lpdetach, the driver undoes everything it did in lpattach.

lpgetinfo uses the soft-state structures to obtain the required information.


Example 9-3

static int
lpidentify(dev_info_t *dip)
{
	if (strcmp(ddi_get_name(dip), "lp") == 0) {
		return (DDI_IDENTIFIED);
	} else
		return (DDI_NOT_IDENTIFIED);
}

static int
lpattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
	int instance;
	struct lp *lpp;

	switch (cmd) {

	case DDI_ATTACH:

		instance = ddi_get_instance(dip);

		if (ddi_soft_state_zalloc(lp_state, instance) != DDI_SUCCESS) {
			cmn_err(CE_CONT, "%s%d: can't allocate state\n",
			 ddi_get_name(dip), instance);
			return (DDI_FAILURE);
		} else
			lpp = ddi_get_soft_state(lp_state, instance);

		if (ddi_create_minor_node(dip, "strlp", S_IFCHR,
		 instance, NULL, 0) == DDI_FAILURE) {
			ddi_remove_minor_node(dip, NULL);
			goto attach_failed;

		}

		lpp->dip = dip;
		ddi_set_driver_private(dip, (caddr_t)lpp);

		/* add (soft) interrupt */


		if (ddi_add_softintr(dip, DDI_SOFTINT_LOW, &lpp->siid,
		 &lpp->iblock_cookie, 0, lpintr, (caddr_t)lpp)
		 != DDI_SUCCESS) {
			ddi_remove_minor_node(dip, NULL);
			goto attach_failed;
		}

		mutex_init(&lpp->lp_lock, "lp lock", MUTEX_DRIVER,
		 (void *)lpp->iblock_cookie);


		ddi_report_dev(dip);
		return (DDI_SUCCESS);

	default:
		return (DDI_FAILURE);
	}

attach_failed:
	/*
	 * Use our own detach routine to toss
	 * away any stuff we allocated above.
	 */
	(void) lpdetach(dip, DDI_DETACH);
	return (DDI_FAILURE);
}

static int
lpdetach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
	int instance;
	struct lp *lpp;

	switch (cmd) {

	case DDI_DETACH:
		/*
		 * Undo what we did in lpattach, freeing resources
		 * and removing things we installed. The system
		 * framework guarantees we are not active with this devinfo
		 * node in any other entry points at this time.
		 */
		ddi_prop_remove_all(dip);
		instance = ddi_get_instance(dip);
		lpp = ddi_get_soft_state(lp_state, instance);
		ddi_remove_minor_node(dip, NULL);
		ddi_remove_softintr(lpp->siid);
		ddi_soft_state_free(lp_state, instance);
		return (DDI_SUCCESS);

	default:
		return (DDI_FAILURE);
	}
}

/*ARGSUSED*/
static int
lpgetinfo(dev_info_t *dip, ddi_info_cmd_t infocmd, void *arg,
          void **result)
{
	struct lp *lpp;
	int error = DDI_FAILURE;

	switch (infocmd) {
	case DDI_INFO_DEVT2DEVINFO:
		if ((lpp = ddi_get_soft_state(lp_state,
		 getminor((dev_t)arg))) != NULL) {
			*result = lpp->dip;
			error = DDI_SUCCESS;
		} else
			*result = NULL;
		break;

	case DDI_INFO_DEVT2INSTANCE:
		*result = (void *)getminor((dev_t)arg);
		error = DDI_SUCCESS;
		break;

	default:
		break;
	}

	return (error);
}

The STREAMS mechanism allows only one Stream per minor device. The driver open routine is called whenever a STREAMS device is opened. open matches the correct private data structure with the Stream using ddi_get_soft_state(9F). The driver open, lpopen in Example 9-4, has the same interface as the module open.

The Stream flag, sflag, must have the value 0, indicating a normal driver open. devp pointers to the major/minor device number for the port. After checking sflag, lpopen uses devp to find the correct soft-state structure.

The next check, if (q->q_ptr)..., determines if the printer is already open. q_ptr is a driver or module private data pointer. It can be used by the driver for any purpose and is initialized to zero by STREAMS before the first open. In this example, the driver sets the value of q_ptr, in both the read and write queue structures, to point to the device's per-instance data structure. If the pointer is non-NULL, it means the printer is already open, so lpopen returns EBUSY to avoid merging printouts from multiple users.

The driver close routine is called by the Stream head. Any messages left in the queue are automatically removed by STREAMS. The Stream is dismantled and data structures are released.


Example 9-4

/*ARGSUSED*/
static int
lpopen(
	queue_t *q,	/* read queue */
	dev_t *devp,
	int flag,
	int sflag,
	cred_t *credp)

{

	struct lp *lp;

	if (sflag)	/* driver refuses to do module or clone open */
			return (ENXIO);

	if ((lp = ddi_get_soft_state(lp_state, getminor(*devp))) == NULL)
			return (ENXIO);

	/* Check if open already. Can't have multiple opens */

	if (q->q_ptr) {
			return (EBUSY);
	}

	lp->qptr = WR(q);
	q->q_ptr = (char *) lp;
	WR(q)->q_ptr = (char *) lp;
	qprocson(q);
	return (0);
}

/*ARGSUSED*/
static int
lpclose(
	queue_t *q,		/* read queue */
	int flag,
	cred_t *credp)
{

	struct lp *lp;

	qprocsoff(q);
	lp = (struct lp *) q->q_ptr;

	/*
	 * Free message, queue is automatically
	 * flushed by STREAMS
	 */
	mutex_enter(&lp->lp_lock);

	if (lp->msg) {
			freemsg(lp->msg);
			lp->msg = NULL;
	}

	lp->flags = 0;
	mutex_exit(&lp->lp_lock);

	return (0);
}

There are no physical pointers between the read and write queue of a pair. WR(9F) is a queue pointer function. WR(9F) generates the write pointer from the read pointer. RD(9F) and OTHERQ(9F) are additional queue pointer functions. RD(9F) generates the read pointer from the write pointer, and OTHERQ(9F) generates the mate pointer from either.

Driver Flush Handling

The write put procedure in Example 9-5, lpwput, illustrates driver M_FLUSH handling. Note that all drivers are expected to incorporate flush handling.

If FLUSHW is set, the write message queue is flushed, and (in this example) the leading message (lp->msg) is also flushed. lp_lock protects the driver's per-instance data structure.

In most drivers, if FLUSHR is set, the read queue is flushed. However, in this example, no messages are ever placed on the read queue, so it is not necessary to flush it. The FLUSHW bit is cleared and the message is sent upstream using qreply(9F). If FLUSHR is not set, the message is discarded.

The Stream head always performs the following actions on flush requests received on the read side from downstream. If FLUSHR is set, messages waiting to be sent to user space are flushed. If FLUSHW is set, the Stream head clears the FLUSHR bit and sends the M_FLUSH message downstream. In this manner, a single M_FLUSH message sent from the driver can reach all queues in a Stream. A module must send two M_FLUSH messages to have the same effect.

lpwput queues M_DATA and M_IOCTL messages and, if the device is not busy, starts output by calling lpout. Message types that are not recognized are discarded (in the default case of the switch).


Example 9-5

static int lpwput(
	queue_t *q,	/* write queue */
	mblk_t *mp)	/* message pointer */
{
	struct lp *lp;

	lp = (struct lp *)q->q_ptr;

	switch (mp->b_datap->db_type) {
	default:
		freemsg(mp);
		break;

	case M_FLUSH: /* Canonical flush handling */
		if (*mp->b_rptr & FLUSHW) {
			flushq(q, FLUSHDATA);
			mutex_enter(&lp->lp_lock); /* lock any access to lp */

			if (lp->msg) {
				freemsg(lp->msg);
				lp->msg = NULL;
			}

			mutex_exit(&lp->lp_lock);

		}

		if (*mp->b_rptr & FLUSHR) {
			*mp->b_rptr &= ~FLUSHW;
			qreply(q, mp);
		} else
			freemsg(mp);

		break;

	case M_IOCTL:
	case M_DATA:
		(void) putq(q, mp);
		mutex_enter(&lp->lp_lock);

		if (!(lp->flags & BUSY))
			lpout(lp);

		mutex_exit(&lp->lp_lock);

	}
	return (0);
}

Driver Interrupt

Example 9-6 shows the interrupt handling for the printer driver.

lpintr is the driver-interrupt handler registered by the attach routine.

lpout takes a single character from the queue and sends it to the printer. For convenience, the message currently being output is stored in lp->msg in the per-instance structure. It is assumed that this is called with the mutex held.

lpoutchar sends a single character to the printer (in this case the system console using cmn_err(9F)) and interrupts when complete. Of course, hardware would generate a hard interrupt, so the call to ddi_trigger_softintr(9F) would be unnecessary.


Example 9-6

/* Device interrupt routine */static uint
lpintr(caddr_t lp)	 /* minor device number of lp */
{

	struct lp *lpp = (struct lp *)lp;
	
	mutex_enter(&lpp->lp_lock);

	if (!(lpp->flags & BUSY)) {
			mutex_exit(&lpp->lp_lock);
			return (DDI_INTR_UNCLAIMED);
	}

	lpp->flags &= ~BUSY;
	lpout(lpp);
	mutex_exit(&lpp->lp_lock);

	return (DDI_INTR_CLAIMED);
}

/* Start output to device - used by put procedure and driver */

static void
lpout(
	struct lp *lp)
{

	mblk_t *bp;
	queue_t *q;

	q = lp->qptr;

 loop:
	if ((bp = lp->msg) == NULL) { /*no current message*/
			if ((bp = getq(q)) == NULL) {
				lp->flags &= ~BUSY;	
				return;
			}
			if (bp->b_datap->db_type == M_IOCTL) {
				/* lpdoioctl(lp, bp); */
				goto loop;
			}

			lp->msg = bp; /* new message */

		}

	if (bp->b_rptr >= bp->b_wptr) { /* validate message */

			bp = lp->msg->b_cont;
			lp->msg->b_cont = NULL;
			freeb(lp->msg);
			lp->msg = bp;
			goto loop;
	}

	lpoutchar(lp, *bp->b_rptr++); /*output one character*/
	lp->flags |= BUSY;
}

static void
lpoutchar(
	struct lp *lp,
	char c)
{
	cmn_err(CE_CONT, "%c", c);
	ddi_trigger_softintr(lp->siid);
}

Driver Flow Control

The same utilities (described in Chapter 10, Modules) and mechanisms used for module flow control are used by drivers.

When the message is queued, putq(9F) increments the value of q_count by the size of the message and compares the result to the driver's write high-watermark (q_hiwat) value. If the count reaches q_hiwat, putq(9F) sets the internal FULL indicator for the driver write queue. This causes messages from upstream to be halted (canputnext(9F) returns FALSE) until the write queue count drops below q_lowat. The driver messages waiting to be output through lpout are dequeued by the driver output interrupt routine with getq(9F), which decrements the count. If the resulting count is below q_lowat, getq(9F) back-enables any upstream queue that had been blocked.

For priority band data, qb_count, qb_hiwat, and qb_lowat are used.

STREAMS allows flow control to be used on the driver read side to handle temporary upstream blocks.

To some extent, a driver or a module can control when its upstream transmission becomes blocked. Control is available through the M_SETOPTS message (see Appendix A, Message Types) to modify the Stream head read-side flow control limits.

Cloning

In previous examples, each user process connects a Stream to a driver by explicitly opening a particular minor device of the driver. Each minor device has its own node in the device tree file system. Often, there is a need for a user process to connect a new Stream to a driver regardless of which minor device is used to access the driver. In the past, this forced the user process to poll the various minor device nodes of the driver for an available minor device. To eliminate polling, STREAMS drivers can be made clonable. If a STREAMS driver is implemented as a clonable device, a single node in the file system can be opened to access any unused device that the driver controls. This special node guarantees that each user is allocated a separate Stream to the driver for each open call. Each Stream is associated with an unused minor device, so the total number of Streams that may be connected to a particular clonable driver is limited only by the number of minor devices configured for that driver.

The clone model is useful, for example, in a networking environment where a protocol pseudo-device driver requires each user to open a separate Stream over which it establishes communication. (The decision to implement a STREAMS driver as a clonable device is made by the designers of the device driver. Knowledge of the clone driver implementation is not required to use it. )

There are two ways to open as a clone device. The first is to use the STREAMS framework-provided clone device, which arranges to open the device with the CLONEOPEN flag passed in. This method is demonstrated in Example 9-7, which shows the attach and open routines for the pseudo-terminal master ptm(7D) driver. The second way is to have the driver open itself as a clone device, without intervention from the system clone device. This method is demonstrated in the attach and open routines for the log(7D) device in Example 9-8.

The ptm(7D) device, which uses the system-provided clone device, sets up two nodes in the device file system. One has a major number of 23 (ptm's assigned major number) and a minor number of 0. The other node has a major number of 11 (the clone device's assigned major number) and a minor number of 23 (ptm's assigned major number). The driver's attach routine (see Example 9-7) calls to ddi_create_minor_node(9F) twice. First to set up the "normal" node (major number 23); second to specify CLONE_DEV as the last parameter, making the system create the node with major 11.


crw-rw-rw-   1 sys       11, 23 Mar  6 02:05 clone:ptmx
crw-------   1 sys       23,  0 Mar  6 02:05 ptm:ptmajor

When the special file /devices/pseudo/clone@0:ptmx is opened, the clone driver code in the kernel (accessed by major 11) passes the CLONEOPEN flag in the sflag parameter to the ptm(7D) open routine. ptm's open routine checks sflag to make sure it is being called by the clone driver. The open routine next attempts to find an unused minor device for the open by searching its table of minor devices (PT_ENTER_WRITE and PT_EXIT_WRITE are driver-defined macros for entering and exiting the driver's mutex). If it succeeds (and following other open processing), the open routine constructs a new dev_t with the new minor number, which it passes back to its caller in the devp parameter. (The new minor number is available to the user program that opened the clonable device through an fstat(2) call.)


Example 9-7

static int
ptm_attach(dev_info_t *devi, ddi_attach_cmd_t cmd)
{
		if (cmd != DDI_ATTACH)
			return (DDI_FAILURE);

		if (ddi_create_minor_node(devi, "ptmajor", S_IFCHR, 0, NULL, 0) 
				== DDI_FAILURE) {
			ddi_remove_minor_node(devi, NULL);
			return (DDI_FAILURE);
		}
		if (ddi_create_minor_node(devi, "ptmx", S_IFCHR, 0, NULL, CLONE_DEV) 
				== DDI_FAILURE) {
			ddi_remove_minor_node(devi, NULL);
			return (DDI_FAILURE);
		}
		ptm_dip = devi;
		return (DDI_SUCCESS);
}

static int
ptmopen(
		queue_t	*rqp,			/* pointer to the read side queue */
		dev_t		*devp,		/* pointer to stream tail's dev */
		int		oflag,		/* the user open(2) supplied flags */
		int		sflag,		/* open state flag */
		cred_t 	*credp)		/* credentials */
{
		struct pt_ttys	*ptmp;
		mblk_t		*mop;	/* ptr to a setopts message block */
		minor_t	dev;

		if (sflag != CLONEOPEN) {
			return (EINVAL);
		}

		for (dev = 0; dev < pt_cnt; dev++) {
			ptmp = &ptms_tty[dev];
			PT_ENTER_WRITE(ptmp);
			if (ptmp->pt_state & (PTMOPEN | PTSOPEN | PTLOCK)) {
				PT_EXIT_WRITE(ptmp);
			} else
				break;
		}

		if (dev >= pt_cnt) {
			return (ENODEV);
		}

		... <other open processing> ...

		/*
		 * The input, devp, is a major device number, the output is put into
		 * into the same parm as a major,minor pair.
		 */
		*devp = makedevice(getmajor(*devp), dev);
		return (0);
}

The log(7D) driver uses the second method; it clones itself without intervention from the system clone device. The log(7D) driver's attach routine (in Example 9-8) is similar to the one in ptm(7D). It creates two nodes using ddi_create_minor_node(9F), but neither specifies CLONE_DEV as the last parameter. Instead, one of the devices has minor 0, the other minor CLONEMIN. These two devices provide log(7D) two interfaces: the first write-only, the second read-write (see the man page log(7D) for more information). Users open one node or the other. If they open the CONSWMIN (clonable, read-write) node, the open routine checks its table of minor devices for an unused device. If it is successful, it (like the ptm(7D) open routine) returns the new dev_t to its caller in devp


Example 9-8

static int
log_attach(dev_info_t *devi, ddi_attach_cmd_t cmd)
{
		if (ddi_create_minor_node(devi, "conslog", S_IFCHR, 0, NULL, NULL)
				 == DDI_FAILURE ||
				 ddi_create_minor_node(devi, "log", S_IFCHR, 5, NULL, NULL) 
				 == DDI_FAILURE) {
			ddi_remove_minor_node(devi, NULL);
			return (-1);
		}
		log_dip = devi;
		return (DDI_SUCCESS);
}

static int
logopen(
		queue_t *q,
		dev_t *devp,
		int flag,
		int sflag,
		cred_t *cr)
{
		int i;
		struct log *lp;

		/*
		 * A MODOPEN is invalid and so is a CLONEOPEN.
		 * This is because a clone open comes in as a CLONEMIN device open!!
		 */
		if (sflag)
			return (ENXIO);

		mutex_enter(&log_lock);
		switch (getminor(*devp)) {

		case CONSWMIN:
			if (flag & FREAD) { /* you can only write to this minor */
				mutex_exit(&log_lock);
				return (EINVAL);
			}
			if (q->q_ptr) { /* already open */
				mutex_exit(&log_lock);
				return (0);
			}
			lp = &log_log[CONSWMIN];
			break;

		case CLONEMIN:
			/*
			 * Find an unused minor > CLONEMIN.
			 */
				i = CLONEMIN + 1;
			for (lp = &log_log[i]; i < log_cnt; i++, lp++) {
				if (!(lp->log_state & LOGOPEN))
					break;
			}
			if (i >= log_cnt) {
				mutex_exit(&log_lock);
				return (ENXIO);
			}
			*devp = makedevice(getmajor(*devp), i); /* clone it */
			break;

		default:
			mutex_exit(&log_lock);
			return (ENXIO);
		}

		/*
		 * Finish device initialization.
		 */
		lp->log_state = LOGOPEN;
		lp->log_rdq = q;
		q->q_ptr = (void *)lp;
		WR(q)->q_ptr = (void *)lp;
		mutex_exit(&log_lock);
		qprocson(q);
		return (0);
}

Loop-Around Driver

The loop-around driver is a pseudo-driver that loops data from one open Stream to another open Stream. The associated files are almost like a full-duplex pipe to user processes. The Streams are not physically linked. The driver is a simple multiplexer that passes messages from one Stream's write queue to the other Stream's read queue.

To create a connection, a process opens two Streams, obtains the minor device number associated with one of the returned file descriptors, and sends the device number in an ioctl(2) to the other Stream. For each open, the driver open places the passed queue pointer in a driver interconnection table, indexed by the device number. When the driver later receives an M_IOCTL message, it uses the device number to locate the other Stream's interconnection table entry, and stores the appropriate queue pointers in both of the Streams' interconnection table entries.

Subsequently, when messages other than M_IOCTL or M_FLUSH are received by the driver on either Stream's write side, the messages are switched to the read queue following the driver on the other Stream's read side. The resultant logical connection is shown in Figure 9-2. Flow control between the two Streams must be handled explicitly, since STREAMS do not automatically propagate flow control information between two Streams that are not physically connected.

Figure 9-2 Loop-Around Streams

Graphic

Example 9-9 shows the loop-around driver code. The loop structure contains the interconnection information for a pair of Streams. loop_loop is indexed by the minor device number. When a Stream is opened to the driver, the driver places the address of the corresponding loop_loop element in q_ptr (private data structure pointer) of the read-side and write-side queues. Since STREAMS clears q_ptr when the queue is allocated, a NULL value of q_ptr indicates an initial open. loop_loop verifies that this Stream is connected to another open Stream.

The code presented here for the loop-around driver represents a single-threaded, uniprocessor implementation. Chapter 12, MultiThreaded STREAMS presents multiprocessor and multithreading issues such as locking to prevent race conditions and data corruption.

Example 9-9 contains the declarations for the driver.


Example 9-9 Declarations for the Driver

/* Loop-around driver */

#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
#include <sys/stropts.h>
#include <sys/signal.h>
#include <sys/errno.h>
#include <sys/cred.h>
#include <sys/stat.h>
#include <sys/modctl.h>
#include <sys/conf.h>
#include <sys/ddi.h>
#include <sys/sunddi.h>

static int loop_identify(dev_info_t *);
static int loop_attach(dev_info_t *, ddi_attach_cmd_t);
static int loop_detach(dev_info_t *, ddi_detach_cmd_t);
static int loop_devinfo(dev_info_t *, ddi_info_cmd_t, void *, void **);
static int loopopen (queue_t*, dev_t*, int, int, cred_t*);
static int loopclose (queue_t*, int, cred_t*);
static int loopwput (queue_t*, mblk_t*);
static int loopwsrv (queue_t*);
static int looprsrv (queue_t*);

static dev_info_t *loop_dip;	/* private devinfo pointer */

static struct module_info minfo = {
		0xee12,
		"loop",
		0,
		INFPSZ,
		512,
		128
};

static struct qinit rinit = {
		(int (*)()) NULL,
		looprsrv,
		loopopen,
		loopclose,
		(int (*)()) NULL,
		&minfo,
		NULL
};

static struct qinit winit = {
		loopwput,
		loopwsrv,
		(int (*)()) NULL,
		(int (*)()) NULL,
		(int (*)()) NULL,
		&minfo,
		NULL
};

static struct streamtab loopinfo= {
		&rinit,
		&winit,
		NULL,
		NULL
};

struct 	loop {
		queue_t *qptr;			/* back pointer to write queue */
		queue_t *oqptr;		/* pointer to connected read queue */
};

#define LOOP_CONF_FLAG (D_NEW | D_MP)

static struct cb_ops cb_loop_ops = {
   nulldev,               /* cb_open */
   nulldev,               /* cb_close */
   nodev,                 /* cb_strategy */
   nodev,                 /* cb_print */
   nodev,                 /* cb_dump */
   nodev,                 /* cb_read */
   nodev,                 /* cb_write */
   nodev,                 /* cb_ioctl */
   nodev,                 /* cb_devmap */
   nodev,                 /* cb_mmap */
   nodev,                 /* cb_segmap */
   nochpoll,              /* cb_chpoll */
   ddi_prop_op,           /* cb_prop_op */
   ( &loopinfo),          /* cb_stream */
   (int)(LOOP_CONF_FLAG)  /* cb_flag */
};

static struct dev_ops loop_ops = {
   DEVO_REV,                /* devo_rev */
   0,                       /* devo_refcnt */
   (loop_devinfo),          /* devo_getinfo */
   (loop_identify),         /* devo_identify */
   (nulldev),               /* devo_probe */
   (loop_attach),           /* devo_attach */
   (loop_detach),           /* devo_detach */
   (nodev),                 /* devo_reset */
   &(cb_loop_ops),          /* devo_cb_ops */
   (struct bus_ops *)NULL,  /* devo_bus_ops */
   (int (*)()) NULL         /* devo_power */
};

#define LOOP_SET ((`l'<<8)|1) /* in a .h file */
#define NLOOP	64
static struct loop loop_loop[NLOOP];
static int loop_cnt = NLOOP;

/*
 * Module linkage information for the kernel.
 */
extern struct mod_ops mod_strmodops;

static struct modldrv modldrv = {
		&mod_driverops, "STREAMS loop driver", &loop_ops
};

static struct modlinkage modlinkage = {
		MODREV_1, &modldrv, NULL
};

_init()
{
		return (mod_install(&modlinkage));
}

_info(modinfop)
		struct modinfo *modinfop;
{
		return (mod_info(&modlinkage, modinfop));
}

_fini(void)
{
		return (mod_remove(&modlinkage));
}

Example 9-10 contains the initialization routines.


Example 9-10 Initialization Routines

static int
loop_identify(dev_info_t *devi)
{
		if (strcmp(ddi_get_name(devi), "loop") == 0)
			return (DDI_IDENTIFIED);
		else
			return (DDI_NOT_IDENTIFIED);
}

static int
loop_attach(dev_info_t *devi, ddi_attach_cmd_t cmd)
{
		if (cmd != DDI_ATTACH)
			return (DDI_FAILURE);

		if (ddi_create_minor_node(devi, "loopmajor", S_IFCHR, 0, NULL, 0) 
				== DDI_FAILURE) {
			ddi_remove_minor_node(devi, NULL);
			return (DDI_FAILURE);
		}
		if (ddi_create_minor_node(devi, "loopx", S_IFCHR, 0, NULL, CLONE_DEV)
				 == DDI_FAILURE) {
			ddi_remove_minor_node(devi, NULL);
			return (DDI_FAILURE);
		}

		loop_dip = devi;

		return (DDI_SUCCESS);
}

static int
loop_detach(dev_info_t *devi, ddi_detach_cmd_t cmd)
{
		if (cmd != DDI_DETACH)
			return (DDI_FAILURE);

		ddi_remove_minor_node(devi, NULL);
		return (DDI_SUCCESS);
}

/*ARGSUSED*/
static int
loop_devinfo(
		dev_info_t *dip, 
		ddi_info_cmd_t infocmd, 
		void *arg, 
		void **result)
{
		int error;

		switch (infocmd) {
		case DDI_INFO_DEVT2DEVINFO:
			if (loop_dip == NULL) {
				error = DDI_FAILURE;
			} else {
				*result = (void *) loop_dip;
				error = DDI_SUCCESS;
			}
			break;
		case DDI_INFO_DEVT2INSTANCE:
			*result = (void *)0;
			error = DDI_SUCCESS;
			break;
		default:
			error = DDI_FAILURE;
		}
		return (error);
}

The open procedure (in Example 9-11) includes canonical clone processing that enables a single file system node to yield a new minor device/vnode each time the driver is opened. In loopopen, sflag can be CLONEOPEN, indicating that the driver picks an unused minor device. In this case, the driver scans its private loop_loop data structure to find an unused minor device number. If sflag is not set to CLONEOPEN, the passed-in minor device specified by getminor(*devp) is used.


Example 9-11 open Procedure

/*ARGSUSED*/
static int loopopen(
		queue_t *q,
		dev_t *devp,
		int flag,
		int sflag,
		cred_t *credp)
{
		struct loop *loop;
		minor_t newminor;

		if (q->q_ptr) /* already open */
			return(0);

		/*
		 * If CLONEOPEN, pick a minor device number to use.
		 * Otherwise, check the minor device range.
		 */

		if (sflag == CLONEOPEN) {
			for (newminor = 0; newminor < loop_cnt; newminor++ ) {
				if (loop_loop[newminor].qptr == NULL)
					break;
			}
		} else
			newminor = getminor(*devp);

		if (newminor >= loop_cnt)
			return(ENXIO);

		/*
		 * construct new device number and reset devp
		 * getmajor gets the major number
		 */

		*devp = makedevice(getmajor(*devp), newminor);
		loop = &loop_loop[newminor];
		WR(q)->q_ptr = (char *) loop;
		q->q_ptr = (char *) loop;
		loop->qptr = WR(q);
		loop->oqptr = NULL;

		qprocson(q);

		return(0);
}

Since the messages are switched to the read queue following the other Stream's read side, the driver needs a put procedure only on its write-side. loopwput (Example 9-12) shows another use of an ioctl(2). The driver supports the ioc_cmd value LOOP_SET in the iocblk(9S) of the M_IOCTL message. LOOP_SET makes the driver connect the current open Stream to the Stream indicated in the message. The second block of the M_IOCTL message holds an integer that specifies the minor device number of the Stream to which to connect.

The LOOP_SET ioctl(2) processing involves several checks:

If these checks pass, the read queue pointers for the two Streams are stored in the respective oqptr fields. This cross-connects the two Streams indirectly, through loop_loop.

The put procedure incorporates canonical flush handling.

loopwput queues all other messages (for example, M_DATA or M_PROTO) for processing by its service procedure. A check is made that the Stream is connected. If not, M_ERROR is sent to the Stream head. Certain message types can be sent upstream by drivers and modules to the Stream head where they are translated into actions detectable by user processes. These messages may also modify the state of the Stream head:

M_ERROR

Causes the Stream head to lock up. Message transmission between Stream and user processes is terminated. All subsequent system calls except close(2) and poll(2) fail. Also causes M_FLUSH, clearing all message queues, to be sent downstream by the Stream head.

M_HANGUP

Terminates input from a user process to the Stream. All subsequent system calls that would send messages downstream fail. Once the Stream head read message queue is empty, EOF is returned on reads. This can also result in SIGHUP being sent to the process group's session leader.

M_SIG/M_PCSIG

Causes a specified signal to be sent to the process group associated with the Stream.

putnextctl(9F) and putnextctl1(9F) allocate a nondata (that is, not M_DATA, M_DELAY, M_PROTO, or M_PCPROTO) type message, place one byte in the message (for putnextctl1(9F)), and call the put(9E) procedure of the specified queue.


Example 9-12 Another Use of ioctl

static int loopwput(queue_t *q, mblk_t *mp)
{
		struct loop *loop;
		int to;

		loop = (struct loop *)q->q_ptr;

		switch (mp->b_datap->db_type) {
			case M_IOCTL: {
				struct iocblk	*iocp;
				int		error=0;

				iocp = (struct iocblk *)mp->b_rptr;

				switch (iocp->ioc_cmd) {

	 				case LOOP_SET: {
						/*
						 * if this is a transparent ioctl return an error; 
						 * the complete solution is to convert the message 
						 * into an M_COPYIN message so that the data is
						 * ultimately copied from user space to kernel space.
						 */

						if (iocp->ioc_count == TRANSPARENT) {
							error = EINVAL;
							goto iocnak;
						}

						/* fetch other minor device number */

						to = *(int *)mp->b_cont->b_rptr;

						/*
						 * Sanity check. ioc_count contains the amount
						 * of user supplied data which must equal the
						 * size of an int.
						 */

						if (iocp->ioc_count != sizeof(int)) {
							error = EINVAL;
							goto iocnak;
						}

						/* Is the minor device number in range? */

						if (to >= loop_cnt || to < 0) {
							error = ENXIO;
							goto iocnak;
						}

						/* Is the other device open? */

						if (!loop_loop[to].qptr) {
							error = ENXIO;
							goto iocnak;
						}

						/* Check if either dev is currently connected */

						if (loop->oqptr || loop_loop[to].oqptr) {
							error = EBUSY;
							goto iocnak;
						}

						/* Cross connect the streams through the loopstruct */

						loop->oqptr = RD(loop_loop[to].qptr);
						loop_loop[to].oqptr = RD(q);

						/*
						 * Return successful ioctl. Set ioc_count
						 * to zero, since no data is returned.
						 */

						mp->b_datap->db_type = M_IOCACK;
						iocp->ioc_count = 0;
						qreply(q, mp);
						break;
					}

					default:
						error = EINVAL;
						iocnak:

						/*
						 * Bad ioctl. Setting ioc_error causes the ioctl call 
						 * to return that particular errno. By default, ioctl 
						 * returns EINVAL on failure.
						 */

						mp->b_datap->db_type = M_IOCNAK;
						iocp->ioc_error = error;
						qreply(q, mp);
						break;
					}

					break;
				}

				case M_FLUSH: {

					if (*mp->b_rptr & FLUSHW) {
						flushq(q, FLUSHALL);	/* write */
						if (loop->oqptr)
							flushq(loop->oqptr, FLUSHALL);
						/* read on other side equals write on this side */
					}

					if (*mp->b_rptr & FLUSHR) {
						flushq(RD(q), FLUSHALL);
						if (loop->oqptr != NULL)
							flushq(WR(loop->oqptr), FLUSHALL);
					}

					switch(*mp->b_rptr) {

					case FLUSHW:
						*mp->b_rptr = FLUSHR;
						break;

					case FLUSHR:
						*mp->b_rptr = FLUSHW;
						break;

					}

					if (loop->oqptr != NULL)
						(void) putnext(loop->oqptr, mp);
					break;
				}

				default: /* If this Stream isn't connected, send 
							 * M_ERROR upstream.
							 */
					if (loop->oqptr == NULL) {
						freemsg(mp);
						(void) putnextctl1(RD(q), M_ERROR, ENXIO);
						break;
					}
					(void) putq(q, mp);

				}

		return (0);
}

Service procedures are required in this example on both the write side and read side for flow control (see Example 9-13). The write service procedure, loopwsrv, takes on the canonical form. The queue being written to is not downstream, but upstream (found through oqptr) on the other Stream.

In this case, there is no read side put procedure so the read service procedure, looprsrv, is not scheduled by an associated put procedure, as has been done previously. looprsrv is scheduled only by being back-enabled when its upstream flow control blockage is released. The purpose of the procedure is to re-enable the writer (loopwsrv) by using oqptr to find the related queue. loopwsrv can not be directly back-enabled by STREAMS because there is no direct queue linkage between the two Streams. Note that no message is queued to the read service procedure. Messages are kept on the write side so that flow control can propagate up to the Stream head. qenable(9F) schedules the write-side service procedure of the other Stream.


Example 9-13 Flow Control

staticintloopwsrv (queue_t*q)
{
		mblk_t *mp;
		structloop *loop;
		loop = (structloop*) q->q_ptr;

		while ((mp = getq (q)) != NULL){
			/* Check if we can put the message up the other Stream read queue
			 */
			i f(mp->b_datap->db_type <= QPCTL && !canputnext (loop->oqptr)) {
				(void) putbq (q,mp);				/*read-side is blocked*/
				break;
			}

			/* send message to queue following other Stream read queue */

			(void) putnext (loop->oqptr, mp);
		}
		return (0);
}

staticintlooprsrv (queue_t*q)
{
		/* Enter only when "backenabled" by flow control */
		structloop *loop;
		loop = (structloop*) q->q_ptr;
		if (loop->oqptr == NULL)
			return (0);

		/*manually enable write service procedure*/
		qenable (WR (loop->oqptr));
		return (0);
}

loopclose breaks the connection between the Streams, as shown in Example 9-14. loopclose sends an M_HANGUP message up the connected Stream to the Stream head.


Example 9-14 Breaking Stream Connections

/*ARGSUSED*/
static int loopclose (
		queue_t *q,
		int flag,
		cred_t *credp)
{
		struct loop *loop;

		loop = (struct loop *)q->q_ptr;
		loop->qptr = NULL;

		/*
		 * If we are connected to another stream, break the linkage, and 
		 * send a hangup message. The hangup message causes the stream head 
		 * to reject writes, allow the queued data to be read completely,
		 * and then return EOF on subsequent reads.
		 */

		if (loop->oqptr) {
			(void) putnextctl(loop->oqptr, M_HANGUP);
			((struct loop *)loop->oqptr->q_ptr)->oqptr = NULL;
			loop->oqptr = NULL;
		}

		qprocsoff(q);
		return (0);
}

An application using this driver would first open the clone device node created in the attach routine (/devices/pseudo/clone@0:loopx) two times to obtain two Streams. The application can determine the minor numbers of the devices by using fstat(2). Next, it joins the two Streams by using the Streams I_STR ioctl(2) (see streamio(7I)) to pass the LOOP_SET ioctl(2) with one of the Stream's minor numbers as an argument to the other Stream. Once this is completed, the data sent to one Stream using write(2) or putmsg(2) can be retrieved from the other Stream with read(2) or getmsg(2). The application also can interpose Streams modules between the Stream heads and the driver using the I_PUSH ioctl(2).

Summary

STREAMS device drivers are in many ways similar to non-STREAMS device drivers; the following points summarize the differences between STREAMS drivers and other drivers.

STREAMS device drivers also are similar to STREAMS modules. The following points summarize some of the differences between STREAMS modules and drivers.

Also see Writing Device Drivers.

Answers to Frequently Asked Questions

Solaris 7 Ethernet drivers, le(7D) and eepro(7D) both support Data Link Provider Interfaces (DLPI).

When an ifconfig device0 plumb is issued, the driver immediately receives a DL_INFO_REQ. The information requested by DL_INFO_ACK is shown in the dl_info_ack_t struct in /usr/include/sys/dlpi.h.

A driver can be a CLONE driver and also a DLPI Style 2 provider. Mapping minor numbers selected in the open routine to an instance prior to a DL_ATTACH_REQ using the instance in the getinfo routine is not valid prior to the DL_ATTACH_REQ. The DL_ATTACH_REQ request is to assign a physical point of attachment (PPA) to a Stream. The DL_ATTACH_REQ request can be issued any time after a file or Stream is opened. The DL_ATTACH_REQ request is not involved in assigning, retrieving, or mapping minor or instance numbers. You can issue a DL_ATTACH_REQ request for a file or Stream with a desired major/minor number. Mapping minor number to instance reflects, in most cases, that the minor number (getmino(dev)) is the instance number.

Each time a driver's attach routine is called, a minor node is created. If a non-CLONE driver needs to attach to multiple boards, that is, to have multiple instances and still create only one minor node, it is possible to use the bits of information in a particular minor number; for example `FF' to map to all other minor nodes.

Chapter 10 Modules

This chapter provides specific examples of how modules work, based on code samples.

Module Overview

STREAMS modules process messages as they flow through the stream between an application and a character device driver. A STREAMS module is a pair of initialized queue structures and the specified kernel-level procedures that process data, status, and control information for the two queues. A Stream can contain zero or more modules. Application processes push (stack) modules on a Stream using the I_PUSH ioctl(2) and pop (unstack) them using the I_POP ioctl(2).

STREAMS Module Configuration

Like device drivers, STREAMS modules are dynamically linked and can be loaded into and unloaded from the running kernel.


Note -

The word module is used differently when talking about drivers. A device driver is a kernel-loadable module that provides the interface between a device and the Device Driver Interface, and is linked to the kernel when it is first invoked.


A loadable module must provide linkage information to the kernel in an initialized modlstrmod(9S) and three entry points: _init(9E), _info(9E), and _fini(9E).

STREAMS modules can be unloaded from the kernel when not pushed onto a Stream. A STREAMS module can prevent being unloaded by returning an error (selected from errno.h) from its _fini(9E) routine (EBUSY is a good choice).

Module Procedures

STREAMS module procedures (open, close, put, service) have already been described in the previous chapters. This section shows some examples and further describes attributes common to module put and service procedures.

A module's put procedure is called by the preceding module, driver, or Stream head, and always before that queue's service procedure. The put procedure does any immediate processing (for example, high-priority messages), while the corresponding service procedure performs deferred processing.

The service procedure is used primarily for performing deferred processing, with a secondary task to implement flow control. Once the service procedure is enabled, it can start but not complete before running user-level code. The put and service procedures must not block because there is no thread synchronization being done.

Example 10-1 shows a STREAMS module read-side put procedure.


Example 10-1 Read- side put Procedure

static int
modrput (queue_t *q, mblk_t *mp)
{
		struct mod_prv *modptr;

		modptr = (struct mod_prv *) q->q_ptr;  /*state info*/

		if (mp->b_datap->db_type >= QPCTL){ /*proc pri msg*/
			putnext(q, mp); 						/* and pass it on */
			return (0);
		}

		switch(mp->b_datap->db_type) {
			case M_DATA:							/* can process message data */
					putq(q, mp); /* queue msg for service procedure */
			r		eturn (0);

			case M_PROTO:						/* handle protocol control message */
					.
					.
					.

			default:
					putnext(q, mp);		
					return (0);
		}
}

The preceding code does the following:

Example 10-2 shows a module write-side put procedure.


Example 10-2 Write-side put Procedure

static int
modwput (queue_t *q, mblk_t *mp)
{
 	struct mod_prv *modptr;

 	modptr = (struct mod_prv *) q->q_ptr;		/*state info*/

 	if (mp->b_datap->db_type >= QPCTL){	/* proc pri msg and pass it on */
			putnext(q, mp);
			return (0);
		}

		switch(mp->b_datap->db_type) {
			case M_DATA:					/* can process message data */
					putq(q, mp);			/* queue msg for service procedure or */
												/* pass message along with putnext(q,mp) */
					return (0);

			case M_PROTO:
c				.
					.
					.

			case M_IOCTL:					/* if cmd in msg is recognized */
												/* process message and send reply back */
												/* else pass message downstream */

			default:
					putnext(q, mp);
					return (0);
		}
}

The write-side put procedure, unlike the read side, can be passed M_IOCTL messages. It is up to the module to recognize and process the ioctl(2) command, or pass the message downstream if it does not recognize the command.

Example 10-3 shows a general scenario employed by the module's service procedure.


Example 10-3 Service Procedure

static int
modrsrv (queue_t *q)
{
		mblk_t *mp;

		while ((mp = getq(q)) != NULL) {
			if (!(mp->b_datap->db_type >= QPCTL) && !canputnext(q)) {	
																	/* flow control check */
				putbq(q, mp);									/* return message */
				return (0);
			}
			/* process the message */
				.
				.
				.
			putnext(q, mp); /* pass the result */
		}
		return (0);
}

The steps are:

These steps are repeated until getq(9F) returns NULL (the queue is empty) or canputnext(9F) returns false.

Filter Module Example

The module shown next, crmod in Example 10-4, is an asymmetric filter. On the write side, a newline is changed to a carriage return followed by a newline. On the read side, no conversion is done.


Example 10-4 crmod

/* Simple filter
 * converts newline -> carriage return, newline
 */
#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
#include <sys/stropts.h>
#include <sys/ddi.h>
#include <sys/sunddi.h>

static struct module_info minfo =
	{ 0x09, "crmod", 0, INFPSZ, 512, 128 };

static int modopen (queue_t*, dev_t*, int, int, cred_t*);
static int modrput (queue_t*, mblk_t*);
static int modwput (queue_t*, mblk_t*);
static int modwsrv (queue_t*);
static int modclose (queue_t*, int, cred_t*);

static struct qinit rinit = {
	modrput, NULL, modopen, modclose, NULL, &minfo, NULL};

static struct qinit winit = {
	modwput, modwsrv, NULL, NULL, NULL, &minfo, NULL};

struct streamtab crmdinfo={ &rinit, &winit, NULL, NULL};

stropts.h includes definitions of flush message options common to user applications. modrput is like modput from the null module.

Note that, in contrast to the null module example, a single module_info structure is shared by the read side and write side. The module_info includes the flow control high-water and low-water marks (512 and 128) for the write queue. (Though the same module_info is used on the read queue side, the read side has no service procedure so flow control is not used.) The qinit contains the service procedure pointer.

The write-side put procedure, the beginning of the service procedure, and an example of flushing a queue are shown in Example 10-5.


Example 10-5

static int
modwput(queue_t *q, mblk_t *mp)
{
	if (mp->b_datap->db_type >= QPCTL && mp->b_datap->db_type != M_FLUSH)
			putnext(q, mp);
	else
			putq(q, mp);				 /* Put it on the queue */
	return (0);
}
static int 
modwsrv(queue_t *q)
{
	mblk_t *mp;

	while ((mp = getq(q)) != NULL) {
			switch (mp->b_datap->db_type) {
				default:
					if (canputnext(q)) {
							putnext(q, mp);
							break;
			 		} else {
							putbq(q, mp);
							return (0);
					 }

			    case M_FLUSH:
				    if (*mp->b_rptr & FLUSHW)
						    flushq(q, FLUSHDATA);
				    putnext(q, mp);
				    break;

modwput, the write put procedure, switches on the message type. High-priority messages other than type M_FLUSH use putnext(9F) to avoid scheduling. The others are queued for the service procedure. An M_FLUSH message is a request to remove messages on one or both queues. It can be processed in the put or service procedure.

modwsrv is the write service procedure. It takes a single argument, a pointer to the write queue. modwsrv processes only one high-priority message, M_FLUSH. No other high priority-messages should reach modwsrv.

For an M_FLUSH message, modwsrv checks the first data byte. If FLUSHW is set, the write queue is flushed by flushq(9F), which takes two arguments, the queue pointer and a flag. The flag indicates what should be flushed, data messages (FLUSHDATA) or everything (FLUSHALL). Data includes M_DATA, M_DELAY, M_PROTO, and M_PCPROTO messages. The choice of what types of messages to flush is module specific.

Ordinary messages are returned to the queue if canputnext(9F) returns false, indicating the downstream path is blocked. The example continues with the remainder of modwsrv processing M_DATA messages:

		case M_DATA: {
			mblk_t *nbp = NULL;
			mblk_t *next;
			if (!canputnext(q)) {
					putbq(q, mp);
					return (0);
			}
			/* Filter data, appending to queue */
			for (; mp != NULL; mp = next) {
					while (mp->b_rptr < mp->b_wptr) {
							if (*mp->b_rptr == '\n')
								if (!bappend(&nbp, '\r'))
										goto push;
							if (!bappend(&nbp, *mp->b_rptr))
								goto push;
							mp->b_rptr++;
							continue;
					push:
							if (nbp)
								putnext(q, nbp);
							nbp = NULL;
							if (!canputnext(q)) {
								if (mp->b_rptr>=mp->b_wptr){
										next = mp->b_cont;
										freeb(mp);
										mp=next;
								}
								if (mp)
										putbq(q, mp);
								return (0);
							}
					} /* while */
					next = mp->b_cont;
					freeb(mp);
					if (nbp)
						putnext(q, nbp);
				}
			}
		}
	}
	return (0);
}

The differences in M_DATA processing between this and the example in "Message Allocation and Freeing" relate to the manner in which the new messages are forwarded and flow controlled. For the purpose of demonstrating alternative means of processing messages, this version creates individual new messages rather than a single message containing multiple message blocks. When a new message block is full, it is immediately forwarded with putnext(9F) rather than being linked into a single large message. This alternative is not desirable because message boundaries are altered, and because of the additional overhead of handling and scheduling multiple messages.

When the filter processing is performed (following push), flow control is checked (with canputnext(9F)) after each new message is forwarded. This is done because there is no provision to hold the new message until the queue becomes unblocked. If the downstream path is blocked, the remaining part of the original message is returned to the queue. Otherwise, processing continues.

Flow Control

To support the STREAMS flow control mechanism, modules that use service procedures must invoke canputnext(9F) before calling putnext(9F), and use appropriate values for the high-water and low-water marks. If your module has a service procedure, you manage the flow control. If you don't have a service procedure, then there is no need to do anything.

The queue hiwat and lowat values limit the amount of data that can be placed on a queue. The limits prevent depletion of buffers in the buffer pool. Flow control is advisory in nature and can be bypassed. It is managed by high-water and low-water marks and regulated by the utility routines getq(9F), putq(9F), putbq(9F), insq(9F), rmvq(9F), and canputnext(9F).

The following scenario takes place normally in flow control:

A driver sends data to a module using putnext(9F), and the module's put procedure queues data using putq(9F). Calling putq(9F) enables the service procedure and executes at some indeterminate time in the future. When the service procedure runs, it retrieves the data by calling getq(9F).

If the module cannot process data at the rate at which the driver is sending the data, the following happens:

When the message is queued, putq(9F) increments the value of q_count by the size of the message and compares the result to the module's high-water limit (q_hiwat) value for the queue. If the count reaches q_hiwat, putq(9F) sets the internal FULL indicator for the queue. This causes messages from upstream in the case of a write-side queue or downstream in the case of a read-side queue to be halted (canputnext(9F) returns FALSE) until the queue count drops below q_lowat. getq(9F) decrements the queue count. If the resulting count is below q_lowat, getq(9F) back-enables and causes the service procedure to be called for any blocked queue. (Flow control does not prevent reaching q_hiwat on a queue. Flow control can exceed its maximum value before canputnext(9F) detects QFULL and flow is stopped.)

The next example show a line discipline module's flow control. Example 10-6 shows a read-side line discipline module and a write-side line discipline module. Note that the read side is the same as the write side but without the M_IOCTL processing.


Example 10-6 Read-side Line Discipline Module

/* read side line discipline module flow control */
static mblk_t *read_canon(mblk_t *);

static int
ld_read_srv(
	queue_t *q)									/* pointer to read queue */
{
	mblk_t *mp;									/* original message */
	mblk_t *bp;									/* canonicalized message */

	while ((mp = getq(q)) != NULL) {
			switch (mp->b_datap->db_type) { /* type of msg */
			case M_DATA:	 /* data message */
				if (canputnext(q)) {
						bp = read_canon(mp);
						putnext(q, bp);
				} else {
						putbq(q, mp); /* put messagebackinqueue */
						return (0);
				}
				break;

			default:
				if (mp->b_datap->db_type >= QPCTL)
						putnext(q, mp); 				/* high-priority message */
				else { /* ordinary message */
						if (canputnext(q))
								 putnext(q, mp);
						else {
								 putbq(q, mp);
								 return (0);
						}
				}
				break;
			}
	}
return (0);
}

/* write side line discipline module flow control */
static int
ld_write_srv(
	queue_t *q)								/* pointer to write queue */
{
	mblk_t *mp;								/* original message */
	mblk_t *bp;								/* canonicalized message */

	while ((mp = getq(q)) != NULL) {
			switch (mp->b_datap->db_type) { /* type of msg */
			case M_DATA:			 /* data message */
				if (canputnext(q)) {
						bp = write_canon(mp);
						putnext(q, bp);
				} else {
						putbq(q, mp);
						return (0);
				}
				break;

			case M_IOCTL:
				ld_ioctl(q, mp);
				break;

			default:
				if (mp->b_datap->db_type >= QPCTL)
						putnext(q, mp);				/* high priority message */
				else { 						/* ordinary message */
						if (canputnext(q))		
								putnext(q, mp);
						else {
								putbq(q, mp);
								return (0);
						}
				}
				break;
			}
	}
return (0);
}

Design Guidelines

Module developers should follow these guidelines:

htonl(3N) and ntohl(3N)

The htonl(3N) and ntohl(3N) conversion routines follow the XNS5 publications. The functions continue to convert 32-bit quantities between network byte order and host byte order.

Answers to Frequently Asked Questions

TCP and IP are STREAMS modules in the Solaris 7 system. The command strconf < /dev/tcp shows you all the modules. STREAMS is not supported in SunOS 4 system TCP/IP.

Solaris 7 system DLPI provides both connection-oriented and connectionless services, and multicast features. See the dlpi(7P) man page.

IP multicast is a standard supported feature in the Solaris 7 system. In the SunOS 4 system, multicasting is not supported. But, it is available using anonymous FTP from gregorio.stanford.edu in the file vmtp-ip/ipmulti-sunos41x.tar.Z.

IP is a STREAMS module in the Solaris 7 system, and any module or driver interface with IP should follow the STREAMS mechanism. There are no specific requirements for the interface between IP and network drivers.

Chapter 11 Configuration

This chapter contains information about configuring STREAMS drivers and modules into the Solaris 7 system. It describes how to configure a driver and a module for the STREAMS framework only. For more in-depth information on the general configuration mechanism, see Writing Device Drivers.

This chapter also includes a list of STREAMS-related parameters that can be tuned and describes the autopush(1M) facility.

Configuring STREAMS Drivers and Modules

The following sections contain descriptions of the pointer relationships maintained by the kernel and the various data structures used in STREAMS drivers. For the kernel to access a driver, it uses a sequence of pointers in various data structures. Look first at the data structure relationship, and then the entry point interface for loading the driver into the kernel and accessing the driver from the application level.

The order of data structure the kernel uses to get to a driver is as follows:

modlinkage(9S)

Contains the revision number and a list of drivers to dynamically load. It is used by mod_install in the _init routine to load the module into the kernel. Points to a modldrv(9S) or modlstrmod(9S).

modldrv(9S)

Contains information about the driver being loaded and points to the devops structure

modlstrmod(9S)

Points to an fmodsw(9S) structure (which points to a streamtab(9S)) Only used by STREAMS modules.

dev_ops(9S)

Contains list of entry points for a driver, such as identify, attach, and info. Also points to a cb_ops(9S) structure.

cb_ops(9S)

Points to list of threadable entry points to driver, like open, close, read, write, ioctl. Also points to the streamtab

streamtab(9S)

Points to the read and write queue init structures

qinit(9S)

Points to the entry points of the STREAMS portion of the driver, such as put, srv, open, close, as well as the mod_info structure. These entry points only process messages.

Each STREAMS driver or module contains the linkage connections for the various data structures: a list of pointers to dev_ops(9S) structures. In each dev_ops(9S) structure is a pointer to the cb_ops(9S) structure. In the cb_ops(9S) structure is a pointer named streamtab. If the driver is not a STREAMS driver, streamtab is NULL. If the driver is a STREAMS driver, streamtab points to a structure that contains initialization routines for the driver.

modlinkage(9S)

This is the definition of modlinkage(9S).


struct modlinkage {
   int      ml_rev;           /* rev of loadable modules system */
   void     *ml_linkage[4];   /* NULL terminated list of linkage 
                               * structures */
};

modldrv(9S)

This is the definition ofmodldrv(9S).


struct modldrv {
		struct mod_ops   *drv_modops;
		char             *drv_linkinfo;
		struct dev_ops   *drv_dev_ops;
};

modlstrmod(9S)

This is the definition of modlstrmod(9S). It does not point to dev_ops(9S) structures because modules can only be pushed onto an existing stream.


struct modlstrmod {
		struct mod_ops      *strmod_modops;
		char                *strmod_linkinfo;
		struct fmodsw       *strmod_fmodsw;
};

dev_ops(9S)

The first structure is dev_ops(9S). It represents a specific class or type of device. Each dev_ops(9S) structure represents a unique device to the operating system. Each device has its own dev_ops(9S) structure. Each dev_ops(9S) structure contains a cb_ops(9S).


struct dev_ops  {
  int       devo_rev;                 /* Driver build version	*/
  int       devo_refcnt;              /* device reference count	*/
  int       (*devo_getinfo)(dev_info_t *dip, ddi_info_cmd_t infocmd, 
									 void *arg, void **result);
  int       (*devo_identify)(dev_info_t *dip);
  int       (*devo_probe)(dev_info_t *dip);
  int       (*devo_attach)(dev_info_t *dip, ddi_attach_cmd_t cmd);
  int       (*devo_detach)(dev_info_t *dip, ddi_detach_cmd_t cmd);
  int       (*devo_reset)(dev_info_t *dip, ddi_reset_cmd_t cmd);
  struct cb_ops      *devo_cb_ops;    /* cb_ops ptr for leaf driver*/
  struct bus_ops     *devo_bus_ops;   /* ptr for nexus drivers */
};

cb_ops(9S)

The cb_ops(9S) structure is the SunOS 5 version of the cdevsw and bdevsw tables of previous versions of Unix System V. It contains character and block device information and the driver entry points for non-STREAMS drivers.

struct cb_ops  {
		int		*cb_open)(dev_t *devp, int flag, int otyp, cred_t *credp);
		int		(*cb_close)(dev_t dev, int flag, int otyp, cred_t *credp);
		int		(*cb_strategy)(struct buf *bp);
 	int		(*cb_print)(dev_t dev, char *str);
		int		(*cb_dump)(dev_t dev, caddr_t addr,daddr_t blkno, int nblk);
		int		(*cb_read)(dev_t dev, struct uio *uiop, cred_t *credp);
		int		(*cb_write)(dev_t dev, struct uio *uiop, cred_t *credp);
		int		(*cb_ioctl)(dev_t dev, int cmd, int arg, int mode,
						cred_t *credp, int *rvalp);
		int		(*cb_devmap)(dev_t dev, dev_info_t *dip, 
						ddi_devmap_data_t *dvdp, ddi_devmap_cmd_t cmd, off_t offset, 
						unsigned int len, unsigned int prot, cred_t *credp);
		int		(*cb_mmap)(dev_t dev, off_t off, int prot);
		int		(*cb_segmap)(dev_t dev, off_t off, struct as *asp, 
						caddr_t *addrp, off_t len, unsigned int prot, 
						unsigned int maxprot, unsigned int flags, cred_t *credp);
		int		(*cb_chpoll)(dev_t dev, short events, int anyyet, 
						short *reventsp, struct pollhead **phpp);
		int		(*cb_prop_op)(dev_t dev, dev_info_t *dip, ddi_prop_op_t prop_op, 
						int mod_flags, char *name, caddr_t valuep, int *length);

 struct streamtab *cb_str;		/* streams information */

	/*
	 * The cb_flag fields are here to tell the system a bit about the device.
 * The bit definitions are in <sys/conf.h>.
	 */
 int		cb_flag;						/* driver compatibility flag */
};

streamtab(9S)

The streamtab(9S) structure contains pointers to the structures that hold the routines that actually initialize the reading and writing for a module.

If streamtab is NULL, it signifies no STREAMS routines and the entire driver is treated as though it was a regular driver. The streamtab(9S) indirectly identifies the appropriate open, close, put, service, and administration routines. These driver and module routines should generally be declared static.


struct streamtab {
		struct qinit     *st_rdinit;      /* defines read queue */
		struct qinit     *st_wrinit;      /* defines write queue */
		struct qinit     *st_muxrinit;    /* for multiplexing */
		struct qinit     *st_muxwinit;    /* drivers only */
};

qinit(9S)

The qinit(9S) structure (also shown in Appendix A) contains pointers to the STREAMS entry points. These routines are called by the module loading code in the kernel.


struct qinit {
 	int         (*qi_putp)();            /* put procedure */
 	int         (*qi_srvp)();            /* service procedure */
 	int         (*qi_qopen)();           /*called on each open or push*/
 	int         (*qi_qclose)();          /*called on last close or pop*/
 	int         (*qi_qadmin)();          /* reserved for future use */
 	struct module_info     *qi_minfo;    /* info struct */
 	struct module_stat     *qi_mstat;    /*stats struct (opt)*/
};

Entry Points

As described in Chapter 9, STREAMS Drivers, and as seen in the previous data structures, there are four entry points:

  1. Kernel module loading - _init(9E), _fini(9E), _info(9E)

  2. dev_ops - identify(9E), attach(9E), getinfo(9E)).

  3. cb_ops - open(9E), close(9E), read(9E), write(9E), ioctl(9E).

  4. streamtab - put(9E), srv(9E).

pts(7D) example

Now look at a real example taken from the Solaris 7 system. The driver pts(7D) is the pseudo terminal slave driver.

																											/*
 * Slave Stream Pseudo Terminal Module
 */

#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
#include <sys/stropts.h>
#include <sys/stat.h>
#include <sys/errno.h>
#include <sys/debug.h>
#include <sys/cmn_err.h>
#include <sys/modctl.h>
#include <sys/conf.h>
#include <sys/ddi.h>
#include <sys/sunddi.h>

static int ptsopen (queue_t*, dev_t*, int, int, cred_tstatic int
ptsclose (queue_t*, int, cred_t*);
static int ptswput (queue_t*, mblk_t*);
static int ptsrsrv (queue_t*);
static int ptswsrv (queue_t*);

static int pts_devinfo(dev_info_t *dip, ddi_info_cmd_t infocmd,
      void *arg,void **result);

static struct module_info pts_info = {
		0xface,
		"pts",
		0,
		512,
		512,
		128
};

static struct qinit ptsrint = {
		NULL,
		ptsrsrv,
		ptsopen,
		ptsclose,
		NULL,
		&pts_info,
		NULL
};

static struct qinit ptswint = {
		ptswput,
		ptswsrv,
		NULL,
		NULL,
		NULL,
		&pts_info,
		NULL
};

static struct streamtab ptsinfo = {
		&ptsrint,
		&ptswint,
		NULL,
		NULL
};

static int pts_identify(dev_info_t *devi);
static int pts_attach(dev_info_t *devi, ddi_attach_cmd_t cmd);
static int pts_detach(dev_info_t *devi, ddi_detach_cmd_t cmd);
static dev_info_t *pts_dip;				/* private copy of devinfo ptr */

extern kmutex_t pt_lock;
extern pt_cnt;
static struct cb_ops cb_pts_ops = {
   nulldev,       /* cb_open */ 
   nulldev,       /* cb_close */ 
   nodev,         /* cb_strategy */
   nodev,         /* cb_print */
   nodev,         /* cb_dump */
   nodev,         /* cb_read */
   nodev,         /* cb_write */
   nodev,         /* cb_ioctl */
   nodev,         /* cb_devmap */
   nodev,         /* cb_mmap */
   nodev,         /* cb_segmap */
   nochpoll,      /* cb_chpoll */
   ddi_prop_op,   /* cb_prop_op */
   &ptsinfo,      /* cb_stream */
   D_MP           /* cb_flag */
};

static struct dev_ops pts_ops = {
   DEVO_REV,      /* devo_rev */
   0,             /* devo_refcnt */   
   pts_devinfo,   /* devo_getinfo */ 
   pts_identify,  /* devo_identify */
   nulldev,       /* devo_probe */
   pts_attach,    /* devo_attach */
   pts_detach,    /* devo_detach */
   nodev,         /* devo_reset */
   &cb_pts_ops,   /* devo_cb_ops */
   (struct bus_ops*) NULL   /* devo_bus_ops */
};

/*
 * Module linkage information for the kernel.
 */

static struct modldrv modldrv = {
			&mod_driverops, 		/* Type of module: a pseudo driver */
			"Slave Stream Pseudo Terminal driver'pts'",
			&pts_ops,				/* driver ops */
};

static struct modlinkage modlinkage = {
		MODREV_1,
		(void *)&modldrv,
		NULL
};

int
_init(void)
{
		return (mod_install(&modlinkage));
}

int
_fini(void)
{
		return (mod_remove(&modlinkage));
}

int
_info(struct modinfo *modinfop)
{
		return (mod_info(&modlinkage, modinfop));
}

static int
pts_identify(dev_info_t *devi)
{
		if (strcmp(ddi_get_name(devi), "pts") == 0)
			return (DDI_IDENTIFIED);
		else
			return (DDI_NOT_IDENTIFIED);
}

static int
pts_attach(dev_info_t *devi, ddi_attach_cmd_t cmd)
{
		int i;
		char name[5];

		if (cmd != DDI_ATTACH)
			return (DDI_FAILURE);
	
		for (i = 0; i < pt_cnt; i++) {
			(void) sprintf(name, "%d", i);
			if (ddi_create_minor_node(devi, name, S_IFCHR, i, NULL, 0) 
						== DDI_FAILURE) {
				ddi_remove_minor_node(devi, NULL);
				return (DDI_FAILURE);
			}
		}
		return (DDI_SUCCESS);
}

static int
pts_detach(dev_info_t *devi, ddi_detach_cmd_t cmd)
{
		ddi_remove_minor_node(devi, NULL);
		return (DDI_SUCCESS);
}

static int
pts_devinfo (dev_info_t *dip, ddi_info_cmd_t infocmd, void *arg,
					 void **result)
{
		int error;

		switch (infocmd)   {
			case DDI_INFO_DEVT2DEVINFO:
				if (pts_dip == NULL) {
					error = DDI_FAILURE;
				} else {
					*result = (void *) pts_dip;
					error = DDI_SUCCESS;
				}
				break;
			case DDI_INFO_DEVT2INSTANCE:
				*result = (void *) 0;
				error = DDI_SUCCESS;
				break;
			default:
				error = DDI_FAILURE;
		}
		return (error);
}

/* the open, close, wput, rsrv, and wsrv routines are presented
 * here solely for the sake of showing how they interact with the
 * configuration data structures and routines. Therefore, the 
 * bulk of their code is not included.
 */
static int
ptsopen(rqp, devp, oflag, sflag, credp)
		queue_t *rqp; 	/* pointer to the read side queue */
		dev_t   *devp;			/* pointer to stream tail's dev */
		int	 oflag; 			/* the user open(2) supplied flags */
		int 	sflag; 			/* open state flag */
		cred_t  *credp;		/* credentials */
{
		qprocson(rqp);
		return (0);
}

static int
ptsclose(rqp, flag, credp)
		queue_t		*rqp;
		int			flag;
		cred_t		*credp;
{
		qprocsoff(rqp);
		return (0);
}

static int
ptswput(qp, mp)
		queue_t		*qp;
		mblk_t		*mp;
{
		return (0);
}

static int
ptsrsrv(qp)
		queue_t *qp;
{
		return (0);
}

static int
ptswsrv(qp)
		queue_t 	*qp;
{
		return (0);
}

STREAMS Module Configuration

Here are the structures if you are working with a module instead of a driver. Notice that a modlstrmod(9S) is used in modlinkage(9S) and fmodsw(9S) points to streamtab(9S) instead of going through dev_ops(9S).

extern struct streamtab pteminfo;

static struct fmodsw fsw = {
		"ptem",
		&pteminfo,
		D_NEW | D_MP
};

/*
 * Module linkage information for the kernel.
 */
extern struct mod_ops mod_strmodops;

static struct modlstrmod modlstrmod = {
		&mod_strmodops,
		"pty hardware emulator",
		&fsw
};

static struct modlinkage modlinkage = {
		MODREV_1, 
		(void *)&modlstrmod,
		NULL
};

Compilation

Here are some compile, assemble, and link lines for an example driver with two C modules and an assembly language module.


cc -D_KERNEL -c example_one.c
cc -D_KERNEL -c example_two.c
as -P -D_ASM -D_KERNEL -I. -o example_asm.o example_asm.s
ld -r -o example example_one.o example_two.o example_asm.o

Kernel Loading

See Writing Device Drivers for more information on the sequence of installing and loading device drivers. The procedure is to copy your driver or module to /kernel/drv or /kernel/strmod respectively. For drivers run add_drv(1M).

Checking Module Type

Next, see the code that enables a driver to determine if it is running as a regular driver, a module, or a cloneable driver. The open routine returns sflag, which is checked.

	if (sflag == MODOPEN)
 			/* then the module is being pushed */
		else if (sflag == CLONEOPEN)
 			/* then its being opened as a clonable driver */
 	else
 			/* its being opened as a regular driver */

Tunable Parameters

Certain system parameters referred to by STREAMS are configurable when building a new operating system (see the file /etc/system and the SunOS User's Guide to System Administration for further details). These parameters are:

nstrpush

Maximum number (should be at least 8) of modules that can be pushed onto a single Stream.

strmsgsz

Maximum number of bytes of information that a single system call can pass to a Stream to be placed into the data part of a message (in M_DATA blocks). Any write(2) exceeding this size is broken into multiple messages. A putmsg(2) with a data part exceeding this size fails with ERANGE. If STRMSGSZ is set to 0, the number of bytes passed to a Stream is infinite.

strctlsz

Maximum number of bytes of information that a single system call can pass to a Stream to be placed into the control part of a message (in an M_PROTO or M_PCPROTO block). A putmsg(2) with a control part exceeding this size fails with ERANGE.

autopush(1M) Facility

The autopush(1M) facility configures the list of modules for a STREAMS device. It automatically pushes a prespecified list (/etc/iu.ap) of modules onto the Stream when the STREAMS device is opened and the device is not already open.

The STREAMS Administrative Driver (SAD) (see the sad(7D) man page) provides an interface to the autopush mechanism. System administrators can open the SAD driver and set or get autopush(1M) information on other drivers. The SAD driver caches the list of modules to push for each driver. When the driver is opened the Stream head checks the SAD's cache to determine if the device is configured to have modules pushed automatically. If an entry is found, the modules are pushed. If the device has been opened but not been closed, another open does not cause the list of the prespecified modules to be pushed again.

Three options configure the module list:

When the configuration list is cleared, a range of minor devices has to be cleared as a range and not in parts.

Application Interface

The SAD driver is accessed through the /dev/sad/admin or /dev/sad/user node. After the device is initialized, a program can perform any autopush configuration. The program should open the SAD driver; read a configuration file to find out what modules need to be configured for which devices, format the information into strapush structures; and make the SAD_SAP ioctl(2) calls. See the sad(7D) amn page for more information.

All autopush operations are performed through ioctl(2) commands to set or get autopush information. Only root can set autopush information, but any user can get the autopush information for a device.

The ioctl is a form of ioctl(fd, cmd, arg), where fd is the file descriptor of the SAD driver, cmd is either SAD_SAP (set autopush information) or SAD_GAP (get autopush information), and arg is a pointer to the structure strapush.

The strapush structure is:

/*
 * maximum number of modules that can be pushed on a
 * Stream using the autopush feature should be no greater
 * than nstrpush
 */
#define MAXAPUSH 8

/* autopush information common to user and kernel */

struct apcommon {
   uint     apc_cmd;          /* command - see below */
   major_t  apc_major;        /* major device number */
   minor_t  apc_minor;        /* minor device number */
   minor_t  apc_lastminor;    /* last minor dev # for range */
   uint     apc_npush;        /* number of modules to push */
};

/* ap_cmd - various options of autopush */
#define SAP_CLEAR       0 /* remove configuration list */
#define SAP_ONE         1 /* configure one minor device */
#define SAP_RANGE       2 /* config range of minor devices */
#define SAP_ALL         3 /* configure all minor devices */

/* format of autopush ioctls */
struct strapush {
		struct apcommon sap_common;
		char sap_list[MAXAPUSH] [FMNAMESZ + 1]; /* module list */
};

#define sap_cmd           sap_common.apc_cmd
#define sap_major         sap_common.apc_major
#define sap_minor         sap_common.apc_minor
#define sap_lastminor     sap_common.apc_lastminor
#define sap_npush         sap_common.apc_npush

A device is identified by its major device number, sap_major. The SAD_SAP ioctl(2) has the following options:

SAP_ONE

Configures a single minor device, sap_minor, of a driver.

SAP_RANGE

Configures a range of minor devices from sap_minor to sap_lastminor, inclusive.

SAP_ALL

Configures all minor devices of a device.

SAP_CLEAR

Clears the previous settings by removing the entry with the matching sap_major and sap_minor fields.

The list of modules is specified as a list of module names in sap_list. MAXAPUSH defines the maximum number of modules to push automatically.

A user can query the current configuration status of a given major/minor device by issuing the SAD_GAP ioctl(2) with sap_major and sap_minor values of the device set. On successful return from this system call, the strapush structure is filled in with the corresponding information for the device. The maximum number of entries the SAD driver can cache is determined by the tunable parameter NAUTOPUSH found in the SAD driver's master file.

The following is an example of an autopush configuration file in /etc/iu.ap:


# /dev/console and /dev/contty autopush setup
#
#	major      minor         lastminor         modules

		wc         0             0                 ldterm ttcompat
		zs         0             1                 ldterm ttcompat
		ptsl       0             15                ldterm ttcompat

The first line configures a single minor device whose major name is wc and minor numbers start and end at 0, creating only one minor number. The modules automatically pushed are ldterm and ttcompat. The second line configures the zs driver whose minor device numbers are 0 and 1, and automatically pushes the same modules. The last line configures the ptsl driver whose minor device numbers are from 0 to 15, and automatically pushes the same modules.

Chapter 12 MultiThreaded STREAMS

This chapter describes how to multithread a STREAMS driver or module. It covers the necessary conversion topics so that new and existing STREAMS modules and drivers run in the multithreaded kernel. It describes STREAMS-specific multithreading issues and techniques. Refer also to Writing Device Drivers.

MultiThreaded (MT) STREAMS Overview

The SunOS 5 operating system is fully multithreaded, able to make effective use of the available parallelism of a symmetric shared-memory multiprocessor computer. All kernel subsystems are multithreaded: scheduler, virtual memory, file systems, block/character/STREAMS I/O, networking protocols, and device drivers.

MT STREAMS requires you to use some new concepts and terminology. These concepts apply not only to STREAMS drivers, but to all device drivers in the SunOS 5 system. For a more complete description of these terms, see Writing Device Drivers. Additionally, see Chapter 1, Overview of STREAMS of this guide for definitions and Chapter 8, Messages - Kernel Level for elements of MT drivers.

You need to understand the following terms and ideas.

Thread

Sequence of instructions executed within context of a process

Lock

Mechanism to restrict access to data structures

Single Threaded

Restricting access to a single thread

Multithreaded

Allowing two or more threads access

Multiprocessing

Two or more CPUs concurrently executing the OS

Concurrency

Simultaneous execution

Preemption

Suspending execution for the next thread to run

Monitor

Portion of code that is single threaded

Mutual Exclusion

Exclusive access to a data element by a single thread at one time

Condition Variables

Kernel event synchronization primitives

Counting Semaphores

Memory based synchronization mechanism

Readers/Writer Locks

Data lock allowing one writer or many readers at one time

Callback

On specific event, call module function

MT STREAMS Framework

The STREAMS framework consists of the Stream head, STREAMS utility routines, and documented STREAMS data structures. The STREAMS framework allows multiple kernel threads to concurrently enter and execute within each module. Multiple threads can be actively executing in the open, close, put, and service procedures of each queue within the system.

The first goal of the SunOS 5 system is to preserve the interface and flavor of STREAMS and to shield module code as much as possible from the impact of migrating to the multithreaded kernel. Most of the locking is hidden from the programmer and performed by the STREAMS kernel framework. As long as module code uses the standard, documented programmatic interfaces to shared kernel data structures (such as queue_t, mblk_t, and dblk_t), it does not have to explicitly lock these framework data structures.

The second goal is to make it simple to write MT SAFE modules. The framework accomplishes this by providing the MT STREAMS perimeter mechanisms for controlling and restricting the concurrency in a STREAMS module. See the section "MT SAFE Modules".

The DDI/DKI entry points (open, close, put, and service procedures) plus certain callback procedures (scheduled with qtimeout, qbufcall, or qwriter) are synchronous entry points. All other entry points into a module are asynchronous. Examples of the latter are hardware interrupt routines, timeout, bufcall, and esballoc callback routines.

STREAMS Framework Integrity

The STREAMS framework guarantees the integrity of the STREAMS data structures, such as queue_t, mblk_t, and dblk_t. This assumes that a module conforms to the DDI/DKI and does not directly access global operating system data structures or facilities not described within the Driver-Kernel Interface.

The q_next and q_ptr fields of the queue_t structure are not modified by the system while a thread is actively executing within a synchronous entry point. The q_next field of the queue_t structure can change while a thread is executing within an asynchronous entry point.

As in previous Solaris system releases, a module must not call another module's put or service procedures directly. The DDI/DKI routines putnext(9F), put(9F), and others in Section 9F must be used to pass a message to another queue. Calling another module's routines directly circumvents the design of the MT STREAMS framework and can yield unknown results.

When making your module MT SAFE, the integrity of private module data structures must be ensured by the module itself. Knowing what the framework supports is critical in deciding what you must provide. The integrity of private module data structures can be maintained by either using the MT STREAMS perimeters to control the concurrency in the module, by using module private locks, or by a combination of the two.

Message Ordering

The STREAMS framework guarantees the ordering of messages along a stream if all the modules in the stream preserve message ordering internally. This ordering guarantee only applies to messages that are sent along the same stream and produced by the same source.

The STREAMS framework does not guarantee that a message has been seen by the next put procedure when putnext(9F), qreply(9F) returns.

MT Configurations

A module or a driver can be either MT SAFE or MT UNSAFE. Beginning with the release of the Solaris 7 system, no MT UNSAFE module or driver will be supported.

MT SAFE modules

For MT SAFE mode, use MT STREAMS perimeters to restrict the concurrency in a module or driver to:

It is easiest to initially implement your module and configure it to be per-module single threaded, and increase the level of concurrency as needed. "Sample Multithreaded Device Driver"provides a complete example of using a per-module perimeter, and "Sample Multithreaded Module with Outer Perimeter" provides a complete example with a higher level of concurrency.

MT SAFE modules can use different MT STREAMS perimeters to restrict the concurrency in the module to a concurrency that is natural given the data structures that the module contains, thereby removing the need for module private locks. A module that requires unrestricted concurrency can be configured to have no perimeters. Such modules have to use explicit locking primitives to protect their data structures. While such modules can exploit the maximum level of concurrency allowed by the underlying hardware platform, they are more complex to develop and support. See "MT SAFE Modules Using Explicit Locks".

Independent of the perimeters, there will be at most one thread allowed within any given queue's service procedure.

MT UNSAFE Modules

MT UNSAFE mode for STREAMS modules were temporarily supported as an aid in porting SVR4 modules. MT UNSAFE are not supported after SVR4..

Preparing to Port

When modifying a STREAMS driver to take advantage of the multithreaded kernel, a level of MT safety is selected according to:

Note that much of the effort in conversion is simply determining the appropriate degree of data sharing and the corresponding granularity of locking. The actual time spent configuring perimeters and/or installing locks should be much smaller than the time spent in analysis.

To port your module, you must understand the data structures used within your module, as well as the accesses to those data structures. It is your responsibility to fully understand the relationship between all portions of the module and private data within that module, and to use the MT STREAMS perimeters (or the synchronization primitives available) to maintain the integrity of these private data structures.

You must explicitly restrict access to private module data structures as appropriate to ensure the integrity of these data structures. You must use the MT STREAMS perimeters to restrict the concurrency in the module so that the parts of the module that modify private data are single threaded with respect to the parts of the module that read the same data. Alternatively to the perimeters, you can use the synchronization primitives available (mutex, condition variables, readers/writer, semaphore) to explicitly restrict access to module private data appropriate for the operations within the module on that data.

The first step in multithreading a module or driver is to analyze the module, breaking the entire module up into a list of individual operations and the private data structures referenced in each operation. Part of this first step is deciding upon a level of concurrency for the module. Ask yourself which of these operations can be multithreaded and which must be single threaded. Try to find a level of concurrency that is "natural" for the module and that matches one of the available perimeters (or alternatively, requires the minimal number of locks) and that has a simple and straightforward implementation. Avoid additional complexity.

It is very common to overdo multithreading that results in a very low performance module.

Typical questions to ask are:

Examples of natural levels of concurrency are:

Porting to the SunOS 5 System

When porting a STREAMS module or driver from the SunOS 4 system to the SunOS 5 system, the module should be examined with respect to the following areas:

For portability and correct operation, each module must adhere to the SunOS DDI/DKI. Several facilities available in previous releases of the SunOS system have changed and can take different arguments, or produce different side effects, or no longer exist in the SunOS 5 system. The module writer should carefully review the module with respect to the DDI/DKI.

Each module that accesses underlying Sun-specific features included in the SunOS 5 system should conform to the Device Driver Interface. The SunOS 5 DDI defines the interface used by the device driver to register device hardware interrupts, access device node properties, map device slave memory, and establish and synchronize memory mappings for DVMA (Direct Virtual Memory Access). These areas are primarily applicable to hardware device drivers. Refer to the Device Driver Interface Specification within the Writing Device Drivers for details on the 5 DDI and DVMA.

The kernel networking subsystem in the SunOS 5 system is STREAMS based. Datalink drivers that used the ifnet interface in the SunOS 4 system must be converted to use DLPI for the SunOS 5 system. Refer to the Data Link Provider Interface, Revision 2 specification.

After reviewing the module for conformance to the SunOS 5 DKI and DDI specifications, you should be able to consider the impact of multithreading on the module.

MT SAFE Modules

Your MT SAFE modules should use perimeters and avoid using module private locks. Should you opt to use module private locks, you need to read "MT SAFE Modules Using Explicit Locks" along with this section.

MT STREAMS Perimeters

For the purpose of controlling and restricting the concurrency for the synchronous entry points, the STREAMS framework defines two MT perimeters. The STREAMS framework provides the concepts of inner and outer perimeters. A module can be configured either to have no perimeters, to have only an inner or an outer perimeter, or to have both an inner and outer perimeter. For inner perimeters there are different scope perimeters to choose from. Unrestricted concurrency can be obtained by configuring no perimeters.

Figure Figure 12-1 and Figure 12-2 are examples of inner perimeters. Figure 12-3 shows multiple inner perimeters inside an outer perimeter.

Figure 12-1 Inner Perimeter Spanning a Pair of Queues. (D_MPTQAIR)

Graphic

Both the inner and outer perimeters act as readers/writer locks allowing multiple readers or a single writer. Thus, each perimeter can be entered in two modes: shared (reader) or exclusive (writer). By default, all synchronous entry points enter the inner perimeter exclusively and the outer perimeter shared.

The inner and outer perimeters are entered when one of the synchronous entry points is called. The perimeters are retained until the call returns from the entry point. Thus, for example, the thread does not leave the perimeter of one module when it calls putnext to enter another module.

Figure 12-2 Inner Perimeter Spanning All queues In a Module. (D_MTPERMOD)

Graphic

When a thread is inside a perimeter and it calls putnext(9F) (or putnextctl1(9F)), the thread can "loop around" through other STREAMS modules and try to reenter a put procedure inside the original perimeter. If this reentry conflicts with the earlier entry (for example if the first entry has exclusive access at the inner perimeter), the STREAMS framework defers the reentry while preserving the order of the messages attempting to enter the perimeter. Thus, putnext(9F) returns without the message having been passed to the put procedure and the framework passes the message to the put procedure when it is possible to enter the perimeters.

The optional outer perimeter spans all queues in a module is illustrated in Figure 12-3.

Figure 12-3 Outer Perimeter Spanning All Queues With Inner Perimeters Spanning Each Pair (D_MTOUTPERIM Combined With D_MTQPAIR)

Graphic

Perimeter options

Several flags are used to specify the perimeters. These flags fall into three categories:

The inner perimeter is controlled by these mutually exclusive flags:

The presence of the outer perimeter is configured using:

Recall that by default all synchronous entry points enter the inner perimeter exclusively and enter the outer perimeter shared. This behavior can be modified in two ways:

MT Configuration

To configure the driver as being MT SAFE, the cb_ops(9S) and dev_ops(9S) data structures must be initialized. This code must be in the header section of your module. For more information, see Example 12-1, and dev_ops(9S).

The driver is configured to be MT SAFE by setting the cb_flag field to D_MP. It also configures any MT STREAMS perimeters by setting flags in the cb_flag field. (See mt-streams(9F). The corresponding configuration for a module is done using the f_flag field in fmodsw(9S).

qprocson(9F)/qprocsoff(9F)

The routines qprocson(9F) and qprocsoff(9F)) respectively enable and disable the put and service procedures of the queue pair. Before calling qprocson(9F), and after calling qprocsoff(9F)), the module's put and service procedures are disabled; messages flow around the module as if it were not present in the Stream.

qprocson(9F) must be called by the first open of a module, but only after allocation and initialization of any module resources on which the put and service procedures depend. The qprocsoff routine must be called by the close routine of the module before deallocating any resources on which the put and service procedures depend.

To avoid deadlocks, modules must not hold private locks across the calls to qprocson(9F) or qprocsoff(9F).

qtimeout(9F)/qunbufcall(9F)

The timeout(9F) and bufcall(9F) callbacks are asynchronous. For a module using MT STREAMS perimeters, the timeout(9F) and bufcall(9F) callback functions execute outside the scope of the perimeters. This makes it complex for the callbacks to synchronize with the rest of the module.

To make timeout(9F) and bufcall(9F) functionality easier to use for modules with perimeters, there are additional interfaces that use synchronous callbacks. These routines are qtimeout(9F), quntimeout(9F), qbufcall(9F), and qunbufcall(9F). When using these routines, the callback functions are executed inside the perimeters, hence with the same concurrency restrictions as the put and service procedures.

qwriter(9F)

Modules can use the qwriter(9F) function to upgrade from shared to exclusive access at a perimeter. For example, a module with an outer perimeter can use qwriter(9F) in the put procedure to upgrade to exclusive access at the outer perimeter. A module where the put procedure runs with shared access at the inner perimeter (D_MTPUTSHARED) can use qwriter(9F) in the put procedure to upgrade to exclusive access at the inner perimeter.


Note -

Note that qwriter(9F) cannot be used in the open or close procedures. If a module needs exclusive access at the outer perimeter in the open and/or close procedures, it has to specify that the outer perimeter should always be entered exclusively for open and close (using D_MTOCEXCL).


The STREAMS framework guarantees that all deferred qwriter(9F) callbacks associated with a queue have executed before the module's close routine is called for that queue.

For an example of a driver using qwriter(9F) see Example 12-2.

qwait(9F)

A module that uses perimeters and must wait in its open or close procedure for a message from another STREAMS module has to wait outside the perimeters; otherwise, the message would never be allowed to enter its put and service procedures. This is accomplished by using the qwait(9F) interface. See qwriter(9F) man page for an example.

Asynchronous Callbacks

Interrupt handlers and other asynchronous callback functions require special care by the module writer, since they can execute asynchronously to threads executing within the module open, close, put, and service procedures.

For modules using perimeters, use qtimeout(9F) and qbufcall(9F) instead of timeout(9F) and bufcall(9F). The qtimeout and qbufcall callbacks are synchronous and consequently introduce no special synchronization requirements.

Since a thread can enter the module at any time, you must ensure that the asynchronous callback function acquires the proper private locks before accessing private module data structures, and releases these locks before returning. You must cancel any outstanding registered callback routines before the data structures on which the callback routines depend are deallocated and the module closed.

Close Race Conditions

Since the callback functions are by nature asynchronous, they can be executing or about to execute at the time the module close routine is called. You must cancel all outstanding callback and interrupt conditions before deallocating those data structures or returning from the close routine.

The callback functions scheduled with timeout(9F) and bufcall(9F) are guaranteed to have been canceled by the time untimeout(9F) and unbufcall(9F) return. The same is true for qtimeout(9F) and qbufcall(9F) by the time quntimeout(9F) and qunbufcall(9F) return. You must also take responsibility for other asynchronous routines, including esballoc(9F) callbacks and hardware, as well as software interrupts.

Module Unloading and esballoc(9F)

The STREAMS framework prevents a module or driver text from being unloaded while there are open instances of the module or driver. If a module does not cancel all callbacks in the last close routine, it has to refuse to be unloaded.

This is an issue mainly for modules and drivers using esballoc since esballoc callbacks cannot be canceled. Thus modules and drivers using esballoc have to be prepared to handle calls to the esballoc callback free function after the last instance of the module or driver has been closed.

Modules and drivers can maintain a count of outstanding callbacks. They can refuse to be unloaded by having their _fini(9E) routine return EBUSY if there are outstanding callbacks.

Use of q_next

The q_next field in the queue_t structure can be referenced in open, close, put, and service procedures as well as the synchronous callback procedures (scheduled with qtimeout(9F), qbufcall(9F), and qwriter(9F)). However, the value in the q_next field should not be trusted. It is relevant to the STREAMS framework, but may not be relevant to a specific module.

All other module code, such as interrupt routines, timeout(9F) and esballoc(9F) callback routines, cannot dereference q_next. Those routines have to use the "next" version of all functions. For instance, use canputnext(9F) instead of dereferencing q_next and using canput(9F).

MT SAFE Modules Using Explicit Locks

Although the result is not reliable, you can use explicit locks either instead of perimeters or to augment the concurrency restrictions provided by the perimeters.


Caution - Caution -

Explicit locks cannot be used to preserve message ordering in a module because of the risk of reentering the module. Use MT STREAMS perimeters to preserve message ordering.


All four types of kernel synchronization primitives are available to the module writer: mutexes, readers/writer locks, semaphores, and condition variables. Since cv_wait implies a context switch, it can only be called from the module's open and close procedures, which are executed with valid process context. You must use the synchronization primitives to protect accesses and ensure the integrity of private module data structures.

Constraints When Using Locks

When adding locks in a module, it is important to observe these constraints:

The first restriction makes it hard to use module private locks to preserve message ordering. MT STREAMS perimeters is the preferred mechanism to preserve message ordering.

Preserving Message Ordering

Module private locks cannot be used to preserve message ordering, since they cannot be held across calls to putnext(9F) and the other messages that pass routines to other modules. The alternatives for preserving message ordering are:

Use perimeters since there is a performance penalty for using service procedures.

Sample Multithreaded Device Driver

Example 12-1 is a sample multithreaded, loadable, STREAMS pseudo-driver. The driver MT design is the simplest possible based on using a per module inner perimeter. Thus, only one thread can execute in the driver at any time. In addition, a quntimeout(9F) synchronous callback routine is used. The driver cancels an outstanding qtimeout(9F) by calling quntimeout(9F) in the close routine. See "Close Race Conditions".


Example 12-1 Sample Multithreaded, Loadable, STREAMS Pseudo-Driver

/*
 * Example SunOS 5 multithreaded STREAMS pseudo device driver.
 * Using a D_MTPERMOD inner perimeter.
 */

#include				<sys/types.h>
#include				<sys/errno.h>
#include				<sys/stropts.h>
#include				<sys/stream.h>
#include				<sys/strlog.h>
#include				<sys/cmn_err.h>
#include				<sys/modctl.h>
#include				<sys/kmem.h>
#include				<sys/conf.h>
#include				<sys/ksynch.h>
#include				<sys/stat.h>
#include				<sys/ddi.h>
#include				<sys/sunddi.h>

/*
 * Function prototypes.
 */
static			int xxidentify(dev_info_t *);
static			int xxattach(dev_info_t *, ddi_attach_cmd_t);
static			int xxdetach(dev_info_t *, ddi_detach_cmd_t);
static			int xxgetinfo(dev_info_t *,ddi_info_cmd_t,void *,void**);
static			int xxopen(queue_t *, dev_t *, int, int, cred_t *);
static			int xxclose(queue_t *, int, cred_t *);
static			int xxwput(queue_t *, mblk_t *);
static			int xxwsrv(queue_t *);
static 			void xxtick(caddr_t);

/*
 * Streams Declarations
 */
static struct module_info xxm_info = {
   99,            /* mi_idnum */
   "xx",        /* mi_idname */
   0,             /* mi_minpsz */
   INFPSZ,        /* mi_maxpsz */
   0,             /* mi_hiwat */
   0              /* mi_lowat */
};

static struct qinit xxrinit = {
		NULL,           /* qi_putp */
		NULL,           /* qi_srvp */
		xxopen,         /* qi_qopen */
		xxclose,        /* qi_qclose */
		NULL,           /* qi_qadmin */
		&xxm_info,      /* qi_minfo */
		NULL            /* qi_mstat */
};

static struct qinit xxwinit = {
		xxwput,         /* qi_putp */
		xxwsrv,         /* qi_srvp */
		NULL,           /* qi_qopen */
		NULL,           /* qi_qclose */
		NULL,           /* qi_qadmin */
		&xxm_info,      /* qi_minfo */
		NULL            /* qi_mstat */
};

static struct streamtab xxstrtab = {
		&xxrinit,       /* st_rdinit */
		&xxwinit,       /* st_wrinit */
		NULL,           /* st_muxrinit */
		NULL            /* st_muxwrinit */
};

/*
 * define the xx_ops structure.
 */

static 				struct cb_ops cb_xx_ops = {
		nodev,            /* cb_open */
		nodev,            /* cb_close */
		nodev,            /* cb_strategy */
		nodev,            /* cb_print */
		nodev,            /* cb_dump */
		nodev,            /* cb_read */
		nodev,            /* cb_write */
		nodev,            /* cb_ioctl */
		nodev,            /* cb_devmap */
		nodev,            /* cb_mmap */
		nodev,            /* cb_segmap */
		nochpoll,         /* cb_chpoll */
		ddi_prop_op,      /* cb_prop_op */
		&xxstrtab,        /* cb_stream */
		(D_NEW|D_MP|D_MTPERMOD) /* cb_flag */
};

static struct dev_ops xx_ops = {
		DEVO_REV,         /* devo_rev */
		0,                /* devo_refcnt */
		xxgetinfo,        /* devo_getinfo */
		xxidentify,       /* devo_identify */
		nodev,            /* devo_probe */
		xxattach,         /* devo_attach */
		xxdetach,         /* devo_detach */
		nodev,            /* devo_reset */
		&cb_xx_ops,       /* devo_cb_ops */
		(struct bus_ops *)NULL /* devo_bus_ops */
};


/*
 * Module linkage information for the kernel.
 */
static struct modldrv modldrv = {
		&mod_driverops,   /* Type of module. This one is a driver */
		"xx",           /* Driver name */
		&xx_ops,          /* driver ops */
};

static struct modlinkage modlinkage = {
		MODREV_1,
		&modldrv,
		NULL
};

/*
 * Driver private data structure. One is allocated per Stream.
 */
struct xxstr {
		struct		xxstr *xx_next;	/* pointer to next in list */
		queue_t		*xx_rq;				/* read side queue pointer */
		minor_t		xx_minor;			/* minor device # (for clone) */
		int			xx_timeoutid;		/* id returned from timeout() */
};

/*
 * Linked list of opened Stream xxstr structures.
 * No need for locks protecting it since the whole module is
 * single threaded using the D_MTPERMOD perimeter.
 */
static struct xxstr						*xxup = NULL;


/*
 * Module Config entry points
 */

_init(void)
{
		return (mod_install(&modlinkage));
}

_fini(void)
{
		return (mod_remove(&modlinkage));
}

_info(struct modinfo *modinfop)
{
		return (mod_info(&modlinkage, modinfop));
}

/*
 * Auto Configuration entry points
 */

/* Identify device. */
static int
xxidentify(dev_info_t *dip)
{
		if (strcmp(ddi_get_name(dip), "xx") == 0)
			return (DDI_IDENTIFIED);
		else
			return (DDI_NOT_IDENTIFIED);
}

/* Attach device. */
static int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
		/* This creates the device node. */
		if (ddi_create_minor_node(dip, "xx", S_IFCHR, ddi_get_instance(dip), 
				DDI_PSEUDO, CLONE_DEV) == DDI_FAILURE) {
			return (DDI_FAILURE);
		}
		ddi_report_dev(dip);
		return (DDI_SUCCESS);
}

/* Detach device. */
static int
xxdetach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
		ddi_remove_minor_node(dip, NULL);
		return (DDI_SUCCESS);
}

/* ARGSUSED */
static int
xxgetinfo(dev_info_t *dip, ddi_info_cmd_t infocmd, void *arg,	void **resultp)
{
		dev_t dev = (dev_t) arg;
		int instance, ret = DDI_FAILURE;

		devstate_t *sp;
		state *statep;
		instance = getminor(dev);

		switch (infocmd) {
			case DDI_INFO_DEVT2DEVINFO:
				if ((sp = ddi_get_soft_state(statep, 
						getminor((dev_t) arg))) != NULL) {
					*resultp = sp->devi;
					ret = DDI_SUCCESS;
				} else
					*result = NULL;
				break;

			case DDI_INFO_DEVT2INSTANCE:
				*resultp = (void *)instance;
				ret = DDI_SUCCESS;
				break;

			default:
				break;
		}
		return (ret);
}

static
xxopen(rq, devp, flag, sflag, credp)
		queue_t			*rq;
		dev_t				*devp;
		int				flag;
		int				sflag;
		cred_t			*credp;
{
		struct xxstr *xxp;
		struct xxstr **prevxxp;
		minor_t 			minordev;

		/* If this Stream already open - we're done. */
		if (rq->q_ptr)
			return (0);

		/* Determine minor device number. */
		prevxxp = & xxup;
		if (sflag == CLONEOPEN) {
			minordev = 0;
			while ((xxp = *prevxxp) != NULL) {
				if (minordev < xxp->xx_minor)
					break;
				minordev++;
				prevxxp = &xxp->xx_next;
			}
			*devp = makedevice(getmajor(*devp), minordev)
		} else
			minordev = getminor(*devp);

		/* Allocate our private per-Stream data structure. */
		if ((xxp = kmem_alloc(sizeof (struct xxstr), KM_SLEEP)) == NULL)
			return (ENOMEM);

		/* Point q_ptr at it. */
		rq->q_ptr = WR(rq)->q_ptr = (char *) xxp;

		/* Initialize it. */
		xxp->xx_minor = minordev;
		xxp->xx_timeoutid = 0;
		xxp->xx_rq = rq;

		/* Link new entry into the list of active entries. */
		xxp->xx_next = *prevxxp;
		*prevxxp = xxp;

		/* Enable xxput() and xxsrv() procedures on this queue. */
		qprocson(rq);

		return (0);
}

static
xxclose(rq, flag, credp)
		queue_t			*rq;
		int				flag;
		cred_t			*credp;

{
		struct		xxstr		*xxp;
		struct		xxstr		**prevxxp;

		/* Disable xxput() and xxsrv() procedures on this queue. */
		qprocsoff(rq);
		/* Cancel any pending timeout. */
		 xxp = (struct xxstr *) rq->q_ptr;
		 if (xxp->xx_timeoutid != 0) {
	 		 (void) quntimeout(rq, xxp->xx_timeoutid);
	 		 xxp->xx_timeoutid = 0;
		 }
		/* Unlink per-Stream entry from the active list and free it. */
		for (prevxxp = &xxup; (xxp = *prevxxp) != NULL; prevxxp = &xxp->xx_next)
			if (xxp == (struct xxstr *) rq->q_ptr)
				break;
		*prevxxp = xxp->xx_next;
		kmem_free (xxp, sizeof (struct xxstr));

		rq->q_ptr = WR(rq)->q_ptr = NULL;

		return (0);
}

static
xxwput(wq, mp)
		queue_t		*wq;
		mblk_t		*mp;
{
		struct xxstr	*xxp = (struct xxstr *)wq->q_ptr;

		/* do stuff here */
		freemsg(mp);
		mp = NULL;

		if (mp != NULL)
			putnext(wq, mp);
}

static
xxwsrv(wq)
		queue_t		*wq;
{
		mblk_t		*mp;
		struct xxstr	*xxp;

		xxp = (struct xxstr *) wq->q_ptr;

		while (mp = getq(wq)) {
			/* do stuff here */
			freemsg(mp);

			/* for example, start a timeout */
			if (xxp->xx_timeoutid != 0) {
				/* cancel running timeout */
				(void) quntimeout(wq, xxp->xx_timeoutid);
			}
			xxp->xx_timeoutid = qtimeout(wq, xxtick, (char *)xxp, 10);
		}
}

static void
xxtick(arg)
		caddr_t arg;
{
		struct xxstr *xxp = (struct xxstr *)arg;

		xxp->xx_timeoutid = 0;      /* timeout has run */
		/* do stuff */

}

Sample Multithreaded Module with Outer Perimeter

Example 12-2 is a sample multithreaded, loadable STREAMS module. The module MT design is a relatively simple one based on a per queue-pair inner perimeter plus an outer perimeter. The inner perimeter protects per-instance data structure (accessed through the q_ptr field) and the module global data is protected by the outer perimeter. The outer perimeter is configured so that the open and close routines have exclusive access to the outer perimeter. This is necessary since they both modify the global linked list of instances. Other routines that modify global data are run as qwriter(9F) callbacks, giving them exclusive access to the whole module.


Example 12-2 Multithread Module with Outer Perimeter

/*
 * Example SunOS 5 multi-threaded STREAMS module.
 * Using a per-queue-pair inner perimeter plus an outer perimeter.
 */

#include				<sys/types.h>
#include				<sys/errno.h>
#include				<sys/stropts.h>
#include				<sys/stream.h>
#include				<sys/strlog.h>
#include				<sys/cmn_err.h>
#include				<sys/kmem.h>
#include				<sys/conf.h>
#include				<sys/ksynch.h>
#include				<sys/modctl.h>
#include				<sys/stat.h>
#include				<sys/ddi.h>
#include				<sys/sunddi.h>

/*
 * Function prototypes.
 */
static			int xxopen(queue_t *, dev_t *, int, int, cred_t *);
static			int xxclose(queue_t *, int, cred_t *);
static			int xxwput(queue_t *, mblk_t *);
static			int xxwsrv(queue_t *);
static			void xxwput_ioctl(queue_t *, mblk_t *);
static			int xxrput(queue_t *, mblk_t *);
static 			void xxtick(caddr_t);

/*
 * Streams Declarations
 */
static struct module_info xxm_info = {
   99,            /* mi_idnum */
   "xx",        /* mi_idname */
   0,             /* mi_minpsz */
   INFPSZ,        /* mi_maxpsz */
   0,             /* mi_hiwat */
   0              /* mi_lowat */
};
/*
 * Define the read side qinit structure
 */
static struct qinit xxrinit = {
		xxrput,         /* qi_putp */
		NULL,           /* qi_srvp */
		xxopen,         /* qi_qopen */
		xxclose,        /* qi_qclose */
		NULL,           /* qi_qadmin */
		&xxm_info,      /* qi_minfo */
		NULL            /* qi_mstat */
};
/*
 * Define the write side qinit structure
 */
static struct qinit xxwinit = {
		xxwput,         /* qi_putp */
		xxwsr,          /* qi_srvp */
		NULL,           /* qi_qopen */
		NULL,           /* qi_qclose */
		NULL,           /* qi_qadmin */
		&xxm_info,      /* qi_minfo */
		NULL            /* qi_mstat */
};

static struct streamtab xxstrtab = {
		&xxrini,        /* st_rdinit */
		&xxwini,        /* st_wrinit */
		NULL,           /* st_muxrinit */
		NULL            /* st_muxwrinit */
};

/*
 * define the fmodsw structure.
 */

static struct fmodsw xx_fsw = {
		"xx",         /* f_name */
		&xxstrtab,      /* f_str */
		(D_NEW|D_MP|D_MTQPAIR|D_MTOUTPERIM|D_MTOCEXCL) /* f_flag */
};

/*
 * Module linkage information for the kernel.
 */
static struct modlstrmod modlstrmod = {
		&mod_strmodops,	/* Type of module; a STREAMS module */
		"xx module",		/* Module name */
		&xx_fsw,				/* fmodsw */
};

static struct modlinkage modlinkage = {
		MODREV_1,
		&modlstrmod,
		NULL
};

/*
 * Module private data structure. One is allocated per Stream.
 */
struct xxstr {
		struct		xxstr *xx_next;	/* pointer to next in list */
		queue_t		*xx_rq;				/* read side queue pointer */
		int			xx_timeoutid;		/* id returned from timeout() */
};

/*
 * Linked list of opened Stream xxstr structures and other module
 * global data. Protected by the outer perimeter.
 */
static struct xxstr						*xxup = NULL;
static int some_module_global_data;


/*
 * Module Config entry points
 */
int
_init(void)
{
		return (mod_install(&modlinkage));
}
int
_fini(void)
{
		return (mod_remove(&modlinkage));
}
int
_info(struct modinfo *modinfop)
{
		return (mod_info(&modlinkage, modinfop));
}


static int
xxopen(queue_t *rq,dev_t *devp,int flag,int sflag, cred_t *credp)
{
		struct xxstr *xxp;
		/* If this Stream already open - we're done. */
		if (rq->q_ptr)
			return (0);
		/* We must be a module */
		if (sflag != MODOPEN)
			return (EINVAL);

		/*
		 * The perimeter flag D_MTOCEXCL implies that the open and
		 * close routines have exclusive access to the module global
		 * data structures.
		 *
		 * Allocate our private per-Stream data structure.
		 */
	 	xxp = kmem_alloc(sizeof (struct xxstr),KM_SLEEP);

		/* Point q_ptr at it. */
		rq->q_ptr = WR(rq)->q_ptr = (char *) xxp;

		/* Initialize it. */
		xxp->xx_rq = rq;
		xxp->xx_timeoutid = 0;

		/* Link new entry into the list of active entries. */
		xxp->xx_next = xxup;
		xxup = xxp;

		/* Enable xxput() and xxsrv() procedures on this queue. */
		qprocson(rq);
		/* Return success */
		return (0);
}

static int
xxclose(queue_t,*rq, int flag,cred_t *credp)
{
		struct			xxstr				*xxp;
		struct			xxstr				**prevxxp;

		/* Disable xxput() and xxsrv() procedures on this queue. */
		qprocsoff(rq);
		/* Cancel any pending timeout. */
	 	xxp = (struct xxstr *) rq->q_ptr;
	 	if (xxp->xx_timeoutid != 0) {
	 		(void) quntimeout(WR(rq), xxp->xx_timeoutid);
	 	 	xxp->xx_timeoutid = 0;
	 	}
		/*
		 * D_MTOCEXCL implies that the open and close routines have
		 * exclusive access to the module global data structures.
		 *
		 * Unlink per-Stream entry from the active list and free it.
		 */
		for (prevxxp = &xxup; (xxp = *prevxxp) != NULL; prevxxp = &xxp->xx_next) {
			if (xxp == (struct xxstr *) rq->q_ptr)
				break;
		}
		*prevxxp = xxp->xx_next;
		kmem_free (xxp, sizeof (struct xxstr));
		rq->q_ptr = WR(rq)->q_ptr = NULL;
		return (0);
}

static int
xxrput(queue_t, *wq, mblk_t *mp)
{
		struct xxstr	*xxp = (struct xxstr *)wq->q_ptr;
	
		/*
		 * Do stuff here. Can read "some_module_global_data" since we
		 * have shared access at the outer perimeter.
		 */
		putnext(wq, mp);
}

/* qwriter callback function for handling M_IOCTL messages */
static void
xxwput_ioctl(queue_t, *wq, mblk_t *mp)
{
		struct xxstr				*xxp = (struct xxstr *)wq->q_ptr;

		/*
		 * Do stuff here. Can modify "some_module_global_data" since
		 * we have exclusive access at the outer perimeter.
		 */
		mp->b_datap->db_type = M_IOCNAK;
		qreply(wq, mp);
}

static
xxwput(queue_t *wq, mblk_t *mp)
{
		struct xxstr				*xxp = (struct xxstr *)wq->q_ptr;

		if (mp->b_datap->db_type == M_IOCTL) {
			/* M_IOCTL will modify the module global data */
			qwriter(wq, mp, xxwput_ioctl, PERIM_OUTER);
			return;
		}
		/*
		 * Do stuff here. Can read "some_module_global_data" since
		 * we have exclusive access at the outer perimeter.
		 */
		putnext(wq, mp);
}

static
xxwsrv(queue_t wq)
{
		mblk_t			*mp;
		struct xxstr	*xxp= (struct xxstr *) wq->q_ptr;

		while (mp = getq(wq)) {
		/*
		 * Do stuff here. Can read "some_module_global_data" since
		 * we have exclusive access at the outer perimeter.
		 */
			freemsg(mp);

			/* for example, start a timeout */
			if (xxp->xx_timeoutid != 0) {
				/* cancel running timeout */
				(void) quntimeout(wq, xxp->xx_timeoutid);
			}
			xxp->xx_timeoutid = qtimeout(wq, xxtick, (char *)xxp, 10);
		}
}

static void
xxtick(arg)
		caddr_t arg;
{
		struct xxstr *xxp = (struct xxstr *)arg;

		xxp->xx_timeoutid = 0;      /* timeout has run */
		/*
		 * Do stuff here. Can read "some_module_global_data" since we
		 * have shared access at the outer perimeter.
		 */
}

Chapter 13 Multiplexing

Overview of Multiplexing

This chapter describes how STREAMS multiplexing configurations are created and also discusses multiplexing drivers. A STREAMS multiplexer is a driver with multiple Streams connected to it. The primary function of the multiplexing driver is to switch messages among the connected Streams. Multiplexer configurations are created from the user level by system calls.

STREAMS-related system calls are used to set up the "plumbing," or Stream interconnections, for multiplexing drivers. The subset of these calls that allows a user to connect (and disconnect) Streams below a driver is referred to as the multiplexing facility. This type of connection is referred to as a one-to-M, or lower, multiplexer configuration. This configuration must always contain a multiplexing driver, which is recognized by STREAMS as having special characteristics.

Multiple Streams can be connected above a driver by open(2) calls. This accomodates the loop-around driver and for the driver handling multiple minor devices in Chapter 9, STREAMS Drivers. There is no difference between the connections to these drivers. Only the functions performed by the driver are different. In the multiplexing case, the driver routes data between multiple Streams. In the device driver case, the driver routes data between user processes and associated physical ports. Multiplexing with Streams connected above is referred to as an N-to-1, or upper, multiplexer. STREAMS does not provide any facilities beyond open and close to connect or disconnect upper Streams for multiplexing.

From the driver's perspective, upper and lower configurations differ only in the way they are initially connected to the driver. The implementation requirements are the same: route the data and handle flow control. All multiplexer drivers require special developer-provided software to perform the multiplexing data routing and to handle flow control. STREAMS does not directly support flow control among multiplexed Streams. M-to-N multiplexing configurations are implemented by using both of these mechanisms in a driver.

As discussed in Chapter 9, STREAMS Drivers, the multiple Streams that represent minor devices are actually distinct Streams in which the driver keeps track of each Stream attached to it. The STREAMS subsystem does not recognize any relationship between the Streams. The same is true for STREAMS multiplexers of any configuration. The multiplexed Streams are distinct and the driver must be implemented to do most of the work.

In addition to upper and lower multiplexers, more complex configurations can be created by connecting Streams containing multiplexers to other multiplexer drivers. With such a diversity of needs for multiplexers, it is not possible to provide general-purpose multiplexer drivers. Rather, STREAMS provides a general purpose multiplexing facility. The facility allows users to set up the intermodule or driver plumbing to create multiplexer configurations of generally unlimited interconnection.

Building a Multiplexer

This section builds a protocol multiplexer with the multiplexing configuration shown in Figure 13-1. To free users from the need to know about the underlying protocol structure, a user-level daemon process is built to maintain the multiplexing configuration. Users can then access the transport protocol directly by opening the transport protocol (TP) driver device node.

An internetworking protocol driver (IP) routes data from a single upper Stream to one of two lower Streams. This driver supports two STREAMS connections beneath it. These connections are to two distinct networks; one for the IEEE 802.3 standard through the 802.3 driver, and another to the IEEE 802.4 standard through the 802.4 driver. The TP driver multiplexes upper Streams over a single Stream to the IP driver.

Figure 13-1 Protocol Multiplexer

Graphic

Example 13-1 shows how this daemon process sets up the protocol multiplexer. The necessary declarations and initialization for the daemon program follow.


Example 13-1 Protocol Daemon

#include <fcntl.h>
#include <stropts.h>
void
main()
{
		int	fd_802_4,
				fd_802_3,
				fd_ip,
				fd_tp;
		/* daemon-ize this process */

		switch (fork()) {
			case 0:
				break;
			case -1:
				perror("fork failed");
				exit(2);
			default:
				exit(0);
	}
	(void)setsid();

This multilevel multiplexed Stream configuration is built from the bottom up. So, the example begins by first constructing the IP multiplexer. This multiplexing device driver is treated like any other software driver. It owns a node in the Solaris file system and is opened just like any other STREAMS device driver.

The first step is to open the multiplexing driver and the 802.4 driver, thus creating separate Streams above each driver as shown inFigure 13-2. The Stream to the 802.4 driver may now be connected below the multiplexing IP driver using the I_LINK ioctl(2).

Figure 13-2 Before Link

Graphic

The sequence of instructions to this point is:


	if ((fd_802_4 = open("/dev/802_4", O_RDWR)) < 0) {
			perror("open of /dev/802_4 failed");
			exit(1);
		}
		if ((fd_ip = open("/dev/ip", O_RDWR)) < 0) {
			perror("open of /dev/ip failed");
			exit(2);
		}
		/* now link 802.4 to underside of IP */
		if (ioctl(fd_ip, I_LINK, fd_802_4) < 0) {
			perror("I_LINK ioctl failed");
			exit(3);
		}

I_LINK takes two file descriptors as arguments. The first file descriptor, fd_ip, is the Stream connected to the multiplexing driver, and the second file descriptor, fd_802_4, is the Stream to be connected below the multiplexer. The complete Stream to the 802.4 driver is connected below the IP driver. The Stream head's queues of the 802.4 driver are used by the IP driver to manage the lower half of the multiplexer.

I_LINK returns an integer value, muxid, which is used by the multiplexing driver to identify the Stream just connected below it. muxid is ignored in the example, but it is useful for dismantling a multiplexer or routing data through the multiplexer. Its significance is discussed later.

The following sequence of system calls continues building the Internetworking Protocol multiplexer (IP):


	if ((fd_802_3 = open("/dev/802_3", O_RDWR)) < 0) {
			perror("open of /dev/802_3 failed");
			exit(4);
		}
		if (ioctl(fd_ip, I_LINK, fd_802_3) < 0) {
			perror("I_LINK ioctl failed");
			exit(5);
		}

The Stream above the multiplexing driver used to establish the lower connections is the controlling Stream and has special significance when dismantling the multiplexing configuration. This is illustrated later in this chapter. The Stream referenced by fd_ip is the controlling Stream for the IP multiplexer.

The order in which the Streams in the multiplexing configuration are opened is unimportant. If it is necessary to have intermediate modules in the Stream between the IP driver and media drivers, these modules must be added to the Streams associated with the media drivers (using I_PUSH) before the media drivers are attached below the multiplexer.

The number of Streams that can be linked to a multiplexer is restricted by the design of the particular multiplexer. The manual page describing each driver (see SunOS Reference Manual, Intro(7)) describes such restrictions. However, only one I_LINK operation is allowed for each lower Stream; a single Stream cannot be linked below two multiplexers simultaneously.

Continuing with the example, the IP driver is now linked below the transport protocol (TP) multiplexing driver. As seen in Figure 13-1, only one link is supported below the transport driver. This link is formed by the following sequence of system calls:


	if ((fd_tp = open("/dev/tp", O_RDWR)) < 0) {
			perror("open of /dev/tp failed");
			exit(6);
		}
		if (ioctl(fd_tp, I_LINK, fd_ip) < 0) {
			perror("I_LINK ioctl failed");
			exit(7);
		}

Because the controlling Stream of the IP multiplexer has been linked below the TP multiplexer, the controlling Stream for the new multilevel multiplexer configuration is the Stream above the TP multiplexer.

At this point the file descriptors associated with the lower drivers can be closed without affecting the operation of the multiplexer. If these file descriptors are not closed, all subsequent read(2), write(2), ioctl(2), poll(2), getmsg(2), and putmsg(2) calls issued to them fail. That is because I_LINK associates the Stream head of each linked Stream with the multiplexer, so the user may not access that Stream directly for the duration of the link.

The following sequence of system calls completes the daemon example:


	close(fd_802_4);
		close(fd_802_3);
		close(fd_ip);
		/* Hold multiplexer open forever or at least til this process
      is terminated by an external UNIX signal */
		pause();
	}

The transport driver supports several simultaneous Streams. These Streams are multiplexed over the single Stream connected to the IP multiplexer. The mechanism for establishing multiple Streams above the transport multiplexer is actually a by-product of the way in which Streams are created between a user process and a driver. By opening different minor devices of a STREAMS driver, separate Streams will be connected to that driver. The driver must be designed with the intelligence to route data from the single lower Stream to the appropriate upper Stream.

The daemon process maintains the multiplexed Stream configuration through an open Stream (the controlling Stream) to the transport driver. Meanwhile, other users can access the services of the transport protocol by opening new Streams to the transport driver; they are freed from the need for any unnecessary knowledge of the underlying protocol configurations and subnetworks that support the transport service.

Multilevel multiplexing configurations should be assembled from the bottom up. That is because the passing of ioctl(2) through the multiplexer is determined by the nature of the multiplexing driver and cannot generally be relied on.

Dismantling a Multiplexer

Streams connected to a multiplexing driver from above with open(2), can be dismantled by closing each Stream with close(2). The mechanism for dismantling Streams that have been linked below a multiplexing driver is less obvious, and is described in the following section.

I_UNLINK ioctl(2) disconnects each multiplexer link below a multiplexing driver individually. This command has the form:


ioctl(fd, I_UNLINK, muxid);

where fd is a file descriptor associated with a Stream connected to the multiplexing driver from above, and muxid is the identifier that was returned by I_LINK when a driver was linked below the multiplexer. Each lower driver may be disconnected individually in this way, or a special muxid value of MUXID_ALL can be used to disconnect all drivers from the multiplexer simultaneously.

In the multiplexing daemon program, the multiplexer is never explicitly dismantled. That is because all links associated with a multiplexing driver are automatically dismantled when the controlling Stream associated with that multiplexer is closed. Because the controlling Stream is open to a driver, only the final call of close for that Stream will close it. In this case, the daemon is the only process that has opened the controlling Stream, so the multiplexing configuration will be dismantled when the daemon exits.

For the automatic dismantling mechanism to work in the multilevel, multiplexed Stream configuration, the controlling Stream for each multiplexer at each level must be linked under the next higher-level multiplexer. In the example, the controlling Stream for the IP driver was linked under the TP driver. This resulted in a single controlling Stream for the full, multilevel configuration. Because the multiplexing program relied on closing the controlling Stream to dismantle the multiplexed Stream configuration instead of using explicit I_UNLINK calls, the muxid values returned by I_LINK could be ignored.

An important side effect of automatic dismantling on the close is that a process cannot build a multiplexing configuration with I_LINK and then exit. That is because exit(2) closes all files associated with the process, including the controlling Stream. To keep the configuration intact, the process must exist for the life of that multiplexer. That is the motivation for implementing the example as a daemon process.

If the process uses persistent links through I_PLINK ioctl(2), the multiplexer configuration remains intact after the process exits. "Persistent Links" are described later in this chapter.

Routing Data Through a Multiplexer

As demonstrated, STREAMS provides a mechanism for building multiplexed Stream configurations. However, the criteria by which a multiplexer routes data is driver dependent. For example, the protocol multiplexer might use address information found in a protocol header to determine over which subnetwork data should be routed. You must define its routing criteria.

One routing option available to the multiplexer is to use the muxid value to determine the Stream to which data is routed (remember that each multiplexer link has a muxid). I_LINK passes the muxid value to the driver and returns this value to the user. The driver can therefore specify that the muxid value accompany data routed through it. For example, if a multiplexer routed data from a single upper Stream to one of several lower Streams (as did the IP driver), the multiplexer can require the user to insert the muxid of the desired lower Stream into the first four bytes of each message passed to it. The driver can then match the muxid in each message with the muxid of each lower Stream, and route the data accordingly.

Connecting And Disconnecting Lower Streams

Multiple Streams are created above a driver/multiplexer by use of the open system call on either different minor device, or on a cloneable device file. Note that any driver that handles more than one minor device is considered an upper multiplexer.

To connect Streams below a multiplexer requires additional software in the multiplexer. The main difference between STREAMS lower multiplexers and STREAMS device drivers is that multiplexers are pseudo-devices and multiplexers have two additional qinit structures, pointed to by fields in streamtab(9S): the lower half read-side qinit(9S) and the lower half write-side qinit(9S).

The multiplexer is divided into two parts: the lower half (bottom) and the upper half (top). The multiplexer queue structures allocated when the multiplexer was opened use the usual qinit entries from the multiplexer's streamtab(9S). This is the same as any open of the STREAMS device. When a lower Stream is linked beneath the multiplexer, the qinit structures at the Stream head are substituted by the bottom half qinit(9S) structures of the multiplexers. Once the linkage is made, the multiplexer switches messages between upper and lower Streams. When messages reach the top of the lower Stream, they are handled by put and service routines specified in the bottom half of the multiplexer.

Connecting Lower Streams

A lower multiplexer is connected as follows: the initial open to a multiplexing driver creates a Stream, as in any other driver. open uses the first two streamtab structure entries to create the driver queues. At this point, the only distinguishing characteristics of this Stream are non-NULL entries in the streamtab(9S) st_muxrinit and st_muxwinit fields.

These fields are ignored by open. Any other Stream subsequently opened to this driver will have the same streamtab and thereby the same mux fields.

Next, another file is opened to create a (soon-to-be) lower Stream. The driver for the lower Stream is typically a device driver This Stream has no distinguishing characteristics. It can include any driver compatible with the multiplexer. Any modules required on the lower Stream must be pushed onto it now.

Next, this lower Stream is connected below the multiplexing driver with an I_LINK ioctl(2) (see streamio(7I)). The Stream head points to the Stream head routines as its procedures (through its queue). An I_LINK to the upper Stream, referencing the lower Stream, causes STREAMS to modify the contents of the Stream-head's queues in the lower Stream. The pointers to the Stream-head routines, and other values, in the Stream-head's queues are replaced with those contained in the mux fields of the multiplexing driver's streamtab. Changing the Stream-head routines on the lower Stream means that all subsequent messages sent upstream by the lower Stream's driver are passed to the put procedure designated in st_muxrinit, the multiplexing driver. The I_LINK also establishes this upper Stream as the control Stream for this lower Stream. STREAMS remembers the relationship between these two Streams until the upper Stream is closed or the lower Stream is unlinked.

Finally, the Stream head sends an M_IOCTL message with ioc_cmd set to I_LINK to the multiplexing driver. The M_DATA part of the M_IOCTL contains a linkblk(9S) structure. The multiplexing driver stores information from the linkblk(9S) structure in private storage and returns an M_IOCACK acknowledgment. l_index is returned to the process requesting the I_LINK. This value is used later by the process to disconnect the Stream.

An I_LINK is required for each lower Stream connected to the driver. Additional upper Streams can be connected to the multiplexing driver by open calls. Any message type can be sent from a lower Stream to user processes along any of the upper Streams. The upper Streams provide the only interface between the user processes and the multiplexer.

No direct data structure linkage is established for the linked Streams. The read queue's q_next is NULL and the write queue's q_next points to the first entity on the lower Stream. Messages flowing upstream from a lower driver (a device driver or another multiplexer) will enter the multiplexing driver put procedure with l_qbot as the queue value. The multiplexing driver has to route the messages to the appropriate upper (or lower) Stream. Similarly, a message coming downstream from user space on any upper Stream has to be processed and routed, if required, by the driver.

In general, multiplexing drivers should be implemented so that new Streams can be dynamically connected to (and existing Streams disconnected from) the driver without interfering with its ongoing operation. The number of Streams that can be connected to a multiplexer is implementation dependent.

Disconnecting Lower Streams

Dismantling a lower multiplexer is accomplished by disconnecting (unlinking) the lower Streams. Unlinking can be initiated in three ways:

As in the link, an unlink sends a linkblk(9S) structure to the driver in an M_IOCTL message. The I_UNLINK call, which unlinks a single Stream, uses the l_index value returned in the I_LINK to specify the lower Stream to be unlinked. The latter two calls must designate a file corresponding to a control Stream, which causes all the lower Streams that were previously linked by this control Stream to be unlinked. However, the driver sees a series of individual unlinks.

If no open references exist for a lower Stream, a subsequent unlink will automatically close the Stream. Otherwise, the lower Stream must be closed by close(2) following the unlink. STREAMS automatically dismantles all cascaded multiplexers (below other multiplexing Streams) if their controlling Stream is closed. An I_UNLINK leaves lower, cascaded multiplexing Streams intact unless the Stream file descriptor was previously closed.

Multiplexer Construction Example

This section describes an example of multiplexer construction and usage. Multiple upper and lower Streams interface to the multiplexer driver.

The Ethernet, LAPB, and IEEE 802.2 device drivers terminate links to other nodes. The multiplexer driver is an Internet Protocol (IP) multiplexer that switches data among the various nodes or sends data upstream to users in the system. The net modules typically provide a convergence function that matches the multiplexer driver and device driver interface.

Streams A, B, and C are opened by the process, and modules are pushed as needed. Two upper Streams are opened to the IP multiplexer. The rightmost Stream represents multiple Streams, each connected to a process using the network. The Stream second from the right provides a direct path to the multiplexer for supervisory functions. It is the control Stream, leading to a process, that sets up and supervises this configuration. It is always directly connected to the IP driver. Although not shown, modules can be pushed on the control Stream.

After the Streams are opened, the supervisory process typically transfers routing information to the IP drivers (and any other multiplexers above the IP), and initializes the links. As each link becomes operational, its Stream is connected below the IP driver. If a more complex multiplexing configuration is required, the IP multiplexer Stream with all its connected links can be connected below another multiplexer driver.

Multiplexing Driver

This section contains an example of a multiplexing driver that implements an N-to-1 configuration. This configuration might be used for terminal windows, where each transmission to or from the terminal identifies the window. This resembles a typical device driver, with two differences: the device handling functions are performed by a separate driver, connected as a lower Stream, and the device information (that is, relevant user process) is contained in the input data rather than in an interrupt call.

Each upper Stream is created by open(2). A single lower Stream is opened and then it is linked by use of the multiplexing facility. This lower Stream might connect to the TTY driver. The implementation of this example is a foundation for an M-to-N multiplexer.

As in the loop-around driver (Chapter 9, STREAMS Drivers), flow control requires the use of standard and special code, since physical connectivity among the Streams is broken at the driver. Different approaches are used for flow control on the lower Stream, for messages coming upstream from the device driver, and on the upper Streams, for messages coming downstream from the user processes.


Note -

The code presented here for the multiplexing driver represents a single threaded, uniprocessor implementation. See Chapter 12, MultiThreaded STREAMS for details on multiprocessor and multithreading issues such as locking for data corruption and to prevent race conditions.


Example 13-2 is of multiplexer declarations:


Example 13-2 Multiplexer Declarations

#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
#include <sys/stropts.h>
#include <sys/errno.h>
#include <sys/cred.h>
#include <sys/ddi.h>
#include <sys/sunddi.h>

static int muxopen (queue_t*, dev_t*, int, int, cred_t*);
static int muxclose (queue_t*, int, cred_t*);
static int muxuwput (queue_t*, mblk_t*);
static int muxlwsrv (queue_t*);
static int muxlrput (queue_t*, mblk_t*);
static int muxuwsrv (queue_t*);

static struct module_info info = {
	0xaabb, "mux", 0, INFPSZ, 512, 128 };

static struct qinit urinit = {												/* upper read */
	NULL, NULL, muxopen, muxclose, NULL, &info, NULL };

static struct qinit uwinit = {												/* upper write */
	muxuwput, muxuwsrv, NULL, NULL, NULL, &info, NULL };

static struct qinit lrinit = { /* lower read */
	muxlrput, NULL, NULL, NULL, NULL, &info, NULL };

static struct qinit lwinit = { /* lower write */
	NULL, muxlwsrv, NULL, NULL, NULL, &info, NULL };

struct streamtab muxinfo = {
	&urinit, &uwinit, &lrinit, &lwinit };

struct mux {
	queue_t *qptr;							/* back pointer to read queue */
	int		bufcid;					/* bufcall return value */
};
extern struct mux mux_mux[];
extern int mux_cnt;     /* max number of muxes */

static queue_t *muxbot; 									/* linked lower queue */
static int muxerr;									/* set if error of hangup on
lower strm */

The four streamtab entries correspond to the upper read, upper write, lower read, and lower write qinit structures. The multiplexing qinit structures replace those in each (in this case there is only one) lower Stream head after the I_LINK has concluded successfully. In a multiplexing configuration, the processing performed by the multiplexing driver can be partitioned between the upper and lower queues. There must be an upper-Stream write put procedure and lower-Stream read put procedure. If the queue procedures of the opposite upper/lower queue are not needed, the queue can be skipped, and the message put to the following queue.

In the example, the upper read-side procedures are not used. The lower-Stream read queue put procedure transfers the message directly to the read queue upstream from the multiplexer. There is no lower write put procedure because the upper write put procedure directly feeds the lower write queue downstream from the multiplexer.

The driver uses a private data structure, mux. mux_mux[dev] points back to the opened upper read queue. This is used to route messages coming upstream from the driver to the appropriate upper queue. It is also used to find a free major or minor device for a CLONEOPEN driver open case.

Example 13-3, the upper queue open, contains the canonical driver open code:


Example 13-3 Upper Queue Open

static int
muxopen(queue_t *q, dev_t *devp, int flag,
				 int sflag, cred_t *credp)
{
	struct mux *mux;
	minor_t device;

	if (q->q_ptr)
			return(EBUSY);

	if (sflag == CLONEOPEN) {
			for (device = 0; device < mux_cnt; device++)
				if (mux_mux[device].qptr == 0)
						break;

			*devp=makedevice(getmajor(*devp), device);
	}
	else {
			device = getminor(*devp);
			if (device >= mux_cnt)
				return ENXIO;
	}

	mux = &mux_mux[device];
	mux->qptr = q;
	q->q_ptr = (char *) mux;
	WR(q)->q_ptr = (char *) mux;
	qprocson(q);
	return (0);
}

muxopen checks for a clone or ordinary open call. It initializes q_ptr to point at the mux_mux[] structure.

The core multiplexer processing is the following: downstream data written to an upper Stream is queued on the corresponding upper write message queue if the lower Stream is flow controlled. This allows flow control to propagate toward the Stream head for each upper Stream. A lower write service procedure, rather than a write put procedure, is used so that flow control, coming up from the driver below, may be handled.

On the lower read side, data coming up the lower Stream are passed to the lower read put procedure. The procedure routes the data to an upper Stream based on the first byte of the message. This byte holds the minor device number of an upper Stream. The put procedure handles flow control by testing the upper Stream at the first upper read queue beyond the driver.

Upper Write Put Procedure

muxuwput, the upper-queue write put procedure, traps ioctl, in particular I_LINK and I_UNLINK:


static int
/*
* This is our callback routine usesd by bufcall() to inform us
*when buffers become available
*/
static void mux_qenable(long ql)
{
		queue_t *q = (queue_t *ql);
		struct mux *mux;

		mux = (struct mux *)(q->q_ptr);
		mux->bufcid = 0;
		qenable(q);
}
muxuwput(queue_t *q, mblk_t *mp)
{
	struct mux *mux;

	mux = (struct mux *)q->q_ptr;
	switch (mp->b_datap->db_type) {
	case M_IOCTL: {
			struct iocblk *iocp;
			struct linkblk *linkp;
			/*
			 * ioctl. Only channel 0 can do ioctls. Two
			 * calls are recognized: LINK, and UNLINK
			 */
			if (mux != mux_mux)
				goto iocnak;

			iocp = (struct iocblk *) mp->b_rptr;
			switch (iocp->ioc_cmd) {
			case I_LINK:
				/*
				 *Link. The data contains a linkblk structure
				 *Remember the bottom queue in muxbot.
				 */
				if (muxbot != NULL)
						goto iocnak;

				linkp=(struct linkblk *) mp->b_cont->b_rptr;
				muxbot = linkp->l_qbot;
				muxerr = 0;

				mp->b_datap->db_type = M_IOCACK;
				iocp->ioc_count = 0;
				qreply(q, mp);
				break;
			case I_UNLINK:
				/*
				 * Unlink. The data contains a linkblk struct.
				 * Should not fail an unlink. Null out muxbot.
				 */
				linkp=(struct linkblk *) mp->b_cont->b_rptr;
				muxbot = NULL;
				mp->b_datap->db_type = M_IOCACK;
				iocp->ioc_count = 0;
				qreply(q, mp);
				break;
			default:
			iocnak:
				/* fail ioctl */
				mp->b_datap->db_type = M_IOCNAK;
				qreply(q, mp);
			}
			break;
	}
	case M_FLUSH:
			if (*mp->b_rptr & FLUSHW)
				flushq(q, FLUSHDATA);
			if (*mp->b_rptr & FLUSHR) {
				*mp->b_rptr &= ~FLUSHW;
				qreply(q, mp);
			} else
				freemsg(mp);
			break;





	case M_DATA:{
			 */
			 * Data. If we have no bottom queue --> fail
			 * Otherwise, queue the data and invoke the lower
			 * service procedure.
		mblk_t *bp;
			if (muxerr || muxbot == NULL)
				goto bad;
		if ((bp = allocb(1, BPRI_MED)) == NULL) {
				putbq(q, mp);
				mux->bufcid = bufcall(1, BPRI_MED,
					mux_qenable, (long)q);
				break;
			}
			*bp->b_wptr++ = (struct mux *)q->ptr - mux_mux;
			bp->b_cont = mp;
			putq(q, bp);
			break;
}
	default:
	bad:
			/*
			 * Send an error message upstream.
			 */
			mp->b_datap->db_type = M_ERROR;
			mp->b_rptr = mp->b_wptr = mp->b_datap->db_base;
			*mp->b_wptr++ = EINVAL;
			qreply(q, mp);
 	}
}

First, there is a check to enforce that the Stream associated with minor device 0 will be the single, controlling Stream. The ioctl(2)s are only accepted on this Stream. As described previously, a controlling Stream is the one that issues the I_LINK. Having a single control Stream is a recommended practice. I_LINK and I_UNLINK include a linkblk structure containing:

l_qtop

The upper write queue from which the ioctl(2) is coming. It always equals q for an I_LINK, and NULL for I_PLINK.

l_qbot

The new lower write queue. It is the former Stream-head write queue. It is of most interest since that is where the multiplexer gets and puts its data.

l_index

A unique (system-wide) identifier for the link. It can be used for routing or during selective unlinks. Since the example only supports a single link, l_index is not used.

For I_LINK, l_qbot is saved in muxbot and a positive acknowledgment is generated. From this point on, until an I_UNLINK occurs, data from upper queues will be routed through muxbot. Note that when an I_LINK, is received, the lower Stream has already been connected. This allows the driver to send messages downstream to perform any initialization functions. Returning an M_IOCNAK message (negative acknowledgment) in response to an I_LINK causes the lower Stream to be disconnected.

The I_UNLINK handling code nulls out muxbot and generates a positive acknowledgment. A negative acknowledgment should not be returned to an I_UNLINK. The Stream head ensures that the lower Stream is connected to a multiplexer before sending an I_UNLINK M_IOCTL.

Drivers can handle the persistent link requests--I_PLINK and I_PUNLINK ioctl(2) (see "Persistent Links") in the same manner, except that l_qtop in the linkblk structure passed to the put routine is NULL instead of identifying the controlling Stream.

muxuwput handles M_FLUSH messages as a normal driver does, except that there are no messages queued on the upper read queue, so there is no need to call flushq if FLUSHR is set.

M_DATA messages are not placed on the lower write message queue. They are queued on the upper write message queue. When flow control subsides on the lower Stream, the lower service procedure, muxlwsrv, is scheduled to start output. This is similar to starting output on a device driver.

Upper Write service Procedure

The following example shows the code for the upper multiplexer write service procedure:


static int muxuwsrv(queue_t *q)
{
	mblk_t *mp;
	struct mux *muxp;
	muxp = (struct mux *)q->q_ptr;

	if (!muxbot) {
			flushq(q, FLUSHALL);
			return (0);
	}
	if (muxerr) {
			flushq(q, FLUSHALL);
			return (0);
	}
	while (mp = getq(q)) {
			if (canputnext(muxbot))
				putnext(muxbot, mp);
			else {
				putbq(q, mp);
				return(0);
			}
	}
	return (0);
}

As long as there is a Stream still linked under the multiplexer and there are no errors, the service procedure will take a message off the queue and send it downstream, if flow control allows.

Lower Write service Procedure

muxlwsrv, the lower (linked) queue write service procedure is scheduled as a result of flow control subsiding downstream (it is back-enabled).


static int muxlwsrv(queue_t *q)
{
	int i;

	for (i = 0; i < mux_cnt; i++)
			if (mux_mux[i].qptr && mux_mux[i].qptr->q_first)
				qenable(mux_mux[i].qptr);
	return (0);
}

muxlwsrv steps through all possible upper queues. If a queue is active and there are messages on the queue, then its upper write service procedure is enabled through qenable.

Lower Read put Procedure

The lower (linked) queue read put procedure is:


Example 13-4

static int
muxlrput(queue_t *q, mblk_t *mp)
{
	queue_t *uq;
	int device;

	if(muxerr) {
			freemsg(mp);
			return (0);
	}

	switch(mp->b_datap->db_type) {
	case M_FLUSH:
			/*
			 * Flush queues. NOTE: sense of tests is reversed
			 * since we are acting like a "stream head"
			 */
			if (*mp->b_rptr & FLUSHW) {
				*mp->b_rptr &= ~FLUSHR;
				qreply(q, mp);
			} else
				freemsg(mp);
			break;
	case M_ERROR:
	case M_HANGUP:
			muxerr = 1;
			freemsg(mp);
			break;
	case M_DATA:
			/*
			 * Route message. First byte indicates
			 * device to send to. No flow control.
			 *
			 * Extract and delete device number. If the
			 * leading block is now empty and more blocks
			 * follow, strip the leading block.
			 */
			device = *mp->b_rptr++;

			/* Sanity check. Device must be in range */
			if (device < 0 || device >= mux_cnt) {
				freemsg(mp);
				break;
			}
			/*
			 * If upper stream is open and not backed up,
			 * send the message there, otherwise discard it.
			 */
			uq = mux_mux[device].qptr;
			if (uq != NULL && canputnext(uq))
				putnext(uq, mp);
			else
				freemsg(mp);
			break;
	default:
			freemsg(mp);
	}
	return (0);
}

muxlrput receives messages from the linked Stream. In this case, it is acting as a Stream head. It handles M_FLUSH messages. Note the code is the reverse of a driver, handling M_FLUSH messages from upstream. There is no need to flush the read queue because no data is ever placed in it.

muxlrput also handles M_ERROR and M_HANGUP messages. If one is received, it locks up the upper Streams by setting muxerr.

M_DATA messages are routed by checking the first data byte of the message. This byte contains the minor device of the upper Stream. Several sanity checks are made:

This multiplexer does not support flow control on the read side; it is merely a router. If it passes all checks, the message is put to the proper upper queue. Otherwise, the message is discarded.

The upper Stream close routine clears the mux entry so this queue will no longer be found. Outstanding bufcalls are not cleared.


/*
* Upper queue close
*/
static int
muxclose(queue_t *q, int flag, cred_t *credp)
{
	struct mux *mux;

	mux = (struct mux *) q->q_ptr;
	qprocsoff(q);
	if (mux->bufcid != 0)
		unbufcall(mux->bufcid);
	mux->bufcid = 0;
	mux->ptr = NULL;
	q->q_ptr = NULL;
	WR(q)->q_ptr = NULL;
	return(0);
}

Persistent Links

With I_LINK and I_UNLINK ioctl(2) the file descriptor associated with the Stream above the multiplexer used to set up the lower multiplexer connections must remain open for the duration of the configuration. Closing the file descriptor associated with the controlling Stream dismantles the whole multiplexing configuration. It is not always desirable to keep a process running merely to hold the multiplexer configuration together, so, "free standing" links below a multiplexer are needed. A persistent link is such a link. It is similar to a STREAMS multiplexer link, except that a process is not needed to hold the links together. After the multiplexer has been set up, the process may close all file descriptors and exit, and the multiplexer remains intact.

Two ioctl(2)s, I_PLINK and I_PUNLINK, are used to create and remove persistent links that are associated with the Stream above the multiplexer. close(2) and I_UNLINK are not able to disconnect the persistent links (see strconf(1) and strchg(1)).

The format of I_PLINK is:


ioctl(fd0, I_PLINK, fd1)

The first file descriptor, fd0, must reference the Stream connected to the multiplexing drive,r and the second file descriptor, fd1, must reference the Stream to be connected below the multiplexer. The persistent link can be created in the following way:


upper_stream_fd = open("/dev/mux", O_RDWR);
lower_stream_fd = open("/dev/driver", O_RDWR);
muxid = ioctl(upper_stream_fd, I_PLINK, lower_stream_fd);
/*
 * save muxid in a file
 */
exit(0);

The persistent link can still exist even if the file descriptor associated with the upper Stream to the multiplexing driver is closed. The I_PLINK ioctl(2) returns an integer value, muxid, that can be used for dismantling the multiplexing configuration. If the process that created the persistent link still exists, it may pass the muxid value to some other process to dismantle the link, if the dismantling is desired, or it can leave the muxid value in a file so that other processes may find it later.

Several users can open the MUXdriver and send data to the Driver1 since the persistent link to the Driver1 remains intact.

The I_PUNLINK ioctl(2) is used to dismantle the persistent link. Its format is:


ioctl(fd0, I_PUNLINK, muxid)

where the fd0 is the file descriptor associated with Stream connected to the multiplexing driver from above. The muxid is returned by the I_PLINK ioctl(2) for the Stream that was connected below the multiplexer. The I_PUNLINK removes the persistent link between the multiplexer referenced by the fd0 and the Stream to the driver designated by the muxid. Each of the bottom persistent links can be disconnected individually. An I_PUNLINK ioctl(2) with the muxid value of MUXID_ALL will remove all persistent links below the multiplexing driver referenced by the fd0.

The following code example shows how to dismantle the previously given configuration:


fd = open("/dev/mux", O_RDWR);
/*
 * retrieve muxid from the file
 */
ioctl(fd, I_PUNLINK, muxid);
exit(0);

Do not use the I_PLINK and I_PUNLINK ioctl(2) with the I_LINK and I_UNLINK. Any attempt to unlink a regular link with the I_PUNLINK or to unlink a persistent link with the I_UNLINK ioctl(2) causes the errno value of EINVAL to be returned.

Since multilevel multiplexing configurations are allowed in STREAMS, it is possible to have a situation where persistent links exist below a multiplexer whose Stream is connected to the above multiplexer by regular links. Closing the file descriptor associated with the controlling Stream will remove the regular link but not the persistent links below it. On the other hand, regular links are allowed to exist below a multiplexer whose Stream is connected to the above multiplexer with persistent links. In this case, the regular links will be removed if the persistent link above is removed and no other references to the lower Streams exist.

The construction of cycles is not allowed when creating links. A cycle could be constructed by:

This is not allowed. The operating system prevents a multiplexer configuration from containing a cycle to ensure that messages cannot be routed infinitely, thus creating an infinite loop or overflowing the kernel stack.

Design Guidelines

The following are general multiplexer design guidelines: