Writing Device Drivers

Part II Designing Specific Kinds of Device Drivers

The second part of the book provides design information that is specific to the type of driver:

Chapter 15 Drivers for Character Devices

A character device does not have physically addressable storage media, such as tape drives or serial ports, where I/O is normally performed in a byte stream. This chapter describes the structure of a character device driver, focusing in particular on entry points for character drivers. In addition, this chapter describes the use of physio(9F) and aphysio(9F) in the context of synchronous and asynchronous I/O transfers.

This chapter provides information on the following subjects:

Overview of the Character Driver Structure

Figure 15–1 shows data structures and routines that define the structure of a character device driver. Device drivers typically include the following elements:

The shaded device access section in the following figure illustrates character driver entry points.

Figure 15–1 Character Driver Roadmap

Diagram shows structures and entry points for character
device drivers.

Associated with each device driver is a dev_ops(9S) structure, which in turn refers to a cb_ops(9S) structure. These structures contain pointers to the driver entry points:


Note –

Some of these entry points can be replaced with nodev(9F) or nulldev(9F) as appropriate.


Character Device Autoconfiguration

The attach(9E) routine should perform the common initialization tasks that all devices require, such as:

See attach() Entry Point for code examples of these tasks.

Character device drivers create minor nodes of type S_IFCHR. A minor node of S_IFCHR causes a character special file that represents the node to eventually appear in the /devices hierarchy.

The following example shows a typical attach(9E) routine for character drivers. Properties that are associated with the device are commonly declared in an attach() routine. This example uses a predefined Size property. Size is the equivalent of the Nblocks property for getting the size of partition in a block device. If, for example, you are doing character I/O on a disk device, you might use Size to get the size of a partition. Since Size is a 64-bit property, you must use a 64-bit property interface. In this case, you use ddi_prop_update_int64(9F). See Device Properties for more information about properties.


Example 15–1 Character Driver attach() Routine

static int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
  int instance = ddi_get_instance(dip);
  switch (cmd) {
  case DDI_ATTACH:
      /* 
       * Allocate a state structure and initialize it.
       * Map the device's registers.
       * Add the device driver's interrupt handler(s).
       * Initialize any mutexes and condition variables.
       * Create power manageable components.
       *
       * Create the device's minor node. Note that the node_type
       * argument is set to DDI_NT_TAPE.
       */
       if (ddi_create_minor_node(dip, minor_name, S_IFCHR,
           instance, DDI_NT_TAPE, 0) == DDI_FAILURE) {
           /* Free resources allocated so far. */
           /* Remove any previously allocated minor nodes. */
           ddi_remove_minor_node(dip, NULL);
           return (DDI_FAILURE);
       }
      /*
       * Create driver properties like "Size." Use "Size" 
       * instead of "size" to ensure the property works 
       * for large bytecounts.
       */
       xsp->Size = size_of_device_in_bytes;
       maj_number = ddi_driver_major(dip);
       if (ddi_prop_update_int64(makedevice(maj_number, instance), 
           dip, "Size", xsp->Size) != DDI_PROP_SUCCESS) {
           cmn_err(CE_CONT, "%s: cannot create Size property\n",
               ddi_get_name(dip));
               /* Free resources allocated so far. */
           return (DDI_FAILURE);
       }
      /* ... */
      return (DDI_SUCCESS);    
case DDI_RESUME:
      /* See the "Power Management" chapter in this book. */
default:
      return (DDI_FAILURE);
  }
}

Device Access (Character Drivers)

Access to a device by one or more application programs is controlled through the open(9E) and close(9E) entry points. An open(2) system call to a special file representing a character device always causes a call to the open(9E) routine for the driver. For a particular minor device, open(9E) can be called many times. The close(9E) routine is called only when the final reference to a device is removed. If the device is accessed through file descriptors, the final call to close(9E) can occur as a result of a close(2) or exit(2) system call. If the device is accessed through memory mapping, the final call to close(9E) can occur as a result of a munmap(2) system call.

open() Entry Point (Character Drivers)

The primary function of open() is to verify that the open request is allowed. The syntax for open(9E) is as follows:

int xxopen(dev_t *devp, int flag, int otyp, cred_t *credp);

where:

devp

Pointer to a device number. The open() routine is passed a pointer so that the driver can change the minor number. With this pointer, drivers can dynamically create minor instances of the device. An example would be a pseudo terminal driver that creates a new pseudo-terminal whenever the driver is opened. A driver that dynamically chooses the minor number normally creates only one minor device node in attach(9E) with ddi_create_minor_node(9F), then changes the minor number component of *devp using makedevice(9F) and getmajor(9F):

    *devp = makedevice(getmajor(*devp), new_minor);

You do not have to call ddi_create_minor_node(9F) for the new minor. A driver must not change the major number of *devp. The driver must keep track of available minor numbers internally.

flag

Flag with bits to indicate whether the device is opened for reading (FREAD), writing (FWRITE), or both. User threads issuing the open(2) system call can also request exclusive access to the device (FEXCL) or specify that the open should not block for any reason (FNDELAY), but the driver must enforce both cases. A driver for a write-only device such as a printer might consider an open(9E) for reading invalid.

otyp

Integer that indicates how open() was called. The driver must check that the value of otyp is appropriate for the device. For character drivers, otyp should be OTYP_CHR (see the open(9E) man page).

credp

Pointer to a credential structure containing information about the caller, such as the user ID and group IDs. Drivers should not examine the structure directly, but should instead use drv_priv(9F) to check for the common case of root privileges. In this example, only root or a user with the PRIV_SYS_DEVICES privilege is allowed to open the device for writing.

The following example shows a character driver open(9E) routine.


Example 15–2 Character Driver open(9E) Routine

static int
xxopen(dev_t *devp, int flag, int otyp, cred_t *credp)
{
    minor_t        instance;

    if (getminor(*devp)         /* if device pointer is invalid */
        return (EINVAL);
    instance = getminor(*devp); /* one-to-one example mapping */
    /* Is the instance attached? */
    if (ddi_get_soft_state(statep, instance) == NULL)
        return (ENXIO);
    /* verify that otyp is appropriate */
    if (otyp != OTYP_CHR)
        return (EINVAL);
    if ((flag & FWRITE) && drv_priv(credp) == EPERM)
        return (EPERM);
    return (0);
}

close() Entry Point (Character Drivers)

The syntax for close(9E) is as follows:

int xxclose(dev_t dev, int flag, int otyp, cred_t *credp);

close() should perform any cleanup necessary to finish using the minor device, and prepare the device (and driver) to be opened again. For example, the open routine might have been invoked with the exclusive access (FEXCL) flag. A call to close(9E) would allow additional open routines to continue. Other functions that close(9E) might perform are:

A driver that waits for I/O to drain could wait forever if draining stalls due to external conditions such as flow control. See Threads Unable to Receive Signals for information about how to avoid this problem.

I/O Request Handling

This section discusses I/O request processing in detail.

User Addresses

When a user thread issues a write(2) system call, the thread passes the address of a buffer in user space:

    char buffer[] = "python";
    count = write(fd, buffer, strlen(buffer) + 1);

The system builds a uio(9S) structure to describe this transfer by allocating an iovec(9S) structure and setting the iov_base field to the address passed to write(2), in this case, buffer. The uio(9S) structure is passed to the driver write(9E) routine. See Vectored I/O for more information about the uio(9S) structure.

The address in the iovec(9S) is in user space, not kernel space. Thus, the address is neither guaranteed to be currently in memory nor to be a valid address. In either case, accessing a user address directly from the device driver or from the kernel could crash the system. Thus, device drivers should never access user addresses directly. Instead, a data transfer routine in the Solaris DDI/DKI should be used to transfer data into or out of the kernel. These routines can handle page faults. The DDI/DKI routines can bring in the proper user page to continue the copy transparently. Alternatively, the routines can return an error on an invalid access.

copyout(9F) can be used to copy data from kernel space to user space. copyin(9F) can copy data from user space to kernel space. ddi_copyout(9F) and ddi_copyin(9F) operate similarly but are to be used in the ioctl(9E) routine. copyin(9F) and copyout(9F) can be used on the buffer described by each iovec(9S) structure, or uiomove(9F) can perform the entire transfer to or from a contiguous area of driver or device memory.

Vectored I/O

In character drivers, transfers are described by a uio(9S) structure. The uio(9S) structure contains information about the direction and size of the transfer, plus an array of buffers for one end of the transfer. The other end is the device.

The uio(9S) structure contains the following members:

iovec_t       *uio_iov;       /* base address of the iovec */
                              /* buffer description array */
int           uio_iovcnt;     /* the number of iovec structures */
off_t         uio_offset;     /* 32-bit offset into file where */
                              /* data is transferred from or to */
offset_t      uio_loffset;    /* 64-bit offset into file where */
                              /* data is transferred from or to */
uio_seg_t     uio_segflg;     /* identifies the type of I/O transfer */
                              /* UIO_SYSSPACE:  kernel <-> kernel */
                              /* UIO_USERSPACE: kernel <-> user */
short         uio_fmode;      /* file mode flags (not driver setTable) */
daddr_t       uio_limit;      /* 32-bit ulimit for file (maximum */
                              /* block offset). not driver settable. */
diskaddr_t    uio_llimit;     /* 64-bit ulimit for file (maximum block */
                              /* block offset). not driver settable. */
int           uio_resid;      /* amount (in bytes) not */
                              /* transferred on completion */

A uio(9S) structure is passed to the driver read(9E) and write(9E) entry points. This structure is generalized to support what is called gather-write and scatter-read. When writing to a device, the data buffers to be written do not have to be contiguous in application memory. Similarly, data that is transferred from a device into memory comes off in a contiguous stream but can go into noncontiguous areas of application memory. See the readv(2), writev(2), pread(2), and pwrite(2) man pages for more information on scatter-gather I/O.

Each buffer is described by an iovec(9S) structure. This structure contains a pointer to the data area and the number of bytes to be transferred.

caddr_t    iov_base;    /* address of buffer */
int        iov_len;     /* amount to transfer */

The uio structure contains a pointer to an array of iovec(9S) structures. The base address of this array is held in uio_iov, and the number of elements is stored in uio_iovcnt.

The uio_offset field contains the 32-bit offset into the device at which the application needs to begin the transfer. uio_loffset is used for 64-bit file offsets. If the device does not support the notion of an offset, these fields can be safely ignored. The driver should interpret either uio_offset or uio_loffset, but not both. If the driver has set the D_64BIT flag in the cb_ops(9S) structure, that driver should use uio_loffset.

The uio_resid field starts out as the number of bytes to be transferred, that is, the sum of all the iov_len fields in uio_iov. This field must be set by the driver to the number of bytes that were not transferred before returning. The read(2) and write(2) system calls use the return value from the read(9E) and write(9E) entry points to determine failed transfers. If a failure occurs, these routines return -1. If the return value indicates success, the system calls return the number of bytes requested minus uio_resid. If uio_resid is not changed by the driver, the read(2) and write(2) calls return 0. A return value of 0 indicates end-of-file, even though all the data has been transferred.

The support routines uiomove(9F), physio(9F), and aphysio(9F) update the uio(9S) structure directly. These support routines update the device offset to account for the data transfer. Neither the uio_offset or uio_loffset fields need to be adjusted when the driver is used with a seekable device that uses the concept of position. I/O performed to a device in this manner is constrained by the maximum possible value of uio_offset or uio_loffset. An example of such a usage is raw I/O on a disk.

If the device has no concept of position, the driver can take the following steps:

  1. Save uio_offset or uio_loffset.

  2. Perform the I/O operation.

  3. Restore uio_offset or uio_loffset to the field's initial value.

I/O that is performed to a device in this manner is not constrained by the maximum possible value of uio_offset or uio_loffset. An example of this type of usage is I/O on a serial line.

The following example shows one way to preserve uio_loffset in the read(9E) function.

static int
xxread(dev_t dev, struct uio *uio_p, cred_t *cred_p)
{
    offset_t off;
    /* ... */
    off = uio_p->uio_loffset;  /* save the offset */
    /* do the transfer */
    uio_p->uio_loffset = off;  /* restore it */
}

Differences Between Synchronous and Asynchronous I/O

Data transfers can be synchronous or asynchronous. The determining factor is whether the entry point that schedules the transfer returns immediately or waits until the I/O has been completed.

The read(9E) and write(9E) entry points are synchronous entry points. The transfer must not return until the I/O is complete. Upon return from the routines, the process knows whether the transfer has succeeded.

The aread(9E) and awrite(9E) entry points are asynchronous entry points. Asynchronous entry points schedule the I/O and return immediately. Upon return, the process that issues the request knows that the I/O is scheduled and that the status of the I/O must be determined later. In the meantime, the process can perform other operations.

With an asynchronous I/O request to the kernel, the process is not required to wait while the I/O is in process. A process can perform multiple I/O requests and allow the kernel to handle the data transfer details. Asynchronous I/O requests enable applications such as transaction processing to use concurrent programming methods to increase performance or response time. Any performance boost for applications that use asynchronous I/O, however, comes at the expense of greater programming complexity.

Data Transfer Methods

Data can be transferred using either programmed I/O or DMA. These data transfer methods can be used either by synchronous or by asynchronous entry points, depending on the capabilities of the device.

Programmed I/O Transfers

Programmed I/O devices rely on the CPU to perform the data transfer. Programmed I/O data transfers are identical to other read and write operations for device registers. Various data access routines are used to read or store values to device memory.

uiomove(9F) can be used to transfer data to some programmed I/O devices. uiomove(9F) transfers data between the user space, as defined by the uio(9S) structure, and the kernel. uiomove() can handle page faults, so the memory to which data is transferred need not be locked down. uiomove() also updates the uio_resid field in the uio(9S) structure. The following example shows one way to write a ramdisk read(9E) routine. It uses synchronous I/O and relies on the presence of the following fields in the ramdisk state structure:

caddr_t    ram;        /* base address of ramdisk */
int        ramsize;    /* size of the ramdisk */

Example 15–3 Ramdisk read(9E) Routine Using uiomove(9F)

static int
rd_read(dev_t dev, struct uio *uiop, cred_t *credp)
{
     rd_devstate_t     *rsp;

     rsp = ddi_get_soft_state(rd_statep, getminor(dev));
     if (rsp == NULL)
       return (ENXIO);
     if (uiop->uio_offset >= rsp->ramsize)
       return (EINVAL);
     /*
      * uiomove takes the offset into the kernel buffer,
      * the data transfer count (minimum of the requested and
      * the remaining data), the UIO_READ flag, and a pointer
      * to the uio structure.
      */
     return (uiomove(rsp->ram + uiop->uio_offset,
         min(uiop->uio_resid, rsp->ramsize - uiop->uio_offset),
         UIO_READ, uiop));
}

Another example of programmed I/O would be a driver that writes data one byte at a time directly to the device's memory. Each byte is retrieved from the uio(9S) structure by using uwritec(9F). The byte is then sent to the device. read(9E) can use ureadc(9F) to transfer a byte from the device to the area described by the uio(9S) structure.


Example 15–4 Programmed I/O write(9E) Routine Using uwritec(9F)

static int
xxwrite(dev_t dev, struct uio *uiop, cred_t *credp)
{
    int    value;
    struct xxstate     *xsp;

    xsp = ddi_get_soft_state(statep, getminor(dev));
    if (xsp == NULL)
        return (ENXIO);
    /* if the device implements a power manageable component, do this: */
    pm_busy_component(xsp->dip, 0);
    if (xsp->pm_suspended)
        pm_raise_power(xsp->dip, normal power);

    while (uiop->uio_resid > 0) {
        /*
         * do the programmed I/O access
         */
        value = uwritec(uiop);
        if (value == -1)
               return (EFAULT);
        ddi_put8(xsp->data_access_handle, &xsp->regp->data,
            (uint8_t)value);
        ddi_put8(xsp->data_access_handle, &xsp->regp->csr,
            START_TRANSFER);
        /*
         * this device requires a ten microsecond delay
         * between writes
         */
        drv_usecwait(10);
    }
    pm_idle_component(xsp->dip, 0);
    return (0);
}

DMA Transfers (Synchronous)

Character drivers generally use physio(9F) to do the setup work for DMA transfers in read(9E) and write(9E), as is shown in Example 15–5.

int physio(int (*strat)(struct buf *), struct buf *bp,
     dev_t dev, int rw, void (*mincnt)(struct buf *),
     struct uio *uio);

physio(9F) requires the driver to provide the address of a strategy(9E) routine. physio(9F) ensures that memory space is locked down, that is, memory cannot be paged out, for the duration of the data transfer. This lock-down is necessary for DMA transfers because DMA transfers cannot handle page faults. physio(9F) also provides an automated way of breaking a larger transfer into a series of smaller, more manageable ones. See minphys() Entry Point for more information.


Example 15–5 read(9E) and write(9E) Routines Using physio(9F)

static int
xxread(dev_t dev, struct uio *uiop, cred_t *credp)
{
     struct xxstate *xsp;
     int ret;

     xsp = ddi_get_soft_state(statep, getminor(dev));
     if (xsp == NULL)
        return (ENXIO);
     ret = physio(xxstrategy, NULL, dev, B_READ, xxminphys, uiop);
     return (ret);
}    

static int
xxwrite(dev_t dev, struct uio *uiop, cred_t *credp)
{     
     struct xxstate *xsp;
     int ret;

     xsp = ddi_get_soft_state(statep, getminor(dev));
     if (xsp == NULL)
        return (ENXIO);
     ret = physio(xxstrategy, NULL, dev, B_WRITE, xxminphys, uiop);
     return (ret);
}

In the call to physio(9F), xxstrategy is a pointer to the driver strategy() routine. Passing NULL as the buf(9S) structure pointer tells physio(9F) to allocate a buf(9S) structure. If the driver must provide physio(9F) with a buf(9S) structure, getrbuf(9F) should be used to allocate the structure. physio(9F) returns zero if the transfer completes successfully, or an error number on failure. After calling strategy(9E), physio(9F) calls biowait(9F) to block until the transfer either completes or fails. The return value of physio(9F) is determined by the error field in the buf(9S) structure set by bioerror(9F).

DMA Transfers (Asynchronous)

Character drivers that support aread(9E) and awrite(9E) use aphysio(9F) instead of physio(9F).

int aphysio(int (*strat)(struct buf *), int (*cancel)(struct buf *),
     dev_t dev, int rw, void (*mincnt)(struct buf *),
     struct aio_req *aio_reqp);

Note –

The address of anocancel(9F) is the only value that can currently be passed as the second argument to aphysio(9F).


aphysio(9F) requires the driver to pass the address of a strategy(9E) routine. aphysio(9F) ensures that memory space is locked down, that is, cannot be paged out, for the duration of the data transfer. This lock-down is necessary for DMA transfers because DMA transfers cannot handle page faults. aphysio(9F) also provides an automated way of breaking a larger transfer into a series of smaller, more manageable ones. See minphys() Entry Point for more information.

Example 15–5 and Example 15–6 demonstrate that the aread(9E) and awrite(9E) entry points differ only slightly from the read(9E) and write(9E) entry points. The difference is primarily in their use of aphysio(9F) instead of physio(9F).


Example 15–6 aread(9E) and awrite(9E) Routines Using aphysio(9F)

static int
xxaread(dev_t dev, struct aio_req *aiop, cred_t *cred_p)
{
     struct xxstate *xsp;

     xsp = ddi_get_soft_state(statep, getminor(dev));
     if (xsp == NULL)
         return (ENXIO);
     return (aphysio(xxstrategy, anocancel, dev, B_READ,
     xxminphys, aiop));
}

static int
xxawrite(dev_t dev, struct aio_req *aiop, cred_t *cred_p)
{
     struct xxstate *xsp;

     xsp = ddi_get_soft_state(statep, getminor(dev));
     if (xsp == NULL)
        return (ENXIO);
     return (aphysio(xxstrategy, anocancel, dev, B_WRITE,
     xxminphys,aiop));  
}

In the call to aphysio(9F), xxstrategy() is a pointer to the driver strategy routine. aiop is a pointer to the aio_req(9S) structure. aiop is passed to aread(9E) and awrite(9E). aio_req(9S) describes where the data is to be stored in user space. aphysio(9F) returns zero if the I/O request is scheduled successfully or an error number on failure. After calling strategy(9E), aphysio(9F) returns without waiting for the I/O to complete or fail.

minphys() Entry Point

The minphys() entry point is a pointer to a function to be called by physio(9F) or aphysio(9F). The purpose of xxminphys is to ensure that the size of the requested transfer does not exceed a driver-imposed limit. If the user requests a larger transfer, strategy(9E) is called repeatedly, requesting no more than the imposed limit at a time. This approach is important because DMA resources are limited. Drivers for slow devices, such as printers, should be careful not to tie up resources for a long time.

Usually, a driver passes the address of the kernel function minphys(9F), but the driver can define its own xxminphys() routine instead. The job of xxminphys() is to keep the b_bcount field of the buf(9S) structure under a driver's limit. The driver should adhere to other system limits as well. For example, the driver's xxminphys() routine should call the system minphys(9F) routine after setting the b_bcount field and before returning.


Example 15–7 minphys(9F) Routine

#define XXMINVAL (512 << 10)    /* 512 KB */
static void
xxminphys(struct buf *bp)
{
       if (bp->b_bcount > XXMINVAL)
        bp->b_bcount = XXMINVAL
      minphys(bp);
}

strategy() Entry Point

The strategy(9E) routine originated in block drivers. The strategy function got its name from implementing a strategy for efficient queuing of I/O requests to a block device. A driver for a character-oriented device can also use a strategy(9E) routine. In the character I/O model presented here, strategy(9E) does not maintain a queue of requests, but rather services one request at a time.

In the following example, the strategy(9E) routine for a character-oriented DMA device allocates DMA resources for synchronous data transfer. strategy() starts the command by programming the device register. See Chapter 9, Direct Memory Access (DMA) for a detailed description.


Note –

strategy(9E) does not receive a device number (dev_t) as a parameter. Instead, the device number is retrieved from the b_edev field of the buf(9S) structure passed to strategy(9E).



Example 15–8 strategy(9E) Routine

static int
xxstrategy(struct buf *bp)
{
     minor_t            instance;
     struct xxstate     *xsp;
     ddi_dma_cookie_t   cookie;

     instance = getminor(bp->b_edev);
     xsp = ddi_get_soft_state(statep, instance);
     /* ... */
      * If the device has power manageable components,
      * mark the device busy with pm_busy_components(9F),
      * and then ensure that the device is
      * powered up by calling pm_raise_power(9F).
      */
     /* Set up DMA resources with ddi_dma_alloc_handle(9F) and
      * ddi_dma_buf_bind_handle(9F).
      */
     xsp->bp = bp; /* remember bp */
     /* Program DMA engine and start command */
     return (0);
}


Note –

Although strategy() is declared to return an int, strategy() must always return zero.


On completion of the DMA transfer, the device generates an interrupt, causing the interrupt routine to be called. In the following example, xxintr() receives a pointer to the state structure for the device that might have generated the interrupt.


Example 15–9 Interrupt Routine

static u_int
xxintr(caddr_t arg)
{
     struct xxstate *xsp = (struct xxstate *)arg;
     if ( /* device did not interrupt */ ) {
        return (DDI_INTR_UNCLAIMED);
     }
     if ( /* error */ ) {
        /* error handling */
     }
     /* Release any resources used in the transfer, such as DMA resources.
      * ddi_dma_unbind_handle(9F) and ddi_dma_free_handle(9F)
      * Notify threads that the transfer is complete.
      */
     biodone(xsp->bp);
     return (DDI_INTR_CLAIMED);
}

The driver indicates an error by calling bioerror(9F). The driver must call biodone(9F) when the transfer is complete or after indicating an error with bioerror(9F).

Mapping Device Memory

Some devices, such as frame buffers, have memory that is directly accessible to user threads by way of memory mapping. Drivers for these devices typically do not support the read(9E) and write(9E) interfaces. Instead, these drivers support memory mapping with the devmap(9E) entry point. For example, a frame buffer driver might implement the devmap(9E) entry point to enable the frame buffer to be mapped in a user thread.

The devmap(9E) entry point is called to export device memory or kernel memory to user applications. The devmap() function is called from devmap_setup(9F) inside segmap(9E) or on behalf of ddi_devmap_segmap(9F).

The segmap(9E) entry point is responsible for setting up a memory mapping requested by an mmap(2) system call. Drivers for many memory-mapped devices use ddi_devmap_segmap(9F) as the entry point rather than defining their own segmap(9E) routine.

See Chapter 10, Mapping Device and Kernel Memory and Chapter 11, Device Context Management for details.

Multiplexing I/O on File Descriptors

A thread sometimes needs to handle I/O on more than one file descriptor. One example is an application program that needs to read the temperature from a temperature-sensing device and then report the temperature to an interactive display. A program that makes a read request with no data available should not block while waiting for the temperature before interacting with the user again.

The poll(2) system call provides users with a mechanism for multiplexing I/O over a set of file descriptors that reference open files. poll(2) identifies those file descriptors on which a program can send or receive data without blocking, or on which certain events have occurred.

To enable a program to poll a character driver, the driver must implement the chpoll(9E) entry point. The system calls chpoll(9E) when a user process issues a poll(2) system call on a file descriptor associated with the device. The chpoll(9E) entry point routine is used by non-STREAMS character device drivers that need to support polling.

The chpoll(9E) function uses the following syntax:

int xxchpoll(dev_t dev, short events, int anyyet, short *reventsp,
     struct pollhead **phpp);

In the chpoll(9E) entry point, the driver must follow these rules:

Example 15–10 and Example 15–11 show how to implement the polling discipline and how to use pollwakeup(9F).

The following example shows how to handle the POLLIN and POLLERR events. The driver first reads the status register to determine the current state of the device. The parameter events specifies which conditions the driver should check. If an appropriate condition has occurred, the driver sets that bit in *reventsp. If none of the conditions has occurred and if anyyet is not set, the address of the pollhead structure is returned in *phpp.


Example 15–10 chpoll(9E) Routine

static int
xxchpoll(dev_t dev, short events, int anyyet,
    short *reventsp, struct pollhead **phpp)
{
     uint8_t status;
     short revent;
     struct xxstate *xsp;

     xsp = ddi_get_soft_state(statep, getminor(dev));
     if (xsp == NULL)
         return (ENXIO);
     revent = 0;
     /*
    * Valid events are:
    * POLLIN | POLLOUT | POLLPRI | POLLHUP | POLLERR
    * This example checks only for POLLIN and POLLERR.
    */
     status = ddi_get8(xsp->data_access_handle, &xsp->regp->csr);
     if ((events & POLLIN) && data available to read) {
        revent |= POLLIN;
     }
     if (status & DEVICE_ERROR) {
        revent |= POLLERR;
     }
     /* if nothing has occurred */
     if (revent == 0) {
        if (!anyyet) {
        *phpp = &xsp->pollhead;
        }
     }
       *reventsp = revent;
     return (0);
}

The following example shows how to use the pollwakeup(9F) function. The pollwakeup(9F) function usually is called in the interrupt routine when a supported condition has occurred. The interrupt routine reads the status from the status register and checks for the conditions. The routine then calls pollwakeup(9F) for each event to possibly notify polling threads that they should check again. Note that pollwakeup(9F) should not be called with any locks held, since deadlock could result if another routine tried to enter chpoll(9E) and grab the same lock.


Example 15–11 Interrupt Routine Supporting chpoll(9E)

static u_int
xxintr(caddr_t arg)
{
     struct xxstate *xsp = (struct xxstate *)arg;
     uint8_t    status;
     /* normal interrupt processing */
     /* ... */
     status = ddi_get8(xsp->data_access_handle, &xsp->regp->csr);
     if (status & DEVICE_ERROR) {
        pollwakeup(&xsp->pollhead, POLLERR);
     }
     if ( /* just completed a read */ ) {
        pollwakeup(&xsp->pollhead, POLLIN);
     }
     /* ... */
     return (DDI_INTR_CLAIMED);
}

Miscellaneous I/O Control

The ioctl(9E) routine is called when a user thread issues an ioctl(2) system call on a file descriptor associated with the device. The I/O control mechanism is a catchall for getting and setting device-specific parameters. This mechanism is frequently used to set a device-specific mode, either by setting internal driver software flags or by writing commands to the device. The control mechanism can also be used to return information to the user about the current device state. In short, the control mechanism can do whatever the application and driver need to have done.

ioctl() Entry Point (Character Drivers)

int xxioctl(dev_t dev, int cmd, intptr_t arg, int mode,
     cred_t *credp, int *rvalp);

The cmd parameter indicates which command ioctl(9E) should perform. By convention, the driver with which an I/O control command is associated is indicated in bits 8-15 of the command. Typically, the ASCII code of a character represents the driver. The driver-specific command in bits 0-7. The creation of some I/O commands is illustrated in the following example:

#define XXIOC    (`x' << 8)     /* `x' is a character representing */
                                      /* device xx */
#define XX_GET_STATUS    (XXIOC | 1)  /* get status register */
#define XX_SET_CMD       (XXIOC | 2)  /* send command */

The interpretation of arg depends on the command. I/O control commands should be documented in the driver documentation or a man page. The command should also be defined in a public header file, so that applications can determine the name of the command, what the command does, and what the command accepts or returns as arg. Any data transfer of arg into or out of the driver must be performed by the driver.

Certain classes of devices such as frame buffers or disks must support standard sets of I/O control requests. These standard I/O control interfaces are documented in the Solaris 8 Reference Manual Collection. For example, fbio(7I) documents the I/O controls that frame buffers must support, and dkio(7I) documents standard disk I/O controls. See Miscellaneous I/O Control for more information on I/O controls.

Drivers must use ddi_copyin(9F) to transfer arg data from the user-level application to the kernel level. Drivers must use ddi_copyout(9F) to transfer data from the kernel to the user level. Failure to use ddi_copyin(9F) or ddi_copyout(9F) can result in panics under two conditions. A panic occurs if the architecture separates the kernel and user address spaces, or if the user address has been swapped out.

ioctl(9E) is usually a switch statement with a case for each supported ioctl(9E) request.


Example 15–12 ioctl(9E) Routine

static int
xxioctl(dev_t dev, int cmd, intptr_t arg, int mode,
    cred_t *credp, int *rvalp)
{
     uint8_t        csr;
     struct xxstate     *xsp;

     xsp = ddi_get_soft_state(statep, getminor(dev));
     if (xsp == NULL) {
        return (ENXIO);
     }
     switch (cmd) {
     case XX_GET_STATUS:
       csr = ddi_get8(xsp->data_access_handle, &xsp->regp->csr);
       if (ddi_copyout(&csr, (void *)arg,
           sizeof (uint8_t), mode) != 0) {
           return (EFAULT);
       }
       break;
     case XX_SET_CMD:
       if (ddi_copyin((void *)arg, &csr,
         sizeof (uint8_t), mode) != 0) {
         return (EFAULT);
       }
       ddi_put8(xsp->data_access_handle, &xsp->regp->csr, csr);
       break;
     default:
       /* generic "ioctl unknown" error */
       return (ENOTTY);
     }
     return (0);
}

The cmd variable identifies a specific device control operation. A problem can occur if arg contains a user virtual address. ioctl(9E) must call ddi_copyin(9F) or ddi_copyout(9F) to transfer data between the data structure in the application program pointed to by arg and the driver. In Example 15–12, for the case of an XX_GET_STATUS request, the contents of xsp->regp->csr are copied to the address in arg. ioctl(9E) can store in *rvalp any integer value as the return value to the ioctl(2) system call that makes a successful request. Negative return values, such as -1, should be avoided. Many application programs assume that negative values indicate failure.

The following example demonstrates an application that uses the I/O controls discussed in the previous paragraph.


Example 15–13 Using ioctl(9E)

#include <sys/types.h>
#include "xxio.h"     /* contains device's ioctl cmds and args */
int
main(void)
{
     uint8_t    status;
     /* ... */
     /*
      * read the device status
      */
     if (ioctl(fd, XX_GET_STATUS, &status) == -1) {
         /* error handling */
     }
     printf("device status %x\n", status);
     exit(0);
}

I/O Control Support for 64-Bit Capable Device Drivers

The Solaris kernel runs in 64-bit mode on suitable hardware, supporting both 32-bit applications and 64-bit applications. A 64-bit device driver is required to support I/O control commands from programs of both sizes. The difference between a 32-bit program and a 64-bit program is the C language type model. A 32-bit program is ILP32, and a 64-bit program is LP64. See Appendix C, Making a Device Driver 64-Bit Ready for information on C data type models.

If data that flows between programs and the kernel is not identical in format, the driver must be able to handle the model mismatch. Handling a model mismatch requires making appropriate adjustments to the data.

To determine whether a model mismatch exists, the ioctl(9E) mode parameter passes the data model bits to the driver. As Example 15–14 shows, the mode parameter is then passed to ddi_model_convert_from(9F) to determine whether any model conversion is necessary.

A flag subfield of the mode argument is used to pass the data model to the ioctl(9E) routine. The flag is set to one of the following:

FNATIVE is conditionally defined to match the data model of the kernel implementation. The FMODELS mask should be used to extract the flag from the mode argument. The driver can then examine the data model explicitly to determine how to copy the application data structure.

The DDI function ddi_model_convert_from(9F) is a convenience routine that can assist some drivers with their ioctl() calls. The function takes the data type model of the user application as an argument and returns one of the following values:

DDI_MODEL_NONE is returned if no data conversion is necessary, as occurs when the application and driver have the same data model. DDI_MODEL_ILP32 is returned to a driver that is compiled to the LP64 model and that communicates with a 32-bit application.

In the following example, the driver copies a data structure that contains a user address. The data structure changes size from ILP32 to LP64. Accordingly, the 64-bit driver uses a 32-bit version of the structure when communicating with a 32-bit application.


Example 15–14 ioctl(9E) Routine to Support 32-bit Applications and 64-bit Applications

struct args32 {
    uint32_t    addr;    /* 32-bit address in LP64 */
    int     len;
}
struct args {
    caddr_t     addr;    /* 64-bit address in LP64 */
    int     len;
}

static int
xxioctl(dev_t dev, int cmd, intptr_t arg, int mode,
    cred_t *credp, int *rvalp)
{
    struct  xxstate  *xsp;
    struct  args     a;
    xsp = ddi_get_soft_state(statep, getminor(dev));
    if (xsp == NULL) {
        return (ENXIO);
    }
    switch (cmd) {
    case XX_COPYIN_DATA:
        switch(ddi_model_convert_from(mode)) {
        case DDI_MODEL_ILP32:
        {
            struct args32 a32;

            /* copy 32-bit args data shape */
            if (ddi_copyin((void *)arg, &a32,
                sizeof (struct args32), mode) != 0) {
                return (EFAULT);
            }
            /* convert 32-bit to 64-bit args data shape */
            a.addr = a32.addr;
            a.len = a32.len;
            break;
        }
        case DDI_MODEL_NONE:
            /* application and driver have same data model. */
            if (ddi_copyin((void *)arg, &a, sizeof (struct args),
                mode) != 0) {
                return (EFAULT);
            }
        }
        /* continue using data shape in native driver data model. */
        break;

    case XX_COPYOUT_DATA:
        /* copyout handling */
        break;
    default:
        /* generic "ioctl unknown" error */
        return (ENOTTY);
    }
    return (0);
}

Handling copyout() Overflow

Sometimes a driver needs to copy out a native quantity that no longer fits in the 32-bit sized structure. In this case, the driver should return EOVERFLOW to the caller. EOVERFLOW serves as an indication that the data type in the interface is too small to hold the value to be returned, as shown in the following example.


Example 15–15 Handling copyout(9F) Overflow

int
    xxioctl(dev_t dev, int cmd, intptr_t arg, int mode,
     cred_t *cr, int *rval_p)
    {
        struct resdata res;
        /* body of driver */
        switch (ddi_model_convert_from(mode & FMODELS)) {
        case DDI_MODEL_ILP32: {
            struct resdata32 res32;

            if (res.size > UINT_MAX)
                    return (EOVERFLOW);    
            res32.size = (size32_t)res.size;
            res32.flag = res.flag;
            if (ddi_copyout(&res32,
                (void *)arg, sizeof (res32), mode))
                    return (EFAULT);
        }
        break;

        case DDI_MODEL_NONE:
            if (ddi_copyout(&res, (void *)arg, sizeof (res), mode))
                    return (EFAULT);
            break;
        }
        return (0);
    }

32-bit and 64-bit Data Structure Macros

The method in Example 15–15 works well for many drivers. An alternate scheme is to use the data structure macros that are provided in <sys/model.h>to move data between the application and the kernel. These macros make the code less cluttered and behave identically, from a functional perspective.


Example 15–16 Using Data Structure Macros to Move Data

int
    xxioctl(dev_t dev, int cmd, intptr_t arg, int mode,
        cred_t *cr, int *rval_p)
    {    
        STRUCT_DECL(opdata, op);

        if (cmd != OPONE)
            return (ENOTTY);

        STRUCT_INIT(op, mode);

        if (copyin((void *)arg,
            STRUCT_BUF(op), STRUCT_SIZE(op)))
                return (EFAULT);

        if (STRUCT_FGET(op, flag) != XXACTIVE ||     
            STRUCT_FGET(op, size) > XXSIZE)
                return (EINVAL);
        xxdowork(device_state, STRUCT_FGET(op, size));
        return (0);
}

How Do the Structure Macros Work?

In a 64-bit device driver, structure macros enable the use of the same piece of kernel memory by data structures of both sizes. The memory buffer holds the contents of the native form of the data structure, that is, the LP64 form, and the ILP32 form. Each structure access is implemented by a conditional expression. When compiled as a 32-bit driver, only one data model, the native form, is supported. No conditional expression is used.

The 64-bit versions of the macros depend on the definition of a shadow version of the data structure. The shadow version describes the 32-bit interface with fixed-width types. The name of the shadow data structure is formed by appending “32” to the name of the native data structure. For convenience, place the definition of the shadow structure in the same file as the native structure to ease future maintenance costs.

The macros can take the following arguments:

structname

The structure name of the native form of the data structure as entered after the struct keyword.

umodel

A flag word that contains the user data model, such as FILP32 or FLP64, extracted from the mode parameter of ioctl(9E).

handle

The name used to refer to a particular instance of a structure that is manipulated by these macros.

fieldname

The name of the field within the structure.

When to Use Structure Macros

Macros enable you to make in-place references only to the fields of a data item. Macros do not provide a way to take separate code paths that are based on the data model. Macros should be avoided if the number of fields in the data structure is large. Macros should also be avoided if the frequency of references to these fields is high.

Macros hide many of the differences between data models in the implementation of the macros. As a result, code written with this interface is generally easier to read. When compiled as a 32-bit driver, the resulting code is compact without needing clumsy #ifdefs, but still preserves type checking.

Declaring and Initializing Structure Handles

STRUCT_DECL(9F) and STRUCT_INIT(9F) can be used to declare and initialize a handle and space for decoding an ioctl on the stack. STRUCT_HANDLE(9F) and STRUCT_SET_HANDLE(9F) declare and initialize a handle without allocating space on the stack. The latter macros can be useful if the structure is very large, or is contained in some other data structure.


Note –

Because the STRUCT_DECL(9F) and STRUCT_HANDLE(9F) macros expand to data structure declarations, these macros should be grouped with such declarations in C code.


The macros for declaring and initializing structures are as follows:

STRUCT_DECL(structname, handle)

Declares a structure handlethat is called handle for a structname data structure. STRUCT_DECL allocates space for its native form on the stack. The native form is assumed to be larger than or equal to the ILP32 form of the structure.

STRUCT_INIT(handle, umodel)

Initializes the data model for handle to umodel. This macro must be invoked before any access is made to a structure handle declared with STRUCT_DECL(9F).

STRUCT_HANDLE(structname, handle)

Declares a structure handle that is called handle. Contrast with STRUCT_DECL(9F).

STRUCT_SET_HANDLE(handle, umodel, addr)

Initializes the data model for handle to umodel, and sets addr as the buffer used for subsequent manipulation. Invoke this macro before accessing a structure handle declared with STRUCT_DECL(9F).

Operations on Structure Handles

The macros for performing operations on structures are as follows:

size_t STRUCT_SIZE(handle)

Returns the size of the structure referred to by handle, according to its embedded data model.

typeof fieldname STRUCT_FGET(handle, fieldname)

Returns the indicated field in the data structure referred to by handle. This field is a non-pointer type.

typeof fieldname STRUCT_FGETP(handle, fieldname)

Returns the indicated field in the data structure referred to by handle. This field is a pointer type.

STRUCT_FSET(handle, fieldname, val)

Sets the indicated field in the data structure referred to by handle to value val. The type of val should match the type of fieldname. The field is a non-pointer type.

STRUCT_FSETP(handle, fieldname, val)

Sets the indicated field in the data structure referred to by handle to value val. The field is a pointer type.

typeof fieldname *STRUCT_FADDR(handle, fieldname)

Returns the address of the indicated field in the data structure referred to by handle.

struct structname *STRUCT_BUF(handle)

Returns a pointer to the native structure described by handle.

Other Operations

Some miscellaneous structure macros follow:

size_t SIZEOF_STRUCT(struct_name, datamodel)

Returns the size of struct_name, which is based on the given data model.

size_t SIZEOF_PTR(datamodel)

Returns the size of a pointer based on the given data model.

Chapter 16 Drivers for Block Devices

This chapter describes the structure of block device drivers. The kernel views a block device as a set of randomly accessible logical blocks. The file system uses a list of buf(9S) structures to buffer the data blocks between a block device and the user space. Only block devices can support a file system.

This chapter provides information on the following subjects:

Block Driver Structure Overview

Figure 16–1 shows data structures and routines that define the structure of a block device driver. Device drivers typically include the following elements:

The shaded device access section in the following figure illustrates entry points for block drivers.

Figure 16–1 Block Driver Roadmap

Diagram shows structures and entry points for block device
drivers.

Associated with each device driver is a dev_ops(9S) structure, which in turn refers to a cb_ops(9S) structure. See Chapter 6, Driver Autoconfiguration for details on driver data structures.

Block device drivers provide these entry points:


Note –

Some of the entry points can be replaced by nodev(9F) or nulldev(9F) as appropriate.


File I/O

A file system is a tree-structured hierarchy of directories and files. Some file systems, such as the UNIX File System (UFS), reside on block-oriented devices. File systems are created by format(1M) and newfs(1M).

When an application issues a read(2) or write(2) system call to an ordinary file on the UFS file system, the file system can call the device driver strategy(9E) entry point for the block device on which the file system resides. The file system code can call strategy(9E) several times for a single read(2) or write(2) system call.

The file system code determines the logical device address, or logical block number, for each ordinary file block. A block I/O request is then built in the form of a buf(9S) structure directed at the block device. The driver strategy(9E) entry point then interprets the buf(9S) structure and completes the request.

Block Device Autoconfiguration

attach(9E) should perform the common initialization tasks for each instance of a device:

Block device drivers create minor nodes of type S_IFBLK. As a result, a block special file that represents the node appears in the /devices hierarchy.

Logical device names for block devices appear in the /dev/dsk directory, and consist of a controller number, bus-address number, disk number, and slice number. These names are created by the devfsadm(1M) program if the node type is set to DDI_NT_BLOCK or DDI_NT_BLOCK_CHAN. DDI_NT_BLOCK_CHAN should be specified if the device communicates on a channel, that is, a bus with an additional level of addressability. SCSI disks are a good example. DDI_NT_BLOCK_CHAN causes a bus-address field (tN) to appear in the logical name. DDI_NT_BLOCK should be used for most other devices.

A minor device refers to a partition on the disk. For each minor device, the driver must create an nblocks or Nblocks property. This integer property gives the number of blocks supported by the minor device expressed in units of DEV_BSIZE, that is, 512 bytes. The file system uses the nblocks and Nblocks properties to determine device limits. Nblocks is the 64-bit version of nblocks. Nblocks should be used with storage devices that can hold over 1 Tbyte of storage per disk. See Device Properties for more information.

Example 16–1 shows a typical attach(9E) entry point with emphasis on creating the device's minor node and the Nblocks property. Note that because this example uses Nblocks and not nblocks, ddi_prop_update_int64(9F) is called instead of ddi_prop_update_int(9F).

As a side note, this example shows the use of makedevice(9F) to create a device number for ddi_prop_update_int64(). The makedevice function makes use of ddi_driver_major(9F), which generates a major number from a pointer to a dev_info_t structure. Using ddi_driver_major() is similar to using getmajor(9F), which gets a dev_t structure pointer.


Example 16–1 Block Driver attach() Routine

static int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
     int instance = ddi_get_instance(dip);
     switch (cmd) {
       case DDI_ATTACH:
       /*
        * allocate a state structure and initialize it
        * map the devices registers
        * add the device driver's interrupt handler(s)
        * initialize any mutexes and condition variables
        * read label information if the device is a disk
        * create power manageable components
        *
        * Create the device minor node. Note that the node_type
        * argument is set to DDI_NT_BLOCK.
        */
       if (ddi_create_minor_node(dip, "minor_name", S_IFBLK,
          instance, DDI_NT_BLOCK, 0) == DDI_FAILURE) {
          /* free resources allocated so far */
          /* Remove any previously allocated minor nodes */
          ddi_remove_minor_node(dip, NULL);
          return (DDI_FAILURE);
        }
       /*
        * Create driver properties like "Nblocks". If the device
        * is a disk, the Nblocks property is usually calculated from
        * information in the disk label.  Use "Nblocks" instead of
        * "nblocks" to ensure the property works for large disks.
        */
       xsp->Nblocks = size;
       /* size is the size of the device in 512 byte blocks */
       maj_number = ddi_driver_major(dip);
       if (ddi_prop_update_int64(makedevice(maj_number, instance), dip, 
          "Nblocks", xsp->Nblocks) != DDI_PROP_SUCCESS) {
          cmn_err(CE_CONT, "%s: cannot create Nblocks property\n",
               ddi_get_name(dip));
         /* free resources allocated so far */
         return (DDI_FAILURE);
       }
       xsp->open = 0;
       xsp->nlayered = 0;
       /* ... */
       return (DDI_SUCCESS);

    case DDI_RESUME:
       /* For information, see Chapter 12, "Power Management," in this book. */
       default:
          return (DDI_FAILURE);
     }
}

Controlling Device Access

This section describes the entry points for open() and close() functions in block device drivers. See Chapter 15, Drivers for Character Devices for more information on open(9E) and close(9E).

open() Entry Point (Block Drivers)

The open(9E) entry point is used to gain access to a given device. The open(9E) routine of a block driver is called when a user thread issues an open(2) or mount(2) system call on a block special file associated with the minor device, or when a layered driver calls open(9E). See File I/O for more information.

The open() entry point should check for the following conditions:

The following example demonstrates a block driver open(9E) entry point.


Example 16–2 Block Driver open(9E) Routine

static int
xxopen(dev_t *devp, int flags, int otyp, cred_t *credp)
{
       minor_t         instance;
       struct xxstate        *xsp;

     instance = getminor(*devp);
     xsp = ddi_get_soft_state(statep, instance);
     if (xsp == NULL)
           return (ENXIO);
     mutex_enter(&xsp->mu);
     /*
    * only honor FEXCL. If a regular open or a layered open
    * is still outstanding on the device, the exclusive open
    * must fail.
    */
     if ((flags & FEXCL) && (xsp->open || xsp->nlayered)) {
       mutex_exit(&xsp->mu);
       return (EAGAIN);
     }
     switch (otyp) {
       case OTYP_LYR:
         xsp->nlayered++;
         break;
      case OTYP_BLK:
         xsp->open = 1;
         break;
     default:
         mutex_exit(&xsp->mu);
         return (EINVAL);
     }
   mutex_exit(&xsp->mu);
      return (0);
}

The otyp argument is used to specify the type of open on the device. OTYP_BLK is the typical open type for a block device. A device can be opened several times with otyp set to OTYP_BLK. close(9E) is called only once when the final close of type OTYP_BLK has occurred for the device. otyp is set to OTYP_LYR if the device is being used as a layered device. For every open of type OTYP_LYR, the layering driver issues a corresponding close of type OTYP_LYR. The example keeps track of each type of open so the driver can determine when the device is not being used in close(9E).

close() Entry Point (Block Drivers)

The close(9E) entry point uses the same arguments as open(9E) with one exception. dev is the device number rather than a pointer to the device number.

The close() routine should verify otyp in the same way as was described for the open(9E) entry point. In the following example, close() must determine when the device can really be closed. Closing is affected by the number of block opens and layered opens.


Example 16–3 Block Device close(9E) Routine

static int
xxclose(dev_t dev, int flag, int otyp, cred_t *credp)
{
     minor_t instance;
     struct xxstate *xsp;

     instance = getminor(dev);
     xsp = ddi_get_soft_state(statep, instance);
       if (xsp == NULL)
          return (ENXIO);
     mutex_enter(&xsp->mu);
     switch (otyp) {
       case OTYP_LYR:
       xsp->nlayered--;
       break;
      case OTYP_BLK:
       xsp->open = 0;
       break;
     default:
       mutex_exit(&xsp->mu);
       return (EINVAL);
       }

     if (xsp->open || xsp->nlayered) {
       /* not done yet */
       mutex_exit(&xsp->mu);
       return (0);
     }
       /* cleanup (rewind tape, free memory, etc.) */
   /* wait for I/O to drain */
     mutex_exit(&xsp->mu);

     return (0);
}

strategy() Entry Point

The strategy(9E) entry point is used to read and write data buffers to and from a block device. The name strategy refers to the fact that this entry point might implement some optimal strategy for ordering requests to the device.

strategy(9E) can be written to process one request at a time, that is, a synchronous transfer. strategy() can also be written to queue multiple requests to the device, as in an asynchronous transfer. When choosing a method, the abilities and limitations of the device should be taken into account.

The strategy(9E) routine is passed a pointer to a buf(9S) structure. This structure describes the transfer request, and contains status information on return. buf(9S) and strategy(9E) are the focus of block device operations.

buf Structure

The following buf structure members are important to block drivers:

     int          b_flags;     /* Buffer Status */
     struct buf       *av_forw;    /* Driver work list link */
     struct buf       *av_back;    /* Driver work list link */
     size_t       b_bcount;    /* # of bytes to transfer */
     union {
     caddr_t      b_addr;      /* Buffer's virtual address */
     } b_un;
     daddr_t      b_blkno;     /* Block number on device */
     diskaddr_t       b_lblkno;    /* Expanded block number on device */
     size_t       b_resid;     /* # of bytes not transferred */
                       /* after error */
     int          b_error;     /* Expanded error field */
     void         *b_private;      /* “opaque” driver private area */
     dev_t        b_edev;      /* expanded dev field */

where:

av_forw and av_back

Pointers that the driver can use to manage a list of buffers by the driver. See Asynchronous Data Transfers (Block Drivers) for a discussion of the av_forw and av_back pointers.

b_bcount

Specifies the number of bytes to be transferred by the device.

b_un.b_addr

The kernel virtual address of the data buffer. Only valid after bp_mapin(9F) call.

b_blkno

The starting 32-bit logical block number on the device for the data transfer, which is expressed in 512-byte DEV_BSIZE units. The driver should use either b_blkno or b_lblkno but not both.

b_lblkno

The starting 64-bit logical block number on the device for the data transfer, which is expressed in 512-byte DEV_BSIZE units. The driver should use either b_blkno or b_lblkno but not both.

b_resid

Set by the driver to indicate the number of bytes that were not transferred because of an error. See Example 16–7 for an example of setting b_resid. The b_resid member is overloaded. b_resid is also used by disksort(9F).

b_error

Set to an error number by the driver when a transfer error occurs. b_error is set in conjunction with the b_flags B_ERROR bit. See the Intro(9E) man page for details about error values. Drivers should use bioerror(9F) rather than setting b_error directly.

b_flags

Flags with status and transfer attributes of the buf structure. If B_READ is set, the buf structure indicates a transfer from the device to memory. Otherwise, this structure indicates a transfer from memory to the device. If the driver encounters an error during data transfer, the driver should set the B_ERROR field in the b_flags member. In addition, the driver should provide a more specific error value in b_error. Drivers should use bioerror(9F) rather than setting B_ERROR.


Caution – Caution –

Drivers should never clear b_flags.


b_private

For exclusive use by the driver to store driver-private data.

b_edev

Contains the device number of the device that was used in the transfer.

bp_mapin Structure

A buf structure pointer can be passed into the device driver's strategy(9E) routine. However, the data buffer referred to by b_un.b_addr is not necessarily mapped in the kernel's address space. Therefore, the driver cannot directly access the data. Most block-oriented devices have DMA capability and therefore do not need to access the data buffer directly. Instead, these devices use the DMA mapping routines to enable the device's DMA engine to do the data transfer. For details about using DMA, see Chapter 9, Direct Memory Access (DMA).

If a driver needs to access the data buffer directly, that driver must first map the buffer into the kernel's address space by using bp_mapin(9F). bp_mapout(9F) should be used when the driver no longer needs to access the data directly.


Caution – Caution –

bp_mapout(9F) should only be called on buffers that have been allocated and are owned by the device driver. bp_mapout() must not be called on buffers that are passed to the driver through the strategy(9E) entry point, such as a file system. bp_mapin(9F) does not keep a reference count. bp_mapout(9F) removes any kernel mapping on which a layer over the device driver might rely.


Synchronous Data Transfers (Block Drivers)

This section presents a simple method for performing synchronous I/O transfers. This method assumes that the hardware is a simple disk device that can transfer only one data buffer at a time by using DMA. Another assumption is that the disk can be spun up and spun down by software command. The device driver's strategy(9E) routine waits for the current request to be completed before accepting a new request. The device interrupts when the transfer is complete. The device also interrupts if an error occurs.

The steps for performing a synchronous data transfer for a block driver are as follows:

  1. Check for invalid buf(9S) requests.

    Check the buf(9S) structure that is passed to strategy(9E) for validity. All drivers should check the following conditions:

    • The request begins at a valid block. The driver converts the b_blkno field to the correct device offset and then determines whether the offset is valid for the device.

    • The request does not go beyond the last block on the device.

    • Device-specific requirements are met.

    If an error is encountered, the driver should indicate the appropriate error with bioerror(9F). The driver should then complete the request by calling biodone(9F). biodone() notifies the caller of strategy(9E) that the transfer is complete. In this case, the transfer has stopped because of an error.

  2. Check whether the device is busy.

    Synchronous data transfers allow single-threaded access to the device. The device driver enforces this access in two ways:

    • The driver maintains a busy flag that is guarded by a mutex.

    • The driver waits on a condition variable with cv_wait(9F), when the device is busy.

    If the device is busy, the thread waits until the interrupt handler indicates that the device is not longer busy. The available status can be indicated by either the cv_broadcast(9F) or the cv_signal(9F) function. See Chapter 3, Multithreading for details on condition variables.

    When the device is no longer busy, the strategy(9E) routine marks the device as available. strategy() then prepares the buffer and the device for the transfer.

  3. Set up the buffer for DMA.

    Prepare the data buffer for a DMA transfer by using ddi_dma_alloc_handle(9F) to allocate a DMA handle. Use ddi_dma_buf_bind_handle(9F) to bind the data buffer to the handle. For information on setting up DMA resources and related data structures, see Chapter 9, Direct Memory Access (DMA).

  4. Begin the transfer.

    At this point, a pointer to the buf(9S) structure is saved in the state structure of the device. The interrupt routine can then complete the transfer by calling biodone(9F).

    The device driver then accesses device registers to initiate a data transfer. In most cases, the driver should protect the device registers from other threads by using mutexes. In this case, because strategy(9E) is single-threaded, guarding the device registers is not necessary. See Chapter 3, Multithreading for details about data locks.

    When the executing thread has started the device's DMA engine, the driver can return execution control to the calling routine, as follows:

    static int
    xxstrategy(struct buf *bp)
    {
        struct xxstate *xsp;
        struct device_reg *regp;
        minor_t instance;
        ddi_dma_cookie_t cookie;
        instance = getminor(bp->b_edev);
        xsp = ddi_get_soft_state(statep, instance);
        if (xsp == NULL) {
           bioerror(bp, ENXIO);
           biodone(bp);
           return (0);
        }
        /* validate the transfer request */
        if ((bp->b_blkno >= xsp->Nblocks) || (bp->b_blkno < 0)) {
           bioerror(bp, EINVAL);    
           biodone(bp);
           return (0);
        }
        /*
         * Hold off all threads until the device is not busy.
         */
        mutex_enter(&xsp->mu);
        while (xsp->busy) {
           cv_wait(&xsp->cv, &xsp->mu);
        }
        xsp->busy = 1;
        mutex_exit(&xsp->mu);
        /* 
         * If the device has power manageable components, 
         * mark the device busy with pm_busy_components(9F),
         * and then ensure that the device 
         * is powered up by calling pm_raise_power(9F).
         *
         * Set up DMA resources with ddi_dma_alloc_handle(9F) and
         * ddi_dma_buf_bind_handle(9F).
         */
        xsp->bp = bp;
        regp = xsp->regp;
        ddi_put32(xsp->data_access_handle, &regp->dma_addr,
            cookie.dmac_address);
        ddi_put32(xsp->data_access_handle, &regp->dma_size,
             (uint32_t)cookie.dmac_size);
        ddi_put8(xsp->data_access_handle, &regp->csr,
             ENABLE_INTERRUPTS | START_TRANSFER);
        return (0);
    }
  5. Handle the interrupting device.

    When the device finishes the data transfer, the device generates an interrupt, which eventually results in the driver's interrupt routine being called. Most drivers specify the state structure of the device as the argument to the interrupt routine when registering interrupts. See the ddi_add_intr(9F) man page and Registering Interrupts. The interrupt routine can then access the buf(9S) structure being transferred, plus any other information that is available from the state structure.

    The interrupt handler should check the device's status register to determine whether the transfer completed without error. If an error occurred, the handler should indicate the appropriate error with bioerror(9F). The handler should also clear the pending interrupt for the device and then complete the transfer by calling biodone(9F).

    As the final task, the handler clears the busy flag. The handler then calls cv_signal(9F) or cv_broadcast(9F) on the condition variable, signaling that the device is no longer busy. This notification enables other threads waiting for the device in strategy(9E) to proceed with the next data transfer.

    The following example shows a synchronous interrupt routine.


Example 16–4 Synchronous Interrupt Routine for Block Drivers

static u_int
xxintr(caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    struct buf *bp;
    uint8_t status;
    mutex_enter(&xsp->mu);
    status = ddi_get8(xsp->data_access_handle, &xsp->regp->csr);
    if (!(status & INTERRUPTING)) {
       mutex_exit(&xsp->mu);
       return (DDI_INTR_UNCLAIMED);
    }
    /* Get the buf responsible for this interrupt */
    bp = xsp->bp;
    xsp->bp = NULL;
    /*
     * This example is for a simple device which either
     * succeeds or fails the data transfer, indicated in the
     * command/status register.
     */
    if (status & DEVICE_ERROR) {
       /* failure */
       bp->b_resid = bp->b_bcount;
       bioerror(bp, EIO);
    } else {
       /* success */
       bp->b_resid = 0;
    }
    ddi_put8(xsp->data_access_handle, &xsp->regp->csr,
       CLEAR_INTERRUPT);
    /* The transfer has finished, successfully or not */
    biodone(bp);
    /*
     * If the device has power manageable components that were
     * marked busy in strategy(9F), mark them idle now with
     * pm_idle_component(9F)
     * Release any resources used in the transfer, such as DMA
     * resources ddi_dma_unbind_handle(9F) and
     * ddi_dma_free_handle(9F).
     *
     * Let the next I/O thread have access to the device.
     */
    xsp->busy = 0;
    cv_signal(&xsp->cv);
    mutex_exit(&xsp->mu);
    return (DDI_INTR_CLAIMED);
}

Asynchronous Data Transfers (Block Drivers)

This section presents a method for performing asynchronous I/O transfers. The driver queues the I/O requests and then returns control to the caller. Again, the assumption is that the hardware is a simple disk device that allows one transfer at a time. The device interrupts when a data transfer has completed. An interrupt also takes place if an error occurs. The basic steps for performing asynchronous data transfers are:

  1. Check for invalid buf(9S) requests.

  2. Enqueue the request.

  3. Start the first transfer.

  4. Handle the interrupting device.

Checking for Invalid buf Requests

As in the synchronous case, the device driver should check the buf(9S) structure passed to strategy(9E) for validity. See Synchronous Data Transfers (Block Drivers) for more details.

Enqueuing the Request

Unlike synchronous data transfers, a driver does not wait for an asynchronous request to complete. Instead, the driver adds the request to a queue. The head of the queue can be the current transfer. The head of the queue can also be a separate field in the state structure for holding the active request, as in Example 16–5.

If the queue is initially empty, then the hardware is not busy and strategy(9E) starts the transfer before returning. Otherwise, if a transfer completes with a non-empty queue, the interrupt routine begins a new transfer. Example 16–5 places the decision of whether to start a new transfer into a separate routine for convenience.

The driver can use the av_forw and the av_back members of the buf(9S) structure to manage a list of transfer requests. A single pointer can be used to manage a singly linked list, or both pointers can be used together to build a doubly linked list. The device hardware specification specifies which type of list management, such as insertion policies, is used to optimize the performance of the device. The transfer list is a per-device list, so the head and tail of the list are stored in the state structure.

The following example provides multiple threads with access to the driver shared data, such as the transfer list. You must identify the shared data and must protect the data with a mutex. See Chapter 3, Multithreading for more details about mutex locks.


Example 16–5 Enqueuing Data Transfer Requests for Block Drivers

static int
xxstrategy(struct buf *bp)
{
    struct xxstate *xsp;
    minor_t instance;
    instance = getminor(bp->b_edev);
    xsp = ddi_get_soft_state(statep, instance);
    /* ... */
    /* validate transfer request */
    /* ... */
    /*
     * Add the request to the end of the queue. Depending on the device, a sorting
     * algorithm, such as disksort(9F) can be used if it improves the
     * performance of the device.
     */
    mutex_enter(&xsp->mu);
    bp->av_forw = NULL;
    if (xsp->list_head) {
       /* Non-empty transfer list */
       xsp->list_tail->av_forw = bp;
       xsp->list_tail = bp;
    } else {
       /* Empty Transfer list */
       xsp->list_head = bp;
       xsp->list_tail = bp;
    }
    mutex_exit(&xsp->mu);
    /* Start the transfer if possible */
    (void) xxstart((caddr_t)xsp);
    return (0);
}

Starting the First Transfer

Device drivers that implement queuing usually have a start() routine. start() dequeues the next request and starts the data transfer to or from the device. In this example, start() processes all requests regardless of the state of the device, whether busy or free.


Note –

start() must be written to be called from any context. start() can be called by both the strategy routine in kernel context and the interrupt routine in interrupt context.


start() is called by strategy(9E) every time strategy() queues a request so that an idle device can be started. If the device is busy, start() returns immediately.

start() is also called by the interrupt handler before the handler returns from a claimed interrupt so that a nonempty queue can be serviced. If the queue is empty, start() returns immediately.

Because start() is a private driver routine, start() can take any arguments and can return any type. The following code sample is written to be used as a DMA callback, although that portion is not shown. Accordingly, the example must take a caddr_t as an argument and return an int. See Handling Resource Allocation Failures for more information about DMA callback routines.


Example 16–6 Starting the First Data Request for a Block Driver

static int
xxstart(caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    struct buf *bp;

    mutex_enter(&xsp->mu);
    /*
     * If there is nothing more to do, or the device is
     * busy, return.
     */
    if (xsp->list_head == NULL || xsp->busy) {
       mutex_exit(&xsp->mu);
       return (0);
    }
    xsp->busy = 1;
    /* Get the first buffer off the transfer list */
    bp = xsp->list_head;
    /* Update the head and tail pointer */
    xsp->list_head = xsp->list_head->av_forw;
    if (xsp->list_head == NULL)
       xsp->list_tail = NULL;
    bp->av_forw = NULL;
    mutex_exit(&xsp->mu);
    /*
     * If the device has power manageable components,
     * mark the device busy with pm_busy_components(9F),
     * and then ensure that the device
     * is powered up by calling pm_raise_power(9F).
     *
     * Set up DMA resources with ddi_dma_alloc_handle(9F) and
     * ddi_dma_buf_bind_handle(9F).
     */
    xsp->bp = bp;
    ddi_put32(xsp->data_access_handle, &xsp->regp->dma_addr,
        cookie.dmac_address);
    ddi_put32(xsp->data_access_handle, &xsp->regp->dma_size,
         (uint32_t)cookie.dmac_size);
    ddi_put8(xsp->data_access_handle, &xsp->regp->csr,
         ENABLE_INTERRUPTS | START_TRANSFER);
    return (0);
}

Handling the Interrupting Device

The interrupt routine is similar to the asynchronous version, with the addition of the call to start() and the removal of the call to cv_signal(9F).


Example 16–7 Block Driver Routine for Asynchronous Interrupts

static u_int
xxintr(caddr_t arg)
{
    struct xxstate *xsp = (struct xxstate *)arg;
    struct buf *bp;
    uint8_t status;
    mutex_enter(&xsp->mu);
    status = ddi_get8(xsp->data_access_handle, &xsp->regp->csr);
    if (!(status & INTERRUPTING)) {
        mutex_exit(&xsp->mu);
        return (DDI_INTR_UNCLAIMED);
    }
    /* Get the buf responsible for this interrupt */
    bp = xsp->bp;
    xsp->bp = NULL;
    /*
     * This example is for a simple device which either
     * succeeds or fails the data transfer, indicated in the
     * command/status register.
     */
    if (status & DEVICE_ERROR) {
        /* failure */
        bp->b_resid = bp->b_bcount;
        bioerror(bp, EIO);
    } else {
        /* success */
        bp->b_resid = 0;
    }
    ddi_put8(xsp->data_access_handle, &xsp->regp->csr,
        CLEAR_INTERRUPT);
    /* The transfer has finished, successfully or not */
    biodone(bp);
    /*
     * If the device has power manageable components that were
     * marked busy in strategy(9F), mark them idle now with
     * pm_idle_component(9F)
     * Release any resources used in the transfer, such as DMA
     * resources (ddi_dma_unbind_handle(9F) and
     * ddi_dma_free_handle(9F)).
     *
     * Let the next I/O thread have access to the device.
     */
    xsp->busy = 0;
    mutex_exit(&xsp->mu);
    (void) xxstart((caddr_t)xsp);
    return (DDI_INTR_CLAIMED);
}

dump() and print() Entry Points

This section discusses the dump(9E) and print(9E) entry points.

dump() Entry Point (Block Drivers)

The dump(9E) entry point is used to copy a portion of virtual address space directly to the specified device in the case of a system failure. dump() is also used to copy the state of the kernel out to disk during a checkpoint operation. See the cpr(7) and dump(9E) man pages for more information. The entry point must be capable of performing this operation without the use of interrupts, because interrupts are disabled during the checkpoint operation.

int dump(dev_t dev, caddr_t addr, daddr_t blkno, int nblk)

where:

dev

Device number of the device to receive the dump.

addr

Base kernel virtual address at which to start the dump.

blkno

Block at which the dump is to start.

nblk

Number of blocks to dump.

The dump depends upon the existing driver working properly.

print() Entry Point (Block Drivers)

int print(dev_t dev, char *str)

The print(9E) entry point is called by the system to display a message about an exception that has been detected. print(9E) should call cmn_err(9F) to post the message to the console on behalf of the system. The following example demonstrates a typical print() entry point.

static int
 xxprint(dev_t dev, char *str)
 {
     cmn_err(CE_CONT, “xx: %s\n”, str);
     return (0);
 }

Disk Device Drivers

Disk devices represent an important class of block device drivers.

Disk ioctls

Solaris disk drivers need to support a minimum set of ioctl commands specific to Solaris disk drivers. These I/O controls are specified in the dkio(7I) manual page. Disk I/O controls transfer disk information to or from the device driver. A Solaris disk device is supported by disk utility commands such as format(1M) and newfs(1M). The mandatory Sun disk I/O controls are as follows:

DKIOCINFO

Returns information that describes the disk controller

DKIOCGAPART

Returns a disk's partition map

DKIOCSAPART

Sets a disk's partition map

DKIOCGGEOM

Returns a disk's geometry

DKIOCSGEOM

Sets a disk's geometry

DKIOCGVTOC

Returns a disk's Volume Table of Contents

DKIOCSVTOC

Sets a disk's Volume Table of Contents

Disk Performance

The Solaris DDI/DKI provides facilities to optimize I/O transfers for improved file system performance. A mechanism manages the list of I/O requests so as to optimize disk access for a file system. See Asynchronous Data Transfers (Block Drivers) for a description of enqueuing an I/O request.

The diskhd structure is used to manage a linked list of I/O requests.

struct diskhd {
    long     b_flags;         /* not used, needed for consistency*/
    struct   buf *b_forw,    *b_back;       /* queue of unit queues */
    struct   buf *av_forw,    *av_back;    /* queue of bufs for this unit */
    long     b_bcount;            /* active flag */
};

The diskhd data structure has two buf pointers that the driver can manipulate. The av_forw pointer points to the first active I/O request. The second pointer, av_back, points to the last active request on the list.

A pointer to this structure is passed as an argument to disksort(9F), along with a pointer to the current buf structure being processed. The disksort() routine sorts the buf requests to optimize disk seek. The routine then inserts the buf pointer into the diskhd list. The disksort() program uses the value that is in b_resid of the buf structure as a sort key. The driver is responsible for setting this value. Most Sun disk drivers use the cylinder group as the sort key. This approach optimizes the file system read-ahead accesses.

When data has been added to the diskhd list, the device needs to transfer the data. If the device is not busy processing a request, the xxstart() routine pulls the first buf structure off the diskhd list and starts a transfer.

If the device is busy, the driver should return from the xxstrategy() entry point. When the hardware is done with the data transfer, an interrupt is generated. The driver's interrupt routine is then called to service the device. After servicing the interrupt, the driver can then call the start() routine to process the next buf structure in the diskhd list.

Chapter 17 SCSI Target Drivers

The Solaris DDI/DKI divides the software interface to SCSI devices into two major parts: target drivers and host bus adapter (HBA) drivers. Target refers to a driver for a device on a SCSI bus, such as a disk or a tape drive. Host bus adapter refers to the driver for the SCSI controller on the host machine. SCSA defines the interface between these two components. This chapter discusses target drivers only. See Chapter 18, SCSI Host Bus Adapter Drivers for information on host bus adapter drivers.


Note –

The terms “host bus adapter” and “HBA” are equivalent to “host adapter,” which is defined in SCSI specifications.


This chapter provides information on the following subjects:

Introduction to Target Drivers

Target drivers can be either character or block device drivers, depending on the device. Drivers for tape drives are usually character device drivers, while disks are handled by block device drivers. This chapter describes how to write a SCSI target driver. The chapter discusses the additional requirements that SCSA places on block and character drivers for SCSI target devices.

The following reference documents provide supplemental information needed by the designers of target drivers and host bus adapter drivers.

Small Computer System Interface 2 (SCSI-2), ANSI/NCITS X3.131-1994, Global Engineering Documents, 1998. ISBN 1199002488.

The Basics of SCSI, Fourth Edition, ANCOT Corporation, 1998. ISBN 0963743988.

Refer also to the SCSI command specification for the target device, provided by the hardware vendor.

Sun Common SCSI Architecture Overview

The Sun Common SCSI Architecture (SCSA) is the Solaris DDI/DKI programming interface for the transmission of SCSI commands from a target driver to a host bus adapter driver. This interface is independent of the type of host bus adapter hardware, the platform, the processor architecture, and the SCSI command being transported across the interface.

Conforming to the SCSA enables the target driver to pass SCSI commands to target devices without knowledge of the hardware implementation of the host bus adapter.

The SCSA conceptually separates building the SCSI command from transporting the command with data across the SCSI bus. The architecture defines the software interface between high-level and low-level software components. The higher level software component consists of one or more SCSI target drivers, which translate I/O requests into SCSI commands appropriate for the peripheral device. The following example illustrates the SCSI architecture.

Figure 17–1 SCSA Block Diagram

Diagram shows the role of the Sun Common SCSI Architecture
in relation to SCSI drivers in the operating system.

The lower-level software component consists of a SCSA interface layer and one or more host bus adapter drivers. The target driver is responsible for the generation of the proper SCSI commands required to execute the desired function and for processing the results.

General Flow of Control

Assuming no transport errors occur, the following steps describe the general flow of control for a read or write request.

  1. The target driver's read(9E) or write(9E) entry point is invoked. physio(9F) is used to lock down memory, prepare a buf structure, and call the strategy routine.

  2. The target driver's strategy(9E) routine checks the request. strategy() then allocates a scsi_pkt(9S) by using scsi_init_pkt(9F). The target driver initializes the packet and sets the SCSI command descriptor block (CDB) using the scsi_setup_cdb(9F) function. The target driver also specifies a timeout. Then, the driver provides a pointer to a callback function. The callback function is called by the host bus adapter driver on completion of the command. The buf(9S) pointer should be saved in the SCSI packet's target-private space.

  3. The target driver submits the packet to the host bus adapter driver by using scsi_transport(9F). The target driver is then free to accept other requests. The target driver should not access the packet while the packet is in transport. If either the host bus adapter driver or the target supports queueing, new requests can be submitted while the packet is in transport.

  4. As soon as the SCSI bus is free and the target not busy, the host bus adapter driver selects the target and passes the CDB. The target driver executes the command. The target then performs the requested data transfers.

  5. After the target sends completion status and the command completes, the host bus adapter driver notifies the target driver. To perform the notification, the host calls the completion function that was specified in the SCSI packet. At this time the host bus adapter driver is no longer responsible for the packet, and the target driver has regained ownership of the packet.

  6. The SCSI packet's completion routine analyzes the returned information. The completion routine then determines whether the SCSI operation was successful. If a failure has occurred, the target driver retries the command by calling scsi_transport(9F) again. If the host bus adapter driver does not support auto request sense, the target driver must submit a request sense packet to retrieve the sense data in the event of a check condition.

  7. After successful completion or if the command cannot be retried, the target driver calls scsi_destroy_pkt(9F). scsi_destroy_pkt() synchronizes the data. scsi_destroy_pkt() then frees the packet. If the target driver needs to access the data before freeing the packet, scsi_sync_pkt(9F) is called.

  8. Finally, the target driver notifies the requesting application that the read or write transaction is complete. This notification is made by returning from the read(9E) entry point in the driver for character devices. Otherwise, notification is made indirectly through biodone(9F).

SCSA allows the execution of many of such operations, both overlapped and queued, at various points in the process. The model places the management of system resources on the host bus adapter driver. The software interface enables the execution of target driver functions on host bus adapter drivers by using SCSI bus adapters of varying degrees of sophistication.

SCSA Functions

SCSA defines functions to manage the allocation and freeing of resources, the sensing and setting of control states, and the transport of SCSI commands. These functions are listed in the following table.

Table 17–1 Standard SCSA Functions

Function Name 

Category 

scsi_abort(9F)

Error handling 

scsi_alloc_consistent_buf(9F)

 

scsi_destroy_pkt(9F)

 

scsi_dmafree(9F)

 

scsi_free_consistent_buf(9F)

 

scsi_ifgetcap(9F)

Transport information and control 

scsi_ifsetcap(9F)

 

scsi_init_pkt(9F)

Resource management 

scsi_poll(9F)

Polled I/O 

scsi_probe(9F)

Probe functions 

scsi_reset(9F)

 

scsi_setup_cdb(9F)

CDB initialization function 

scsi_sync_pkt(9F)

 

scsi_transport(9F)

Command transport 

scsi_unprobe(9F)

 


Note –

If your driver needs to work with a SCSI-1 device, use the makecom(9F).


Hardware Configuration File

Because SCSI devices are not self-identifying, a hardware configuration file is required for a target driver. See the driver.conf(4) and scsi_free_consistent_buf(9F) man pages for details. The following is a typical configuration file:

    name="xx" class="scsi" target=2 lun=0;

The system reads the file during autoconfiguration. The system uses the class property to identify the driver's possible parent. Then, the system attempts to attach the driver to any parent driver that is of class scsi. All host bus adapter drivers are of this class. Using the class property rather than the parent property is preferred. This approach enables any host bus adapter driver that finds the expected device at the specified target and lun IDs to attach to the target. The target driver is responsible for verifying the class in its probe(9E) routine.

Declarations and Data Structures

Target drivers must include the header file <sys/scsi/scsi.h>.

SCSI target drivers must use the following command to generate a binary module:

ld -r xx xx.o -N"misc/scsi"

scsi_device Structure

The host bus adapter driver allocates and initializes a scsi_device(9S) structure for the target driver before either the probe(9E) or attach(9E) routine is called. This structure stores information about each SCSI logical unit, including pointers to information areas that contain both generic and device-specific information. One scsi_device(9S) structure exists for each logical unit that is attached to the system. The target driver can retrieve a pointer to this structure by calling ddi_get_driver_private(9F).


Caution – Caution –

Because the host bus adapter driver uses the private field in the target device's dev_info structure, target drivers must not use ddi_set_driver_private(9F).


The scsi_device(9S) structure contains the following fields:

struct scsi_device {
    struct scsi_address           sd_address;    /* opaque address */
    dev_info_t                    *sd_dev;       /* device node */
    kmutex_t                      sd_mutex;
    void                          *sd_reserved;
    struct scsi_inquiry           *sd_inq;
    struct scsi_extended_sense    *sd_sense;
    caddr_t                       sd_private;
};

where:

sd_address

Data structure that is passed to the routines for SCSI resource allocation.

sd_dev

Pointer to the target's dev_info structure.

sd_mutex

Mutex for use by the target driver. This mutex is initialized by the host bus adapter driver and can be used by the target driver as a per-device mutex. Do not hold this mutex across a call to scsi_transport(9F) or scsi_poll(9F). See Chapter 3, Multithreading for more information on mutexes.

sd_inq

Pointer for the target device's SCSI inquiry data. The scsi_probe(9F) routine allocates a buffer, fills the buffer in with inquiry data, and attaches the buffer to this field.

sd_sense

Pointer to a buffer to contain SCSI request sense data from the device. The target driver must allocate and manage this buffer. See attach() Entry Point (SCSI Target Drivers).

sd_private

Pointer field for use by the target driver. This field is commonly used to store a pointer to a private target driver state structure.

scsi_pkt Structure (Target Drivers)

The scsi_pkt structure contains the following fields:

struct scsi_pkt {
    opaque_t  pkt_ha_private;         /* private data for host adapter */
    struct scsi_address pkt_address;  /* destination packet is for */
    opaque_t  pkt_private;            /* private data for target driver */
    void     (*pkt_comp)(struct scsi_pkt *);  /* completion routine */
    uint_t   pkt_flags;               /* flags */
    int      pkt_time;                /* time allotted to complete command */
    uchar_t  *pkt_scbp;               /* pointer to status block */
    uchar_t  *pkt_cdbp;               /* pointer to command block */
    ssize_t  pkt_resid;               /* data bytes not transferred */
    uint_t   pkt_state;               /* state of command */
    uint_t   pkt_statistics;          /* statistics */
    uchar_t  pkt_reason;              /* reason completion called */
};

where:

pkt_address

Target device's address set by scsi_init_pkt(9F).

pkt_private

Place to store private data for the target driver. pkt_private is commonly used to save the buf(9S) pointer for the command.

pkt_comp

Address of the completion routine. The host bus adapter driver calls this routine when the driver has transported the command. Transporting the command does not mean that the command succeeded. The target might have been busy. Another possibility is that the target might not have responded before the time out period elapsed. See the description for pkt_time field. The target driver must supply a valid value in this field. This value can be NULL if the driver does not want to be notified.


Note –

Two different SCSI callback routines are provided. The pkt_comp field identifies a completion callback routine, which is called when the host bus adapter completes its processing. A resource callback routine is also available, which is called when currently unavailable resources are likely to be available. See the scsi_init_pkt(9F) man page.


pkt_flags

Provides additional control information, for example, to transport the command without disconnect privileges (FLAG_NODISCON) or to disable callbacks (FLAG_NOINTR). See the scsi_pkt(9S) man page for details.

pkt_time

Time out value in seconds. If the command is not completed within this time, the host bus adapter calls the completion routine with pkt_reason set to CMD_TIMEOUT. The target driver should set this field to longer than the maximum time the command might take. If the timeout is zero, no timeout is requested. Timeout starts when the command is transmitted on the SCSI bus.

pkt_scbp

Pointer to the block for SCSI status completion. This field is filled in by the host bus adapter driver.

pkt_cdbp

Pointer to the SCSI command descriptor block, the actual command to be sent to the target device. The host bus adapter driver does not interpret this field. The target driver must fill the field in with a command that the target device can process.

pkt_resid

Residual of the operation. The pkt_resid field has two different uses depending on how pkt_resid is used. When pkt_resid is used to allocate DMA resources for a command scsi_init_pkt(9F), pkt_resid indicates the number of unallocable bytes. DMA resources might not be allocated due to DMA hardware scatter-gather or other device limitations. After command transport, pkt_resid indicates the number of non-transferable data bytes. The field is filled in by the host bus adapter driver before the completion routine is called.

pkt_state

Indicates the state of the command. The host bus adapter driver fills in this field as the command progresses. One bit is set in this field for each of the five following command states:

  • STATE_GOT_BUS – Acquired the bus

  • STATE_GOT_TARGET – Selected the target

  • STATE_SENT_CMD – Sent the command

  • STATE_XFERRED_DATA – Transferred data, if appropriate

  • STATE_GOT_STATUS – Received status from the device

pkt_statistics

Contains transport-related statistics set by the host bus adapter driver.

pkt_reason

Gives the reason the completion routine was called. The completion routine decodes this field. The routine then takes the appropriate action. If the command completes, that is, no transport errors occur, this field is set to CMD_CMPLT. Other values in this field indicate an error. After a command is completed, the target driver should examine the pkt_scbp field for a check condition status. See the scsi_pkt(9S) man page for more information.

Autoconfiguration for SCSI Target Drivers

SCSI target drivers must implement the standard autoconfiguration routines _init(9E), _fini(9E), and _info(9E). See Loadable Driver Interfaces for more information.

The following routines are also required, but these routines must perform specific SCSI and SCSA processing:

probe() Entry Point (SCSI Target Drivers)

SCSI target devices are not self-identifying, so target drivers must have a probe(9E) routine. This routine must determine whether the expected type of device is present and responding.

The general structure and the return codes of the probe(9E) routine are the same as the structure and return codes for other device drivers. SCSI target drivers must use the scsi_probe(9F) routine in their probe(9E) entry point. scsi_probe(9F) sends a SCSI inquiry command to the device and returns a code that indicates the result. If the SCSI inquiry command is successful, scsi_probe(9F) allocates a scsi_inquiry(9S) structure and fills the structure in with the device's inquiry data. Upon return from scsi_probe(9F), the sd_inq field of the scsi_device(9S) structure points to this scsi_inquiry(9S) structure.

Because probe(9E) must be stateless, the target driver must call scsi_unprobe(9F) before probe(9E) returns, even if scsi_probe(9F) fails.

Example 17–1 shows a typical probe(9E) routine. The routine in the example retrieves the scsi_device(9S) structure from the private field of its dev_info structure. The routine also retrieves the device's SCSI target and logical unit numbers for printing in messages. The probe(9E) routine then calls scsi_probe(9F) to verify that the expected device, a printer in this case, is present.

If successful, scsi_probe(9F) attaches the device's SCSI inquiry data in a scsi_inquiry(9S) structure to the sd_inq field of the scsi_device(9S) structure. The driver can then determine whether the device type is a printer, which is reported in the inq_dtype field. If the device is a printer, the type is reported with scsi_log(9F), using scsi_dname(9F) to convert the device type into a string.


Example 17–1 SCSI Target Driver probe(9E) Routine

static int
xxprobe(dev_info_t *dip)
{
    struct scsi_device *sdp;
    int rval, target, lun;
    /*
     * Get a pointer to the scsi_device(9S) structure
     */
    sdp = (struct scsi_device *)ddi_get_driver_private(dip);

    target = sdp->sd_address.a_target;
    lun = sdp->sd_address.a_lun;
    /*
     * Call scsi_probe(9F) to send the Inquiry command. It will
     * fill in the sd_inq field of the scsi_device structure.
     */
    switch (scsi_probe(sdp, NULL_FUNC)) {
    case SCSIPROBE_FAILURE:
    case SCSIPROBE_NORESP:
    case SCSIPROBE_NOMEM:
       /*
        * In these cases, device might be powered off,
        * in which case we might be able to successfully
        * probe it at some future time - referred to
        * as `deferred attach'.
        */
        rval = DDI_PROBE_PARTIAL;
        break;
    case SCSIPROBE_NONCCS:
    default:
        /*
         * Device isn't of the type we can deal with,
         * and/or it will never be usable.
         */
        rval = DDI_PROBE_FAILURE;
        break;
    case SCSIPROBE_EXISTS:
        /*
         * There is a device at the target/lun address. Check
         * inq_dtype to make sure that it is the right device
         * type. See scsi_inquiry(9S)for possible device types.
         */
        switch (sdp->sd_inq->inq_dtype) {
        case DTYPE_PRINTER:
        scsi_log(sdp, "xx", SCSI_DEBUG,
           "found %s device at target%d, lun%d\n",
            scsi_dname((int)sdp->sd_inq->inq_dtype),
            target, lun);
        rval = DDI_PROBE_SUCCESS;
        break;
        case DTYPE_NOTPRESENT:
        default:
        rval = DDI_PROBE_FAILURE;
        break;     
        }    
    }
    scsi_unprobe(sdp);
    return (rval);
}

A more thorough probe(9E) routine could check scsi_inquiry(9S) to make sure that the device is of the type expected by a particular driver.

attach() Entry Point (SCSI Target Drivers)

After the probe(9E) routine has verified that the expected device is present, attach(9E) is called. attach() performs these tasks:

A SCSI target driver needs to call scsi_probe(9F) again to retrieve the device's inquiry data. The driver must also create a SCSI request sense packet. If the attach is successful, the attach() function should not call scsi_unprobe(9F).

Three routines are used to create the request sense packet: scsi_alloc_consistent_buf(9F), scsi_init_pkt(9F), and scsi_setup_cdb(9F). scsi_alloc_consistent_buf(9F) allocates a buffer that is suitable for consistent DMA. scsi_alloc_consistent_buf() then returns a pointer to a buf(9S) structure. The advantage of a consistent buffer is that no explicit synchronization of the data is required. In other words, the target driver can access the data after the callback. The sd_sense element of the device's scsi_device(9S) structure must be initialized with the address of the sense buffer. scsi_init_pkt(9F) creates and partially initializes a scsi_pkt(9S) structure. scsi_setup_cdb(9F) creates a SCSI command descriptor block, in this case by creating a SCSI request sense command.

Note that a SCSI device is not self-identifying and does not have a reg property. As a result, the driver must set the pm-hardware-state property. Setting pm-hardware-state informs the framework that this device needs to be suspended and then resumed.

The following example shows the SCSI target driver's attach() routine.


Example 17–2 SCSI Target Driver attach(9E) Routine

static int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
    struct xxstate         *xsp;
    struct scsi_pkt        *rqpkt = NULL;
    struct scsi_device     *sdp;
    struct buf         *bp = NULL;
    int            instance;
    instance = ddi_get_instance(dip);
    switch (cmd) {
        case DDI_ATTACH:
        break;
        case DDI_RESUME:
        /* For information, see the "Directory Memory Access (DMA)" */
        /* chapter in this book. */
        default:
        return (DDI_FAILURE);
    }
    /*
     * Allocate a state structure and initialize it.
     */
    xsp = ddi_get_soft_state(statep, instance);
    sdp = (struct scsi_device *)ddi_get_driver_private(dip);
    /*
     * Cross-link the state and scsi_device(9S) structures.
     */
    sdp->sd_private = (caddr_t)xsp;
    xsp->sdp = sdp;
    /*
     * Call scsi_probe(9F) again to get and validate inquiry data.
     * Allocate a request sense buffer. The buf(9S) structure
     * is set to NULL to tell the routine to allocate a new one.
     * The callback function is set to NULL_FUNC to tell the
     * routine to return failure immediately if no
     * resources are available.
     */
    bp = scsi_alloc_consistent_buf(&sdp->sd_address, NULL,
    SENSE_LENGTH, B_READ, NULL_FUNC, NULL);
    if (bp == NULL)
        goto failed;
    /*
     * Create a Request Sense scsi_pkt(9S) structure.
     */
    rqpkt = scsi_init_pkt(&sdp->sd_address, NULL, bp,
    CDB_GROUP0, 1, 0, PKT_CONSISTENT, NULL_FUNC, NULL);
    if (rqpkt == NULL)
        goto failed;
    /*
     * scsi_alloc_consistent_buf(9F) returned a buf(9S) structure.
     * The actual buffer address is in b_un.b_addr.
     */
    sdp->sd_sense = (struct scsi_extended_sense *)bp->b_un.b_addr;
    /*
     * Create a Group0 CDB for the Request Sense command
     */
    if (scsi_setup_cdb((union scsi_cdb *)rqpkt->pkt_cdbp,
        SCMD_REQUEST_SENSE, 0, SENSE__LENGTH, 0) == 0)
         goto failed;;
    /*
     * Fill in the rest of the scsi_pkt structure.
     * xxcallback() is the private command completion routine.
     */
    rqpkt->pkt_comp = xxcallback;
    rqpkt->pkt_time = 30; /* 30 second command timeout */
    rqpkt->pkt_flags |= FLAG_SENSING;
    xsp->rqs = rqpkt;
    xsp->rqsbuf = bp;
    /*
     * Create minor nodes, report device, and do any other initialization. */
     * Since the device does not have the 'reg' property,
     * cpr will not call its DDI_SUSPEND/DDI_RESUME entries.
     * The following code is to tell cpr that this device
     * needs to be suspended and resumed.
     */
    (void) ddi_prop_update_string(device, dip,
     "pm-hardware-state", "needs-suspend-resume");
    xsp->open = 0;
    return (DDI_SUCCESS);
failed:
    if (bp)
        scsi_free_consistent_buf(bp);
    if (rqpkt)
        scsi_destroy_pkt(rqpkt);
    sdp->sd_private = (caddr_t)NULL;
    sdp->sd_sense = NULL;
    scsi_unprobe(sdp);
    /* Free any other resources, such as the state structure. */
    return (DDI_FAILURE);
}

detach() Entry Point (SCSI Target Drivers)

The detach(9E) entry point is the inverse of attach(9E). detach() must free all resources that were allocated in attach(). If successful, the detach should call scsi_unprobe(9F). The following example shows a target driver detach() routine.


Example 17–3 SCSI Target Driver detach(9E) Routine

static int
xxdetach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
    struct xxstate *xsp;
    switch (cmd) {
    case DDI_DETACH:
      /*
       * Normal detach(9E) operations, such as getting a
       * pointer to the state structure
       */
      scsi_free_consistent_buf(xsp->rqsbuf);
      scsi_destroy_pkt(xsp->rqs);
      xsp->sdp->sd_private = (caddr_t)NULL;
      xsp->sdp->sd_sense = NULL;
      scsi_unprobe(xsp->sdp);
      /*
       * Remove minor nodes.
       * Free resources, such as the state structure and properties.
       */
          return (DDI_SUCCESS);
    case DDI_SUSPEND:
      /* For information, see the "Directory Memory Access (DMA)" */
      /* chapter in this book. */
    default:
      return (DDI_FAILURE);
    }
}

getinfo() Entry Point (SCSI Target Drivers)

The getinfo(9E) routine for SCSI target drivers is much the same as for other drivers (see getinfo() Entry Point for more information on DDI_INFO_DEVT2INSTANCE case). However, in the DDI_INFO_DEVT2DEVINFO case of the getinfo() routine, the target driver must return a pointer to its dev_info node. This pointer can be saved in the driver state structure or can be retrieved from the sd_dev field of the scsi_device(9S) structure. The following example shows an alternative SCSI target driver getinfo() code fragment.


Example 17–4 Alternative SCSI Target Driver getinfo() Code Fragment

case DDI_INFO_DEVT2DEVINFO:
    dev = (dev_t)arg;
    instance = getminor(dev);
    xsp = ddi_get_soft_state(statep, instance);
    if (xsp == NULL)
        return (DDI_FAILURE);
    *result = (void *)xsp->sdp->sd_dev;
    return (DDI_SUCCESS);

Resource Allocation

To send a SCSI command to the device, the target driver must create and initialize a scsi_pkt(9S) structure. This structure must then be passed to the host bus adapter driver.

scsi_init_pkt() Function

The scsi_init_pkt(9F) routine allocates and zeroes a scsi_pkt(9S) structure. scsi_init_pkt() also sets pointers to pkt_private, *pkt_scbp, and *pkt_cdbp. Additionally, scsi_init_pkt() provides a callback mechanism to handle the case where resources are not available. This function has the following syntax:

struct scsi_pkt *scsi_init_pkt(struct scsi_address *ap,
     struct scsi_pkt *pktp, struct buf *bp, int cmdlen,
     int statuslen, int privatelen, int flags,
     int (*callback)(caddr_t), caddr_t arg)

where:

ap

Pointer to a scsi_address structure. ap is the sd_address field of the device's scsi_device(9S) structure.

pktp

Pointer to the scsi_pkt(9S) structure to be initialized. If this pointer is set to NULL, a new packet is allocated.

bp

Pointer to a buf(9S) structure. If this pointer is not null and has a valid byte count, DMA resources are allocated.

cmdlen

Length of the SCSI command descriptor block in bytes.

statuslen

Required length of the SCSI status completion block in bytes.

privatelen

Number of bytes to allocate for the pkt_private field.

flags

Set of flags:

  • PKT_CONSISTENT – This bit must be set if the DMA buffer was allocated using scsi_alloc_consistent_buf(9F). In this case, the host bus adapter driver guarantees that the data transfer is properly synchronized before performing the target driver's command completion callback.

  • PKT_DMA_PARTIAL – This bit can be set if the driver accepts a partial DMA mapping. If set, scsi_init_pkt(9F) allocates DMA resources with the DDI_DMA_PARTIAL flag set. The pkt_resid field of the scsi_pkt(9S) structure can be returned with a nonzero residual. A nonzero value indicates the number of bytes for which scsi_init_pkt(9F) was unable to allocate DMA resources.

callback

Specifies the action to take if resources are not available. If set to NULL_FUNC, scsi_init_pkt(9F) returns the value NULL immediately. If set to SLEEP_FUNC, scsi_init_pkt() does not return until resources are available. Any other valid kernel address is interpreted as the address of a function to be called when resources are likely to be available.

arg

Parameter to pass to the callback function.

The scsi_init_pkt() routine synchronizes the data prior to transport. If the driver needs to access the data after transport, the driver should call scsi_sync_pkt(9F) to flush any intermediate caches. The scsi_sync_pkt() routine can be used to synchronize any cached data.

scsi_sync_pkt() Function

If the target driver needs to resubmit the packet after changing the data, scsi_sync_pkt(9F) must be called before calling scsi_transport(9F). However, if the target driver does not need to access the data, scsi_sync_pkt() does not need to be called after the transport.

scsi_destroy_pkt() Function

The scsi_destroy_pkt(9F) routine synchronizes any remaining cached data that is associated with the packet, if necessary. The routine then frees the packet and associated command, status, and target driver-private data areas. This routine should be called in the command completion routine.

scsi_alloc_consistent_buf() Function

For most I/O requests, the data buffer passed to the driver entry points is not accessed directly by the driver. The buffer is just passed on to scsi_init_pkt(9F). If a driver sends SCSI commands that operate on buffers that the driver itself examines, the buffers should be DMA consistent. The SCSI request sense command is a good example. The scsi_alloc_consistent_buf(9F) routine allocates a buf(9S) structure and a data buffer that is suitable for DMA-consistent operations. The HBA performs any necessary synchronization of the buffer before performing the command completion callback.


Note –

scsi_alloc_consistent_buf(9F) uses scarce system resources. Thus, you should use scsi_alloc_consistent_buf() sparingly.


scsi_free_consistent_buf() Function

scsi_free_consistent_buf(9F) releases a buf(9S) structure and the associated data buffer allocated with scsi_alloc_consistent_buf(9F). See attach() Entry Point (SCSI Target Drivers) and detach() Entry Point (SCSI Target Drivers) for examples.

Building and Transporting a Command

The host bus adapter driver is responsible for transmitting the command to the device. Furthermore, the driver is responsible for handling the low-level SCSI protocol. The scsi_transport(9F) routine hands a packet to the host bus adapter driver for transmission. The target driver has the responsibility to create a valid scsi_pkt(9S) structure.

Building a Command

The routine scsi_init_pkt(9F) allocates space for a SCSI CDB, allocates DMA resources if necessary, and sets the pkt_flags field, as shown in this example:

pkt = scsi_init_pkt(&sdp->sd_address, NULL, bp,
CDB_GROUP0, 1, 0, 0, SLEEP_FUNC, NULL);

This example creates a new packet along with allocating DMA resources as specified in the passed buf(9S) structure pointer. A SCSI CDB is allocated for a Group 0 (6-byte) command. The pkt_flags field is set to zero, but no space is allocated for the pkt_private field. This call to scsi_init_pkt(9F), because of the SLEEP_FUNC parameter, waits indefinitely for resources if no resources are currently available.

The next step is to initialize the SCSI CDB, using the scsi_setup_cdb(9F) function:

    if (scsi_setup_cdb((union scsi_cdb *)pkt->pkt_cdbp,
     SCMD_READ, bp->b_blkno, bp->b_bcount >> DEV_BSHIFT, 0) == 0)
     goto failed;

This example builds a Group 0 command descriptor block. The example fills in the pkt_cdbp field as follows:


Note –

scsi_setup_cdb(9F) does not support setting a target device's logical unit number (LUN) in bits 5-7 of byte 1 of the SCSI command block. This requirement is defined by SCSI-1. For SCSI-1 devices that require the LUN bits set in the command block, use makecom_g0(9F) or some equivalent rather than scsi_setup_cdb(9F).


After initializing the SCSI CDB, initialize three other fields in the packet and store as a pointer to the packet in the state structure.

pkt->pkt_private = (opaque_t)bp;
pkt->pkt_comp = xxcallback;
pkt->pkt_time = 30;
xsp->pkt = pkt;

The buf(9S) pointer is saved in the pkt_private field for later use in the completion routine.

Setting Target Capabilities

The target drivers use scsi_ifsetcap(9F) to set the capabilities of the host adapter driver. A cap is a name-value pair, consisting of a null-terminated character string and an integer value. The current value of a capability can be retrieved using scsi_ifgetcap(9F). scsi_ifsetcap(9F) allows capabilities to be set for all targets on the bus.

In general, however, setting capabilities of targets that are not owned by the target driver is not recommended. This practice is not universally supported by HBA drivers. Some capabilities, such as disconnect and synchronous, can be set by default by the HBA driver. Other capabilities might need to be set explicitly by the target driver. Wide-xfer and tagged-queueing must be set by the target drive, for example.

Transporting a Command

After the scsi_pkt(9S) structure is filled in, use scsi_transport(9F) to hand the structure to the bus adapter driver:

    if (scsi_transport(pkt) != TRAN_ACCEPT) {
     bp->b_resid = bp->b_bcount;
     bioerror(bp, EIO);
     biodone(bp);
     }

The other return values from scsi_transport(9F) are as follows:


Note –

The mutex sd_mutex in the scsi_device(9S) structure must not be held across a call to scsi_transport(9F).


If scsi_transport(9F) returns TRAN_ACCEPT, the packet becomes the responsibility of the host bus adapter driver. The packet should not be accessed by the target driver until the command completion routine is called.

Synchronous scsi_transport() Function

If FLAG_NOINTR is set in the packet, then scsi_transport(9F) does not return until the command is complete. No callback is performed.


Note –

Do not use FLAG_NOINTR in interrupt context.


Command Completion

When the host bus adapter driver is through with the command, the driver invokes the packet's completion callback routine. The driver then passes a pointer to the scsi_pkt(9S) structure as a parameter. After decoding the packet, the completion routine takes the appropriate action.

Example 17–5 presents a simple completion callback routine. This code checks for transport failures. In case of failure, the routine gives up rather than retrying the command. If the target is busy, extra code is required to resubmit the command at a later time.

If the command results in a check condition, the target driver needs to send a request sense command unless auto request sense has been enabled.

Otherwise, the command succeeded. At the end of processing for the command, the command destroys the packet and calls biodone(9F).

In the event of a transport error, such as a bus reset or parity problem, the target driver can resubmit the packet by using scsi_transport(9F). No values in the packet need to be changed prior to resubmitting.

The following example does not attempt to retry incomplete commands.


Note –

Normally, the target driver's callback function is called in interrupt context. Consequently, the callback function should never sleep.



Example 17–5 Completion Routine for a SCSI Driver

static void
xxcallback(struct scsi_pkt *pkt)
{
    struct buf        *bp;
    struct xxstate    *xsp;
    minor_t           instance;
    struct scsi_status *ssp;
    /*
     * Get a pointer to the buf(9S) structure for the command
     * and to the per-instance data structure.
     */
    bp = (struct buf *)pkt->pkt_private;
    instance = getminor(bp->b_edev);
    xsp = ddi_get_soft_state(statep, instance);
    /*
     * Figure out why this callback routine was called
     */
    if (pkt->pkt_reason != CMP_CMPLT) {
       bp->b_resid = bp->b_bcount;
       bioerror(bp, EIO);
       scsi_destroy_pkt(pkt);          /* Release resources */
       biodone(bp);                    /* Notify waiting threads */ ;
    } else {
       /*
        * Command completed, check status.
        * See scsi_status(9S)
        */
       ssp = (struct scsi_status *)pkt->pkt_scbp;
       if (ssp->sts_busy) {
           /* error, target busy or reserved */
       } else if (ssp->sts_chk) {
           /* Send a request sense command. */
       } else {
        bp->b_resid = pkt->pkt_resid;  /* Packet completed OK */
        scsi_destroy_pkt(pkt);
        biodone(bp);
       }
    }
}

Reuse of Packets

A target driver can reuse packets in the following ways:

Auto-Request Sense Mode

Auto-request sense mode is most desirable if queuing is used, whether the queuing is tagged or untagged. A contingent allegiance condition is cleared by any subsequent command and, consequently, the sense data is lost. Most HBA drivers start the next command before performing the target driver callback. Other HBA drivers can use a separate, lower-priority thread to perform the callbacks. This approach might increase the time needed to notify the target driver that the packet completed with a check condition. In this case, the target driver might not be able to submit a request sense command in time to retrieve the sense data.

To avoid this loss of sense data, the HBA driver, or controller, should issue a request sense command if a check condition has been detected. This mode is known as auto-request sense mode. Note that not all HBA drivers are capable of auto-request sense mode, and some drivers can only operate with auto-request sense mode enabled.

A target driver enables auto-request-sense mode by using scsi_ifsetcap(9F). The following example shows auto-request sense enabling.


Example 17–6 Enabling Auto-Request Sense Mode

static int
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
    struct xxstate *xsp;
    struct scsi_device *sdp = (struct scsi_device *)
    ddi_get_driver_private(dip);
   /*
    * Enable auto-request-sense; an auto-request-sense cmd might
    * fail due to a BUSY condition or transport error. Therefore,
    * it is recommended to allocate a separate request sense
    * packet as well.
    * Note that scsi_ifsetcap(9F) can return -1, 0, or 1
    */
    xsp->sdp_arq_enabled =
    ((scsi_ifsetcap(ROUTE, “auto-rqsense”, 1, 1) == 1) ? 1 :
0);
   /*
    * If the HBA driver supports auto request sense then the
    * status blocks should be sizeof (struct scsi_arq_status);
    */
else
   /*
    * One byte is sufficient
    */
    xsp->sdp_cmd_stat_size =  (xsp->sdp_arq_enabled ?
    sizeof (struct scsi_arq_status) : 1);
   /* ... */
}

If a packet is allocated using scsi_init_pkt(9F) and auto-request sense is desired on this packet, additional space is needed. The target driver must request this space for the status block to hold the auto-request sense structure. The sense length used in the request sense command is sizeof, from struct scsi_extended_sense. Auto-request sense can be disabled per individual packet by allocating sizeof, from struct scsi_status, for the status block.

The packet is submitted using scsi_transport(9F) as usual. When a check condition occurs on this packet, the host adapter driver takes the following steps:

The target driver's callback routine should verify that sense data is available by checking the STATE_ARQ_DONE bit in pkt_state. STATE_ARQ_DONE implies that a check condition has occurred and that a request sense has been performed. If auto-request sense has been temporarily disabled in a packet, subsequent retrieval of the sense data cannot be guaranteed.

The target driver should then verify whether the auto-request sense command completed successfully and decode the sense data.

Dump Handling

The dump(9E) entry point copies a portion of virtual address space directly to the specified device in the case of system failure or checkpoint operation. See the cpr(7) and dump(9E) man pages. The dump(9E) entry point must be capable of performing this operation without the use of interrupts.

The arguments for dump() are as follows:

dev

Device number of the dump device

addr

Kernel virtual address at which to start the dump

blkno

First destination block on the device

nblk

Number of blocks to dump


Example 17–7 dump(9E) Routine

static int
xxdump(dev_t dev, caddr_t addr, daddr_t blkno, int nblk)
{
    struct xxstate     *xsp;
    struct buf         *bp;
    struct scsi_pkt    *pkt;
    int    rval;
    int    instance;

    instance = getminor(dev);
    xsp = ddi_get_soft_state(statep, instance);

    if (tgt->suspended) {
    (void) pm_raise_power(DEVINFO(tgt), 0, 1);
    }

    bp = getrbuf(KM_NOSLEEP);
    if (bp == NULL) {
    return (EIO);
    }

/* Calculate block number relative to partition. */
    
bp->b_un.b_addr = addr;
    bp->b_edev = dev;
    bp->b_bcount = nblk * DEV_BSIZE;
    bp->b_flags = B_WRITE | B_BUSY;
    bp->b_blkno = blkno;

    pkt = scsi_init_pkt(ROUTE(tgt), NULL, bp, CDB_GROUP1,
    sizeof (struct scsi_arq_status),
    sizeof (struct bst_pkt_private), 0, NULL_FUNC, NULL);
    if (pkt == NULL) {
    freerbuf(bp);
    return (EIO);
    }
    (void) scsi_setup_cdb((union scsi_cdb *)pkt->pkt_cdbp,
        SCMD_WRITE_G1, blkno, nblk, 0);
    /*
     * While dumping in polled mode, other cmds might complete
     * and these should not be resubmitted. we set the
     * dumping flag here which prevents requeueing cmds.
     */
    tgt->dumping = 1;
    rval = scsi_poll(pkt);
    tgt->dumping = 0;

    scsi_destroy_pkt(pkt);
    freerbuf(bp);

    if (rval != DDI_SUCCESS) {
    rval = EIO;
    }

    return (rval);
}

SCSI Options

SCSA defines a global variable, scsi_options, for control and debugging. The defined bits in scsi_options can be found in the file <sys/scsi/conf/autoconf.h>. The scsi_options uses the bits as follows:

SCSI_OPTIONS_DR

Enables global disconnect or reconnect.

SCSI_OPTIONS_FAST

Enables global FAST SCSI support: 10 Mbytes/sec transfers. The HBA should not operate in FAST SCSI mode unless the SCSI_OPTIONS_FAST (0x100) bit is set.

SCSI_OPTIONS_FAST20

Enables global FAST20 SCSI support: 20 Mbytes/sec transfers. The HBA should not operate in FAST20 SCSI mode unless the SCSI_OPTIONS_FAST20 (0x400) bit is set.

SCSI_OPTIONS_FAST40

Enables global FAST40 SCSI support: 40 Mbytes/sec transfers. The HBA should not operate in FAST40 SCSI mode unless the SCSI_OPTIONS_FAST40 (0x800) bit is set.

SCSI_OPTIONS_FAST80

Enables global FAST80 SCSI support: 80 Mbytes/sec transfers. The HBA should not operate in FAST80 SCSI mode unless the SCSI_OPTIONS_FAST80 (0x1000) bit is set.

SCSI_OPTIONS_FAST160

Enables global FAST160 SCSI support: 160 Mbytes/sec transfers. The HBA should not operate in FAST160 SCSI mode unless the SCSI_OPTIONS_FAST160 (0x2000) bit is set.

SCSI_OPTIONS_FAST320

Enables global FAST320 SCSI support: 320 Mbytes/sec transfers. The HBA should not operate in FAST320 SCSI mode unless the SCSI_OPTIONS_FAST320 (0x4000) bit is set.

SCSI_OPTIONS_LINK

Enables global link support.

SCSI_OPTIONS_PARITY

Enables global parity support.

SCSI_OPTIONS_QAS

Enables the Quick Arbitration Select feature. QAS is used to decrease protocol overhead when devices arbitrate for and access the bus. QAS is only supported on Ultra4 (FAST160) SCSI devices, although not all such devices support QAS. The HBA should not operate in QAS SCSI mode unless the SCSI_OPTIONS_QAS (0x100000) bit is set. Consult the appropriate Sun hardware documentation to determine whether your machine supports QAS.

SCSI_OPTIONS_SYNC

Enables global synchronous transfer capability.

SCSI_OPTIONS_TAG

Enables global tagged queuing support.

SCSI_OPTIONS_WIDE

Enables global WIDE SCSI.


Note –

The setting of scsi_options affects all host bus adapter drivers and all target drivers that are present on the system. Refer to the scsi_hba_attach(9F) man page for information on controlling these options for a particular host adapter.


Chapter 18 SCSI Host Bus Adapter Drivers

This chapter contains information on creating SCSI host bus adapter (HBA) drivers. The chapter provides sample code illustrating the structure of a typical HBA driver. The sample code shows the use of the HBA driver interfaces that are provided by the Sun Common SCSI Architecture (SCSA). This chapter provides information on the following subjects:

Introduction to Host Bus Adapter Drivers

As described in Chapter 17, SCSI Target Drivers, the DDI/DKI divides the software interface to SCSI devices into two major parts:

Target device refers to a device on a SCSI bus, such as a disk or a tape drive. Target driver refers to a software component installed as a device driver. Each target device on a SCSI bus is controlled by one instance of the target driver.

Host bus adapter device refers to HBA hardware, such as an SBus or PCI SCSI adapter card. Host bus adapter driver refers to a software component that is installed as a device driver. Some examples are the esp driver on a SPARC machine, the ncrs driver on an x86 machine, and the isp driver, which works on both architectures. An instance of the HBA driver controls each of its host bus adapter devices that are configured in the system.

The Sun Common SCSI Architecture (SCSA) defines the interface between the target and HBA components.


Note –

Understanding SCSI target drivers is an essential prerequisite to writing effective SCSI HBA drivers. For information on SCSI target drivers, see Chapter 17, SCSI Target Drivers. Target driver developers can also benefit from reading this chapter.


The host bus adapter driver is responsible for performing the following tasks:

SCSI Interface

SCSA is the DDI/DKI programming interface for the transmission of SCSI commands from a target driver to a host adapter driver. By conforming to the SCSA, the target driver can easily pass any combination of SCSI commands and sequences to a target device. Knowledge of the hardware implementation of the host adapter is not necessary. Conceptually, SCSA separates the building of a SCSI command from the transporting of the command with data to the SCSI bus. SCSA manages the connections between the target and HBA drivers through an HBA transportlayer, as shown in the following figure.

Figure 18–1 SCSA Interface

Diagram shows the host bus adapter transport layer between
a target driver and SCSI devices.

The HBA transport layer is a software and hardware layer that is responsible for transporting a SCSI command to a SCSI target device. The HBA driver provides resource allocation, DMA management, and transport services in response to requests made by SCSI target drivers through SCSA. The host adapter driver also manages the host adapter hardware and the SCSI protocols necessary to perform the commands. When a command has been completed, the HBA driver calls the target driver's SCSI pkt command completion routine.

The following example illustrates this flow, with emphasis on the transfer of information from target drivers to SCSA to HBA drivers. The figure also shows typical transport entry points and function calls.

Figure 18–2 Transport Layer Flow

Diagram shows how commands flow through the HBA transport
layer.

SCSA HBA Interfaces

SCSA HBA interfaces include HBA entry points, HBA data structures, and an HBA framework.

SCSA HBA Entry Point Summary

SCSA defines a number of HBA driver entry points. These entry points are listed in the following table. The entry points are called by the system when a target driver instance connected to the HBA driver is configured. The entry points are also called when the target driver makes a SCSA request. See Entry Points for SCSA HBA Drivers for more information.

Table 18–1 SCSA HBA Entry Point Summary

Function Name 

Called as a Result of 

tran_abort(9E)

Target driver calling scsi_abort(9F)

tran_bus_reset(9E)

System resetting bus 

tran_destroy_pkt(9E)

Target driver calling scsi_destroy_pkt(9F)

tran_dmafree(9E)

Target driver calling scsi_dmafree(9F)

tran_getcap(9E)

Target driver calling scsi_ifgetcap(9F)

tran_init_pkt(9E)

Target driver calling scsi_init_pkt(9F)

tran_quiesce(9E)

System quiescing bus 

tran_reset(9E)

Target driver calling scsi_reset(9F)

tran_reset_notify(9E)

Target driver calling scsi_reset_notify(9F)

tran_setcap(9E)

Target driver calling scsi_ifsetcap(9F)

tran_start(9E)

Target driver calling scsi_transport(9F)

tran_sync_pkt(9E)

Target driver calling scsi_sync_pkt(9F)

tran_tgt_free(9E)

System detaching target device instance 

tran_tgt_init(9E)

System attaching target device instance 

tran_tgt_probe(9E)

Target driver calling scsi_probe(9F)

tran_unquiesce(9E)

System resuming activity on bus 

SCSA HBA Data Structures

SCSA defines data structures to enable the exchange of information between the target and HBA drivers. The following data structures are included:

scsi_hba_tran() Structure

Each instance of an HBA driver must allocate a scsi_hba_tran(9S) structure by using the scsi_hba_tran_alloc(9F) function in the attach(9E) entry point. The scsi_hba_tran_alloc() function initializes the scsi_hba_tran structure. The HBA driver must initialize specific vectors in the transport structure to point to entry points within the HBA driver. After the scsi_hba_tran structure is initialized, the HBA driver exports the transport structure to SCSA by calling the scsi_hba_attach_setup(9F) function.


Caution – Caution –

Because SCSA keeps a pointer to the transport structure in the driver-private field on the devinfo node, HBA drivers must not use ddi_set_driver_private(9F). HBA drivers can, however, use ddi_get_driver_private(9F) to retrieve the pointer to the transport structure.


The SCSA interfaces require the HBA driver to supply a number of entry points that are callable through the scsi_hba_tran structure. See Entry Points for SCSA HBA Drivers for more information.

The scsi_hba_tran structure contains the following fields:

struct scsi_hba_tran {
    dev_info_t          *tran_hba_dip;          /* HBAs dev_info pointer */
    void                *tran_hba_private;      /* HBA softstate */
    void                *tran_tgt_private;      /* HBA target private pointer */
    struct scsi_device  *tran_sd;               /* scsi_device */
    int                 (*tran_tgt_init)();     /* Transport target */
                                                /* Initialization */
    int                 (*tran_tgt_probe)();    /* Transport target probe */
    void                (*tran_tgt_free)();     /* Transport target free */
    int                 (*tran_start)();        /* Transport start */
    int                 (*tran_reset)();        /* Transport reset */
    int                 (*tran_abort)();        /* Transport abort */
    int                 (*tran_getcap)();       /* Capability retrieval */
    int                 (*tran_setcap)();       /* Capability establishment */
    struct scsi_pkt     *(*tran_init_pkt)();    /* Packet and DMA allocation */
    void                (*tran_destroy_pkt)();  /* Packet and DMA */
                                                /* Deallocation */
    void                (*tran_dmafree)();      /* DMA deallocation */
    void                (*tran_sync_pkt)();     /* Sync DMA */
    void                (*tran_reset_notify)(); /* Bus reset notification */
    int                 (*tran_bus_reset)();    /* Reset bus only */
    int                 (*tran_quiesce)();      /* Quiesce a bus */
    int                 (*tran_unquiesce)();    /* Unquiesce a bus */
    int                 tran_interconnect_type; /* transport interconnect */
};

The following descriptions give more information about these scsi_hba_tran structure fields:

tran_hba_dip

Pointer to the HBA device instance dev_info structure. The function scsi_hba_attach_setup(9F) sets this field.

tran_hba_private

Pointer to private data maintained by the HBA driver. Usually, tran_hba_private contains a pointer to the state structure of the HBA driver.

tran_tgt_private

Pointer to private data maintained by the HBA driver when using cloning. By specifying SCSI_HBA_TRAN_CLONE when calling scsi_hba_attach_setup(9F), the scsi_hba_tran(9S) structure is cloned once per target. This approach enables the HBA to initialize this field to point to a per-target instance data structure in the tran_tgt_init(9E) entry point. If SCSI_HBA_TRAN_CLONE is not specified, tran_tgt_private is NULL, and tran_tgt_private must not be referenced. See Transport Structure Cloning for more information.

tran_sd

Pointer to a per-target instance scsi_device(9S) structure used when cloning. If SCSI_HBA_TRAN_CLONE is passed to scsi_hba_attach_setup(9F), tran_sd is initialized to point to the per-target scsi_device structure. This initialization takes place before any HBA functions are called on behalf of that target. If SCSI_HBA_TRAN_CLONE is not specified, tran_sd is NULL, and tran_sd must not be referenced. See Transport Structure Cloning for more information.

tran_tgt_init

Pointer to the HBA driver entry point that is called when initializing a target device instance. If no per-target initialization is required, the HBA can leave tran_tgt_init set to NULL.

tran_tgt_probe

Pointer to the HBA driver entry point that is called when a target driver instance calls scsi_probe(9F). This routine is called to probe for the existence of a target device. If no target probing customization is required for this HBA, the HBA should set tran_tgt_probe to scsi_hba_probe(9F).

tran_tgt_free

Pointer to the HBA driver entry point that is called when a target device instance is destroyed. If no per-target deallocation is necessary, the HBA can leave tran_tgt_free set to NULL.

tran_start

Pointer to the HBA driver entry point that is called when a target driver calls scsi_transport(9F).

tran_reset

Pointer to the HBA driver entry point that is called when a target driver calls scsi_reset(9F).

tran_abort

Pointer to the HBA driver entry point that is called when a target driver calls scsi_abort(9F).

tran_getcap

Pointer to the HBA driver entry point that is called when a target driver calls scsi_ifgetcap(9F).

tran_setcap

Pointer to the HBA driver entry point that is called when a target driver calls scsi_ifsetcap(9F).

tran_init_pkt

Pointer to the HBA driver entry point that is called when a target driver calls scsi_init_pkt(9F).

tran_destroy_pkt

Pointer to the HBA driver entry point that is called when a target driver calls scsi_destroy_pkt(9F).

tran_dmafree

Pointer to the HBA driver entry point that is called when a target driver calls scsi_dmafree(9F).

tran_sync_pkt

Pointer to the HBA driver entry point that is called when a target driver calls scsi_sync_pkt(9F).

tran_reset_notify

Pointer to the HBA driver entry point that is called when a target driver calls tran_reset_notify(9E).

tran_bus_reset

The function entry that resets the SCSI bus without resetting targets.

tran_quiesce

The function entry that waits for all outstanding commands to complete and blocks (or queues) any I/O requests issued.

tran_unquiesce

The function entry that allows I/O activities to resume on the SCSI bus.

tran_interconnect_type

Integer value denoting interconnect type of the transport as defined in the services.h header file.

scsi_address Structure

The scsi_address(9S) structure provides transport and addressing information for each SCSI command that is allocated and transported by a target driver instance.

The scsi_address structure contains the following fields:

struct scsi_address {
    struct scsi_hba_tran    *a_hba_tran;    /* Transport vectors */
    ushort_t                a_target;       /* Target identifier */
    uchar_t                 a_lun;          /* LUN on that target */
    uchar_t                 a_sublun;       /* Sub LUN on that LUN */
                                            /* Not used */
};
a_hba_tran

Pointer to the scsi_hba_tran(9S) structure, as allocated and initialized by the HBA driver. If SCSI_HBA_TRAN_CLONE was specified as the flag to scsi_hba_attach_setup(9F), a_hba_tran points to a copy of that structure.

a_target

Identifies the SCSI target on the SCSI bus.

a_lun

Identifies the SCSI logical unit on the SCSI target.

scsi_device Structure

The HBA framework allocates and initializes a scsi_device(9S) structure for each instance of a target device. The allocation and initialization occur before the framework calls the HBA driver's tran_tgt_init(9E) entry point. This structure stores information about each SCSI logical unit, including pointers to information areas that contain both generic and device-specific information. One scsi_device(9S) structure exists for each target device instance that is attached to the system.

If the per-target initialization is successful, the HBA framework sets the target driver's per-instance private data to point to the scsi_device(9S) structure, using ddi_set_driver_private(9F). Note that an initialization is successful if tran_tgt_init() returns success or if the vector is null.

The scsi_device(9S) structure contains the following fields:

struct scsi_device {
    struct scsi_address           sd_address;    /* routing information */
    dev_info_t                    *sd_dev;       /* device dev_info node */
    kmutex_t                      sd_mutex;      /* mutex used by device */
    void                          *sd_reserved;
    struct scsi_inquiry           *sd_inq;
    struct scsi_extended_sense    *sd_sense;
    caddr_t                       sd_private;    /* for driver's use */
};

where:

sd_address

Data structure that is passed to the routines for SCSI resource allocation.

sd_dev

Pointer to the target's dev_info structure.

sd_mutex

Mutex for use by the target driver. This mutex is initialized by the HBA framework. The mutex can be used by the target driver as a per-device mutex. This mutex should not be held across a call to scsi_transport(9F) or scsi_poll(9F). See Chapter 3, Multithreading for more information on mutexes.

sd_inq

Pointer for the target device's SCSI inquiry data. The scsi_probe(9F) routine allocates a buffer, fills the buffer in, and attaches the buffer to this field.

sd_sense

Pointer to a buffer to contain request sense data from the device. The target driver must allocate and manage this buffer itself. See the target driver's attach(9E) routine in attach() Entry Point for more information.

sd_private

Pointer field for use by the target driver. This field is commonly used to store a pointer to a private target driver state structure.

scsi_pkt Structure (HBA)

To execute SCSI commands, a target driver must first allocate a scsi_pkt(9S) structure for the command. The target driver must then specify its own private data area length, the command status, and the command length. The HBA driver is responsible for implementing the packet allocation in the tran_init_pkt(9E) entry point. The HBA driver is also responsible for freeing the packet in its tran_destroy_pkt(9E) entry point. See scsi_pkt Structure (Target Drivers) for more information.

The scsi_pkt(9S) structure contains these fields:

struct scsi_pkt {
    opaque_t pkt_ha_private;             /* private data for host adapter */
    struct scsi_address pkt_address;     /* destination address */
    opaque_t pkt_private;                /* private data for target driver */
    void (*pkt_comp)(struct scsi_pkt *); /* completion routine */
    uint_t  pkt_flags;                   /* flags */
    int     pkt_time;                    /* time allotted to complete command */
    uchar_t *pkt_scbp;                   /* pointer to status block */
    uchar_t *pkt_cdbp;                   /* pointer to command block */
    ssize_t pkt_resid;                   /* data bytes not transferred */
    uint_t  pkt_state;                   /* state of command */
    uint_t  pkt_statistics;              /* statistics */
    uchar_t pkt_reason;                  /* reason completion called */
};

where:

pkt_ha_private

Pointer to per-command HBA-driver private data.

pkt_address

Pointer to the scsi_address(9S) structure providing address information for this command.

pkt_private

Pointer to per-packet target-driver private data.

pkt_comp

Pointer to the target-driver completion routine called by the HBA driver when the transport layer has completed this command.

pkt_flags

Flags for the command.

pkt_time

Specifies the completion timeout in seconds for the command.

pkt_scbp

Pointer to the status completion block for the command.

pkt_cdbp

Pointer to the command descriptor block (CDB) for the command.

pkt_resid

Count of the data bytes that were not transferred when the command completed. This field can also be used to specify the amount of data for which resources have not been allocated. The HBA must modify this field during transport.

pkt_state

State of the command. The HBA must modify this field during transport.

pkt_statistics

Provides a history of the events that the command experienced while in the transport layer. The HBA must modify this field during transport.

pkt_reason

Reason for command completion. The HBA must modify this field during transport.

Per-Target Instance Data

An HBA driver must allocate a scsi_hba_tran(9S) structure during attach(9E). The HBA driver must then initialize the vectors in this transport structure to point to the required entry points for the HBA driver. This scsi_hba_tran structure is then passed into scsi_hba_attach_setup(9F).

The scsi_hba_tran structure contains a tran_hba_private field, which can be used to refer to the HBA driver's per-instance state.

Each scsi_address(9S) structure contains a pointer to the scsi_hba_tran structure. In addition, the scsi_address structure provides the target, that is, a_target, and logical unit (a_lun) addresses for the particular target device. Each entry point for the HBA driver is passed a pointer to the scsi_address structure, either directly or indirectly through the scsi_device(9S) structure. As a result, the HBA driver can reference its own state. The HBA driver can also identify the target device that is addressed.

The following figure illustrates the HBA data structures for transport operations.

Figure 18–3 HBA Transport Structures

Diagram shows the relationships of structures involved
in the HBA transport layer.

Transport Structure Cloning

Cloning can be useful if an HBA driver needs to maintain per-target private data in the scsi_hba_tran(9S) structure. Cloning can also be used to maintain a more complex address than is provided in the scsi_address(9S) structure.

In the cloning process, the HBA driver must still allocate a scsi_hba_tran structure at attach(9E) time. The HBA driver must also initialize the tran_hba_private soft state pointer and the entry point vectors for the HBA driver. The difference occurs when the framework begins to connect an instance of a target driver to the HBA driver. Before calling the HBA driver's tran_tgt_init(9E) entry point, the framework clones the scsi_hba_tran structure that is associated with that instance of the HBA. Accordingly, each scsi_address structure that is allocated and initialized for a particular target device instance points to a per-target instance copy of the scsi_hba_tran structure. The scsi_address structures do not point to the scsi_hba_tran structure that is allocated by the HBA driver at attach() time.

An HBA driver can use two important pointers when cloning is specified. These pointers are contained in the scsi_hba_tran structure. The first pointer is the tran_tgt_private field, which the driver can use to point to per-target HBA private data. The tran_tgt_private pointer is useful, for example, if an HBA driver needs to maintain a more complex address than a_target and a_lun provide. The second pointer is the tran_sd field, which is a pointer to the scsi_device(9S) structure referring to the particular target device.

When specifying cloning, the HBA driver must allocate and initialize the per-target data. The HBA driver must then initialize the tran_tgt_private field to point to this data during its tran_tgt_init(9E) entry point. The HBA driver must free this per-target data during its tran_tgt_free(9E) entry point.

When cloning, the framework initializes the tran_sd field to point to the scsi_device structure before the HBA driver tran_tgt_init() entry point is called. The driver requests cloning by passing the SCSI_HBA_TRAN_CLONE flag to scsi_hba_attach_setup(9F). The following figure illustrates the HBA data structures for cloning transport operations.

Figure 18–4 Cloning Transport Operation

Diagram shows an example of cloned HBA structures.

SCSA HBA Functions

SCSA also provides a number of functions. The functions are listed in the following table, for use by HBA drivers.

Table 18–2 SCSA HBA Functions

Function Name 

Called by Driver Entry Point 

scsi_hba_init(9F)

_init(9E)

scsi_hba_fini(9F)

_fini(9E)

scsi_hba_attach_setup(9F)

attach(9E)

scsi_hba_detach(9F)

detach(9E)

scsi_hba_tran_alloc(9F)

attach(9E)

scsi_hba_tran_free(9F)

detach(9E)

scsi_hba_probe(9F)

tran_tgt_probe(9E)

scsi_hba_pkt_alloc(9F)

tran_init_pkt(9E)

scsi_hba_pkt_free(9F)

tran_destroy_pkt(9E)

scsi_hba_lookup_capstr(9F)

tran_getcap(9E) and tran_setcap(9E)

HBA Driver Dependency and Configuration Issues

In addition to incorporating SCSA HBA entry points, structures, and functions into a driver, a developer must deal with driver dependency and configuration issues. These issues involve configuration properties, dependency declarations, state structure and per-command structure, entry points for module initialization, and autoconfiguration entry points.

Declarations and Structures

HBA drivers must include the following header files:

#include <sys/scsi/scsi.h>
#include <sys/ddi.h>
#include <sys/sunddi.h>

To inform the system that the module depends on SCSA routines, the driver binary must be generated with the following command. See SCSA HBA Interfaces for more information on SCSA routines.


% ld -r xx.o -o xx -N "misc/scsi"

The code samples are derived from a simplified isp driver for the QLogic Intelligent SCSI Peripheral device. The isp driver supports WIDE SCSI, with up to 15 target devices and 8 logical units (LUNs) per target.

Per-Command Structure

An HBA driver usually needs to define a structure to maintain state for each command submitted by a target driver. The layout of this per-command structure is entirely up to the device driver writer. The layout needs to reflect the capabilities and features of the hardware and the software algorithms that are used in the driver.

The following structure is an example of a per-command structure. The remaining code fragments of this chapter use this structure to illustrate the HBA interfaces.

struct isp_cmd {
     struct isp_request     cmd_isp_request;
     struct isp_response    cmd_isp_response;
     struct scsi_pkt        *cmd_pkt;
     struct isp_cmd         *cmd_forw;
     uint32_t               cmd_dmacount;
     ddi_dma_handle_t       cmd_dmahandle;
     uint_t                 cmd_cookie;
     uint_t                 cmd_ncookies;
     uint_t                 cmd_cookiecnt;
     uint_t                 cmd_nwin;
     uint_t                 cmd_curwin;
     off_t                  cmd_dma_offset;
     uint_t                 cmd_dma_len;
     ddi_dma_cookie_t       cmd_dmacookies[ISP_NDATASEGS];
     u_int                  cmd_flags;
     u_short                cmd_slot;
     u_int                  cmd_cdblen;
     u_int                  cmd_scblen;
 };

Entry Points for Module Initialization

This section describes the entry points for operations that are performed by SCSI HBA drivers.

The following code for a SCSI HBA driver illustrates a representative dev_ops(9S) structure. The driver must initialize the devo_bus_ops field in this structure to NULL. A SCSI HBA driver can provide leaf driver interfaces for special purposes, in which case the devo_cb_ops field might point to a cb_ops(9S) structure. In this example, no leaf driver interfaces are exported, so the devo_cb_ops field is initialized to NULL.

_init() Entry Point (SCSI HBA Drivers)

The _init(9E) function initializes a loadable module. _init() is called before any other routine in the loadable module.

In a SCSI HBA, the _init() function must call scsi_hba_init(9F) to inform the framework of the existence of the HBA driver before calling mod_install(9F). If scsi_hba__init() returns a nonzero value,_init() should return this value. Otherwise, _init() must return the value returned by mod_install(9F).

The driver should initialize any required global state before calling mod_install(9F).

If mod_install() fails, the _init() function must free any global resources allocated. _init() must call scsi_hba_fini(9F) before returning.

The following example uses a global mutex to show how to allocate data that is global to all instances of a driver. The code declares global mutex and soft-state structure information. The global mutex and soft state are initialized during _init().

_fini() Entry Point (SCSI HBA Drivers)

The _fini(9E) function is called when the system is about to try to unload the SCSI HBA driver. The _fini() function must call mod_remove(9F) to determine whether the driver can be unloaded. If mod_remove() returns 0, the module can be unloaded. The HBA driver must deallocate any global resources allocated in _init(9E). The HBA driver must also call scsi_hba_fini(9F).

_fini() must return the value returned by mod_remove().


Note –

The HBA driver must not free any resources or call scsi_hba_fini(9F) unless mod_remove(9F) returns 0.


Example 18–1 shows module initialization for SCSI HBA.


Example 18–1 Module Initialization for SCSI HBA

static struct dev_ops isp_dev_ops = {
    DEVO_REV,       /* devo_rev */
    0,              /* refcnt  */
    isp_getinfo,    /* getinfo */
    nulldev,        /* probe */
    isp_attach,     /* attach */
    isp_detach,     /* detach */
    nodev,          /* reset */
    NULL,           /* driver operations */
    NULL,           /* bus operations */
    isp_power,      /* power management */
};

/*
 * Local static data
 */
static kmutex_t      isp_global_mutex;
static void          *isp_state;

int
_init(void)
{
    int     err;
    
    if ((err = ddi_soft_state_init(&isp_state,
        sizeof (struct isp), 0)) != 0) {
        return (err);
    }
    if ((err = scsi_hba_init(&modlinkage)) == 0) {
        mutex_init(&isp_global_mutex, "isp global mutex",
        MUTEX_DRIVER, NULL);
        if ((err = mod_install(&modlinkage)) != 0) {
            mutex_destroy(&isp_global_mutex);
            scsi_hba_fini(&modlinkage);
            ddi_soft_state_fini(&isp_state);    
        }
    }
    return (err);
}

int
_fini(void)
{
    int     err;
    
    if ((err = mod_remove(&modlinkage)) == 0) {
        mutex_destroy(&isp_global_mutex);
        scsi_hba_fini(&modlinkage);
        ddi_soft_state_fini(&isp_state);
    }
    return (err);
}

Autoconfiguration Entry Points

Associated with each device driver is a dev_ops(9S) structure, which enables the kernel to locate the autoconfiguration entry points of the driver. A complete description of these autoconfiguration routines is given in Chapter 6, Driver Autoconfiguration. This section describes only those entry points associated with operations performed by SCSI HBA drivers. These entry points include attach(9E) and detach(9E).

attach() Entry Point (SCSI HBA Drivers)

The attach(9E) entry point for a SCSI HBA driver performs several tasks when configuring and attaching an instance of the driver for the device. For a typical driver of real devices, the following operating system and hardware concerns must be addressed:

Soft-State Structure

When allocating the per-device-instance soft-state structure, a driver must clean up carefully if an error occurs.

DMA

The HBA driver must describe the attributes of its DMA engine by properly initializing the ddi_dma_attr_t structure.

static ddi_dma_attr_t isp_dma_attr = {
     DMA_ATTR_V0,        /* ddi_dma_attr version */
     0,                  /* low address */
     0xffffffff,         /* high address */
     0x00ffffff,         /* counter upper bound */
     1,                  /* alignment requirements */
     0x3f,               /* burst sizes */
     1,                  /* minimum DMA access */
     0xffffffff,         /* maximum DMA access */
     (1<<24)-1,          /* segment boundary restrictions */
     1,                  /* scatter-gather list length */
     512,                /* device granularity */
     0                   /* DMA flags */
};

The driver, if providing DMA, should also check that its hardware is installed in a DMA-capable slot:

if (ddi_slaveonly(dip) == DDI_SUCCESS) {
    return (DDI_FAILURE);
}

Transport Structure

The driver should further allocate and initialize a transport structure for this instance. The tran_hba_private field is set to point to this instance's soft-state structure. The tran_tgt_probe field can be set to NULL to achieve the default behavior, if no special probe customization is needed.

tran = scsi_hba_tran_alloc(dip, SCSI_HBA_CANSLEEP);

isp->isp_tran                   = tran;
isp->isp_dip                    = dip;

tran->tran_hba_private          = isp;
tran->tran_tgt_private          = NULL;
tran->tran_tgt_init             = isp_tran_tgt_init;
tran->tran_tgt_probe            = scsi_hba_probe;
tran->tran_tgt_free             = (void (*)())NULL;

tran->tran_start                = isp_scsi_start;
tran->tran_abort                = isp_scsi_abort;
tran->tran_reset                = isp_scsi_reset;
tran->tran_getcap               = isp_scsi_getcap;
tran->tran_setcap               = isp_scsi_setcap;
tran->tran_init_pkt             = isp_scsi_init_pkt;
tran->tran_destroy_pkt          = isp_scsi_destroy_pkt;
tran->tran_dmafree              = isp_scsi_dmafree;
tran->tran_sync_pkt             = isp_scsi_sync_pkt;
tran->tran_reset_notify         = isp_scsi_reset_notify;
tran->tran_bus_quiesce          = isp_tran_bus_quiesce
tran->tran_bus_unquiesce        = isp_tran_bus_unquiesce
tran->tran_bus_reset            = isp_tran_bus_reset
tran->tran_interconnect_type    = isp_tran_interconnect_type

Attaching an HBA Driver

The driver should attach this instance of the device, and perform error cleanup if necessary.

i = scsi_hba_attach_setup(dip, &isp_dma_attr, tran, 0);
if (i != DDI_SUCCESS) {
    /* do error recovery */
    return (DDI_FAILURE);
}

Register Mapping

The driver should map in its device's registers. The driver need to specify the following items:

ddi_device_acc_attr_t    dev_attributes;

     dev_attributes.devacc_attr_version = DDI_DEVICE_ATTR_V0;
     dev_attributes.devacc_attr_dataorder = DDI_STRICTORDER_ACC;
     dev_attributes.devacc_attr_endian_flags = DDI_STRUCTURE_LE_ACC;

     if (ddi_regs_map_setup(dip, 0, (caddr_t *)&isp->isp_reg,
     0, sizeof (struct ispregs), &dev_attributes,
     &isp->isp_acc_handle) != DDI_SUCCESS) {
        /* do error recovery */
        return (DDI_FAILURE);
     }

Adding an Interrupt Handler

The driver must first obtain the iblock cookie to initialize any mutexes that are used in the driver handler. Only after those mutexes have been initialized can the interrupt handler be added.

i = ddi_get_iblock_cookie(dip, 0, &isp->iblock_cookie};
if (i != DDI_SUCCESS) {
    /* do error recovery */
    return (DDI_FAILURE);
}

mutex_init(&isp->mutex, "isp_mutex", MUTEX_DRIVER,
(void *)isp->iblock_cookie);
i = ddi_add_intr(dip, 0, &isp->iblock_cookie,
0, isp_intr, (caddr_t)isp);
if (i != DDI_SUCCESS) {
    /* do error recovery */
    return (DDI_FAILURE);
}

If a high-level handler is required, the driver should be coded to provide such a handler. Otherwise, the driver must be able to fail the attach. See Handling High-Level Interrupts for a description of high-level interrupt handling.

Create Power Manageable Components

With power management, if the host bus adapter only needs to power down when all target adapters are at power level 0, the HBA driver only needs to provide a power(9E) entry point. Refer to Chapter 12, Power Management. The HBA driver also needs to create a pm-components(9P) property that describes the components that the device implements.

Nothing more is necessary, since the components will default to idle, and the power management framework's default dependency processing will ensure that the host bus adapter will be powered up whenever an target adapter is powered up. Provided that automatic power management is enabled automatically, the processing will also power down the host bus adapter when all target adapters are powered down ().

Report Attachment Status

Finally, the driver should report that this instance of the device is attached and return success.

ddi_report_dev(dip);
    return (DDI_SUCCESS);

detach() Entry Point (SCSI HBA Drivers)

The driver should perform standard detach operations, including calling scsi_hba_detach(9F).

Entry Points for SCSA HBA Drivers

An HBA driver can work with target drivers through the SCSA interface. The SCSA interfaces require the HBA driver to supply a number of entry points that are callable through the scsi_hba_tran(9S) structure.

These entry points fall into five functional groups:

The following table lists the entry points for SCSA HBA by function groups.

Table 18–3 SCSA Entry Points

Function Groups 

Entry Points Within Group 

Description 

Target Driver Instance Initialization 

tran_tgt_init(9E)

Performs per-target initialization (optional) 

 

tran_tgt_probe(9E)

Probes SCSI bus for existence of a target (optional) 

 

tran_tgt_free(9E)

Performs per-target deallocation (optional) 

Resource Allocation 

tran_init_pkt(9E)

Allocates SCSI packet and DMA resources 

 

tran_destroy_pkt(9E)

Frees SCSI packet and DMA resources 

 

tran_sync_pkt(9E)

Synchronizes memory before and after DMA 

 

tran_dmafree(9E)

Frees DMA resources 

Command Transport 

tran_start(9E)

Transports a SCSI command 

Capability Management 

tran_getcap(9E)

Inquires about a capability's value 

 

tran_setcap(9E)

Sets a capability's value 

Abort and Reset 

tran_abort(9E)

Aborts outstanding SCSI commands 

 

tran_reset(9E)

Resets a target device or the SCSI bus 

 

tran_bus_reset(9E)

Resets the SCSI bus 

 

tran_reset_notify(9E)

Request to notify target of bus reset (optional) 

Dynamic Reconfiguration 

tran_quiesce(9E)

Stops activity on the bus 

 

tran_unquiesce(9E)

Resumes activity on the bus 

Target Driver Instance Initialization

The following sections describe target entry points.

tran_tgt_init() Entry Point

The tran_tgt_init(9E) entry point enables the HBA to allocate and initialize any per-target resources. tran_tgt_init() also enables the HBA to qualify the device's address as valid and supportable for that particular HBA. By returning DDI_FAILURE, the instance of the target driver for that device is not probed or attached.

tran_tgt_init() is not required. If tran_tgt_init() is not supplied, the framework attempts to probe and attach all possible instances of the appropriate target drivers.

static int
isp_tran_tgt_init(
    dev_info_t            *hba_dip,
    dev_info_t            *tgt_dip,
    scsi_hba_tran_t       *tran,
    struct scsi_device    *sd)
{
    return ((sd->sd_address.a_target < N_ISP_TARGETS_WIDE &&
        sd->sd_address.a_lun < 8) ? DDI_SUCCESS : DDI_FAILURE);
}

tran_tgt_probe() Entry Point

The tran_tgt_probe(9E) entry point enables the HBA to customize the operation of scsi_probe(9F), if necessary. This entry point is called only when the target driver calls scsi_probe().

The HBA driver can retain the normal operation of scsi_probe() by calling scsi_hba_probe(9F) and returning its return value.

This entry point is not required, and if not needed, the HBA driver should set the tran_tgt_probe vector in the scsi_hba_tran(9S) structure to point to scsi_hba_probe().

scsi_probe() allocates a scsi_inquiry(9S) structure and sets the sd_inq field of the scsi_device(9S) structure to point to the data in scsi_inquiry. scsi_hba_probe() handles this task automatically. scsi_unprobe(9F) then frees the scsi_inquiry data.

Except for the allocation of scsi_inquiry data, tran_tgt_probe() must be stateless, because the same SCSI device might call tran_tgt_probe() several times. Normally, allocation of scsi_inquiry data is handled by scsi_hba_probe().


Note –

The allocation of the scsi_inquiry(9S) structure is handled automatically by scsi_hba_probe(). This information is only of concern if you want custom scsi_probe() handling.


static int
isp_tran_tgt_probe(
    struct scsi_device    *sd,
    int                   (*callback)())
{
    /*
     * Perform any special probe customization needed.
     * Normal probe handling.
     */
    return (scsi_hba_probe(sd, callback));
}

tran_tgt_free() Entry Point

The tran_tgt_free(9E) entry point enables the HBA to perform any deallocation or clean-up procedures for an instance of a target. This entry point is optional.

static void
isp_tran_tgt_free(
    dev_info_t            *hba_dip,
    dev_info_t            *tgt_dip,
    scsi_hba_tran_t       *hba_tran,
    struct scsi_device    *sd)
{
    /*
     * Undo any special per-target initialization done
     * earlier in tran_tgt_init(9F) and tran_tgt_probe(9F)
     */
}

Resource Allocation

The following sections discuss resource allocation.

tran_init_pkt() Entry Point

The tran_init_pkt(9E) entry point allocates and initializes a scsi_pkt(9S) structure and DMA resources for a target driver request.

The tran_init_pkt(9E) entry point is called when the target driver calls the SCSA function scsi_init_pkt(9F).

Each call of the tran_init_pkt(9E) entry point is a request to perform one or more of three possible services:

Allocation and Initialization of a scsi_pkt(9S) Structure

The tran_init_pkt(9E) entry point must allocate a scsi_pkt(9S) structure through scsi_hba_pkt_alloc(9F) if pkt is NULL.

scsi_hba_pkt_alloc(9F) allocates space for the following items:

The scsi_pkt(9S) structure members, including pkt, must be initialized to zero except for the following members:

These members are pointers to memory space where the values of the fields are stored, as shown in the following figure. For more information, refer to scsi_pkt Structure (HBA).

Figure 18–5 scsi_pkt(9S) Structure Pointers

Diagram shows the scsi_pkt structure with those members
that point to values rather than being initialized to zero.

The following example shows allocation and initialization of a scsi_pkt structure.


Example 18–2 HBA Driver Initialization of a SCSI Packet Structure

static struct scsi_pkt                 *
isp_scsi_init_pkt(
    struct scsi_address    *ap,
    struct scsi_pkt        *pkt,
    struct buf             *bp,
    int                    cmdlen,
    int                    statuslen,
    int                    tgtlen,
    int                    flags,
    int                    (*callback)(),
    caddr_t                arg)
{
    struct isp_cmd         *sp;
    struct isp             *isp;
    struct scsi_pkt        *new_pkt;

    ASSERT(callback == NULL_FUNC || callback == SLEEP_FUNC);

    isp = (struct isp *)ap->a_hba_tran->tran_hba_private;
   /*
    * First step of isp_scsi_init_pkt:  pkt allocation
    */
    if (pkt == NULL) {
        pkt = scsi_hba_pkt_alloc(isp->isp_dip, ap, cmdlen,
            statuslen, tgtlen, sizeof (struct isp_cmd),
            callback, arg);
        if (pkt == NULL) {
            return (NULL);
       }

       sp = (struct isp_cmd *)pkt->pkt_ha_private;
      /*
       * Initialize the new pkt
       */
       sp->cmd_pkt         = pkt;
       sp->cmd_flags       = 0;
       sp->cmd_scblen      = statuslen;
       sp->cmd_cdblen      = cmdlen;
       sp->cmd_dmahandle   = NULL;
       sp->cmd_ncookies    = 0;
       sp->cmd_cookie      = 0; 
       sp->cmd_cookiecnt   = 0;
       sp->cmd_nwin        = 0;
       pkt->pkt_address    = *ap;
       pkt->pkt_comp       = (void (*)())NULL;
       pkt->pkt_flags      = 0;
       pkt->pkt_time       = 0;
       pkt->pkt_resid      = 0;
       pkt->pkt_statistics = 0;
       pkt->pkt_reason     = 0;

       new_pkt = pkt;
    } else {
       sp = (struct isp_cmd *)pkt->pkt_ha_private;
       new_pkt = NULL;
    }
   /*
    * Second step of isp_scsi_init_pkt:  dma allocation/move
    */
    if (bp && bp->b_bcount != 0) {
        if (sp->cmd_dmahandle == NULL) {
            if (isp_i_dma_alloc(isp, pkt, bp,
            flags, callback) == 0) {
            if (new_pkt) {
                scsi_hba_pkt_free(ap, new_pkt);
            }
            return ((struct scsi_pkt *)NULL);
        }
        } else {
            ASSERT(new_pkt == NULL);
            if (isp_i_dma_move(isp, pkt, bp) == 0) {
                return ((struct scsi_pkt *)NULL);
            }
        }
    }
    return (pkt);
}

Allocation of DMA Resources

The tran_init_pkt(9E) entry point must allocate DMA resources for a data transfer if the following conditions are true:

The HBA driver needs to track how DMA resources are allocated for a particular command. This allocation can take place with a flag bit or a DMA handle in the per-packet HBA driver private data.

The PKT_DMA_PARTIAL flag in the pkt enables the target driver to break up a data transfer into multiple SCSI commands to accommodate the complete request. This approach is useful when the HBA hardware scatter-gather capabilities or system DMA resources cannot complete a request in a single SCSI command.

The PKT_DMA_PARTIAL flag enables the HBA driver to set the DDI_DMA_PARTIAL flag. The DDI_DMA_PARTIAL flag is useful when the DMA resources for this SCSI command are allocated. For example the ddi_dma_buf_bind_handle(9F)) command can be used to allocate DMA resources. The DMA attributes used when allocating the DMA resources should accurately describe any constraints placed on the ability of the HBA hardware to perform DMA. If the system can only allocate DMA resources for part of the request, ddi_dma_buf_bind_handle(9F) returns DDI_DMA_PARTIAL_MAP.

The tran_init_pkt(9E) entry point must return the amount of DMA resources not allocated for this transfer in the field pkt_resid.

A target driver can make one request to tran_init_pkt(9E) to simultaneously allocate both a scsi_pkt(9S) structure and DMA resources for that pkt. In this case, if the HBA driver is unable to allocate DMA resources, that driver must free the allocated scsi_pkt(9S) before returning. The scsi_pkt(9S) must be freed by calling scsi_hba_pkt_free(9F).

The target driver might first allocate the scsi_pkt(9S) and allocate DMA resources for this pkt at a later time. In this case, if the HBA driver is unable to allocate DMA resources, the driver must not free pkt. The target driver in this case is responsible for freeing the pkt.


Example 18–3 HBA Driver Allocation of DMA Resources

static int
isp_i_dma_alloc(
    struct isp         *isp,
    struct scsi_pkt    *pkt,
    struct buf         *bp,
    int                flags,
    int                (*callback)())
{
    struct isp_cmd     *sp  = (struct isp_cmd *)pkt->pkt_ha_private;
    int                dma_flags;
    ddi_dma_attr_t     tmp_dma_attr;
    int                (*cb)(caddr_t);
    int                i;

    ASSERT(callback == NULL_FUNC || callback == SLEEP_FUNC);

    if (bp->b_flags & B_READ) {
        sp->cmd_flags &= ~CFLAG_DMASEND;
        dma_flags = DDI_DMA_READ;
    } else {
        sp->cmd_flags |= CFLAG_DMASEND;
        dma_flags = DDI_DMA_WRITE;
    }
    if (flags & PKT_CONSISTENT) {
        sp->cmd_flags |= CFLAG_CMDIOPB;
        dma_flags |= DDI_DMA_CONSISTENT;
    }
    if (flags & PKT_DMA_PARTIAL) {
        dma_flags |= DDI_DMA_PARTIAL;
    }

    tmp_dma_attr = isp_dma_attr;
    tmp_dma_attr.dma_attr_burstsizes = isp->isp_burst_size;

    cb = (callback == NULL_FUNC) ? DDI_DMA_DONTWAIT :
DDI_DMA_SLEEP;

    if ((i = ddi_dma_alloc_handle(isp->isp_dip, &tmp_dma_attr,
      cb, 0, &sp->cmd_dmahandle)) != DDI_SUCCESS) {
        switch (i) {
        case DDI_DMA_BADATTR:
            bioerror(bp, EFAULT);
            return (0);
        case DDI_DMA_NORESOURCES:
            bioerror(bp, 0);
            return (0);
        }
    }

    i = ddi_dma_buf_bind_handle(sp->cmd_dmahandle, bp, dma_flags,
    cb, 0, &sp->cmd_dmacookies[0], &sp->cmd_ncookies);

    switch (i) {
    case DDI_DMA_PARTIAL_MAP:
    if (ddi_dma_numwin(sp->cmd_dmahandle, &sp->cmd_nwin) ==
            DDI_FAILURE) {
        cmn_err(CE_PANIC, "ddi_dma_numwin() failed\n");
    }

    if (ddi_dma_getwin(sp->cmd_dmahandle, sp->cmd_curwin,
        &sp->cmd_dma_offset, &sp->cmd_dma_len,
        &sp->cmd_dmacookies[0], &sp->cmd_ncookies) ==
           DDI_FAILURE) {
        cmn_err(CE_PANIC, "ddi_dma_getwin() failed\n");
    }
    goto get_dma_cookies;

    case DDI_DMA_MAPPED:
    sp->cmd_nwin = 1;
    sp->cmd_dma_len = 0;
    sp->cmd_dma_offset = 0;

get_dma_cookies:
    i = 0;
    sp->cmd_dmacount = 0;
    for (;;) {
        sp->cmd_dmacount += sp->cmd_dmacookies[i++].dmac_size;

        if (i == ISP_NDATASEGS || i == sp->cmd_ncookies)
        break;
        ddi_dma_nextcookie(sp->cmd_dmahandle,
        &sp->cmd_dmacookies[i]);
    }
    sp->cmd_cookie = i;
    sp->cmd_cookiecnt = i;

    sp->cmd_flags |= CFLAG_DMAVALID;
    pkt->pkt_resid = bp->b_bcount - sp->cmd_dmacount;
    return (1);

    case DDI_DMA_NORESOURCES:
    bioerror(bp, 0);
    break;

    case DDI_DMA_NOMAPPING:
    bioerror(bp, EFAULT);
    break;

    case DDI_DMA_TOOBIG:
    bioerror(bp, EINVAL);
    break;

    case DDI_DMA_INUSE:
    cmn_err(CE_PANIC, "ddi_dma_buf_bind_handle:"
        " DDI_DMA_INUSE impossible\n");

    default:
    cmn_err(CE_PANIC, "ddi_dma_buf_bind_handle:"
        " 0x%x impossible\n", i);
    }

    ddi_dma_free_handle(&sp->cmd_dmahandle);
    sp->cmd_dmahandle = NULL;
    sp->cmd_flags &= ~CFLAG_DMAVALID;
    return (0);
}

Reallocation of DMA Resources for Data Transfer

For a previously allocated packet with data remaining to be transferred, the tran_init_pkt(9E) entry point must reallocate DMA resources when the following conditions apply:

When reallocating DMA resources to the next portion of the transfer, tran_init_pkt(9E) must return the amount of DMA resources not allocated for this transfer in the field pkt_resid.

If an error occurs while attempting to move DMA resources, tran_init_pkt(9E) must not free the scsi_pkt(9S). The target driver in this case is responsible for freeing the packet.

If the callback parameter is NULL_FUNC, the tran_init_pkt(9E) entry point must not sleep or call any function that might sleep. If the callback parameter is SLEEP_FUNC and resources are not immediately available, the tran_init_pkt(9E) entry point should sleep. Unless the request is impossible to satisfy, tran_init_pkt() should sleep until resources become available.


Example 18–4 DMA Resource Reallocation for HBA Drivers

static int
isp_i_dma_move(
    struct isp         *isp,
    struct scsi_pkt    *pkt,
    struct buf         *bp)
{
    struct isp_cmd     *sp  = (struct isp_cmd *)pkt->pkt_ha_private;
    int                i;

    ASSERT(sp->cmd_flags & CFLAG_COMPLETED);
    sp->cmd_flags &= ~CFLAG_COMPLETED;
   /*
    * If there are no more cookies remaining in this window,
    * must move to the next window first.
    */
    if (sp->cmd_cookie == sp->cmd_ncookies) {
   /*
    * For small pkts, leave things where they are
    */
    if (sp->cmd_curwin == sp->cmd_nwin && sp->cmd_nwin == 1)
        return (1);
   /*
    * At last window, cannot move
    */
    if (++sp->cmd_curwin >= sp->cmd_nwin)
        return (0);
    if (ddi_dma_getwin(sp->cmd_dmahandle, sp->cmd_curwin,
        &sp->cmd_dma_offset, &sp->cmd_dma_len,
        &sp->cmd_dmacookies[0], &sp->cmd_ncookies) ==
        DDI_FAILURE)
        return (0);
        sp->cmd_cookie = 0;
    } else {
   /*
    * Still more cookies in this window - get the next one
    */
    ddi_dma_nextcookie(sp->cmd_dmahandle,
        &sp->cmd_dmacookies[0]);
    }
   /*
    * Get remaining cookies in this window, up to our maximum
    */
    i = 0;
    for (;;) {
    sp->cmd_dmacount += sp->cmd_dmacookies[i++].dmac_size;
    sp->cmd_cookie++;
    if (i == ISP_NDATASEGS ||
        sp->cmd_cookie == sp->cmd_ncookies)
            break;
    ddi_dma_nextcookie(sp->cmd_dmahandle,
        &sp->cmd_dmacookies[i]);
    }
    sp->cmd_cookiecnt = i;
    pkt->pkt_resid = bp->b_bcount - sp->cmd_dmacount;
    return (1);
}

tran_destroy_pkt() Entry Point

The tran_destroy_pkt(9E) entry point is the HBA driver function that deallocates scsi_pkt(9S) structures. The tran_destroy_pkt() entry point is called when the target driver calls scsi_destroy_pkt(9F).

The tran_destroy_pkt() entry point must free any DMA resources that have been allocated for the packet. An implicit DMA synchronization occurs if the DMA resources are freed and any cached data remains after the completion of the transfer. The tran_destroy_pkt() entry point frees the SCSI packet by calling scsi_hba_pkt_free(9F).


Example 18–5 HBA Driver tran_destroy_pkt(9E) Entry Point

static void
isp_scsi_destroy_pkt(
    struct scsi_address    *ap,
    struct scsi_pkt    *pkt)
{
    struct isp_cmd *sp = (struct isp_cmd *)pkt->pkt_ha_private;
   /*
    * Free the DMA, if any
    */
    if (sp->cmd_flags & CFLAG_DMAVALID) {
        sp->cmd_flags &= ~CFLAG_DMAVALID;
        (void) ddi_dma_unbind_handle(sp->cmd_dmahandle);
        ddi_dma_free_handle(&sp->cmd_dmahandle);
        sp->cmd_dmahandle = NULL;
    }
   /*
    * Free the pkt
    */
    scsi_hba_pkt_free(ap, pkt);
}

tran_sync_pkt() Entry Point

The tran_sync_pkt(9E) entry point synchronizes the DMA object allocated for the scsi_pkt(9S) structure before or after a DMA transfer. The tran_sync_pkt() entry point is called when the target driver calls scsi_sync_pkt(9F).

If the data transfer direction is a DMA read from device to memory, tran_sync_pkt() must synchronize the CPU's view of the data. If the data transfer direction is a DMA write from memory to device, tran_sync_pkt() must synchronize the device's view of the data.


Example 18–6 HBA Driver tran_sync_pkt(9E) Entry Point

static void
isp_scsi_sync_pkt(
    struct scsi_address    *ap,
    struct scsi_pkt        *pkt)
{
    struct isp_cmd *sp = (struct isp_cmd *)pkt->pkt_ha_private;

    if (sp->cmd_flags & CFLAG_DMAVALID) {
        (void)ddi_dma_sync(sp->cmd_dmahandle, sp->cmd_dma_offset,
        sp->cmd_dma_len,
        (sp->cmd_flags & CFLAG_DMASEND) ?
        DDI_DMA_SYNC_FORDEV : DDI_DMA_SYNC_FORCPU);
    }
}

tran_dmafree() Entry Point

The tran_dmafree(9E) entry point deallocates DMA resources that have been allocated for a scsi_pkt(9S) structure. The tran_dmafree() entry point is called when the target driver calls scsi_dmafree(9F).

tran_dmafree() must free only DMA resources allocated for a scsi_pkt(9S) structure, not the scsi_pkt(9S) itself. When DMA resources are freed, a DMA synchronization is implicitly performed.


Note –

The scsi_pkt(9S) is freed in a separate request to tran_destroy_pkt(9E). Because tran_destroy_pkt() must also free DMA resources, the HBA driver must keep accurate note of whether scsi_pkt() structures have DMA resources allocated.



Example 18–7 HBA Driver tran_dmafree(9E) Entry Point

static void
isp_scsi_dmafree(
    struct scsi_address    *ap,
    struct scsi_pkt        *pkt)
{
    struct isp_cmd    *sp = (struct isp_cmd *)pkt->pkt_ha_private;

    if (sp->cmd_flags & CFLAG_DMAVALID) {
        sp->cmd_flags &= ~CFLAG_DMAVALID;
        (void)ddi_dma_unbind_handle(sp->cmd_dmahandle);
        ddi_dma_free_handle(&sp->cmd_dmahandle);
        sp->cmd_dmahandle = NULL;
    }
}

Command Transport

An HBA driver goes through the following steps as part of command transport:

  1. Accept a command from the target driver.

  2. Issue the command to the device hardware.

  3. Service any interrupts that occur.

  4. Manage time outs.

tran_start() Entry Point

The tran_start(9E) entry point for a SCSI HBA driver is called to transport a SCSI command to the addressed target. The SCSI command is described entirely within the scsi_pkt(9S) structure, which the target driver allocated through the HBA driver's tran_init_pkt(9E) entry point. If the command involves a data transfer, DMA resources must also have been allocated for the scsi_pkt(9S) structure.

The tran_start() entry point is called when a target driver calls scsi_transport(9F).

tran_start() should perform basic error checking along with any initialization that is required by the command. The FLAG_NOINTR flag in the pkt_flags field of the scsi_pkt(9S) structure can affect the behavior of tran_start(). If FLAG_NOINTR is not set, tran_start() must queue the command for execution on the hardware and return immediately. Upon completion of the command, the HBA driver should call the pkt completion routine.

If the FLAG_NOINTR is set, then the HBA driver should not call the pkt completion routine.

The following example demonstrates how to handle the tran_start(9E) entry point. The ISP hardware provides a queue per-target device. For devices that can manage only one active outstanding command, the driver is typically required to manage a per-target queue. The driver then starts up a new command upon completion of the current command in a round-robin fashion.


Example 18–8 HBA Driver tran_start(9E) Entry Point

static int
isp_scsi_start(
    struct scsi_address    *ap,
    struct scsi_pkt        *pkt)
{
    struct isp_cmd         *sp;
    struct isp             *isp;
    struct isp_request     *req;
    u_long                 cur_lbolt;
    int                    xfercount;
    int                    rval = TRAN_ACCEPT;
    int                    i;

    sp = (struct isp_cmd *)pkt->pkt_ha_private;
    isp = (struct isp *)ap->a_hba_tran->tran_hba_private;

    sp->cmd_flags = (sp->cmd_flags & ~CFLAG_TRANFLAG) |
                CFLAG_IN_TRANSPORT;
    pkt->pkt_reason = CMD_CMPLT;
   /*
    * set up request in cmd_isp_request area so it is ready to
    * go once we have the request mutex
    */
    req = &sp->cmd_isp_request;

    req->req_header.cq_entry_type = CQ_TYPE_REQUEST;
    req->req_header.cq_entry_count = 1;
    req->req_header.cq_flags        = 0;
    req->req_header.cq_seqno = 0;
    req->req_reserved = 0;
    req->req_token = (opaque_t)sp;
    req->req_target = TGT(sp);
    req->req_lun_trn = LUN(sp);
    req->req_time = pkt->pkt_time;
    ISP_SET_PKT_FLAGS(pkt->pkt_flags, req->req_flags);
   /*
    * Set up data segments for dma transfers.
    */
    if (sp->cmd_flags & CFLAG_DMAVALID) {

        if (sp->cmd_flags & CFLAG_CMDIOPB) {
            (void) ddi_dma_sync(sp->cmd_dmahandle,
            sp->cmd_dma_offset, sp->cmd_dma_len,
            DDI_DMA_SYNC_FORDEV);
        }

        ASSERT(sp->cmd_cookiecnt > 0 &&
            sp->cmd_cookiecnt <= ISP_NDATASEGS);

        xfercount = 0;
        req->req_seg_count = sp->cmd_cookiecnt;
        for (i = 0; i < sp->cmd_cookiecnt; i++) {
            req->req_dataseg[i].d_count =
            sp->cmd_dmacookies[i].dmac_size;
            req->req_dataseg[i].d_base =
            sp->cmd_dmacookies[i].dmac_address;
            xfercount +=
            sp->cmd_dmacookies[i].dmac_size;
        }

        for (; i < ISP_NDATASEGS; i++) {
            req->req_dataseg[i].d_count = 0;
            req->req_dataseg[i].d_base = 0;
        }

        pkt->pkt_resid = xfercount;

        if (sp->cmd_flags & CFLAG_DMASEND) {
            req->req_flags |= ISP_REQ_FLAG_DATA_WRITE;
        } else {
            req->req_flags |= ISP_REQ_FLAG_DATA_READ;
        }
    } else {
        req->req_seg_count = 0;
        req->req_dataseg[0].d_count = 0;
    }
   /*
    * Set up cdb in the request
    */
    req->req_cdblen = sp->cmd_cdblen;
    bcopy((caddr_t)pkt->pkt_cdbp, (caddr_t)req->req_cdb,
    sp->cmd_cdblen);
   /*
    * Start the cmd.  If NO_INTR, must poll for cmd completion.
    */
    if ((pkt->pkt_flags & FLAG_NOINTR) == 0) {
        mutex_enter(ISP_REQ_MUTEX(isp));
        rval = isp_i_start_cmd(isp, sp);
        mutex_exit(ISP_REQ_MUTEX(isp));
    } else {
        rval = isp_i_polled_cmd_start(isp, sp);
    }
    return (rval);
}

Interrupt Handler and Command Completion

The interrupt handler must check the status of the device to be sure the device is generating the interrupt in question. The interrupt handler must also check for any errors that have occurred and service any interrupts generated by the device.

If data is transferred, the hardware should be checked to determine how much data was actually transferred. The pkt_resid field in the scsi_pkt(9S) structure should be set to the residual of the transfer.

Commands that are marked with the PKT_CONSISTENT flag when DMA resources are allocated through tran_init_pkt(9E) take special handling. The HBA driver must ensure that the data transfer for the command is correctly synchronized before the target driver's command completion callback is performed.

Once a command has completed, you need to act on two requirements:

Start a new command on the hardware, if possible, before calling the PKT_COMP command completion callback. The command completion handling can take considerable time. Typically, the target driver calls functions such as biodone(9F) and possibly scsi_transport(9F) to begin a new command.

The interrupt handler must return DDI_INTR_CLAIMED if this interrupt is claimed by this driver. Otherwise, the handler returns DDI_INTR_UNCLAIMED.

The following example shows an interrupt handler for the SCSI HBA isp driver. The caddr_t parameter is set up when the interrupt handler is added in attach(9E). This parameter is typically a pointer to the state structure, which is allocated on a per instance basis.


Example 18–9 HBA Driver Interrupt Handler

static u_int
isp_intr(caddr_t arg)
{
    struct isp_cmd         *sp;
    struct isp_cmd         *head, *tail;
    u_short                response_in;
    struct isp_response    *resp;
    struct isp             *isp = (struct isp *)arg;
    struct isp_slot        *isp_slot;
    int                    n;

    if (ISP_INT_PENDING(isp) == 0) {
        return (DDI_INTR_UNCLAIMED);
    }

    do {
again:
       /*
        * head list collects completed packets for callback later
        */
        head = tail = NULL;
       /*
        * Assume no mailbox events (e.g., mailbox cmds, asynch
        * events, and isp dma errors) as common case.
        */
        if (ISP_CHECK_SEMAPHORE_LOCK(isp) == 0) {
            mutex_enter(ISP_RESP_MUTEX(isp));
           /*
            * Loop through completion response queue and post
            * completed pkts.  Check response queue again
            * afterwards in case there are more.
            */
            isp->isp_response_in =
            response_in = ISP_GET_RESPONSE_IN(isp);
           /*
            * Calculate the number of requests in the queue
            */
            n = response_in - isp->isp_response_out;
            if (n < 0) {
                n = ISP_MAX_REQUESTS -
                isp->isp_response_out + response_in;
            }
            while (n-- > 0) {
                ISP_GET_NEXT_RESPONSE_OUT(isp, resp);
                sp = (struct isp_cmd *)resp->resp_token;
               /*
                * Copy over response packet in sp
                */
                isp_i_get_response(isp, resp, sp);
            }
            if (head) {
                tail->cmd_forw = sp;
                tail = sp;
                tail->cmd_forw = NULL;
            } else {
                tail = head = sp;
                sp->cmd_forw = NULL;
            }
            ISP_SET_RESPONSE_OUT(isp);
            ISP_CLEAR_RISC_INT(isp);
            mutex_exit(ISP_RESP_MUTEX(isp));

            if (head) {
                isp_i_call_pkt_comp(isp, head);
            }
        } else {
            if (isp_i_handle_mbox_cmd(isp) != ISP_AEN_SUCCESS) {
                return (DDI_INTR_CLAIMED);
            }
           /*
            * if there was a reset then check the response
            * queue again
            */
            goto again;    
        }

    } while (ISP_INT_PENDING(isp));

    return (DDI_INTR_CLAIMED);
}

static void
isp_i_call_pkt_comp(
    struct isp             *isp,
    struct isp_cmd         *head)
{
    struct isp             *isp;
    struct isp_cmd         *sp;
    struct scsi_pkt        *pkt;
    struct isp_response    *resp;
    u_char                 status;

    while (head) {
        sp = head;
        pkt = sp->cmd_pkt;
        head = sp->cmd_forw;

        ASSERT(sp->cmd_flags & CFLAG_FINISHED);

        resp = &sp->cmd_isp_response;

        pkt->pkt_scbp[0] = (u_char)resp->resp_scb;
        pkt->pkt_state = ISP_GET_PKT_STATE(resp->resp_state);
        pkt->pkt_statistics = (u_long)
            ISP_GET_PKT_STATS(resp->resp_status_flags);
        pkt->pkt_resid = (long)resp->resp_resid;
       /*
        * If data was xferred and this is a consistent pkt,
        * do a dma sync
        */
        if ((sp->cmd_flags & CFLAG_CMDIOPB) &&
            (pkt->pkt_state & STATE_XFERRED_DATA)) {
                (void) ddi_dma_sync(sp->cmd_dmahandle,
                sp->cmd_dma_offset, sp->cmd_dma_len,
                DDI_DMA_SYNC_FORCPU);
        }

        sp->cmd_flags = (sp->cmd_flags & ~CFLAG_IN_TRANSPORT) |
            CFLAG_COMPLETED;
       /*
        * Call packet completion routine if FLAG_NOINTR is not set.
        */
        if (((pkt->pkt_flags & FLAG_NOINTR) == 0) &&
            pkt->pkt_comp) {
                (*pkt->pkt_comp)(pkt);
        }
    }
}

Timeout Handler

The HBA driver is responsible for enforcing time outs. A command must be complete within a specified time unless a zero time out has been specified in the scsi_pkt(9S) structure.

When a command times out, the HBA driver should mark the scsi_pkt(9S) with pkt_reason set to CMD_TIMEOUT and pkt_statistics OR'd with STAT_TIMEOUT. The HBA driver should also attempt to recover the target and bus. If this recovery can be performed successfully, the driver should mark the scsi_pkt(9S) using pkt_statistics OR'd with either STAT_BUS_RESET or STAT_DEV_RESET.

After the recovery attempt has completed, the HBA driver should call the command completion callback.


Note –

If recovery was unsuccessful or not attempted, the target driver might attempt to recover from the timeout by calling scsi_reset(9F).


The ISP hardware manages command timeout directly and returns timed-out commands with the necessary status. The timeout handler for the isp sample driver checks active commands for the time out state only once every 60 seconds.

The isp sample driver uses the timeout(9F) facility to arrange for the kernel to call the timeout handler every 60 seconds. The caddr_t argument is the parameter set up when the timeout is initialized at attach(9E) time. In this case, the caddr_t argument is a pointer to the state structure allocated per driver instance.

If timed-out commands have not been returned as timed-out by the ISP hardware, a problem has occurred. The hardware is not functioning correctly and needs to be reset.

Capability Management

The following sections discuss capability management.

tran_getcap() Entry Point

The tran_getcap(9E) entry point for a SCSI HBA driver is called by scsi_ifgetcap(9F). The target driver calls scsi_ifgetcap() to determine the current value of one of a set of SCSA-defined capabilities.

The target driver can request the current setting of the capability for a particular target by setting the whom parameter to nonzero. A whom value of zero indicates a request for the current setting of the general capability for the SCSI bus or for adapter hardware.

The tran_getcap() entry point should return -1 for undefined capabilities or the current value of the requested capability.

The HBA driver can use the function scsi_hba_lookup_capstr(9F) to compare the capability string against the canonical set of defined capabilities.


Example 18–10 HBA Driver tran_getcap(9E) Entry Point

static int
isp_scsi_getcap(
    struct scsi_address    *ap,
    char                   *cap,
    int                    whom)
{
    struct isp             *isp;
    int                    rval = 0;
    u_char                 tgt = ap->a_target;
   /*
    * We don't allow getting capabilities for other targets
    */
    if (cap == NULL || whom  == 0) {
        return (-1);
    }
    isp = (struct isp *)ap->a_hba_tran->tran_hba_private;
    ISP_MUTEX_ENTER(isp);

    switch (scsi_hba_lookup_capstr(cap)) {
    case SCSI_CAP_DMA_MAX:
        rval = 1 << 24; /* Limit to 16MB max transfer */
        break;
    case SCSI_CAP_MSG_OUT:
        rval = 1;
        break;
    case SCSI_CAP_DISCONNECT:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_DR) == 0) {
            break;
        } else if (
            (isp->isp_cap[tgt] & ISP_CAP_DISCONNECT) == 0) {
            break;
        }
        rval = 1;
        break;
    case SCSI_CAP_SYNCHRONOUS:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_SYNC) == 0) {
            break;
        } else if (
            (isp->isp_cap[tgt] & ISP_CAP_SYNC) == 0) {
            break;
        }
        rval = 1;
        break;
    case SCSI_CAP_WIDE_XFER:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_WIDE) == 0) {
            break;
        } else if (
            (isp->isp_cap[tgt] & ISP_CAP_WIDE) == 0) {
            break;
        }
        rval = 1;
        break;
    case SCSI_CAP_TAGGED_QING:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_DR) == 0 ||
            (isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_TAG) == 0) {
            break;
        } else if (
            (isp->isp_cap[tgt] & ISP_CAP_TAG) == 0) {
            break;
        }
        rval = 1;
        break;
    case SCSI_CAP_UNTAGGED_QING:
        rval = 1;
        break;
    case SCSI_CAP_PARITY:
        if (isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_PARITY) {
            rval = 1;
        }
        break;
    case SCSI_CAP_INITIATOR_ID:
        rval = isp->isp_initiator_id;
        break;
    case SCSI_CAP_ARQ:
        if (isp->isp_cap[tgt] & ISP_CAP_AUTOSENSE) {
            rval = 1;
        }
        break;
    case SCSI_CAP_LINKED_CMDS:
        break;
    case SCSI_CAP_RESET_NOTIFICATION:
        rval = 1;
        break;
    case SCSI_CAP_GEOMETRY:
        rval = (64 << 16) | 32;
        break;
    default:
        rval = -1;
        break;
    }

    ISP_MUTEX_EXIT(isp);
    return (rval);
}

tran_setcap() Entry Point

The tran_setcap(9E) entry point for a SCSI HBA driver is called by scsi_ifsetcap(9F). A target driver calls scsi_ifsetcap() to change the current one of a set of SCSA-defined capabilities.

The target driver might request that the new value be set for a particular target by setting the whom parameter to nonzero. A whom value of zero means the request is to set the new value for the SCSI bus or for adapter hardware in general.

tran_setcap() should return the following values as appropriate:

The HBA driver can use the function scsi_hba_lookup_capstr(9F) to compare the capability string against the canonical set of defined capabilities.


Example 18–11 HBA Driver tran_setcap(9E) Entry Point

static int
isp_scsi_setcap(
    struct scsi_address    *ap,
    char                   *cap,
    int                    value,
    int                    whom)
{
    struct isp             *isp;
    int                    rval = 0;
    u_char                 tgt = ap->a_target;
    int                    update_isp = 0;
   /*
    * We don't allow setting capabilities for other targets
    */
    if (cap == NULL || whom == 0) {
        return (-1);
    }

    isp = (struct isp *)ap->a_hba_tran->tran_hba_private;
    ISP_MUTEX_ENTER(isp);

    switch (scsi_hba_lookup_capstr(cap)) {
    case SCSI_CAP_DMA_MAX:
    case SCSI_CAP_MSG_OUT:
    case SCSI_CAP_PARITY:
    case SCSI_CAP_UNTAGGED_QING:
    case SCSI_CAP_LINKED_CMDS:
    case SCSI_CAP_RESET_NOTIFICATION:
   /*
    * None of these are settable through
    * the capability interface.
    */
        break;
    case SCSI_CAP_DISCONNECT:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_DR) == 0) {
                break;
        } else {
            if (value) {
                isp->isp_cap[tgt] |= ISP_CAP_DISCONNECT;
            } else {
                isp->isp_cap[tgt] &= ~ISP_CAP_DISCONNECT;
            }
        }
        rval = 1;
        break;
    case SCSI_CAP_SYNCHRONOUS:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_SYNC) == 0) {
                break;
        } else {
            if (value) {
                isp->isp_cap[tgt] |= ISP_CAP_SYNC;
            } else {
                isp->isp_cap[tgt] &= ~ISP_CAP_SYNC;
            }
        }
        rval = 1;
        break;
    case SCSI_CAP_TAGGED_QING:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_DR) == 0 ||
            (isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_TAG) == 0) {
                break;
        } else {
            if (value) {
                isp->isp_cap[tgt] |= ISP_CAP_TAG;
            } else {
            isp->isp_cap[tgt] &= ~ISP_CAP_TAG;
            }
        }
        rval = 1;
        break;
    case SCSI_CAP_WIDE_XFER:
        if ((isp->isp_target_scsi_options[tgt] &
            SCSI_OPTIONS_WIDE) == 0) {
                break;
        } else {
            if (value) {
                isp->isp_cap[tgt] |= ISP_CAP_WIDE;
            } else {
                isp->isp_cap[tgt] &= ~ISP_CAP_WIDE;
            }
        }
        rval = 1;
        break;
    case SCSI_CAP_INITIATOR_ID:
        if (value < N_ISP_TARGETS_WIDE) {
            struct isp_mbox_cmd mbox_cmd;
            isp->isp_initiator_id = (u_short) value;
           /*
            * set Initiator SCSI ID
            */
            isp_i_mbox_cmd_init(isp, &mbox_cmd, 2, 2,
            ISP_MBOX_CMD_SET_SCSI_ID,
            isp->isp_initiator_id,
            0, 0, 0, 0);
            if (isp_i_mbox_cmd_start(isp, &mbox_cmd) == 0) {
                rval = 1;
            }
        }
        break;
    case SCSI_CAP_ARQ:
        if (value) {
            isp->isp_cap[tgt] |= ISP_CAP_AUTOSENSE;
        } else {
            isp->isp_cap[tgt] &= ~ISP_CAP_AUTOSENSE;
        }
        rval = 1;
        break;
    default:
        rval = -1;
        break;
    }
    ISP_MUTEX_EXIT(isp);
    return (rval);
}

Abort and Reset Management

The following sections discuss the abort and reset entry points for SCSI HBA.

tran_abort() Entry Point

The tran_abort(9E) entry point for a SCSI HBA driver is called to abort any commands that are currently in transport for a particular target. This entry point is called when a target driver calls scsi_abort(9F).

The tran_abort() entry point should attempt to abort the command denoted by the pkt parameter. If the pkt parameter is NULL, tran_abort() should attempt to abort all outstanding commands in the transport layer for the particular target or logical unit.

Each command successfully aborted must be marked with pkt_reason CMD_ABORTED and pkt_statistics OR'd with STAT_ABORTED.

tran_reset() Entry Point

The tran_reset(9E) entry point for a SCSI HBA driver is called to reset either the SCSI bus or a particular SCSI target device. This entry point is called when a target driver calls scsi_reset(9F).

The tran_reset() entry point must reset the SCSI bus if level is RESET_ALL. If level is RESET_TARGET, just the particular target or logical unit must be reset.

Active commands affected by the reset must be marked with pkt_reason CMD_RESET. The type of reset determines whether STAT_BUS_RESET or STAT_DEV_RESET should be used to OR pkt_statistics.

Commands in the transport layer, but not yet active on the target, must be marked with pkt_reason CMD_RESET, and pkt_statistics OR'd with STAT_ABORTED.

tran_bus_reset() Entry Point

tran_bus_reset(9E) must reset the SCSI bus without resetting targets.

#include <sys/scsi/scsi.h>

int tran_bus_reset(dev_info_t *hba-dip, int level);

where:

*hba-dip

Pointer associated with the SCSI HBA

level

Must be set to RESET_BUS so that only the SCSI bus is reset, not the targets

The tran_bus_reset() vector in the scsi_hba_tran(9S) structure should be initialized during the HBA driver's attach(9E). The vector should point to an HBA entry point that is to be called when a user initiates a bus reset.

Implementation is hardware specific. If the HBA driver cannot reset the SCSI bus without affecting the targets, the driver should fail RESET_BUS or not initialize this vector.

tran_reset_notify() Entry Point

Use the tran_reset_notify(9E) entry point when a SCSI bus reset occurs. This function requests the SCSI HBA driver to notify the target driver by callback.


Example 18–12 HBA Driver tran_reset_notify(9E) Entry Point

isp_scsi_reset_notify(
    struct scsi_address    *ap,
    int                    flag,
    void                   (*callback)(caddr_t),
    caddr_t                arg)
{
    struct isp                       *isp;
    struct isp_reset_notify_entry    *p, *beforep;
    int                              rval = DDI_FAILURE;

    isp = (struct isp *)ap->a_hba_tran->tran_hba_private;
    mutex_enter(ISP_REQ_MUTEX(isp));
   /*
    * Try to find an existing entry for this target
    */
    p = isp->isp_reset_notify_listf;
    beforep = NULL;

    while (p) {
        if (p->ap == ap)
            break;
        beforep = p;
        p = p->next;
    }

    if ((flag & SCSI_RESET_CANCEL) && (p != NULL)) {
        if (beforep == NULL) {
            isp->isp_reset_notify_listf = p->next;
        } else {
            beforep->next = p->next;
        }
        kmem_free((caddr_t)p, sizeof (struct
            isp_reset_notify_entry));
        rval = DDI_SUCCESS;
    } else if ((flag & SCSI_RESET_NOTIFY) && (p == NULL)) {
        p = kmem_zalloc(sizeof (struct isp_reset_notify_entry),
            KM_SLEEP);
        p->ap = ap;
        p->callback = callback;
        p->arg = arg;
        p->next = isp->isp_reset_notify_listf;
        isp->isp_reset_notify_listf = p;
        rval = DDI_SUCCESS;
    }

    mutex_exit(ISP_REQ_MUTEX(isp));
    return (rval);
}

Dynamic Reconfiguration

To support the minimal set of hot-plugging operations, drivers might need to implement support for bus quiesce, bus unquiesce, and bus reset. The scsi_hba_tran(9S) structure supports these operations. If quiesce, unquiesce, or reset are not required by hardware, no driver changes are needed.

The scsi_hba_tran structure includes the following fields:

int (*tran_quiesce)(dev_info_t *hba-dip);
int (*tran_unquiesce)(dev_info_t *hba-dip);
int (*tran_bus_reset)(dev_info_t *hba-dip, int level);

These interfaces quiesce and unquiesce a SCSI bus.

#include <sys/scsi/scsi.h>

int prefixtran_quiesce(dev_info_t *hba-dip);
int prefixtran_unquiesce(dev_info_t *hba-dip);

tran_quiesce(9E) and tran_unquiesce(9E) are used for SCSI devices that are not designed for hot-plugging. These functions must be implemented by an HBA driver to support dynamic reconfiguration (DR).

The tran_quiesce() and tran_unquiesce() vectors in the scsi_hba_tran(9S) structure should be initialized to point to HBA entry points during attach(9E). These functions are called when a user initiates quiesce and unquiesce operations.

The tran_quiesce() entry point stops all activity on a SCSI bus prior to and during the reconfiguration of devices that are attached to the SCSI bus. The tran_unquiesce() entry point is called by the SCSA framework to resume activity on the SCSI bus after the reconfiguration operation has been completed.

HBA drivers are required to handle tran_quiesce() by waiting for all outstanding commands to complete before returning success. After the driver has quiesced the bus, any new I/O requests must be queued until the SCSA framework calls the corresponding tran_unquiesce() entry point.

HBA drivers handle calls to tran_unquiesce() by starting any target driver I/O requests in the queue.

SCSI HBA Driver Specific Issues

The section covers issues specific to SCSI HBA drivers.

Installing HBA Drivers

A SCSI HBA driver is installed in similar fashion to a leaf driver. See Chapter 21, Compiling, Loading, Packaging, and Testing Drivers. The difference is that the add_drv(1M) command must specify the driver class as SCSI, such as:

# add_drv -m" * 0666 root root" -i'"pci1077,1020"' -c scsi isp

HBA Configuration Properties

When attaching an instance of an HBA device, scsi_hba_attach_setup(9F) creates a number of SCSI configuration properties for that HBA instance. A particular property is created only if no existing property of the same name is already attached to the HBA instance. This restriction avoids overriding any default property values in an HBA configuration file.

An HBA driver must use ddi_prop_get_int(9F) to retrieve each property. The HBA driver then modifies or accepts the default value of the properties to configure its specific operation.

scsi-reset-delay Property

The scsi-reset-delay property is an integer specifying the recovery time in milliseconds for a reset delay by either a SCSI bus or SCSI device.

scsi-options Property

The scsi-options property is an integer specifying a number of options through individually defined bits:

Per-Target scsi-options

An HBA driver might support a per-target scsi-options feature in the following format:

target<n>-scsi-options=<hex value>

In this example, < n> is the target ID. If the per-target scsi-options property is defined, the HBA driver uses that value rather than the per-HBA driver instance scsi-options property. This approach can provide more precise control if, for example, synchronous data transfer needs to be disabled for just one particular target device. The per-target scsi-options property can be defined in the driver.conf(4) file.

The following example shows a per-target scsi-options property definition to disable synchronous data transfer for target device 3:

target3-scsi-options=0x2d8

x86 Target Driver Configuration Properties

Some x86 SCSI target drivers, such as the driver for cmdk disk, use the following configuration properties:

If you use the cmdk sample driver to write an HBA driver for an x86 platform, any appropriate properties must be defined in the driver.conf(4) file.


Note –

These property definitions should appear only in an HBA driver's driver.conf(4) file. The HBA driver itself should not inspect or attempt to interpret these properties in any way. These properties are advisory only and serve as an adjunct to the cmdk driver. The properties should not be relied upon in any way. The property definitions might not be used in future releases.


The disk property can be used to define the type of disk supported by cmdk. For a SCSI HBA, the only possible value for the disk property is:

The queue property defines how the disk driver sorts the queue of incoming requests during strategy(9E). Two values are possible:

The flow_control property defines how commands are transported to the HBA driver. Three values are possible:

The following example is a driver.conf(4) file for use with an x86 HBA PCI device that has been designed for use with the cmdk sample driver:

#
# config file for ISP 1020 SCSI HBA driver     
#
       flow_control="dsngl" queue="qsort" disk="scdk"
       scsi-initiator-id=7;

Support for Queuing

For a definition of tagged queuing, refer to the SCSI-2 specification. To support tagged queuing, first check the scsi_options flag SCSI_OPTIONS_TAG to see whether tagged queuing is enabled globally. Next, check to see whether the target is a SCSI-2 device and whether the target has tagged queuing enabled. If these conditions are all true, attempt to enable tagged queuing by using scsi_ifsetcap(9F).

If tagged queuing fails, you can attempt to set untagged queuing. In this mode, you submit as many commands as you think necessary or optimal to the host adapter driver. Then the host adapter queues the commands to the target one command at a time, in contrast to tagged queuing. In tagged queuing, the host adapter submits as many commands as possible until the target indicates that the queue is full.

Chapter 19 Drivers for Network Devices

To write a network driver for the Solaris OS, use the Solaris Generic LAN Driver (GLD) framework.

GLDv3 Network Device Driver Framework

The GLDv3 framework is a function calls-based interface of MAC plugins and MAC driver service routines and structures. The GLDv3 framework implements the necessary STREAMS entry points on behalf of GLDv3 compliant drivers and handles DLPI compatibility.

This section discusses the following topics:

GLDv3 MAC Registration

GLDv3 defines a driver API for drivers that register with a plugin type of MAC_PLUGIN_IDENT_ETHER.

GLDv3 MAC Registration Process

A GLDv3 device driver must perform the following steps to register with the MAC layer:

GLDv3 MAC Registration Functions

The GLDv3 interface includes driver entry points that are advertised during registration with the MAC layer and MAC entry points that are invoked by drivers.

The mac_init_ops() and mac_fini_ops() Functions

void mac_init_ops(struct dev_ops *ops, const char *name);

A GLDv3 device driver must invoke the mac_init_ops(9F) function in its _init(9E) entry point before calling mod_install(9F).

void mac_fini_ops(struct dev_ops *ops);

A GLDv3 device driver must invoke the mac_fini_ops(9F) function in its _fini(9E) entry point after calling mod_remove(9F).


Example 19–1 The mac_init_ops() and mac_fini_ops() Functions

int
_init(void)
{
        int     rv;
        mac_init_ops(&xx_devops, "xx");
        if ((rv = mod_install(&xx_modlinkage)) != DDI_SUCCESS) {
                mac_fini_ops(&xx_devops);
        }
        return (rv);
}

int
_fini(void)
{
        int     rv;
        if ((rv = mod_remove(&xx_modlinkage)) == DDI_SUCCESS) {
                mac_fini_ops(&xx_devops);
        }
        return (rv);
}

The mac_alloc() and mac_free() Functions

mac_register_t *mac_alloc(uint_t version);

The mac_alloc(9F) function allocates a new mac_register structure and returns a pointer to it. Initialize the structure members before you pass the new structure to mac_register(). MAC-private elements are initialized by the MAC layer before mac_alloc() returns. The value of version must be MAC_VERSION_V1.

void mac_free(mac_register_t *mregp);

The mac_free(9F) function frees a mac_register structure that was previously allocated by mac_alloc().

The mac_register() and mac_unregister() Functions

int mac_register(mac_register_t *mregp, mac_handle_t *mhp);

To register a new instance with the MAC layer, a GLDv3 driver must invoke the mac_register(9F) function in its attach(9E) entry point. The mregp argument is a pointer to a mac_register registration information structure. On success, the mhp argument is a pointer to a MAC handle for the new MAC instance. This handle is needed by other routines such as mac_tx_update(), mac_link_update(), and mac_rx().


Example 19–2 The mac_alloc(), mac_register(), and mac_free() Functions and mac_register Structure

int
xx_attach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
        mac_register_t        *macp;

/* ... */

        if ((macp = mac_alloc(MAC_VERSION)) == NULL) {
                xx_error(dip, "mac_alloc failed");
                goto failed;
        }

        macp->m_type_ident = MAC_PLUGIN_IDENT_ETHER;
        macp->m_driver = xxp;
        macp->m_dip = dip;
        macp->m_src_addr = xxp->xx_curraddr;
        macp->m_callbacks = &xx_m_callbacks;
        macp->m_min_sdu = 0;
        macp->m_max_sdu = ETHERMTU;
        macp->m_margin = VLAN_TAGSZ;

        if (mac_register(macp, &xxp->xx_mh) == DDI_SUCCESS) {
                mac_free(macp);
                return (DDI_SUCCESS);
        }

/* failed to register with MAC */
        mac_free(macp);
failed:
        /* ... */
}

int mac_unregister(mac_handle_t mh);

The mac_unregister(9F) function unregisters a MAC instance that was previously registered with mac_register(). The mh argument is the MAC handle that was allocated by mac_register(). Invoke mac_unregister() from the detach(9E) entry point.


Example 19–3 The mac_unregister() Function

int
xx_detach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
        xx_t        *xxp; /* driver soft state */

        /* ... */

        switch (cmd) {
        case DDI_DETACH:

                if (mac_unregister(xxp->xx_mh) != 0) {
                        return (DDI_FAILURE);
                }
        /* ... */
}

GLDv3 MAC Registration Data Structures

The structures described in this section are defined in the sys/mac_provider.h header file. Include the following three MAC header files in your GLDv3 driver: sys/mac.h, sys/mac_ether.h, and sys/mac_provider.h. Do not include any other MAC-related header file.

The mac_register(9S) data structure is the MAC registration information structure that is allocated by mac_alloc() and passed to mac_register(). Initialize the structure members before you pass the new structure to mac_register(). MAC-private elements are initialized by the MAC layer before mac_alloc() returns. The m_version structure member is the MAC version. Do not modify the MAC version. The m_type_ident structure member is the MAC type identifier. Set the MAC type identifier to MAC_PLUGIN_IDENT_ETHER. The m_callbacks member of the mac_register structure is a pointer to an instance of the mac_callbacks structure.

The mac_callbacks(9S) data structure is the structure that your device driver uses to expose its entry points to the MAC layer. These entry points are used by the MAC layer to control the driver. These entry points are used to do tasks such as start and stop the adapters, manage multicast addresses, set promiscuous mode, query the capabilities of the adapter, and get and set properties. See Table 19–1 for a complete list of required and optional GLDv3 entry points. Provide a pointer to your mac_callbacks structure in the m_callbacks field of the mac_register structure.

The mc_callbacks member of the mac_callbacks structure is a bit mask that is a combination of the following flags that specify which of the optional entry points are implemented by the driver. Other members of the mac_callbacks structure are pointers to each of the entry points of the driver.

MC_IOCTL

The mc_ioctl() entry point is present.

MC_GETCAPAB

The mc_getcapab() entry point is present.

MC_SETPROP

The mc_setprop() entry point is present.

MC_GETPROP

The mc_getprop() entry point is present.

MC_PROPINFO

The mc_propinfo() entry point is present.

MC_PROPERTIES

All properties entry points are present. Setting MC_PROPERTIES is equivalent to setting all three flags: MC_SETPROP, MC_GETPROP, and MC_PROPINFO.


Example 19–4 The mac_callbacks Structure

#define XX_M_CALLBACK_FLAGS \
    (MC_IOCTL | MC_GETCAPAB | MC_PROPERTIES)

static mac_callbacks_t xx_m_callbacks = {
        XX_M_CALLBACK_FLAGS,
        xx_m_getstat,     /* mc_getstat() */
        xx_m_start,       /* mc_start() */
        xx_m_stop,        /* mc_stop() */
        xx_m_promisc,     /* mc_setpromisc() */
        xx_m_multicst,    /* mc_multicst() */
        xx_m_unicst,      /* mc_unicst() */
        xx_m_tx,          /* mc_tx() */
        NULL,             /* Reserved, do not use */
        xx_m_ioctl,       /* mc_ioctl() */
        xx_m_getcapab,    /* mc_getcapab() */
        NULL,             /* Reserved, do not use */
        NULL,             /* Reserved, do not use */
        xx_m_setprop,     /* mc_setprop() */
        xx_m_getprop,     /* mc_getprop() */
        xx_m_propinfo     /* mc_propinfo() */
};

GLDv3 Capabilities

GLDv3 implements a capability mechanism that allows the framework to query and enable capabilities that are supported by the GLDv3 driver. Use the mc_getcapab(9E)entry point to report capabilities. If a capability is supported by the driver, pass information about that capability, such as capability-specific entry points or flags through mc_getcapab(). Pass a pointer to the mc_getcapab() entry point in the mac_callback structure. See GLDv3 MAC Registration Data Structures for more information about the mac_callbacks structure.

boolean_t mc_getcapab(void *driver_handle, mac_capab_t cap, void *cap_data);

The cap argument specifies the type of capability being queried. The value of cap can be either MAC_CAPAB_HCKSUM (hardware checksum offload) or MAC_CAPAB_LSO (large segment offload). Use the cap_data argument to return the capability data to the framework.

If the driver supports the cap capability, the mc_getcapab() entry point must return B_TRUE. If the driver does not support the cap capability, mc_getcapab() must return B_FALSE.


Example 19–5 The mc_getcapab() Entry Point

static boolean_t
xx_m_getcapab(void *arg, mac_capab_t cap, void *cap_data)
{
        switch (cap) {
        case MAC_CAPAB_HCKSUM: {
                uint32_t *txflags = cap_data;
                *txflags = HCKSUM_INET_FULL_V4 | HCKSUM_IPHDRCKSUM;
                break;
        }
        case MAC_CAPAB_LSO: {
                /* ... */
                break;
        }
        default:
                return (B_FALSE);
        }
        return (B_TRUE);
}

The following sections describe the supported capabilities and the corresponding capability data to return.

Hardware Checksum Offload

To get data about support for hardware checksum offload, the framework sends MAC_CAPAB_HCKSUM in the cap argument. See Hardware Checksum Offload Capability Information.

To query checksum offload metadata and retrieve the per-packet hardware checksumming metadata when hardware checksumming is enabled, use mac_hcksum_get(9F). See The mac_hcksum_get() Function Flags.

To set checksum offload metadata, use mac_hcksum_set(9F). See The mac_hcksum_set() Function Flags.

See Hardware Checksumming: Hardware and Hardware Checksumming: MAC Layer for more information.

Hardware Checksum Offload Capability Information

To pass information about the MAC_CAPAB_HCKSUM capability to the framework, the driver must set a combination of the following flags in cap_data, which points to a uint32_t. These flags indicate the level of hardware checksum offload that the driver is capable of performing for outbound packets.

HCKSUM_INET_PARTIAL

Partial 1's complement checksum ability

HCKSUM_INET_FULL_V4

Full 1's complement checksum ability for IPv4 packets

HCKSUM_INET_FULL_V6

Full 1's complement checksum ability for IPv6 packets

HCKSUM_IPHDRCKSUM

IPv4 Header checksum offload capability

The mac_hcksum_get() Function Flags

The flags argument of mac_hcksum_get() is a combination of the following values:

HCK_FULLCKSUM

Compute the full checksum for this packet.

HCK_FULLCKSUM_OK

The full checksum was verified in hardware and is correct.

HCK_PARTIALCKSUM

Compute the partial 1's complement checksum based on other parameters passed to mac_hcksum_get(). HCK_PARTIALCKSUM is mutually exclusive with HCK_FULLCKSUM.

HCK_IPV4_HDRCKSUM

Compute the IP header checksum.

HCK_IPV4_HDRCKSUM_OK

The IP header checksum was verified in hardware and is correct.

The mac_hcksum_set() Function Flags

The flags argument of mac_hcksum_set() is a combination of the following values:

HCK_FULLCKSUM

The full checksum was computed and passed through the value argument.

HCK_FULLCKSUM_OK

The full checksum was verified in hardware and is correct.

HCK_PARTIALCKSUM

The partial checksum was computed and passed through the value argument. HCK_PARTIALCKSUM is mutually exclusive with HCK_FULLCKSUM.

HCK_IPV4_HDRCKSUM

The IP header checksum was computed and passed through the value argument.

HCK_IPV4_HDRCKSUM_OK

The IP header checksum was verified in hardware and is correct.

Large Segment (or Send) Offload

To query support for large segment (or send) offload, the framework sends MAC_CAPAB_LSO in the cap argument and expects the information back in cap_data, which points to a mac_capab_lso(9S) structure. The framework allocates the mac_capab_lso structure and passes a pointer to this structure in cap_data. The mac_capab_lso structure consists of an lso_basic_tcp_ipv4(9S) structure and an lso_flags member. If the driver instance supports LSO for TCP on IPv4, set the LSO_TX_BASIC_TCP_IPV4 flag in lso_flags and set the lso_max member of the lso_basic_tcp_ipv4 structure to the maximum payload size supported by the driver instance.

Use mac_lso_get(9F) to obtain per-packet LSO metadata. If LSO is enabled for this packet, the HW_LSO flag is set in the mac_lso_get() flags argument. The maximum segment size (MSS) to be used during segmentation of the large segment is returned through the location pointed to by the mss argument. See Large Segment Offload for more information.

GLDv3 Data Paths

Data-path entry points are comprised of the following components:

Transmit Data Path

The GLDv3 framework uses the transmit entry point, mc_tx(9E), to pass a chain of message blocks to the driver. Provide a pointer to the mc_tx() entry point in your mac_callbacks structure. See GLDv3 MAC Registration Data Structures for more information about the mac_callbacks structure.


Example 19–6 The mc_tx() Entry Point

mblk_t *
xx_m_tx(void *arg, mblk_t *mp)
{
        xx_t    *xxp = arg;
        mblk_t   *nmp;

        mutex_enter(&xxp->xx_xmtlock);

        if (xxp->xx_flags & XX_SUSPENDED) {
                while ((nmp = mp) != NULL) {
                        xxp->xx_carrier_errors++;
                        mp = mp->b_next;
                        freemsg(nmp);
                }
                mutex_exit(&xxp->xx_xmtlock);
                return (NULL);
        }

        while (mp != NULL) {
                nmp = mp->b_next;
                mp->b_next = NULL;

                if (!xx_send(xxp, mp)) {
                        mp->b_next = nmp;
                        break;
                }
                mp = nmp;
        }
        mutex_exit(&xxp->xx_xmtlock);

        return (mp);
}

The following sections discuss topics related to transmitting data to the hardware.

Flow Control

If the driver cannot send the packets because of insufficient hardware resources, the driver returns the sub-chain of packets that could not be sent. When more descriptors become available at a later time, the driver must invoke mac_tx_update(9F) to notify the framework.

Hardware Checksumming: Hardware

If the driver specified hardware checksum support (see Hardware Checksum Offload), then the driver must do the following tasks:

Large Segment Offload

If the driver specified LSO capabilities (see Large Segment (or Send) Offload), then the driver must use mac_lso_get(9F) to query whether LSO must be performed on the packet.

Virtual LAN: Hardware

When the administrator configures VLANs, the MAC layer inserts the needed VLAN headers on the outbound packets before they are passed to the driver through the mc_tx() entry point.

Receive Data Path

Call the mac_rx(9F) function in your driver's interrupt handler to pass a chain of one or more packets up the stack to the MAC layer. Avoid holding mutex or other locks during the call to mac_rx(). In particular, do not hold locks that could be taken by a transmit thread during a call to mac_rx(). See mc_unicst(9E) for information about the packets that must be sent up to the MAC layer.

The following sections discuss topics related to sending data to the MAC layer.

Hardware Checksumming: MAC Layer

If the driver specified hardware checksum support (see Hardware Checksum Offload), then the driver must use the mac_hcksum_set(9F) function to associate hardware checksumming metadata with the packet.

Virtual LAN: MAC Layer

VLAN packets must be passed with their tags to the MAC layer. Do not strip the VLAN headers from the packets.

GLDv3 State Change Notifications

A driver can call the following functions to notify the network stack that the driver's state has changed.

void mac_tx_update(mac_handle_t mh);

The mac_tx_update(9F) function notifies the framework that more TX descriptors are available. If mc_tx() returns a non-empty chain of packets, then the driver must call mac_tx_update() as soon as possible after resources are available to inform the MAC layer to retry the packets that were returned as not sent. See Transmit Data Path for more information about the mc_tx() entry point.

void mac_link_update(mac_handle_t mh, link_state_t new_state);

The mac_link_update(9F) function notifies the MAC layer that the state of the media link has changed. The new_state argument must be one of the following values:

LINK_STATE_UP

The media link is up.

LINK_STATE_DOWN

The media link is down.

LINK_STATE_UNKNOWN

The media link is unknown.

GLDv3 Network Statistics

Device drivers maintain a set of statistics for the device instances they manage. The MAC layer queries these statistics through the mc_getstat(9E) entry point of the driver.

int mc_getstat(void *driver_handle, uint_t stat, uint64_t *stat_value);

The GLDv3 framework uses stat to specify the statistic being queried. The driver uses stat_value to return the value of the statistic specified by stat. If the value of the statistic is returned, mc_getstat() must return 0. If the stat statistic is not supported by the driver, mc_getstat() must return ENOTSUP.

The GLDv3 statistics that are supported are the union of generic MAC statistics and Ethernet-specific statistics. See the mc_getstat(9E) man page for a complete list of supported statistics.


Example 19–7 The mc_getstat() Entry Point

int
xx_m_getstat(void *arg, uint_t stat, uint64_t *val)
{
        xx_t    *xxp = arg;

        mutex_enter(&xxp->xx_xmtlock);
        if ((xxp->xx_flags & (XX_RUNNING|XX_SUSPENDED)) == XX_RUNNING)
                xx_reclaim(xxp);
        mutex_exit(&xxp->xx_xmtlock);

        switch (stat) {
        case MAC_STAT_MULTIRCV:
                *val = xxp->xx_multircv;
                break;
        /* ... */
        case ETHER_STAT_MACRCV_ERRORS:
                *val = xxp->xx_macrcv_errors;
                break;
        /* ... */
        default:
                return (ENOTSUP);
        }
        return (0);
}

GLDv3 Properties

Use the mc_propinfo(9E) entry point to return immutable attributes of a property. This information includes permissions, default values, and allowed value ranges. Use mc_setprop(9E) to set the value of a property for this particular driver instance. Use mc_getprop(9E) to return the current value of a property.

See the mc_propinfo(9E) man page for a complete list of properties and their types.

The mc_propinfo() entry point should invoke the mac_prop_info_set_perm(), mac_prop_info_set_default(), and mac_prop_info_set_range() functions to associate specific attributes of the property being queried, such as default values, permissions, or allowed value ranges.

The mac_prop_info_set_default_uint8(9F), mac_prop_info_set_default_str(9F), and mac_prop_info_set_default_link_flowctrl(9F) functions associate a default value with a specific property. The mac_prop_info_set_range_uint32(9F) function associates an allowed range of values for a specific property.

The mac_prop_info_set_perm(9F) function specifies the permission of the property. The permission can be one of the following values:

MAC_PROP_PERM_READ

The property is read-only

MAC_PROP_PERM_WRITE

The property is write-only

MAC_PROP_PERM_RW

The property can be read and written

If the mc_propinfo() entry point does not call mac_prop_info_set_perm() for a particular property, the GLDv3 framework assumes that the property has read and write permissions, corresponding to MAC_PROP_PERM_RW.

In addition to the properties listed in the mc_propinfo(9E) man page, drivers can also expose driver-private properties. Use the m_priv_props field of the mac_register structure to specify driver-private properties supported by the driver. The framework passes the MAC_PROP_PRIVATE property ID in mc_setprop(), mc_getprop(), or mc_propinfo(). See the mc_propinfo(9E) man page for more information.

Summary of GLDv3 Interfaces

The following table lists entry points, other DDI functions, and data structures that are part of the GLDv3 network device driver framework.

Table 19–1 GLDv3 Interfaces

Interface Name 

Description 

Required Entry Points

mc_getstat(9E)

Retrieve network statistics from the driver. See GLDv3 Network Statistics.

mc_start(9E)

Start a driver instance. The GLDv3 framework invokes the start entry point before any operation is attempted. 

mc_stop(9E)

Stop a driver instance. The MAC layer invokes the stop entry point before the device is detached. 

mc_setpromisc(9E)

Change the promiscuous mode of the device driver instance. 

mc_multicst(9E)

Add or remove a multicast address. 

mc_unicst(9E)

Set the primary unicast address. The device must start passing back through mac_rx() the packets with a destination MAC address that matches the new unicast address. See Receive Data Path for information about mac_rx().

mc_tx(9E)

Send one or more packets. See Transmit Data Path.

Optional Entry Points

mc_ioctl(9E)

Optional ioctl driver interface. This facility is intended to be used only for debugging purposes. 

mc_getcapab(9E)

Retrieve capabilities. See GLDv3 Capabilities.

mc_setprop(9E)

Set a property value. See GLDv3 Properties.

mc_getprop(9E)

Get a property value. See GLDv3 Properties.

mc_propinfo(9E)

Get information about a property. See GLDv3 Properties.

Data Structures

mac_register(9S)

Registration information. See GLDv3 MAC Registration Data Structures.

mac_callbacks(9S)

Driver callbacks. See GLDv3 MAC Registration Data Structures.

mac_capab_lso(9S)

LSO metadata. See Large Segment (or Send) Offload.

lso_basic_tcp_ipv4(9S)

LSO metadata for TCP/IPv4. See Large Segment (or Send) Offload.

MAC Registration Functions

mac_alloc(9F)

Allocate a new mac_register structure. See GLDv3 MAC Registration.

mac_free(9F)

Free a mac_register structure.

mac_register(9F)

Register with the MAC layer. 

mac_unregister(9F)

Unregister from the MAC layer. 

mac_init_ops(9F)

Initialize the driver's dev_ops(9S) structure.

mac_fini_ops(9F)

Release the driver's dev_ops structure.

Data Transfer Functions

mac_rx(9F)

Pass up received packets. See Receive Data Path.

mac_tx_update(9F)

TX resources are available. See GLDv3 State Change Notifications.

mac_link_update(9F)

Link state has changed. 

mac_hcksum_get(9F)

Retrieve hardware checksum information. See Hardware Checksum Offload and Transmit Data Path.

mac_hcksum_set(9F)

Attach hardware checksum information. See Hardware Checksum Offload and Receive Data Path.

mac_lso_get(9F)

Retrieve LSO information. See Large Segment (or Send) Offload.

Properties Functions

mac_prop_info_set_perm(9F)

Set the permission of a property. See GLDv3 Properties.

mac_prop_info_set_default_uint8(9F), mac_prop_info_set_default_str(9F), mac_prop_info_set_default_link_flowctrl(9F)

Set a property value. 

mac_prop_info_set_range_uint32(9F)

Set a property values range. 

GLDv2 Network Device Driver Framework

GLDv2 is a multi-threaded, clonable, loadable kernel module that provides support to device drivers for local area networks. Local area network (LAN) device drivers in the Solaris OS are STREAMS-based drivers that use the Data Link Provider Interface (DLPI) to communicate with network protocol stacks. These protocol stacks use the network drivers to send and receive packets on a LAN. The GLDv2 implements much of the STREAMS and DLPI functionality for a Solaris LAN driver. The GLDv2 provides common code that many network drivers can share. Using the GLDv2 reduces duplicate code and simplifies your network driver.

For more information about GLDv2, see the gld(7D) man page.

STREAMS drivers are documented in Part II, Kernel Interface, in STREAMS Programming Guide. Specifically, see Chapter 9, “STREAMS Drivers,” in the STREAMS guide. The STREAMS framework is a message-based framework. Interfaces that are unique to STREAMS drivers include STREAMS message queue processing entry points.

The DLPI specifies an interface to the Data Link Services (DLS) of the Data Link Layer of the OSI Reference Model. The DLPI enables a DLS user to access and use any of a variety of conforming DLS providers without special knowledge of the provider's protocol. The DLPI specifies access to the DLS provider in the form of M_PROTO and M_PCPROTO type STREAMS messages. A DLPI module uses STREAMS ioctl calls to link to the MAC sub-layer. For more information about the DLPI protocol, including Solaris-specific DPLI extensions, see the dlpi(7P) man page. For general information about DLPI, see the DLPI standard at http://www.opengroup.org/pubs/catalog/c811.htm.

A Solaris network driver that is implemented using GLDv2 has two distinct parts:

GLDv2 drivers must process fully formed MAC-layer packets and must not perform logical link control (LLC) handling.

This section discusses the following topics:

GLDv2 Device Support

The GLDv2 framework supports the following types of devices:

Ethernet V2 and ISO 8802-3 (IEEE 802.3)

For devices that are declared to be type DL_ETHER, GLDv2 provides support for both Ethernet V2 and ISO 8802-3 (IEEE 802.3) packet processing. Ethernet V2 enables a user to access a conforming provider of data link services without special knowledge of the provider's protocol. A service access point (SAP) is the point through which the user communicates with the service provider.

Streams bound to SAP values in the range [0-255] are treated as equivalent and denote that the user wants to use 8802-3 mode. If the SAP value of the DL_BIND_REQ is within this range, GLDv2 computes the length of each subsequent DL_UNITDATA_REQ message on that stream. The length does not include the 14-byte media access control (MAC) header. GLDv2 then transmits 8802-3 frames that have those lengths in the MAC frame header type fields. Such lengths do not exceed 1500.

Frames that have a type field in the range [0-1500] are assumed to be 8802-3 frames. These frames are routed up all open streams in 8802-3 mode. Those streams with SAP values in the [0-255] range are considered to be in 8802-3 mode. If more than one stream is in 8802-3 mode, the incoming frame is duplicated and routed up these streams.

Those streams that are bound to SAP values that are greater than 1500 are assumed to be in Ethernet V2 mode. These streams receive incoming packets whose Ethernet MAC header type value exactly matches the value of the SAP to which the stream is bound.

TPR and FDDI: SNAP Processing

For media types DL_TPR and DL_FDDI, GLDv2 implements minimal SNAP (Sub-Net Access Protocol) processing. This processing is for any stream that is bound to a SAP value that is greater than 255. SAP values in the range [0-255] are LLC SAP values. Such values are carried naturally by the media packet format. SAP values that are greater than 255 require a SNAP header, subordinate to the LLC header, to carry the 16-bit Ethernet V2-style SAP value.

SNAP headers are carried under LLC headers with destination SAP 0xAA. Outbound packets with SAP values that are greater than 255 require an LLC+SNAP header take the following form:

AA AA 03 00 00 00 XX XX

XX XX represents the 16-bit SAP, corresponding to the Ethernet V2 style type. This header is unique in supporting non-zero organizational unique identifier fields. LLC control fields other than 03 are considered to be LLC packets with SAP 0xAA. Clients that want to use SNAP formats other than this format must use LLC and bind to SAP 0xAA.

Incoming packets are checked for conformance with the above format. Packets that conform are matched to any streams that have been bound to the packet's 16-bit SNAP type. In addition, these packets are considered to match the LLC SNAP SAP 0xAA.

Packets received for any LLC SAP are passed up all streams that are bound to an LLC SAP, as described for media type DL_ETHER.

TPR: Source Routing

For type DL_TPR devices, GLDv2 implements minimal support for source routing.

Source routing support includes the following tasks:

Source routing adds routing information fields to the MAC headers of outgoing packets. In addition, this support recognizes such fields in incoming packets.

GLDv2 source routing support does not implement the full route determination entity (RDE) specified in Section 9 of ISO 8802-2 (IEEE 802.2). However, this support can interoperate with any RDE implementations that might exist in the same or a bridged network.

GLDv2 DLPI Providers

GLDv2 implements both Style 1 and Style 2 DLPI providers. A physical point of attachment (PPA) is the point at which a system attaches itself to a physical communication medium. All communication on that physical medium funnels through the PPA. The Style 1 provider attaches the streams to a particular PPA based on the major or minor device that has been opened. The Style 2 provider requires the DLS user to explicitly identify the desired PPA using DL_ATTACH_REQ. In this case, open(9E) creates a stream between the user and GLDv2, and DL_ATTACH_REQ subsequently associates a particular PPA with that stream. Style 2 is denoted by a minor number of zero. If a device node whose minor number is not zero is opened, Style 1 is indicated and the associated PPA is the minor number minus 1. In both Style 1 and Style 2 opens, the device is cloned.

GLDv2 DLPI Primitives

GLDv2 implements several DLPI primitives. The DL_INFO_REQ primitive requests information about the DLPI streams. The message consists of one M_PROTO message block. GLDv2 returns device-dependent values in the DL_INFO_ACK response to this request. These values are based on information that the GLDv2-based driver specified in the gld_mac_info(9S) structure that was passed to the gld_register(9F) function.

GLDv2 returns the following values on behalf of all GLDv2-based drivers:


Note –

Contrary to the DLPI specification, GLDv2 returns the correct address length and broadcast address of the device in DL_INFO_ACK even before the stream has been attached to a PPA.


The DL_ATTACH_REQ primitive is used to associate a PPA with a stream. This request is needed for Style 2 DLS providers to identify the physical medium over which the communication is sent. Upon completion, the state changes from DL_UNATTACHED to DL_UNBOUND. The message consists of one M_PROTO message block. This request is not allowed when Style 1 mode is used. Streams that are opened using Style 1 are already attached to a PPA by the time the open completes.

The DL_DETACH_REQ primitive requests to detach the PPA from the stream. This detachment is allowed only if the stream was opened using Style 2.

The DL_BIND_REQ and DL_UNBIND_REQ primitives bind and unbind a DLSAP (data link service access point) to the stream. The PPA that is associated with a stream completes initialization before the completion of the processing of the DL_BIND_REQ on that stream. You can bind multiple streams to the same SAP. Each stream in this case receives a copy of any packets that were received for that SAP.

The DL_ENABMULTI_REQ and DL_DISABMULTI_REQ primitives enable and disable reception of individual multicast group addresses. Through iterative use of these primitives, an application or other DLS user can create or modify a set of multicast addresses. The streams must be attached to a PPA for these primitives to be accepted.

The DL_PROMISCON_REQ and DL_PROMISCOFF_REQ primitives turn promiscuous mode on or off on a per-stream basis. These controls operate at either at a physical level or at the SAP level. The DL Provider routes all received messages on the media to the DLS user. Routing continues until a DL_DETACH_REQ is received, a DL_PROMISCOFF_REQ is received, or the stream is closed. You can specify physical level promiscuous reception of all packets on the medium or of multicast packets only.


Note –

The streams must be attached to a PPA for these promiscuous mode primitives to be accepted.


The DL_UNITDATA_REQ primitive is used to send data in a connectionless transfer. Because this service is not acknowledged, delivery is not guaranteed. The message consists of one M_PROTO message block followed by one or more M_DATA blocks containing at least one byte of data.

The DL_UNITDATA_IND type is used when a packet is to be passed on upstream. The packet is put into an M_PROTO message with the primitive set to DL_UNITDATA_IND.

The DL_PHYS_ADDR_REQ primitive requests the MAC address currently associated with the PPA attached to the streams. The address is returned by the DL_PHYS_ADDR_ACK primitive. When using Style 2, this primitive is only valid following a successful DL_ATTACH_REQ.

The DL_SET_PHYS_ADDR_REQ primitive changes the MAC address currently associated with the PPA attached to the streams. This primitive affects all other current and future streams attached to this device. Once changed, all streams currently or subsequently opened and attached to this device obtain this new physical address. The new physical address remains in effect until this primitive changes the physical address again or the driver is reloaded.


Note –

The superuser is allowed to change the physical address of a PPA while other streams are bound to the same PPA.


The DL_GET_STATISTICS_REQ primitive requests a DL_GET_STATISTICS_ACK response containing statistics information associated with the PPA attached to the stream. Style 2 Streams must be attached to a particular PPA using DL_ATTACH_REQ before this primitive can succeed.

GLDv2 I/O Control Functions

GLDv2 implements the ioctl ioc_cmd function described below. If GLDv2 receives an unrecognizable ioctl command, GLDv2 passes the command to the device-specific driver's gldm_ioctl() routine, as described in gld(9E).

The DLIOCRAW ioctl function is used by some DLPI applications, most notably the snoop(1M) command. The DLIOCRAW command puts the stream into a raw mode. In raw mode, the driver passes full MAC-level incoming packets upstream in M_DATA messages instead of transforming the packets into the DL_UNITDATA_IND form. The DL_UNITDATA_IND form is normally used for reporting incoming packets. Packet SAP filtering is still performed on streams that are in raw mode. If a stream user wants to receive all incoming packets, the user must also select the appropriate promiscuous modes. After successfully selecting raw mode, the application is also allowed to send fully formatted packets to the driver as M_DATA messages for transmission. DLIOCRAW takes no arguments. Once enabled, the stream remains in this mode until closed.

GLDv2 Driver Requirements

GLDv2-based drivers must include the header file <sys/gld.h>.

GLDv2-based drivers must be linked with the -N“misc/gld” option:

%ld -r -N"misc/gld" xx.o -o xx

GLDv2 implements the following functions on behalf of the device-specific driver:

The mi_idname element of the module_info(9S) structure is a string that specifies the name of the driver. This string must exactly match the name of the driver module as defined in the file system.

The read-side qinit(9S) structure should specify the following elements:

qi_putp

NULL

qi_srvp

gld_rsrv

qi_qopen

gld_open

qi_qclose

gld_close

The write-side qinit(9S) structure should specify these elements:

qi_putp

gld_wput

qi_srvp

gld_wsrv

qi_qopen

NULL

qi_qclose

NULL

The devo_getinfo element of the dev_ops(9S) structure should specify gld_getinfo as the getinfo(9E) routine.

The driver's attach(9E) function associates the hardware-specific device driver with the GLDv2 facility. attach() then prepares the device and driver for use.

The attach(9E) function allocates a gld_mac_info(9S) structure using gld_mac_alloc(). The driver usually needs to save more information per device than is defined in the macinfo structure. The driver should allocate the additional required data structure and save a pointer to the structure in the gldm_private member of the gld_mac_info(9S) structure.

The attach(9E) routine must initialize the macinfo structure as described in the gld_mac_info(9S) man page. The attach() routine should then call gld_register() to link the driver with the GLDv2 module. The driver should map registers if necessary and be fully initialized and prepared to accept interrupts before calling gld_register(). The attach(9E) function should add interrupts but should not enable the device to generate these interrupts. The driver should reset the hardware before calling gld_register() to ensure the hardware is quiescent. A device must not be put into a state where the device might generate an interrupt before gld_register() is called. The device is started later when GLDv2 calls the driver's gldm_start() entry point, which is described in the gld(9E) man page. After gld_register() succeeds, the gld(9E) entry points might be called by GLDv2 at any time.

The attach(9E) routine should return DDI_SUCCESS if gld_register() succeeds. If gld_register() fails, DDI_FAILURE is returned. If a failure occurs, the attach(9E) routine should deallocate any resources that were allocated before gld_register() was called. The attach routine should then also return DDI_FAILURE. A failed macinfo structure should never be reused. Such a structure should be deallocated using gld_mac_free().

The detach(9E)function should attempt to unregister the driver from GLDv2 by calling gld_unregister(). For more information about gld_unregister(), see the gld(9F) man page. The detach(9E) routine can get a pointer to the needed gld_mac_info(9S) structure from the device's private data using ddi_get_driver_private(9F). gld_unregister() checks certain conditions that could require that the driver not be detached. If the checks fail, gld_unregister() returns DDI_FAILURE, in which case the driver's detach(9E) routine must leave the device operational and return DDI_FAILURE.

If the checks succeed, gld_unregister() ensures that the device interrupts are stopped. The driver's gldm_stop() routine is called if necessary. The driver is unlinked from the GLDv2 framework. gld_unregister() then returns DDI_SUCCESS. In this case, the detach(9E) routine should remove interrupts and use gld_mac_free() to deallocate any macinfo data structures that were allocated in the attach(9E) routine. The detach() routine should then return DDI_SUCCESS. The routine must remove the interrupt before calling gld_mac_free().

GLDv2 Network Statistics

Solaris network drivers must implement statistics variables. GLDv2 tallies some network statistics, but other statistics must be counted by each GLDv2-based driver. GLDv2 provides support for GLDv2-based drivers to report a standard set of network driver statistics. Statistics are reported by GLDv2 using the kstat(7D) and kstat(9S) mechanisms. The DL_GET_STATISTICS_REQ DLPI command can also be used to retrieve the current statistics counters. All statistics are maintained as unsigned. The statistics are 32 bits unless otherwise noted.

GLDv2 maintains and reports the following statistics.

rbytes64

Total bytes successfully received on the interface. Stores 64-bit statistics.

rbytes

Total bytes successfully received on the interface

obytes64

Total bytes that have requested transmission on the interface. Stores 64-bit statistics.

obytes

Total bytes that have requested transmission on the interface.

ipackets64

Total packets successfully received on the interface. Stores 64-bit statistics.

ipackets

Total packets successfully received on the interface.

opackets64

Total packets that have requested transmission on the interface. Stores 64-bit statistics.

opackets

Total packets that have requested transmission on the interface.

multircv

Multicast packets successfully received, including group and functional addresses (long).

multixmt

Multicast packets requested to be transmitted, including group and functional addresses (long).

brdcstrcv

Broadcast packets successfully received (long).

brdcstxmt

Broadcast packets that have requested transmission (long).

unknowns

Valid received packets not accepted by any stream (long).

noxmtbuf

Packets discarded on output because transmit buffer was busy, or no buffer could be allocated for transmit (long).

blocked

Number of times a received packet could not be put up a stream because the queue was flow-controlled (long).

xmtretry

Times transmit was retried after having been delayed due to lack of resources (long).

promisc

Current “promiscuous” state of the interface (string).

The device-dependent driver tracks the following statistics in a private per-instance structure. To report statistics, GLDv2 calls the driver's gldm_get_stats() entry point. gldm_get_stats() then updates device-specific statistics in the gld_stats(9S) structure. See the gldm_get_stats(9E) man page for more information. GLDv2 then reports the updated statistics using the named statistics variables that are shown below.

ifspeed

Current estimated bandwidth of the interface in bits per second. Stores 64-bit statistics.

media

Current media type in use by the device (string).

intr

Number of times that the interrupt handler was called, causing an interrupt (long).

norcvbuf

Number of times a valid incoming packet was known to have been discarded because no buffer could be allocated for receive (long).

ierrors

Total number of packets that were received but could not be processed due to errors (long).

oerrors

Total packets that were not successfully transmitted because of errors (long).

missed

Packets known to have been dropped by the hardware on receive (long).

uflo

Times FIFO underflowed on transmit (long).

oflo

Times receiver overflowed during receive (long).

The following group of statistics applies to networks of type DL_ETHER. These statistics are maintained by device-specific drivers of that type, as shown previously.

align_errors

Packets that were received with framing errors, that is, the packets did not contain an integral number of octets (long).

fcs_errors

Packets received with CRC errors (long).

duplex

Current duplex mode of the interface (string).

carrier_errors

Number of times carrier was lost or never detected on a transmission attempt (long).

collisions

Ethernet collisions during transmit (long).

ex_collisions

Frames where excess collisions occurred on transmit, causing transmit failure (long).

tx_late_collisions

Number of times a transmit collision occurred late, that is, after 512 bit times (long).

defer_xmts

Packets without collisions where first transmit attempt was delayed because the medium was busy (long).

first_collisions

Packets successfully transmitted with exactly one collision.

multi_collisions

Packets successfully transmitted with multiple collisions.

sqe_errors

Number of times that SQE test error was reported.

macxmt_errors

Packets encountering transmit MAC failures, except carrier and collision failures.

macrcv_errors

Packets received with MAC errors, except align_errors, fcs_errors, and toolong_errors.

toolong_errors

Packets received larger than the maximum allowed length.

runt_errors

Packets received smaller than the minimum allowed length (long).

The following group of statistics applies to networks of type DL_TPR. These statistics are maintained by device-specific drivers of that type, as shown above.

line_errors

Packets received with non-data bits or FCS errors.

burst_errors

Number of times an absence of transitions for five half-bit timers was detected.

signal_losses

Number of times loss of signal condition on the ring was detected.

ace_errors

Number of times that an AMP or SMP frame, in which A is equal to C is equal to 0, is followed by another SMP frame without an intervening AMP frame.

internal_errors

Number of times the station recognized an internal error.

lost_frame_errors

Number of times the TRR timer expired during transmit.

frame_copied_errors

Number of times a frame addressed to this station was received with the FS field `A' bit set to 1.

token_errors

Number of times the station acting as the active monitor recognized an error condition that needed a token transmitted.

freq_errors

Number of times the frequency of the incoming signal differed from the expected frequency.

The following group of statistics applies to networks of type DL_FDDI. These statistics are maintained by device-specific drivers of that type, as shown above.

mac_errors

Frames detected in error by this MAC that had not been detected in error by another MAC.

mac_lost_errors

Frames received with format errors such that the frame was stripped.

mac_tokens

Number of tokens that were received, that is, the total of non-restricted and restricted tokens.

mac_tvx_expired

Number of times that TVX has expired.

mac_late

Number of TRT expirations since either this MAC was reset or a token was received.

mac_ring_ops

Number of times the ring has entered the “Ring Operational” state from the “Ring Not Operational” state.

GLDv2 Declarations and Data Structures

This section describes the gld_mac_info(9S) and gld_stats structures.

gld_mac_info Structure

The GLDv2 MAC information (gld_mac_info) structure is the main data interface that links the device-specific driver with GLDv2. This structure contains data required by GLDv2 and a pointer to an optional additional driver-specific information structure.

Allocate the gld_mac_info structure using gld_mac_alloc(). Deallocate the structure using gld_mac_free(). Drivers must not make any assumptions about the length of this structure, which might vary in different releases of the Solaris OS, GLDv2, or both. Structure members private to GLDv2, not documented here, should neither be set nor be read by the device-specific driver.

The gld_mac_info(9S) structure contains the following fields.

caddr_t              gldm_private;              /* Driver private data */
int                  (*gldm_reset)();           /* Reset device */
int                  (*gldm_start)();           /* Start device */
int                  (*gldm_stop)();            /* Stop device */
int                  (*gldm_set_mac_addr)();    /* Set device phys addr */
int                  (*gldm_set_multicast)();   /* Set/delete multicast addr */
int                  (*gldm_set_promiscuous)(); /* Set/reset promiscuous mode */
int                  (*gldm_send)();            /* Transmit routine */
uint_t               (*gldm_intr)();            /* Interrupt handler */
int                  (*gldm_get_stats)();       /* Get device statistics */
int                  (*gldm_ioctl)();           /* Driver-specific ioctls */
char                 *gldm_ident;               /* Driver identity string */
uint32_t             gldm_type;                 /* Device type */
uint32_t             gldm_minpkt;               /* Minimum packet size */
                                                /* accepted by driver */
uint32_t             gldm_maxpkt;               /* Maximum packet size */
                                                /* accepted by driver */
uint32_t             gldm_addrlen;              /* Physical address length */
int32_t              gldm_saplen;               /* SAP length for DL_INFO_ACK */
unsigned char        *gldm_broadcast_addr;      /* Physical broadcast addr */
unsigned char        *gldm_vendor_addr;         /* Factory MAC address */
t_uscalar_t          gldm_ppa;                  /* Physical Point of */
                                                /* Attachment (PPA) number */
dev_info_t           *gldm_devinfo;             /* Pointer to device's */
                                                /* dev_info node */
ddi_iblock_cookie_t  gldm_cookie;               /* Device's interrupt */
                                                /* block cookie */

The gldm_private structure member is visible to the device driver. gldm_private is also private to the device-specific driver. gldm_private is not used or modified by GLDv2. Conventionally, gldm_private is used as a pointer to private data, pointing to a per-instance data structure that is both defined and allocated by the driver.

The following group of structure members must be set by the driver before calling gld_register(), and should not thereafter be modified by the driver. Because gld_register() might use or cache the values of structure members, changes made by the driver after calling gld_register() might cause unpredictable results. For more information on these structures, see the gld(9E) man page.

gldm_reset

Pointer to driver entry point.

gldm_start

Pointer to driver entry point.

gldm_stop

Pointer to driver entry point.

gldm_set_mac_addr

Pointer to driver entry point.

gldm_set_multicast

Pointer to driver entry point.

gldm_set_promiscuous

Pointer to driver entry point.

gldm_send

Pointer to driver entry point.

gldm_intr

Pointer to driver entry point.

gldm_get_stats

Pointer to driver entry point.

gldm_ioctl

Pointer to driver entry point. This pointer is allowed to be null.

gldm_ident

Pointer to a string that contains a short description of the device. This pointer is used to identify the device in system messages.

gldm_type

Type of device the driver handles. GLDv2 currently supports the following values:

  • DL_ETHER (ISO 8802-3 (IEEE 802.3) and Ethernet Bus)

  • DL_TPR (IEEE 802.5 Token Passing Ring)

  • DL_FDDI (ISO 9314-2 Fibre Distributed Data Interface)

This structure member must be correctly set for GLDv2 to function properly.

gldm_minpkt

Minimum Service Data Unit size: the minimum packet size, not including the MAC header, that the device can transmit. This size is allowed to be zero if the device-specific driver handles any required padding.

gldm_maxpkt

Maximum Service Data Unit size: the maximum size of packet, not including the MAC header, that can be transmitted by the device. For Ethernet, this number is 1500.

gldm_addrlen

The length in bytes of physical addresses handled by the device. For Ethernet, Token Ring, and FDDI, the value of this structure member should be 6.

gldm_saplen

The length in bytes of the SAP address used by the driver. For GLDv2-based drivers, the length should always be set to -2. A length of -2 indicates that 2-byte SAP values are supported and that the SAP appears after the physical address in a DLSAP address. See Appendix A.2, “Message DL_INFO_ACK,” in the DLPI specification for more details.

gldm_broadcast_addr

Pointer to an array of bytes of length gldm_addrlen containing the broadcast address to be used for transmit. The driver must provide space to hold the broadcast address, fill the space with the appropriate value, and set gldm_broadcast_addr to point to the address. For Ethernet, Token Ring, and FDDI, the broadcast address is normally 0xFF-FF-FF-FF-FF-FF.

gldm_vendor_addr

Pointer to an array of bytes of length gldm_addrlen that contains the vendor-provided network physical address of the device. The driver must provide space to hold the address, fill the space with information from the device, and set gldm_vendor_addr to point to the address.

gldm_ppa

PPA number for this instance of the device. The PPA number should always be set to the instance number that is returned from ddi_get_instance(9F).

gldm_devinfo

Pointer to the dev_info node for this device.

gldm_cookie

Interrupt block cookie returned by one of the following routines:

This cookie must correspond to the device's receive-interrupt, from which gld_recv() is called.

gld_stats Structure

After calling gldm_get_stats(), a GLDv2-based driver uses the (gld_stats) structure to communicate statistics and state information to GLDv2. See the gld(9E) and gld(7D) man pages. The members of this structure, having been filled in by the GLDv2-based driver, are used when GLDv2 reports the statistics. In the tables below, the name of the statistics variable reported by GLDv2 is noted in the comments. See the gld(7D) man page for a more detailed description of the meaning of each statistic.

Drivers must not make any assumptions about the length of this structure. The structure length might vary in different releases of the Solaris OS, GLDv2, or both. Structure members private to GLDv2, which are not documented here, should not be set or be read by the device-specific driver.

The following structure members are defined for all media types:

uint64_t    glds_speed;                   /* ifspeed */
uint32_t    glds_media;                   /* media */
uint32_t    glds_intr;                    /* intr */
uint32_t    glds_norcvbuf;                /* norcvbuf */
uint32_t    glds_errrcv;                  /* ierrors */
uint32_t    glds_errxmt;                  /* oerrors */
uint32_t    glds_missed;                  /* missed */
uint32_t    glds_underflow;               /* uflo */
uint32_t    glds_overflow;                /* oflo */

The following structure members are defined for media type DL_ETHER:

uint32_t    glds_frame;                   /* align_errors */
uint32_t    glds_crc;                     /* fcs_errors */
uint32_t    glds_duplex;                  /* duplex */
uint32_t    glds_nocarrier;               /* carrier_errors */
uint32_t    glds_collisions;              /* collisions */
uint32_t    glds_excoll;                  /* ex_collisions */
uint32_t    glds_xmtlatecoll;             /* tx_late_collisions */
uint32_t    glds_defer;                   /* defer_xmts */
uint32_t    glds_dot3_first_coll;         /* first_collisions */
uint32_t    glds_dot3_multi_coll;         /* multi_collisions */
uint32_t    glds_dot3_sqe_error;          /* sqe_errors */
uint32_t    glds_dot3_mac_xmt_error;      /* macxmt_errors */
uint32_t    glds_dot3_mac_rcv_error;      /* macrcv_errors */
uint32_t    glds_dot3_frame_too_long;     /* toolong_errors */
uint32_t    glds_short;                   /* runt_errors */

The following structure members are defined for media type DL_TPR:

uint32_t    glds_dot5_line_error          /* line_errors */
uint32_t    glds_dot5_burst_error         /* burst_errors */
uint32_t    glds_dot5_signal_loss         /* signal_losses */
uint32_t    glds_dot5_ace_error           /* ace_errors */
uint32_t    glds_dot5_internal_error      /* internal_errors */
uint32_t    glds_dot5_lost_frame_error    /* lost_frame_errors */
uint32_t    glds_dot5_frame_copied_error  /* frame_copied_errors */
uint32_t    glds_dot5_token_error         /* token_errors */
uint32_t    glds_dot5_freq_error          /* freq_errors */

The following structure members are defined for media type DL_FDDI:

uint32_t    glds_fddi_mac_error;          /* mac_errors */
uint32_t    glds_fddi_mac_lost;           /* mac_lost_errors */
uint32_t    glds_fddi_mac_token;          /* mac_tokens */
uint32_t    glds_fddi_mac_tvx_expired;    /* mac_tvx_expired */
uint32_t    glds_fddi_mac_late;           /* mac_late */
uint32_t    glds_fddi_mac_ring_op;        /* mac_ring_ops */

Most of the above statistics variables are counters that denote the number of times that the particular event was observed. The following statistics do not represent the number of times:

glds_speed

Estimate of the interface's current bandwidth in bits per second. This object should contain the nominal bandwidth for those interfaces that do not vary in bandwidth or where an accurate estimate cannot be made.

glds_media

Type of media (wiring) or connector used by the hardware. The following media names are supported:

  • GLDM_AUI

  • GLDM_BNC

  • GLDM_TP

  • GLDM_10BT

  • GLDM_100BT

  • GLDM_100BTX

  • GLDM_100BT4

  • GLDM_RING4

  • GLDM_RING16

  • GLDM_FIBER

  • GLDM_PHYMII

  • GLDM_UNKNOWN

glds_duplex

Current duplex state of the interface. Supported values are GLD_DUPLEX_HALF and GLD_DUPLEX_FULL. GLD_DUPLEX_UNKNOWN is also allowed.

GLDv2 Function Arguments

The following arguments are used by the GLDv2 routines.

macinfo

Pointer to a gld_mac_info(9S) structure.

macaddr

Pointer to the beginning of a character array that contains a valid MAC address. The array is of the length specified by the driver in the gldm_addrlen element of the gld_mac_info(9S) structure.

multicastaddr

Pointer to the beginning of a character array that contains a multicast, group, or functional address. The array is of the length specified by the driver in the gldm_addrlen element of the gld_mac_info(9S) structure.

multiflag

Flag indicating whether to enable or disable reception of the multicast address. This argument is specified as GLD_MULTI_ENABLE or GLD_MULTI_DISABLE.

promiscflag

Flag indicating what type of promiscuous mode, if any, is to be enabled. This argument is specified as GLD_MAC_PROMISC_PHYS, GLD_MAC_PROMISC_MULTI, or GLD_MAC_PROMISC_NONE.

mp

gld_ioctl() uses mp as a pointer to a STREAMS message block containing the ioctl to be executed. gldm_send() uses mp as a pointer to a STREAMS message block containing the packet to be transmitted. gld_recv() uses mp as a pointer to a message block containing a received packet.

stats

Pointer to a gld_stats(9S) structure to be filled in with the current values of statistics counters.

q

Pointer to the queue(9S) structure to be used in the reply to the ioctl.

dip

Pointer to the device's dev_info structure.

name

Device interface name.

GLDv2 Entry Points

Entry points must be implemented by a device-specific network driver that has been designed to interface with GLDv2.

The gld_mac_info(9S) structure is the main structure for communication between the device-specific driver and the GLDv2 module. See the gld(7D) man page. Some elements in that structure are function pointers to the entry points that are described here. The device-specific driver must, in its attach(9E) routine, initialize these function pointers before calling gld_register().

gldm_reset() Entry Point

int prefix_reset(gld_mac_info_t *macinfo);

gldm_reset() resets the hardware to its initial state.

gldm_start() Entry Point

int prefix_start(gld_mac_info_t *macinfo);

gldm_start() enables the device to generate interrupts. gldm_start() also prepares the driver to call gld_recv() to deliver received data packets to GLDv2.

gldm_stop() Entry Point

int prefix_stop(gld_mac_info_t *macinfo);

gldm_stop() disables the device from generating any interrupts and stops the driver from calling gld_recv() for delivering data packets to GLDv2. GLDv2 depends on the gldm_stop() routine to ensure that the device will no longer interrupt. gldm_stop() must do so without fail. This function should always return GLD_SUCCESS.

gldm_set_mac_addr() Entry Point

int prefix_set_mac_addr(gld_mac_info_t *macinfo, unsigned char *macaddr);

gldm_set_mac_addr() sets the physical address that the hardware is to use for receiving data. This function enables the device to be programmed through the passed MAC address macaddr. If sufficient resources are currently not available to carry out the request, gldm_set_mac_add() should return GLD_NORESOURCES. If the requested function is not supported, gldm_set_mac_add() should return GLD_NOTSUPPORTED.

gldm_set_multicast() Entry Point

int prefix_set_multicast(gld_mac_info_t *macinfo, 
     unsigned char *multicastaddr, int multiflag);

gldm_set_multicast() enables and disables device-level reception of specific multicast addresses. If the third argument multiflag is set to GLD_MULTI_ENABLE, then gldm_set_multicast() sets the interface to receive packets with the multicast address. gldm_set_multicast() uses the multicast address that is pointed to by the second argument. If multiflag is set to GLD_MULTI_DISABLE, the driver is allowed to disable reception of the specified multicast address.

This function is called whenever GLDv2 wants to enable or disable reception of a multicast, group, or functional address. GLDv2 makes no assumptions about how the device does multicast support and calls this function to enable or disable a specific multicast address. Some devices might use a hash algorithm and a bitmask to enable collections of multicast addresses. This procedure is allowed, and GLDv2 filters out any superfluous packets. If disabling an address could result in disabling more than one address at the device level, the device driver should keep any necessary information. This approach avoids disabling an address that GLDv2 has enabled but not disabled.

gldm_set_multicast() is not called to enable a particular multicast address that is already enabled. Similarly, gldm_set_multicast() is not called to disable an address that is not currently enabled. GLDv2 keeps track of multiple requests for the same multicast address. GLDv2 only calls the driver's entry point when the first request to enable, or the last request to disable, a particular multicast address is made. If sufficient resources are currently not available to carry out the request, the function should return GLD_NORESOURCES. The function should return GLD_NOTSUPPORTED if the requested function is not supported.

gldm_set_promiscuous() Entry Point

int prefix_set_promiscuous(gld_mac_info_t *macinfo, int promiscflag);

gldm_set_promiscuous() enables and disables promiscuous mode. This function is called whenever GLDv2 wants to enable or disable the reception of all packets on the medium. The function can also be limited to multicast packets on the medium. If the second argument promiscflag is set to the value of GLD_MAC_PROMISC_PHYS, then the function enables physical-level promiscuous mode. Physical-level promiscuous mode causes the reception of all packets on the medium. If promiscflag is set to GLD_MAC_PROMISC_MULTI, then reception of all multicast packets are enabled. If promiscflag is set to GLD_MAC_PROMISC_NONE, then promiscuous mode is disabled.

In promiscuous multicast mode, drivers for devices without multicast-only promiscuous mode must set the device to physical promiscuous mode. This approach ensures that all multicast packets are received. In this case, the routine should return GLD_SUCCESS. The GLDv2 software filters out any superfluous packets. If sufficient resources are currently not available to carry out the request, the function should return GLD_NORESOURCES. The gld_set_promiscuous() function should return GLD_NOTSUPPORTED if the requested function is not supported.

For forward compatibility, gldm_set_promiscuous() routines should treat any unrecognized values for promiscflag as though these values were GLD_MAC_PROMISC_PHYS.

gldm_send() Entry Point

int prefix_send(gld_mac_info_t *macinfo, mblk_t *mp);

gldm_send() queues a packet to the device for transmission. This routine is passed a STREAMS message containing the packet to be sent. The message might include multiple message blocks. The send() routine must traverse all the message blocks in the message to access the entire packet to be sent. The driver should be prepared to handle and skip over any zero-length message continuation blocks in the chain. The driver should also check that the packet does not exceed the maximum allowable packet size. The driver must pad the packet, if necessary, to the minimum allowable packet size. If the send routine successfully transmits or queues the packet, GLD_SUCCESS should be returned.

The send routine should return GLD_NORESOURCES if the packet for transmission cannot be immediately accepted. In this case, GLDv2 retries later. If gldm_send() ever returns GLD_NORESOURCES, the driver must call gld_sched() at a later time when resources have become available. This call to gld_sched() informs GLDv2 to retry packets that the driver previously failed to queue for transmission. (If the driver's gldm_stop() routine is called, the driver is absolved from this obligation until the driver returns GLD_NORESOURCES from the gldm_send() routine. However, extra calls to gld_sched() do not cause incorrect operation.)

If the driver's send routine returns GLD_SUCCESS, then the driver is responsible for freeing the message when the message is no longer needed. If the hardware uses DMA to read the data directly, the driver must not free the message until the hardware has completely read the data. In this case, the driver can free the message in the interrupt routine. Alternatively, the driver can reclaim the buffer at the start of a future send operation. If the send routine returns anything other than GLD_SUCCESS, then the driver must not free the message. Return GLD_NOLINK if gldm_send() is called when there is no physical connection to the network or link partner.

gldm_intr() Entry Point

int prefix_intr(gld_mac_info_t *macinfo);

gldm_intr() is called when the device might have interrupted. Because interrupts can be shared with other devices, the driver must check the device status to determine whether that device actually caused the interrupt. If the device that the driver controls did not cause the interrupt, then this routine must return DDI_INTR_UNCLAIMED. Otherwise, the driver must service the interrupt and return DDI_INTR_CLAIMED. If the interrupt was caused by successful receipt of a packet, this routine should put the received packet into a STREAMS message of type M_DATA and pass that message to gld_recv().

gld_recv() passes the inbound packet upstream to the appropriate next layer of the network protocol stack. The routine must correctly set the b_rptr and b_wptr members of the STREAMS message before calling gld_recv().

The driver should avoid holding mutex or other locks during the call to gld_recv(). In particular, locks that could be taken by a transmit thread must not be held during a call to gld_recv(). In some cases, the interrupt thread that calls gld_recv() sends an outgoing packet, which results in a call to the driver's gldm_send() routine. If gldm_send() tries to acquire a mutex that is held by gldm_intr() when gld_recv() is called, a panic occurs due to recursive mutex entry. If other driver entry points attempt to acquire a mutex that the driver holds across a call to gld_recv(), deadlock can result.

The interrupt code should increment statistics counters for any errors. Errors include the failure to allocate a buffer that is needed for the received data and any hardware-specific errors, such as CRC errors or framing errors.

gldm_get_stats() Entry Point

int prefix_get_stats(gld_mac_info_t *macinfo, struct gld_stats *stats);

gldm_get_stats() gathers statistics from the hardware, driver private counters, or both, and updates the gld_stats(9S) structure pointed to by stats. This routine is called by GLDv2 for statistics requests. GLDv2 uses the gldm_get_stats() mechanism to acquire device-dependent statistics from the driver before GLDv2 composes the reply to the statistics request. See the gld_stats(9S), gld(7D), and qreply(9F) man pages for more information about defined statistics counters.

gldm_ioctl() Entry Point

int prefix_ioctl(gld_mac_info_t *macinfo, queue_t *q, mblk_t *mp);

gldm_ioctl() implements any device-specific ioctl commands. This element is allowed to be null if the driver does not implement any ioctl functions. The driver is responsible for converting the message block into an ioctl reply message and calling the qreply(9F) function before returning GLD_SUCCESS. This function should always return GLD_SUCCESS. The driver should report any errors as needed in a message to be passed to qreply(9F). If the gldm_ioctl element is specified as NULL, GLDv2 returns a message of type M_IOCNAK with an error of EINVAL.

GLDv2 Return Values

Some entry point functions in GLDv2 can return the following values, subject to the restrictions above:

GLD_BADARG

If the function detected an unsuitable argument, for example, a bad multicast address, a bad MAC address, or a bad packet

GLD_FAILURE

On hardware failure

GLD_SUCCESS

On success

GLDv2 Service Routines

This section provides the syntax and description for the GLDv2 service routines.

gld_mac_alloc() Function

gld_mac_info_t *gld_mac_alloc(dev_info_t *dip);

gld_mac_alloc() allocates a new gld_mac_info(9S) structure and returns a pointer to the structure. Some of the GLDv2-private elements of the structure might be initialized before gld_mac_alloc() returns. All other elements are initialized to zero. The device driver must initialize some structure members, as described in the gld_mac_info(9S) man page, before passing the pointer to the gld_mac_info structure to gld_register().

gld_mac_free() Function

void gld_mac_free(gld_mac_info_t *macinfo);

gld_mac_free() frees a gld_mac_info(9S) structure previously allocated by gld_mac_alloc().

gld_register() Function

int gld_register(dev_info_t *dip, char *name, gld_mac_info_t *macinfo);

gld_register() is called from the device driver's attach(9E) routine. gld_register() links the GLDv2-based device driver with the GLDv2 framework. Before calling gld_register(), the device driver's attach(9E) routine uses gld_mac_alloc() to allocate a gld_mac_info(9S) structure, and then initializes several structure elements. See gld_mac_info(9S) for more information. A successful call to gld_register() performs the following actions:

The device interface name passed to gld_register() must exactly match the name of the driver module as that name exists in the file system.

The driver's attach(9E) routine should return DDI_SUCCESS if gld_register() succeeds. If gld_register() does not return DDI_SUCCESS, the attach(9E) routine should deallocate any allocated resources before calling gld_register(), and then return DDI_FAILURE.

gld_unregister() Function

int gld_unregister(gld_mac_info_t *macinfo);

gld_unregister() is called by the device driver's detach(9E) function, and if successful, performs the following tasks:

If gld_unregister() returns DDI_SUCCESS, the detach(9E) routine should deallocate any data structures allocated in the attach(9E) routine, using gld_mac_free() to deallocate the macinfo structure, and return DDI_SUCCESS. If gld_unregister() does not return DDI_SUCCESS, the driver's detach(9E) routine must leave the device operational and return DDI_FAILURE.

gld_recv() Function

void gld_recv(gld_mac_info_t *macinfo, mblk_t *mp);

gld_recv() is called by the driver's interrupt handler to pass a received packet upstream. The driver must construct and pass a STREAMS M_DATA message containing the raw packet. gld_recv() determines which STREAMS queues should receive a copy of the packet, duplicating the packet if necessary. gld_recv() then formats a DL_UNITDATA_IND message, if required, and passes the data up all appropriate streams.

The driver should avoid holding mutex or other locks during the call to gld_recv(). In particular, locks that could be taken by a transmit thread must not be held during a call to gld_recv(). The interrupt thread that calls gld_recv() in some cases carries out processing that includes sending an outgoing packet. Transmission of the packet results in a call to the driver's gldm_send() routine. If gldm_send() tries to acquire a mutex that is held by gldm_intr() when gld_recv() is called, a panic occurs due to a recursive mutex entry. If other driver entry points attempt to acquire a mutex that the driver holds across a call to gld_recv(), deadlock can result.

gld_sched() Function

void gld_sched(gld_mac_info_t *macinfo);

gld_sched() is called by the device driver to reschedule stalled outbound packets. Whenever the driver's gldm_send() routine returns GLD_NORESOURCES, the driver must call gld_sched() to inform the GLDv2 framework to retry previously unsendable packets. gld_sched() should be called as soon as possible after resources become available so that GLDv2 resumes passing outbound packets to the driver's gldm_send() routine. (If the driver's gldm_stop() routine is called, the driver need not retry until GLD_NORESOURCES is returned from gldm_send(). However, extra calls to gld_sched() do not cause incorrect operation.)

gld_intr() Function

uint_t gld_intr(caddr_t);

gld_intr() is GLDv2's main interrupt handler. Normally, gld_intr() is specified as the interrupt routine in the device driver's call to ddi_add_intr(9F). The argument to the interrupt handler is specified as int_handler_arg in the call to ddi_add_intr(9F). This argument must be a pointer to the gld_mac_info(9S) structure. gld_intr(), when appropriate, calls the device driver's gldm_intr() function, passing that pointer to the gld_mac_info(9S) structure. However, to use a high-level interrupt, the driver must provide its own high-level interrupt handler and trigger a soft interrupt from within the handler. In this case, gld_intr() would normally be specified as the soft interrupt handler in the call to ddi_add_softintr(). gld_intr() returns a value that is appropriate for an interrupt handler.

Chapter 20 USB Drivers

This chapter describes how to write a client USB device driver using the USBA 2.0 framework for the Solaris environment. This chapter discusses the following topics:

USB in the Solaris Environment

The Solaris USB architecture includes the USBA 2.0 framework and USB client drivers.

USBA 2.0 Framework

The USBA 2.0 framework is a service layer that presents an abstract view of USB devices to USBA-compliant client drivers. The framework enables USBA-compliant client drivers to manage their USB devices. The USBA 2.0 framework supports the USB 2.0 specification except for high speed isochronous pipes. For information on the USB 2.0 specification, see http://www.usb.org/home.

The USBA 2.0 framework is platform-independent. The Solaris USB architecture is shown in the following figure. The USBA 2.0 framework is the USBA layer in the figure. This layer interfaces through a hardware-independent host controller driver interface to hardware-specific host controller drivers. The host controller drivers access the USB physical devices through the host controllers they manage.

Figure 20–1 Solaris USB Architecture

Diagram shows the flow of control from client and hub
drivers, through the USB Architecture Interfaces, to the controllers and devices.

USB Client Drivers

The USBA 2.0 framework is not a device driver itself. This chapter describes the client drivers shown in Figure 20–1 and Figure 20–2. The client drivers interact with various kinds of USB devices such as mass storage devices, printers, and human interface devices. The hub driver is a client driver that is also a nexus driver. The hub driver enumerates devices on its ports and creates devinfo nodes for those devices and then attaches the client drivers. This chapter does not describe how to write a hub driver.

USB drivers have the same structure as any other Solaris driver. USB drivers can be block drivers, character drivers, or STREAMS drivers. USB drivers follow the calling conventions and use the data structures and routines described in the Solaris OS section 9 man pages. See Intro(9E), Intro(9F), and Intro(9S).

The difference between USB drivers and other Solaris drivers is that USB drivers call USBA 2.0 framework functions to access the device instead of directly accessing the device. The USBA 2.0 framework supplements the standard Solaris DDI routines. See the following figure.

Figure 20–2 Driver and Controller Interfaces

Diagram shows DDI and USBAI functions, different versions
of the USBA framework, and different types of host controllers.

Figure 20–2 shows interfaces in more detail than Figure 20–1 does. Figure 20–2 shows that the USBA is a kernel subsystem into which a client driver can call, just as a client driver can call DDI functions.

Not all systems have all of the host controller interfaces shown in Figure 20–2. OHCI (Open Host Controller Interface) hardware is most prevalent on SPARC systems and third-party USB PCI cards. UHCI (Universal Host Controller Interface) hardware is most prevalent on x86 systems. However, both OHCI and UHCI hardware can be used on any system. When EHCI (Enhanced Host Controller Interface) hardware is present, the EHCI hardware is on the same card and shares the same ports with either OHCI or UHCI.

The host controllers, host controller drivers, and HCDI make up a transport layer that is commanded by the USBA. You cannot directly call into the OHCI, EHCI, or UHCI. You call into them indirectly through the platform-independent USBA interface.

Binding Client Drivers

This section discusses binding a driver to a device. It discusses compatible device names for devices with single interfaces and devices with multiple interfaces.

How USB Devices Appear to the System

A USB device can support multiple configurations. Only one configuration is active at any given time. The active configuration is called the current configuration.

A configuration can have more than one interface, possibly with intervening interface-associations that group two or more interfaces for a function. All interfaces of a configuration are active simultaneously. Different interfaces might be operated by different device drivers.

An interface can represent itself to the host system in different ways by using alternate settings. Only one alternate setting is active for any given interface.

Each alternate setting provides device access through endpoints. Each endpoint has a specific purpose. The host system communicates with the device by establishing a communication channel to an endpoint. This communication channel is called a pipe.

USB Devices and the Solaris Device Tree

If a USB device has one configuration, one interface, and device class zero, the device is represented as a single device node. If a USB device has multiple interfaces, the device is represented as a hierarchical device structure. In a hierarchical device structure, the device node for each interface is a child of the top-level device node. An example of a device with multiple interfaces is an audio device that presents simultaneously to the host computer both an audio control interface and an audio streaming interface. The audio control interface and the audio streaming interface each could be controlled by its own driver.

Compatible Device Names

The Solaris software builds an ordered list of compatible device names for USB binding based on identification information kept within each device. This information includes device class, subclass, vendor ID, product ID, revision, and protocol. See http://www.usb.org/home for a list of USB classes and subclasses.

This name hierarchy enables binding to a general driver if a more device-specific driver is not available. An example of a general driver is a class-specific driver. Device names that begin with usbif designate single interface devices. See Example 20–1 for examples. The USBA 2.0 framework defines all compatible names for a device. Use the prtconf command to display these device names, as shown in Example 20–2.

The following example shows an example of compatible device names for a USB mouse device. This mouse device represents a combined node entirely operated by a single driver. The USBA 2.0 framework gives this device node the names shown in the example, in the order shown.


Example 20–1 USB Mouse Compatible Device Names

  1. 'usb430,100.102'      Vendor 430, product 100, revision 102
  2. 'usb430,100'          Vendor 430, product 100
  3. 'usbif430,class3.1.2' Vendor 430, class 3, subclass 1, protocol 2
  4. 'usbif430,class3.1'   Vendor 430, class 3, subclass 1
  5. 'usbif430,class3'     Vendor 430, class 3
  6. 'usbif,class3.1.2'    Class 3, subclass 1, protocol 2
  7. 'usbif,class3.1'      Class 3, subclass 1
  8. 'usbif,class3'        Class 3

Note that the names in the above example progress from the most specific to the most general. Entry 1 binds only to a particular revision of a specific product from a particular vendor. Entries 3, 4, and 5 are for class 3 devices manufactured by vendor 430. Entries 6, 7, and 8 are for class 3 devices from any vendor. The binding process looks for a match on the name from the top name down. To bind, drivers must be added to the system with an alias that matches one of these names. To get a list of compatible device names to which to bind when you add your driver, check the compatible property of the device in the output from the prtconf -vp command.

The following example shows compatible property lists for a keyboard and a mouse. Use the prtconf -D command to display the bound driver.


Example 20–2 Compatible Device Names Shown by the Print Configuration Command


# prtconf -vD | grep compatible
            compatible: 'usb430,5.200' + 'usb430,5' + 'usbif430,class3.1.1'
+ 'usbif430,class3.1' + 'usbif430,class3' + 'usbif,class3.1.1' +
'usbif,class3.1' + 'usbif,class3'
            compatible: 'usb2222,2071.200' + 'usb2222,2071' +
'usbif2222,class3.1.2' + 'usbif2222,class3.1' + 'usbif2222,class3' +
'usbif,class3.1.2' + 'usbif,class3.1' + 'usbif,class3'

Use the most specific name you can to more accurately identify a driver for a device or group of devices. To bind drivers written for a specific revision of a specific product, use the most specific name match possible. For example, if you have a USB mouse driver written by vendor 430 for revision 102 of their product 100, use the following command to add that driver to the system:

add_drv -n -i '"usb430,100.102"' specific_mouse_driver

To add a driver written for any USB mouse (class 3, subclass 1, protocol 2) from vendor 430, use the following command:

add_drv -n -i '"usbif430,class3.1.2"' more_generic_mouse_driver

If you install both of these drivers and then connect a compatible device, the system binds the correct driver to the connected device. For example, if you install both of these drivers and then connect a vendor 430, model 100, revision 102 device, this device is bound to specific_mouse_driver. If you connect a vendor 430, model 98 device, this device is bound to more_generic_mouse_driver. If you connect a mouse from another vendor, this device also is bound to more_generic_mouse_driver. If multiple drivers are available for a specific device, the driver binding framework selects the driver with the first matching compatible name in the compatible names list.

Devices With Multiple Interfaces

Composite devices are devices that support multiple interfaces. Composite devices have a list of compatible names for each interface. This compatible names list ensures that the best available driver is bound to the interface. The most general multiple interface entry is usb,device.

For a USB audio composite device, the compatible names are as follows:

1. 'usb471,101.100'     Vendor 471, product 101, revision 100
2. 'usb471,101'         Vendor 471, product 101
3. 'usb,device'         Generic USB device

The name usb,device is a compatible name that represents any whole USB device. The usb_mid(7D) driver (USB multiple-interface driver) binds to the usb,device device node if no other driver has claimed the whole device. The usb_mid driver creates a child device node for each interface of the physical device. The usb_mid driver also generates a set of compatible names for each interface. Each of these generated compatible names begins with usbif. The system then uses these generated compatible names to find the best driver for each interface. In this way, different interfaces of one physical device can be bound to different drivers.

For example, the usb_mid driver binds to a multiple-interface audio device through the usb,device node name of that audio device. The usb_mid driver then creates interface-specific device nodes. Each of these interface-specific device nodes has its own compatible name list. For an audio control interface node, the compatible name list might look like the list shown in the following example.


Example 20–3 USB Audio Compatible Device Names

1. 'usbif471,101.100.config1.0' Vend 471, prod 101, rev 100, cnfg 1, iface 0
2. 'usbif471,101.config1.0'     Vend 471, product 101, config 1, interface 0
3. 'usbif471,class1.1.0'        Vend 471, class 1, subclass 1, protocol 0
4. 'usbif471,class1.1'          Vend 471, class 1, subclass 1
5. 'usbif471,class1'            Vend 471, class 1
6. 'usbif,class1.1.0'           Class 1, subclass 1, protocol 0
7. 'usbif,class1.1'             Class 1, subclass 1
8. 'usbif,class1'               Class 1

Use the following command to bind a vendor-specific, device-specific client driver named vendor_model_audio_usb to the vendor-specific, device-specific configuration 1, interface 0 interface compatible name shown in Example 20–3.

add_drv -n -i '"usbif471,101.config1.0"' vendor_model_audio_usb

Use the following command to bind a class driver named audio_class_usb_if_driver to the more general class 1, subclass 1 interface compatible name shown in Example 20–3:

add_drv -n -i '"usbif,class1.1"' audio_class_usb_if_driver

Use the prtconf -D command to show a list of devices and their drivers. In the following example, the prtconf -D command shows that the usb_mid driver manages the audio device. The usb_mid driver is splitting the audio device into interfaces. Each interface is indented under the audio device name. For each interface shown in the indented list, the prtconf -D command shows which driver manages the interface.

audio, instance #0 (driver name: usb_mid)
    sound-control, instance #2 (driver name: usb_ac)
    sound, instance #2 (driver name: usb_as)
    input, instance #8 (driver name: hid)

Checking Device Driver Bindings

The file /etc/driver_aliases contains entries for the bindings that already exist on a system. Each line of the /etc/driver_aliases file shows a driver name, followed by a space, followed by a device name. Use this file to check existing device driver bindings.


Note –

Do not edit the /etc/driver_aliases file manually. Use the add_drv(1M) command to establish a binding. Use the update_drv(1M) command to change a binding.


Basic Device Access

This section describes how to access a USB device and how to register a client driver. This section also discusses the descriptor tree.

Before the Client Driver Is Attached

The following events take place before the client driver is attached:

  1. The PROM (OBP/BIOS) and USBA framework gain access to the device before any client driver is attached.

  2. The hub driver probes devices on each of its hub's ports for identity and configuration.

  3. The default control pipe to each device is opened, and each device is probed for its device descriptor.

  4. Compatible names properties are constructed for each device, using the device and interface descriptors.

The compatible names properties define different parts of the device that can be individually bound to client drivers. Client drivers can bind either to the entire device or to just one interface. See Binding Client Drivers.

The Descriptor Tree

Parsing descriptors involves aligning structure members at natural boundaries and converting the structure members to the endianness of the host CPU. Parsed standard USB configuration descriptors, interface descriptors, and endpoint descriptors are available to the client driver in the form of a hierarchical tree for each configuration. Any raw class-specific or vendor-specific descriptor information also is available to the client driver in the same hierarchical tree.

Call the usb_get_dev_data(9F) function to retrieve the hierarchical descriptor tree. The “SEE ALSO” section of the usb_get_dev_data(9F) man page lists the man pages for each standard USB descriptor. Use the usb_parse_data(9F) function to parse raw descriptor information.

A descriptor tree for a device with two configurations might look like the tree shown in the following figure.

Figure 20–3 A Hierarchical USB Descriptor Tree

Diagram shows a tree of pairs of descriptors for each
interface of a device with two configurations.

The dev_cfg array shown in the above figure contains nodes that correspond to configurations. Each node contains the following information:

The node that represents the second interface of the second indexed configuration is at dev_cfg[1].cfg_if[1] in the diagram. That node contains an array of nodes that represent the alternate settings for that interface. The hierarchy of USB descriptors propagates through the tree. ASCII strings from string descriptor data are attached where the USB specification says these strings exist.

The array of configurations is non-sparse and is indexed by the configuration index. The first valid configuration (configuration 1) is dev_cfg[0]. Interfaces and alternate settings have indices that align with their numbers. Endpoints of each alternate setting are indexed consecutively. The first endpoint of each alternate setting is at index 0.

This numbering scheme makes the tree easy to traverse. For example, the raw descriptor data of endpoint index 0, alternate 0, interface 1, configuration index 1 is at the node defined by the following path:

dev_cfg[1].cfg_if[1].if_alt[0].altif_ep[0].ep_descr

An alternative to using the descriptor tree directly is using the usb_lookup_ep_data(9F) function. The usb_lookup_ep_data(9F) function takes as arguments the interface, alternate, which endpoint, endpoint type, and direction. You can use the usb_lookup_ep_data(9F) function to traverse the descriptor tree to get a particular endpoint. See the usb_get_dev_data(9F) man page for more information.

Registering Drivers to Gain Device Access

Two of the first calls into the USBA 2.0 framework by a client driver are calls to the usb_client_attach(9F) function and the usb_get_dev_data(9F) function. These two calls come from the client driver's attach(9E) entry point. You must call the usb_client_attach(9F) function before you call the usb_get_dev_data(9F) function.

The usb_client_attach(9F) function registers a client driver with the USBA 2.0 framework. The usb_client_attach(9F) function enforces versioning. All client driver source files must start with the following lines:


#define USBDRV_MAJOR_VER        2
#define USBDRV_MINOR_VER        minor-version
#include <sys/usb/usba.h>

The value of minor-version must be less than or equal to USBA_MINOR_VER. The symbol USBA_MINOR_VER is defined in the <sys/usb/usbai.h> header file. The <sys/usb/usbai.h> header file is included by the <sys/usb/usba.h> header file.

USBDRV_VERSION is a macro that generates the version number from USBDRV_MAJOR_VERSION and USBDRV_MINOR_VERSION. The second argument to usb_client_attach() must be USBDRV_VERSION. The usb_client_attach() function fails if the second argument is not USBDRV_VERSION or if USBDRV_VERSION reflects an invalid version. This restriction ensures programming interface compatibility.

The usb_get_dev_data() function returns information that is required for proper USB device management. For example, the usb_get_dev_data() function returns the following information:

The call to the usb_get_dev_data() function is mandatory. Calling usb_get_dev_data() is the only way to retrieve the default control pipe and retrieve the iblock_cookie required for mutex initialization.

After calling usb_get_dev_data(), the client driver's attach(9E) routine typically copies the desired descriptors and data from the descriptor tree to the driver's soft state. Endpoint descriptors copied to the soft state are used later to open pipes to those endpoints. The attach(9E) routine usually calls usb_free_descr_tree(9F) to free the descriptor tree after copying descriptors. Alternatively, you might choose to keep the descriptor tree and not copy the descriptors.

Specify one of the following three parse levels to the usb_get_dev_data(9F) function to request the breadth of the descriptor tree you want returned. You need greater tree breadth if your driver needs to bind to more of the device.

The client driver's detach(9E) routine must call the usb_free_dev_data(9F) function to release all resources allocated by theusb_get_dev_data() function. The usb_free_dev_data() function accepts handles where the descriptor tree has already been freed with the usb_free_descr_tree() function. The client driver's detach() routine also must call the usb_client_detach(9F) function to release all resources allocated by the usb_client_attach(9F) function.

Device Communication

USB devices operate by passing requests through communication channels called pipes. Pipes must be open before you can submit requests. Pipes also can be flushed, queried, and closed. This section discusses pipes, data transfers and callbacks, and data requests.

USB Endpoints

The four kinds of pipes that communicate with the four kinds of USB endpoints are:

See Chapter 5 of the USB 2.0 specification or see Requests for more information on the transfer types that correspond to these endpoints.

The Default Pipe

Each USB device has a special control endpoint called the default endpoint. Its communication channel is called the default pipe. Most, if not all, device setup is done through this pipe. Many USB devices have this pipe as their only control pipe.

The usb_get_dev_data(9F) function provides the default control pipe to the client driver. This pipe is pre-opened to accommodate any special setup needed before opening other pipes. This default control pipe is special in the following ways:

Other pipes, including other control pipes, must be opened explicitly and are exclusive-open only.

Pipe States

Pipes are in one of the following states:

Call the usb_pipe_get_state(9F) function to retrieve the state of a pipe.

Opening Pipes

To open a pipe, pass to the usb_pipe_open(9F) function the endpoint descriptor that corresponds to the pipe you want to open. Use the usb_get_dev_data(9F) and usb_lookup_ep_data(9F) functions to retrieve the endpoint descriptor from the descriptor tree. The usb_pipe_open(9F) function returns a handle to the pipe.

You must specify a pipe policy when you open a pipe. The pipe policy contains an estimate of the number of concurrent asynchronous operations that require separate threads that will be needed for this pipe. An estimate of the number of threads is the number of parallel operations that could occur during a callback. The value of this estimate must be at least 2. See the usb_pipe_open(9F) man page for more information on pipe policy.

Closing Pipes

The driver must use the usb_pipe_close(9F) function to close pipes other than the default pipe. The usb_pipe_close(9F) function enables all remaining requests in the pipe to complete. The function then allows one second for all callbacks of those requests to complete.

Data Transfer

For all pipe types, the programming model is as follows:

  1. Allocate a request.

  2. Submit the request using one of the pipe transfer functions. See the usb_pipe_bulk_xfer(9F), usb_pipe_ctrl_xfer(9F), usb_pipe_intr_xfer(9F), and usb_pipe_isoc_xfer(9F) man pages.

  3. Wait for completion notification.

  4. Free the request.

See Requests for more information on requests. The following sections describe the features of different request types.

Synchronous and Asynchronous Transfers and Callbacks

Transfers are either synchronous or asynchronous. Synchronous transfers block until they complete. Asynchronous transfers callback into the client driver when they complete. Most transfer functions called with the USB_FLAGS_SLEEP flag set in the flags argument are synchronous.

Continuous transfers such as polling and isochronous transfers cannot be synchronous. Calls to transfer functions for continuous transfers made with the USB_FLAGS_SLEEP flag set block only to wait for resources before the transfer begins.

Synchronous transfers are the most simple transfers to set up because synchronous transfers do not require any callback functions. Synchronous transfer functions return a transfer start status, even though synchronous transfer functions block until the transfer is completed. Upon completion, you can find additional information about the transfer status in the completion reason field and callback flags field of the request. Completion reasons and callback flags fields are discussed below.

If the USB_FLAGS_SLEEP flag is not specified in the flags argument, that transfer operation is asynchronous. The exception to this rule are isochronous transfers. Asynchronous transfer operations set up and start the transfer, and then return before the transfer is complete. Asynchronous transfer operations return a transfer start status. The client driver receives transfer completion status through callback handlers.

Callback handlers are functions that are called when asynchronous transfers complete. Do not set up an asynchronous transfer without callbacks. The two types of callback handlers are normal completion handlers and exception handlers. You can specify one handler to be called in both of these cases.

Both completion handlers and exception handlers receive the transfer's request as an argument. Exception handlers use the completion reason and callback status in the request to find out what happened. The completion reason (usb_cr_t) indicates how the original transaction completed. For example, a completion reason of USB_CR_TIMEOUT indicates that the transfer timed out. As another example, if a USB device is removed while in use, client drivers might receive USB_CR_DEV_NOT_RESP as the completion reason on their outstanding requests. The callback status (usb_cb_flags_t) indicates what the USBA framework did to remedy the situation. For example, a callback status of USB_CB_STALL_CLEARED indicates that the USBA framework cleared a functional stall condition. See the usb_completion_reason(9S) man page for more information on completion reasons. See the usb_callback_flags(9S) man page for more information on callback status flags.

The context of the callback and the policy of the pipe on which the requests are run limit what you can do in the callback.

Requests

This section discusses request structures and allocating and deallocating different types of requests.

Request Allocation and Deallocation

Requests are implemented as initialized request structures. Each different endpoint type takes a different type of request. Each type of request has a different request structure type. The following table shows the structure type for each type of request. This table also lists the functions to use to allocate and free each type of structure.

Table 20–1 Request Initialization

Pipe or Endpoint Type 

Request Structure 

Request Structure Allocation Function 

Request Structure Free Function 

Control 

usb_ctrl_req_t (see the usb_ctrl_request(9S) man page)

usb_alloc_ctrl_req(9F)

usb_free_ctrl_req(9F)

Bulk 

usb_bulk_req_t (see the usb_bulk_request(9S) man page)

usb_alloc_bulk_req(9F)

usb_free_bulk_req(9F)

Interrupt 

usb_intr_req_t (see the usb_intr_request(9S) man page)

usb_alloc_intr_req(9F)

usb_free_intr_req(9F)

Isochronous 

usb_isoc_req_t (see the usb_isoc_request(9S) man page)

usb_alloc_isoc_req(9F)

usb_free_isoc_req(9F)

The following table lists the transfer functions that you can use for each type of request.

Table 20–2 Request Transfer Setup

Pipe or Endpoint Type 

Transfer Functions 

Control 

usb_pipe_ctrl_xfer(9F), usb_pipe_ctrl_xfer_wait(9F)

Bulk 

usb_pipe_bulk_xfer(9F)

Interrupt 

usb_pipe_intr_xfer(9F), usb_pipe_stop_intr_polling(9F)

Isochronous 

usb_pipe_isoc_xfer(9F), usb_pipe_stop_isoc_polling(9F)

Use the following procedure to allocate and deallocate a request:

  1. Use the appropriate allocation function to allocate a request structure for the type of request you need. The man pages for the request structure allocation functions are listed in Table 20–1.

  2. Initialize any fields you need in the structure. See Request Features and Fields or the appropriate request structure man page for more information. The man pages for the request structures are listed in Table 20–1.

  3. When the data transfer is complete, use the appropriate free function to free the request structure. The man pages for the request structure free functions are listed in Table 20–1.

Request Features and Fields

Data for all requests is passed in message blocks so that the data is handled uniformly whether the driver is a STREAMS, character, or block driver. The message block type, mblk_t, is described in the mblk(9S) man page. The DDI offers several routines for manipulating message blocks. Examples include allocb(9F) and freemsg(9F). To learn about other routines for manipulating message blocks, see the “SEE ALSO” sections of the allocb(9F) and freemsg(9F) man pages. Also see the STREAMS Programming Guide.

The following request fields are included in all transfer types. In each field name, the possible values for xxxx are: ctrl, bulk, intr, or isoc.

xxxx_client_private

This field value is a pointer that is intended for internal data to be passed around the client driver along with the request. This pointer is not used to transfer data to the device. 

xxxx_attributes

This field value is a set of transfer attributes. While this field is common to all request structures, the initialization of this field is somewhat different for each transfer type. See the appropriate request structure man page for more information. These man pages are listed in Table 20–1. See also the usb_request_attributes(9S) man page.

xxxx_cb

This field value is a callback function for normal transfer completion. This function is called when an asynchronous transfer completes without error. 

xxxx_exc_cb

This field value is a callback function for error handling. This function is called only when asynchronous transfers complete with errors. 

xxxx_completion_reason

This field holds the completion status of the transfer itself. If an error occurred, this field shows what went wrong. See the usb_completion_reason(9S) man page for more information. This field is updated by the USBA 2.0 framework.

xxxx_cb_flags

This field lists the recovery actions that were taken by the USBA 2.0 framework before calling the callback handler. The USB_CB_INTR_CONTEXT flag indicates whether a callback is running in interrupt context. See the usb_callback_flags(9S) man page for more information. This field is updated by the USBA 2.0 framework.

The following sections describe the request fields that are different for the four different transfer types. These sections describe how to initialize these structure fields. These sections also describe the restrictions on various combinations of attributes and parameters.

Control Requests

Use control requests to initiate message transfers down a control pipe. You can set up transfers manually, as described below. You can also set up and send synchronous transfers using the usb_pipe_ctrl_xfer_wait(9F) wrapper function.

The client driver must initialize the ctrl_bmRequestType, ctrl_bRequest, ctrl_wValue, ctrl_wIndex, and ctrl_wLength fields as described in the USB 2.0 specification.

The ctrl_data field of the request must be initialized to point to a data buffer. The usb_alloc_ctrl_req(9F) function initializes this field when you pass a positive value as the buffer len. The buffer must, of course, be initialized for any outbound transfers. In all cases, the client driver must free the request when the transfer is complete.

Multiple control requests can be queued. Queued requests can be a combination of synchronous and asynchronous requests.

The ctrl_timeout field defines the maximum wait time for the request to be processed, excluding wait time on the queue. This field applies to both synchronous and asynchronous requests. The ctrl_timeout field is specified in seconds.

The ctrl_exc_cb field accepts the address of a function to call if an exception occurs. The arguments of this exception handler are specified in the usb_ctrl_request(9S) man page. The second argument of the exception handler is the usb_ctrl_req_t structure. Passing the request structure as an argument allows the exception handler to check the ctrl_completion_reason and ctrl_cb_flags fields of the request to determine the best recovery action.

The USB_ATTRS_ONE_XFER and USB_ATTRS_ISOC_* flags are invalid attributes for all control requests. The USB_ATTRS_SHORT_XFER_OK flag is valid only for host-bound requests.

Bulk Requests

Use bulk requests to send data that is not time-critical. Bulk requests can take several USB frames to complete, depending on overall bus load.

All requests must receive an initialized message block. See the mblk(9S) man page for a description of the mblk_t message block type. This message block either supplies the data or stores the data, depending on the transfer direction. Refer to the usb_bulk_request(9S) man page for more details.

The USB_ATTRS_ONE_XFER and USB_ATTRS_ISOC_* flags are invalid attributes for all bulk requests. The USB_ATTRS_SHORT_XFER_OK flag is valid only for host-bound requests.

The usb_pipe_get_max_bulk_transfer_size(9F) function specifies the maximum number of bytes per request. The value retrieved can be the maximum value used in the client driver's minphys(9F) routine.

Multiple bulk requests can be queued.

Interrupt Requests

Interrupt requests typically are for periodic inbound data. Interrupt requests periodically poll the device for data. However, the USBA 2.0 framework supports one-time inbound interrupt data requests, as well as outbound interrupt data requests. All interrupt requests can take advantage of the USB interrupt transfer features of timeliness and retry.

The USB_ATTRS_ISOC_* flags are invalid attributes for all interrupt requests. The USB_ATTRS_SHORT_XFER_OK and USB_ATTRS_ONE_XFER flags are valid only for host-bound requests.

Only one-time polls can be done as synchronous interrupt transfers. Specifying the USB_ATTRS_ONE_XFER attribute in the request results in a one-time poll.

Periodic polling is started as an asynchronous interrupt transfer. An original interrupt request is passed to usb_pipe_intr_xfer(9F). When polling finds new data to return, a new usb_intr_req_t structure is cloned from the original and is populated with an initialized data block. When allocating the request, specify zero for the len argument to the usb_alloc_intr_req(9F) function. The len argument is zero because the USBA 2.0 framework allocates and fills in a new request with each callback. After you allocate the request structure, fill in the intr_len field to specify the number of bytes you want the framework to allocate with each poll. Data beyond intr_len bytes is not returned.

The client driver must free each request it receives. If the message block is sent upstream, decouple the message block from the request before you send the message block upstream. To decouple the message block from the request, set the data pointer of the request to NULL. Setting the data pointer of the request to NULL prevents the message block from being freed when the request is deallocated.

Call the usb_pipe_stop_intr_polling(9F) function to cancel periodic polling. When polling is stopped or the pipe is closed, the original request structure is returned through an exception callback. This returned request structure has its completion reason set to USB_CR_STOPPED_POLLING.

Do not start polling while polling is already in progress. Do not start polling while a call to usb_pipe_stop_intr_polling(9F) is in progress.

Isochronous Requests

Isochronous requests are for streaming, constant-rate, time-relevant data. Retries are not made on errors. Isochronous requests have the following request-specific fields:

isoc_frame_no

Specify this field when the overall transfer must start from a specific frame number. The value of this field must be greater than the current frame number. Use usb_get_current_frame_number(9F) to find the current frame number. Note that the current frame number is a moving target. For low-speed and full-speed buses, the current frame is new each millisecond. For high-speed buses, the current frame is new each 0.125 millisecond. Set the USB_ATTR_ISOC_START_FRAME attribute so that the isoc_frame_no field is recognized.

To ignore this frame number field and start as soon as possible, set the USB_ATTR_ISOC_XFER_ASAP flag.

isoc_pkts_count

This field is the number of packets in the request. This value is bounded by the value returned by the usb_get_max_pkts_per_isoc_request(9F) function and by the size of the isoc_pkt_descr array (see below). The number of bytes transferable with this request is equal to the product of this isoc_pkts_count value and the wMaxPacketSize value of the endpoint.

isoc_pkts_length

This field is the sum of the lengths of all packets of the request. This value is set by the initiator. This value should be set to zero so that the sum of isoc_pkts_length in the isoc_pkt_descr list will be used automatically and no check will be applied to this element.

isoc_error_count

This field is the number of packets that completed with errors. This value is set by the USBA 2.0 framework. 

isoc_pkt_descr

This field points to an array of packet descriptors that define how much data to transfer per packet. For an outgoing request, this value defines a private queue of sub-requests to process. For an incoming request, this value describes how the data arrived in pieces. The client driver allocates these descriptors for outgoing requests. The framework allocates and initializes these descriptors for incoming requests. Descriptors in this array contain framework-initialized fields that hold the number of bytes actually transferred and the status of the transfer. See the usb_isoc_request(9S) man page for more details.

All requests must receive an initialized message block. This message block either supplies the data or stores the data. See the mblk(9S) man page for a description of the mblk_t message block type.

The USB_ATTR_ONE_XFER flag is an illegal attribute because the system decides how to vary the amounts of data through available packets. The USB_ATTR_SHORT_XFER_OK flag is valid only on host-bound data.

The usb_pipe_isoc_xfer(9F) function makes all isochronous transfers asynchronous, regardless of whether the USB_FLAGS_SLEEP flag is set. All isochronous input requests start polling.

Call the usb_pipe_stop_isoc_polling(9F) function to cancel periodic polling. When polling is stopped or the pipe is closed, the original request structure is returned through an exception callback. This returned request structure has its completion reason set to USB_CR_STOPPED_POLLING.

Polling continues until one of the following events occurs:

Flushing Pipes

You might need to clean up a pipe after errors, or you might want to wait for a pipe to clear. Use one of the following methods to flush or clear pipes:

Device State Management

Managing a USB device includes accounting for hotplugging, system power management (checkpoint and resume), and device power management. All client drivers should implement the basic state machine shown in the following figure. For more information, see /usr/include/sys/usb/usbai.h.

Figure 20–4 USB Device State Machine

Diagram shows what state the device goes to after each
of seven different events.

This state machine and its four states can be augmented with driver-specific states. Device states 0x80 to 0xff can be defined and used only by client drivers.

Hotplugging USB Devices

USB devices support hotplugging. A USB device can be inserted or removed at any time. The client driver must handle removal and reinsertion of an open device. Use hotplug callbacks to handle open devices. Insertion and removal of closed devices is handled by the attach(9E) and detach(9E) entry points.

Hotplug Callbacks

The USBA 2.0 framework supports the following event notifications:

Client drivers must call usb_register_hotplug_cbs(9F) in their attach(9E) routine to register for event callbacks. Drivers must call usb_unregister_hotplug_cbs(9F) in their detach(9E) routine before dismantling.

Hot Insertion

The sequence of events for hot insertion of a USB device is as follows:

  1. The hub driver, hubd(7D), waits for a port connect status change.

  2. The hubd driver detects a port connect.

  3. The hubd driver enumerates the device, creates child device nodes, and attaches client drivers. Refer to Binding Client Drivers for compatible names definitions.

  4. The client driver manages the device. The driver is in the ONLINE state.

Hot Removal

The sequence of events for hot removal of a USB device is as follows:

  1. The hub driver, hubd(7D), waits for a port connect status change.

  2. The hubd driver detects a port disconnect.

  3. The hubd driver sends a disconnect event to the child client driver. If the child client driver is the hubd driver or the usb_mid(7D) multi-interface driver, then the child client driver propagates the event to its children.

  4. The client driver receives the disconnect event notification in kernel thread context. Kernel thread context enables the driver's disconnect handler to block.

  5. The client driver moves to the DISCONNECTED state. Outstanding I/O transfers fail with the completion reason of device not responding. All new I/O transfers and attempts to open the device node also fail. The client driver is not required to close pipes. The driver is required to save the device and driver context that needs to be restored if the device is reconnected.

  6. The hubd driver attempts to offline the OS device node and its children in bottom-up order.

The following events take place if the device node is not open when the hubd driver attempts to offline the device node:

  1. The client driver's detach(9E) entry point is called.

  2. The device node is destroyed.

  3. The port becomes available for a new device.

  4. The hotplug sequence of events starts over. The hubd driver waits for a port connect status change.

The following events take place if the device node is open when the hubd driver attempts to offline the device node:

  1. The hubd driver puts the offline request in the periodic offline retry queue.

  2. The port remains unavailable for a new device.

If the device node was open when the hubd driver attempted to offline the device node and the user later closes the device node, the hubd driver periodic offlining of that device node succeeds and the following events take place:

  1. The client driver's detach(9E) entry point is called.

  2. The device node is destroyed.

  3. The port becomes available for a new device.

  4. The hotplug sequence of events starts over. The hubd driver waits for a port connect status change.

If the user closes all applications that use the device, the port becomes available again. If the application does not terminate or does not close the device, the port remains unavailable.

Hot Reinsertion

The following events take place if a previously-removed device is reinserted into the same port while the device node of the device is still open:

  1. The hub driver, hubd(7D), detects a port connect.

  2. The hubd driver restores the bus address and the device configuration.

  3. The hubd driver cancels the offline retry request.

  4. The hubd driver sends a connect event to the client driver.

  5. The client driver receives the connect event.

  6. The client driver determines whether the new device is the same as the device that was previously connected. The client driver makes this determination first by comparing device descriptors. The client driver might also compare serial numbers and configuration descriptor clouds.

The following events might take place if the client driver determines that the current device is not the same as the device that was previously connected:

  1. The client driver might issue a warning message to the console.

  2. The user might remove the device again. If the user removes the device again, the hot remove sequence of events starts over. The hubd driver detects a port disconnect. If the user does not remove the device again, the following events take place:

    1. The client driver remains in the DISCONNECTED state, failing all requests and opens.

    2. The port remains unavailable. The user must close and disconnect the device to free the port.

    3. The hotplug sequence of events starts over when the port is freed. The hubd driver waits for a port connect status change.

The following events might take place if the client driver determines that the current device is the same as the device that was previously connected:

  1. The client driver might restore its state and continue normal operation. This policy is up to the client driver. Audio speakers are a good example where the client driver should continue.

  2. If it is safe to continue using the reconnected device, the hotplug sequence of events starts over. The hubd driver waits for a port connect status change. The device is in service once again.

Power Management

This section discusses device power management and system power management.

Device power management manages individual USB devices depending on their I/O activity or idleness.

System power management uses checkpoint and resume to checkpoint the state of the system into a file and shut down the system completely. (Checkpoint is sometimes called “system suspend.”) The system is resumed to its pre-suspend state when the system is powered up again.

Device Power Management

The following summary lists what your driver needs to do to power manage a USB device. A more detailed description of power management follows this summary.

  1. Create power management components during attach(9E). See the usb_create_pm_components(9F) man page.

  2. Implement the power(9E) entry point.

  3. Call pm_busy_component(9F) and pm_raise_power(9F) before accessing the device.

  4. Call pm_idle_component(9F) when finished accessing the device.

The USBA 2.0 framework supports four power levels as specified by the USB interface power management specification. See /usr/include/sys/usb/usbai.h for information on mapping USB power levels to operating system power levels.

The hubd driver suspends the port when the device goes to the USB_DEV_OS_PWR_OFF state. The hubd driver resumes the port when the device goes to the USB_DEV_OS_PWR_1 state and above. Note that port suspend is different from system suspend. In port suspend, only the USB port is shut off. System suspend is defined in System Power Management.

The client driver might choose to enable remote wakeup on the device. See the usb_handle_remote_wakeup(9F) man page. When the hubd driver sees a remote wakeup on a port, the hubd driver completes the wakeup operation and calls pm_raise_power(9F) to notify the child.

The following figure shows the relationship between the different pieces of power management.

Figure 20–5 USB Power Management

Diagram shows when to employ two different power management
schemes.

The driver can implement one of the two power management schemes described at the bottom of Figure 20–5. The passive scheme is simpler than the active scheme because the passive scheme does not do power management during device transfers.

Active Power Management

This section describes the functions you need to use to implement the active power management scheme.

Do the following work in the attach(9E) entry point for your driver:

  1. Call usb_create_pm_components(9F).

  2. Optionally call usb_handle_remote_wakeup(9F) with USB_REMOTE_WAKEUP_ENABLE as the second argument to enable a remote wakeup on the device.

  3. Call pm_busy_component(9F).

  4. Call pm_raise_power(9F) to take power to the USB_DEV_OS_FULL_PWR level.

  5. Communicate with the device to initialize the device.

  6. Call pm_idle_component(9F).

Do the following work in the detach(9E) entry point for your driver:

  1. Call pm_busy_component(9F).

  2. Call pm_raise_power(9F) to take power to the USB_DEV_OS_FULL_PWR level.

  3. If you called the usb_handle_remote_wakeup(9F) function in your attach(9E) entry point, call usb_handle_remote_wakeup(9F) here with USB_REMOTE_WAKEUP_DISABLE as the second argument.

  4. Communicate with the device to cleanly shut down the device.

  5. Call pm_lower_power(9F) to take power to the USB_DEV_OS_PWR_OFF level.

    This is the only time a client driver calls pm_lower_power(9F).

  6. Call pm_idle_component(9F).

When a driver thread wants to start I/O to the device, that thread does the following tasks:

  1. Call pm_busy_component(9F).

  2. Call pm_raise_power(9F) to take power to the USB_DEV_OS_FULL_PWR level.

  3. Begin the I/O transfer.

The driver calls pm_idle_component(9F) when the driver receives notice that an I/O transfer has completed.

In the power(9E) entry point for your driver, check whether the power level to which you are transitioning is valid. You might also need to account for different threads calling into power(9E) at the same time.

The power(9E) routine might be called to take the device to the USB_DEV_OS_PWR_OFF state if the device has been idle for some time or the system is shutting down. This state corresponds to the PWRED_DWN state shown in Figure 20–4. If the device is going to the USB_DEV_OS_PWR_OFF state, do the following work in your power(9E) routine:

  1. Put all open pipes into the idle state. For example, stop polling on the interrupt pipe.

  2. Save any device or driver context that needs to be saved.

    The port to which the device is connected is suspended after the call to power(9E) completes.

The power(9E) routine might be called to power on the device when either a device-initiated remote wakeup or a system-initiated wakeup is received. Wakeup notices occur after the device has been powered down due to extended idle time or system suspend. If the device is going to the USB_DEV_OS_PWR_1 state or above, do the following work in your power(9E) routine:

  1. Restore any needed device and driver context.

  2. Restart activity on the pipe that is appropriate to the specified power level. For example, start polling on the interrupt pipe.

If the port to which the device is connected was previously suspended, that port is resumed before power(9E) is called.

Passive Power Management

The passive power management scheme is simpler than the active power management scheme described above. In this passive scheme, no power management is done during transfers. To implement this passive scheme, call pm_busy_component(9F) and pm_raise_power(9F) when you open the device. Then call pm_idle_component(9F) when you close the device.

System Power Management

System power management consists of turning off the entire system after saving its state, and restoring the state after the system is turned back on. This process is called CPR (checkpoint and resume). USB client drivers operate the same way that other client drivers operate with respect to CPR. To suspend a device, the driver's detach(9E) entry point is called with a cmd argument of DDI_SUSPEND. To resume a device, the driver's attach(9E) entry point is called with a cmd argument of DDI_RESUME. When you handle the DDI_SUSPEND command in your detach(9E) routine, clean up device state and clean up driver state as much as necessary for a clean resume later. (Note that this corresponds to the SUSPENDED state in Figure 20–4.) When you handle the DDI_RESUME command in your attach(9E) routine, always take the device to full power to put the system in sync with the device.

For USB devices, suspend and resume are handled similarly to a hotplug disconnect and reconnect (see Hotplugging USB Devices). An important difference between CPR and hotplugging is that with CPR the driver can fail the checkpoint process if the device is not in a state from which it can be suspended. For example, the device cannot be suspended if the device has an error recovery in progress. The device also cannot be suspended if the device is busy and cannot be stopped safely.

Serialization

In general, a driver should not call USBA functions while the driver is holding a mutex. Therefore, race conditions in a client driver can be difficult to prevent.

Do not allow normal operational code to run simultaneously with the processing of asynchronous events such as a disconnect or CPR. These types of asynchronous events normally clean up and dismantle pipes and could disrupt the normal operational code.

One way to manage race conditions and protect normal operational code is to write a serialization facility that can acquire and release an exclusive-access synchronization object. You can write the serialization facility in such a way that the synchronization object is safe to hold through calls to USBA functions. The usbskel sample driver demonstrates this technique. See Sample USB Device Driver for information on the usbskel driver.

Utility Functions

This section describes several functions that are of general use.

Device Configuration Facilities

This section describes functions related to device configuration.

Getting Interface Numbers

If you are using a multiple-interface device where the usb_mid(7D) driver is making only one of its interfaces available to the calling driver, you might need to know the number of the interface to which the calling driver is bound. Use the usb_get_if_number(9F) function to do any of the following tasks:

Managing Entire Devices

If a driver manages an entire composite device, that driver can bind to the entire device by using a compatible name that contains vendor ID, product ID, and revision ID. A driver that is bound to an entire composite device must manage all the interfaces of that device as a nexus driver would. In general, you should not bind your driver to an entire composite device. Instead, you should use the generic multiple-interface driver usb_mid(7D).

Use the usb_owns_device(9F) function to determine whether a driver owns an entire device. The device might be a composite device. The usb_owns_device(9F) function returns TRUE if the driver owns the entire device.

Multiple-Configuration Devices

USB devices make only a single configuration available to the host at any particular time. Most devices support only a single configuration. However, a few USB devices support multiple configurations.

Any device that has multiple configurations is placed into the first configuration for which a driver is available. When seeking a match, device configurations are considered in numeric order. If no matching driver is found, the device is set to the first configuration. In this case, the usb_mid driver takes over the device and splits the device into interface nodes. Use the usb_get_cfg(9F) function to return the current configuration of a device.

You can use either of the following two methods to request a different configuration. Using either of these two methods to modify the device configuration ensures that the USBA module remains in sync with the device.


Caution – Caution –

Do not change the device configuration by doing a SET_CONFIGURATION USB request manually. Using a SET_CONFIGURATION request to change the configuration is not supported.


Modifying or Getting the Alternate Setting

A client driver can call the usb_set_alt_if(9F) function to change the selected alternate setting of the currently selected interface. Be sure to close all pipes that were opened explicitly. When switching alternate settings, the usb_set_alt_if(9F) function verifies that only the default pipe is open. Be sure the device is settled before you call usb_set_alt_if(9F).

Changing the alternate setting can affect which endpoints and which class-specific and vendor-specific descriptors are available to the driver. See The Descriptor Tree for more information about endpoints and descriptors.

Call the usb_get_alt_if(9F) function to retrieve the number of the current alternate setting.


Note –

When you request a new alternate setting, a new configuration, or a new interface, all pipes except the default pipe to the device must be closed. This is because changing an alternate setting, a configuration, or an interface changes the mode of operation of the device. Also, changing an alternate setting, a configuration, or an interface changes the device's presentation to the system.


Other Utility Functions

This section describes other functions that are useful in USB device drivers.

Retrieving a String Descriptor

Call the usb_get_string_descr(9F) function to retrieve a string descriptor given its index. Some configuration, interface, or device descriptors have string IDs associated with them. Such descriptors contain string index fields with nonzero values. Pass a string index field value to the usb_get_string_descr(9F) to retrieve the corresponding string.

Pipe Private Data Facility

Each pipe has one pointer of space set aside for the client driver's private use. Use the usb_pipe_set_private(9F) function to install a value. Use the usb_pipe_get_private(9F) function to retrieve the value. This facility is useful in callbacks, when pipes might need to bring their own client-defined state to the callback for specific processing.

Clearing a USB Condition

Use the usb_clr_feature(9F) function to do the following tasks:

Getting Device, Interface, or Endpoint Status

Use the usb_get_status(9F) function to issue a USB GET_STATUS request to retrieve the status of a device, interface, or endpoint.

Getting the Bus Address of a Device

Use the usb_get_addr(9F) function to get the USB bus address of a device for debugging purposes. This address maps to a particular USB port.

Sample USB Device Driver

This section describes a template USB device driver that uses the USBA 2.0 framework for the Solaris environment. This driver demonstrates many of the features discussed in this chapter. This template or skeleton driver is named usbskel.

The usbskel driver is a template that you can use to start your own USB device driver. The usbskel driver demonstrates the following features:

This usbskel driver is available on Sun's web site at http://www.sun.com/bigadmin/software/usbskel/.

For source for additional USB drivers, see the OpenSolaris web site. Go to http://hub.opensolaris.org/bin/view/Main/, and click “Source Browser” in the menu on the left side of the page.