Writing Device Drivers

Appendix G Advanced Topics

This appendix contains a collection of topics. Not all drivers need to be concerned with the issues addressed.

Multithreading

This section supplements the guidelines presented in Chapter 4, Multithreading, for writing an MT-safe driver, a driver that safely supports multiple threads.

Lock Granularity

Here are some issues to consider when deciding on how many locks to use in a driver:

A little thought in reorganizing the ordering and types of locks around such data can lead to considerable savings.

Avoiding Unnecessary Locks

To avoid unnecessary locks, note the following:


Note -

Kernel-thread stacks are small (currently 8 Kbytes), so do not allocate large automatic variables, and avoid deep recursion.


Locking Order

When acquiring multiple mutexes, be sure to acquire them in the same order on each code path. For example, mutexes A and B are used to protect two resources in the following ways:

Code Path 1					Code Path 2
mutex_enter(&A);					mutex_enter(&B);
 	...					...
mutex_enter(&B);					mutex_enter(&A);
 	...					...
mutex_exit(&B);					mutex_exit(&A);
 	...					...
mutex_exit(&A);					mutex_exit(&B);

If thread 1 is executing code path one, and thread two is executing code path 2, the following could occur:

  1. Thread one acquires mutex A.

  2. Thread two acquires mutex B.

  3. Thread one needs mutex B, so it blocks holding mutex A.

  4. Thread two needs mutex A, so it blocks holding mutex B.

These threads are now deadlocked. This is hard to track, particularly since the code paths are rarely so straightforward. Also, it doesn't always happen, as it depends on the relative timing of threads 1 and 2.

Scope of a Lock

Experience has shown that it is easier to deal with locks that are either held throughout the execution of a routine, or locks that are both acquired and released in one routine. Avoid nesting like this:

static void
 xxfoo(...)
 {
 	mutex_enter(&softc->lock);
 	...
 	xxbar();
 }
static void
 xxbar(...)
 {
 	...
 	mutex_exit(&softc->lock);
 }

This example works, but will almost certainly lead to maintenance problems.

If contention is likely in a particular code path, try to hold locks for a short time. In particular, arrange to drop locks before calling kernel routines that might block. For example:

mutex_enter(&softc->lock);
 			...
 softc->foo = bar;
 softc->thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP);
 ...
 mutex_exit(&softc->lock);

This is better coded as:

thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP);
 mutex_enter(&softc->lock);
 ...
 softc->foo = bar;
 softc->thingp = thingp;
 ...
 mutex_exit(&softc->lock);

Potential Panics

Here is a set of mutex-related panics:

panic: recursive mutex_enter. mutex %x caller %x

Mutexes are not re-entrant by the same thread. If you already own the mutex, you cannot own it again. Doing this leads to this panic.

panic: mutex_adaptive_exit: mutex not held by thread

Releasing a mutex that the current thread does not hold causes this panic.

panic: lock_set: lock held and only one CPU

This panic only occurs on a uniprocessor. It indicates that a spin mutex is held and it would spin forever, because there is no other CPU to release it. This could happen because the driver forgot to release the mutex on one code path, or blocked while holding it.

A common cause of this panic is that the device's interrupt is high-level (see ddi_intr_hilevel(9F) and Intro(9F)), and is calling a routine that blocks the interrupt handler while holding a spin mutex. This is obvious if the driver explicitly calls cv_wait(9F), but might not be so if it's blocking while grabbing an adaptive mutex with mutex_enter(9F).


Note -

In principle, this is only a problem for drivers that operate above lock level.


Sun Disk Device Drivers

Sun disk devices represent an important class of block device drivers. A Sun disk device is one that is supported by disk utility commands such as format(1M) and newfs(1M).

Disk I/O Controls

Sun disk drivers need to support a minimum set of I/O controls specific to Sun disk drivers. These I/O controls are specified in the dkio(7) manual page. Disk I/O controls transfer disk information to or from the device driver. In the case where data is copied out of the driver to the user, ddi_copyout(9F) should be used to copy the information into the user's address space. When data is copied to the disk from the user, the ddi_copyin(9F) should be used to copy data into the kernels address space. Table G-1 lists the mandatory Sun disk I/O controls.

Table G-1 Mandatory Sun Disk I/O Controls

I/O Control 

Description 

DKIOCINFO

Returns information describing the disk controller. 

DKIOCGAPART

Returns a disk's partition map. 

DKIOCSAPART

Sets a disk's partition map. 

DKIOCGGEOM

Returns a disk's geometry. 

DKIOCSGEOM

Sets a disk's geometry. 

DKIOCGVTOC

Returns a disk's Volume Table of Contents. 

DKIOCSVTOC

Sets a disk's Volume Table of Contents. 

Disk Performance

The Solaris 7 DDI/DKI provides facilities to optimize I/O transfers for improved file system performance. It supports a mechanism to manage the list of I/O requests so as to optimize disk access for a file system. See "Asynchronous Data Transfers"for a description of enqueuing an I/O request.

The diskhd structure is used to manage a linked list of I/O requests.

struct diskhd {
 	long	b_flags;		/* not used, needed for */
 			/* consistency          */
 	struct buf *b_forw,	*b_back;	/* queue of unit queues */
 	struct buf *av_forw,	*av_back;	/* queue of bufs for this unit */
 	long	b_bcount;		/* active flag */
 };

The diskhd data structure has two buf pointers that the driver can manipulate. The av_forw pointer points to the first active I/O request. The second pointer, av_back, points to the last active request on the list.

A pointer to this structure is passed as an argument to disksort(9F), along with a pointer to the current buf structure being processed. The disksort(9F) routine is used to sort the buf requests in a fashion that optimizes disk seek and then inserts the buf pointer into the diskhd list. The disksort program uses the value that is in b_resid of the buf structure as a sort key. The driver is responsible for setting this value. Most Sun disk drivers use the cylinder group as the sort key. This tends to optimize the file system read-ahead accesses.

Once data has been added to the diskhd list, the device needs to transfer the data. If the device is not busy processing a request, the xxstart()( ) routine pulls the first buf structure off the diskhd list and starts a transfer.

If the device is busy, the driver should return from the xxstrategy()( ) entry point. Once the hardware is done with the data transfer, it generates an interrupt. The driver's interrupt routine is then called to service the device. After servicing the interrupt, the driver can then call the start()( ) routine to process the next buf structure in the diskhd list.

SCSA

Global Data Definitions

The following is information for debugging, useful when a driver experiences bus-wide problems. One global data variable has been defined for the SCSA implementation: scsi_options. This variable is a SCSA configuration longword used for debug and control. The defined bits in the scsi_options longword can be found in the file <sys/scsi/conf/autoconf.h>. Table G-2 shows their meanings when set.

Table G-2 SCSA Options

Option 

Description 

SCSI_OPTIONS_DR

Enables global disconnect/reconnect. 

SCSI_OPTIONS_SYNC

Enables global synchronous transfer capability. 

SCSI_OPTIONS_LINK

Enables global link support. 

SCSI_OPTIONS_PARITY

Enables global parity support. 

SCSI_OPTIONS_TAG

Enables global tagged queuing support. 

SCSI_OPTIONS_FAST

Enables global FAST SCSI support: 10MB/sec transfers, as opposed to 5 MB/sec. 

SCSI_OPTIONS_FAST20

Enables global FAST20 SCSI support: 20MB/sec transfers. 

SCSI_OPTIONS_FAST40

Enables global FAST40 SCSI support: 40MB/sec transfers. 

SCSI_OPTIONS_FAST80

Enables global FAST80 SCSI support: 80MB/sec transfers. 

SCSI_OPTIONS_WIDE

Enables global WIDE SCSI. 


Note -

The setting of scsi_options affects all host adapter and target drivers present on the system (as opposed to scsi_ifsetcap(9F)). Refer to scsi_hba_attach(9F) in the Solaris 2.6 Reference Manual for information on controlling these options for a particular host adapter.


The default setting for scsi_options has these values set:

Tagged Queuing

For a definition of tagged queuing refer to the SCSI-2 specification. To support tagged queuing, first check the scsi_options flag SCSI_OPTIONS_TAG to see if tagged queuing is enabled globally. Next, check to see if the target is a SCSI-2 device and whether it has tagged queuing enabled. If this is all true, attempt to enable tagged queuing by using scsi_ifsetcap(9F). Example G-1 shows an example of supporting tagged queuing.


Example G-1 Supporting SCSI Tagged Queuing

#define ROUTE &sdp->sd_address
	...
	/*
	 * If SCSI-2 tagged queueing is supported by the disk drive and
	 * by the host adapter then we will enable it.
	 */ 
	xsp->tagflags = 0;
	if ((scsi_options & SCSI_OPTIONS_TAG) &&
		(devp->sd_inq->inq_rdf == RDF_SCSI2) &&
		(devp->sd_inq->inq_cmdque)) {
		if (scsi_ifsetcap(ROUTE, "tagged-qing", 1, 1) == 1) {
			xsp->tagflags = FLAG_STAG;
			xsp->throttle = 256;
		} else if (scsi_ifgetcap(ROUTE, "untagged-qing", 0) == 1) {
			xsp->dp->options |= XX_QUEUEING;
			xsp->throttle = 3;
		} else {
			xsp->dp->options &= ~XX_QUEUEING;
			xsp->throttle = 1;
		}
}

Untagged Queueing

If tagged queueing fails, you can attempt to set untagged queuing. In this mode, you submit as many commands as you think necessary or optimal to the host adapter driver. Then, the host adapter queues the commands to the target one at a time (as opposed to tagged queueing, where the host adapter submits as many commands as it can until the target indicates that the queue is full).