Writing Device Drivers

Summary of Changes

Autoconfiguration Changes

Starting with the SunOS 4.1.2 system, the framework initialized all the drivers in the system before starting init(8). The advent of loadable module technology enabled some device drivers to be added and removed manually at later times in the life of the system.

The SunOS 5.7 system extends this idea to make every driver loadable, and to allow the system to automatically configure itself continually in response to the needs of applications. This, plus the unification of the "mb" style and Open Boot style autoconfiguration, has meant some significant changes to the probe(9E) and attach(9E) routines, and has added detach(9E).

Because all device drivers are loadable, the kernel no longer needs to be recompiled and relinked to add a driver. The config(8) program has been replaced by Open Boot PROM information and supplemented by information in hardware configuration files (see driver.conf(4)).

Changes to Routines

The xxinit() routine for loadable modules in the SunOS 4.1 system has been split into three routines. The VDLOAD case has become _init(9E), the VDUNLOAD case has become _fini(9E), and the VDSTAT case has become _info(9E).

The SunOS 5.7 probe(9E) routine is not the same as probe(9E) in the SunOS 4.1 system. It is called before attach(9E), and may be called any number of times, so it must be stateless. If it allocates resources before it probes the device, it must deallocate them before returning (regardless of success or failure). attach(9E) will not be called unless probe(9E) succeeds.

attach(9E) is called to allocate any resources the driver needs to operate the device. The system now assigns the instance number (previously known as the unit number) to the device.

The reason the rules are so stringent is that the implementation will change. If driver routines follow these rules, they will not be affected by changes to the implementation. If, however, they assume that the autoconfiguration routines are called only in a certain order (first probe(9E), then attach(9E), for example), these drivers will break in some future release.

Instance Numbers

In the SunOS 4.1 system, drivers counted the number of devices that they found, and assigned a unit number to each (in the range 0 to the number of units found less one). Now, these unit numbers are called instance numbers, and the system assigns the numbers to devices.

Instances can be thought of as a shorthand name for a particular instance of a device (foo0 could name instance 0 of device foo). The system assigns and retrieves the instance numbers, even after any number of reboots. This is because at open(2) time all the system has is a dev_t. To determine which device is needed (as it may need to be attached), the system needs to get the instance number (which the driver retrieves from the minor number).

The mapping between instance numbers and minor numbers (see getinfo(9E)) should be static. The driver should not require any state information to do the translation, since that information might not be available (the device might not be attached).

Changes to `/devices`

All devices in the system are represented by a data structure in the kernel called the device tree. The /devices hierarchy is a representation of this tree in the file system.

In the SunOS 4.1 system,the administrator created special device files using mknod (or an installation script running mknod). Now, device drivers notify the kernel of entries by calling ddi_create_minor_node(9F) once they have determined a particular device exists. drvconfig(1M) actually maintains the file system nodes. This results in names that completely identify the device.

Changes to`/dev`

In the SunOS 4.1 system, device special files were located (by convention) in /dev. Now that the /devices directory is used for special files, /dev is used for logical device names. Usually, these are symbolic links to the real names in /devices.

Logical names can be used for backward compatibility with SunOS 4.1 applications, a short name for the real /devices name, or a way to identify a device without having to know where it is in the /devices tree. For example,/dev/fb could refer to a cgsix, cgthree, or bwtwo framebuffer, but the application does not need to know this.

See disks(1M), tapes(1M), ports(1M), devlinks(1M), and /etc/devlink.tab for system-supported ways of creating these links. See also Chapter 5, Autoconfiguration, for more information.

Multithreading Changes

The SunOS 5.7 system supports multiple threads in the kernel, and multiple CPUs. A thread is a sequence of instructions being executed by a program. In the SunOS 5.7 system, there are application threads, and there are kernel threads. Kernel threads are used to execute kernel code, and are the threads of concern to the driver writer.

Interrupts are also handled as threads. Because of this, there is less of a distinction between the tophalf and bottomhalf of a driver than there was in the SunOS 4.1 system. All driver code is executed by a thread, which may be running in parallel with threads in other (or the same) part of a driver. The distinction now is whether these threads have user context.

See Chapter 4, Multithreading, for more information.

Locking Changes

Starting with the SunOS 4.1.2 system, only one processor can be in the kernel at any one time. This is accomplished by using a master lock around the entire kernel. When a processor needs to execute kernel code, it needs to acquire the lock (this excludes other processors from running the code protected by the lock) and then release the lock when it is through. Because of this master lock, drivers written for uniprocessor systems did not change for multiprocessor systems. Two processors could not execute driver code at the same time.

In the SunOS 5.7 system, instead of one master lock, there are many smaller locks that protect smaller regions of code. For example, there may be a kernel lock that protects access to a particular vnode, and one that protects an inode. Only one processor can be running code dealing with that vnode at a time, but another could be accessing an inode. This allows a greater degree of concurrency.

However, because the kernel is multithreaded, it is possible that two (or more) threads are in driver code at the same time.

One thread could be in an entry point, and another in the interrupt routine. The driver had to handle this in the SunOS 4.1 system, but with the restriction that the interrupt routine blocked the user context routine while it ran.
Two threads could be in a routine at the same time. This could not happen in the SunOS 4.1 system.

Both of these cases are similar to situations present in the SunOS 4.1 system, but now these threads could run at the same time on different CPUs. The driver must be prepared to handle these types of occurrences.

Mutual Exclusion Locks

In the SunOS 4.1 system, a driver had to be careful when accessing data shared between the tophalf and the interrupt routine. Because the interrupt could occur asynchronously, the interrupt routine could corrupt data or simply hang. To prevent this, portions of the top half of the driver would raise, using the various spl routines, the interrupt priority level of the CPU to block the interrupt from being handled:

	s = splr(pritospl(6));
 	/* access shared data */
 	(void)splx(s);

In the SunOS 5.7 system, this no longer works. Changing the interrupt priority level of one CPU does not necessarily prevent another CPU from handling the interrupt. Also, two top-half routines may be running simultaneously with the interrupt running on a third CPU.

To solve this problem, the SunOS 5.7 system provides:

A uniform module of execution--even interrupts run as threads. This blurs the distinction between the tophalf and the bottomhalf, as effectively every routine is a bottomhalf routine.
A number of locking mechanisms-a common mechanism is to use mutual exclusion locks (mutexes):

	mutex_enter(&mu);
 	/* access shared data */
 	mutex_exit(&mu);

A subtle difference from the SunOS 4.1 system is that, because everything is run by kernel threads, the interrupt routine needs to explicitly acquire and release the mutex. In the SunOS 4.1 system, this was implicit since the interrupt handler automatically ran at an elevated priority.

See "Multithreading Additions to the State Structure" for more information on locking.

Condition Variables

In the SunOS 4.1 system, when the driver needed the current process to wait for something (such as a data transfer to complete), it called sleep()(), specifying a channel and a dispatch priority. The interrupt routine then called wakeup()( ) on that channel to notify all processes waiting on that channel that something happened. Because the interrupt could occur at any time, the interrupt priority was usually raised to ensure that the wakeup could not occur until the process was asleep.

Example A-1 SunOS 4.1 Synchronization Method

int		busy; /* global device busy flag */
int xxread(dev, uio)
dev_t		dev;
struct uio *uio;
{
	int		s;
	s = splr(pritospl(6));
	while (busy)
	    	sleep(&busy, PRIBIO + 1);
	busy = 1;
	(void)splx(s);
	/* do the read */
}
int xxintr()
{
	busy = 0;
	wakeup(&busy);
}

The SunOS 5.7 system provides similar functionality with condition variables. Threads are blocked on condition variables until they are notified that the condition has occurred. The driver must acquire a mutex that protects the condition variable before blocking the thread. The mutex is then released before the thread is blocked (similar to blocking/unblocking interrupts in the SunOS 4.1 system).

Example A-2 Synchronization in SunOS 5.7 Similar to SunOS 4.1

int			busy; 			/* global device busy flag */
kmutex_t 			busy_mu;			/* mutex protecting busy flag */
kcondvar_t			busy_cv;			/* condition variable for busy flag */
static int
xxread(dev_t dev, struct uio *uiop, cred_t *credp)
{
	mutex_enter(&busy_mu);
	while (busy)
	    	cv_wait(&busy_cv, &busy_mu);
	busy = 1;
	mutex_exit(&busy_mu);
	/* do the read */
}
static u_int
xxintr(caddr_t arg)
{
	mutex_enter(&busy_mu);
	busy = 0;
	cv_broadcast(&busy_cv);
	mutex_exit(&busy_mu);
}

Like wakeup(), cv_broadcast(9F) unblocks all threads waiting on the condition variable. To wake up one thread, use cv_signal(9F) (there was no documented equivalent for cv_signal(9F) in the SunOS 4.1 system).

Note -

There is no equivalent to the dispatch priority passed to sleep()( ).

Though the sleep()() and wakeup()() calls exist, do not use them, since the result would be an MT-unsafe driver.

See "Thread Synchronization" for more information.

Catching Signals

The driver could accidentally wait for an event that will never occur, or the event might not happen for a long time. In either case, the user might want to abort the process by sending it a signal (or typing a character that causes a signal to be sent to the process). Whether the signal causes the driver to wake up depends upon the driver.

In the SunOS 4.1 system, whether the sleep()() was signal-interruptible depended upon the dispatch priority passed to sleep()(). If the priority was greater than PZERO, the driver was signal-interruptible, otherwise the driver would not be awakened by a signal. Normally, a signal interrupt caused sleep( ) to return to the user, without notifying the driver that the signal had occurred. Drivers that needed to release resources before returning to the user passed the PCATCH flag to sleep( ), then looked at the return value of sleep() to determine why they awoke:

while (busy) {
 	if (sleep(&busy, PCATCH | (PRIBIO + 1))) {
 		/* awakened because of a signal */
 		/* free resources */
 		return (EINTR);
 	}
 }

In the SunOS 5.7 system, the driver can use cv_wait_sig(9F) to wait on the condition variable, but be signal interruptible. Note that cv_wait_sig(9F) returns zero to indicate the return was due to a signal, but sleep( ) in the SunOS 4.1 system returned a nonzero value:

while (busy) {
 	if (cv_wait_sig(&busy_cv, &busy_mu) == 0) {
 		/* returned because of signal */
 		/* free resources */
 		return (EINTR);
 	}
 }

`cv_timedwait()`()

Another solution drivers used to avoid blocking on events that would not occur was to set a timeout before the call to sleep. This timeout would occur far enough in the future that the event should have happened, and if it did run, it would awaken the blocked process. The driver would then see if the timeout function had run, and return some sort of error.

This can still be done in the SunOS 5.7 system, but the same thing may be accomplished with cv_timedwait(9F). An absolute time to wait is passed to cv_timedwait(9F), which will return zero if the time is reached and the event has not occurred. See Example 4-3 for an example usage of cv_timedwait(9F). Also see "cv_wait_sig()" for information on cv_timedwait_sig(9F).

Other Locks

Semaphores and readers/writers locks are also available. See semaphore(9F) and rwlock(9F).

Lock Granularity

Generally, start with one lock, and add more depending upon the abilities of the device. See "Choosing a Locking Scheme" and Appendix G, Advanced Topics, for more information.

Interrupt Changes

In the SunOS 4.1 system, two distinct methods were used for handling interrupts.

Polled, or autovectored, interrupts were handled by calling the xxpoll()( ) routine of the device driver. This routine was responsible for checking all drivers' active units.
Vectored interrupt handlers were called directly in response to a particular hardware interrupt on the basis of the interrupt vector number assigned to the device.

In the SunOS 5.7 system, the interrupt handler model has been unified. The device driver registers an interrupt handler for each device instance, and the system either polls all the handlers for the currently active interrupt level, or calls that handler directly (if it is vectored). The driver no longer needs to care which type of interrupt mechanism is in use (in the handler).

ddi_add_intr(9F) is used to register a handler with the system. A driver-defined argument of type caddr_t to pass to the interrupt handler. The address of the state structure is a good choice. The handler can then cast the caddr_t to whatever was passed. See "Registering Interrupts" and "Responsibilities of an Interrupt Handler" for more information.

DMA Changes

In the SunOS 4.1 system, to do a DMA transfer the driver mapped a buffer into the DMA space, retrieved the DMA address and programed the device, did the transfer, and freed the mapping. This was accomplished in this sequence:

mb_mapalloc()( ) - Map buffer into DMA space
MBI_ADDR()( ) - Retrieve address from returned cookie
Program the device and start the DMA
mb_mapfree()( ) - Free mapping when DMA is complete

The first three usually occurred in a start()( ) routine, and the last in the interrupt routine.

The SunOS 5.7 DMA model is similar, but it has been extended. The goal of the new DMA model is to abstract the platform-dependent details of DMA away from the driver. A sliding DMA window has been added for drivers that need to do DMA to large objects, and the DMA routines can be informed of device limitations (such as 24-bit addressing).

The sequence for DMA is as follows: The driver allocates a DMA handle using ddi_dma_alloc_handle(9F). The DMA handle can be reused for subsequent DMA transfers. Then the driver commits DMA resources using either ddi_dma_buf_bind_handle(9F) or ddi_dma_addr_bind_handle(9F), retrieves the DMA address from the DMA cookie to do the DMA, and frees the mapping with ddi_dma_unbind_handle(9F). The new sequence is something like this:

ddi_dma_alloc_handle(9F) - Allocate a DMA handle
ddi_dma_buf_bind_handle(9F) - Allocate DMA resources and retrieve address from the returned cookie
Program the device and start the DMA
Perform the transfer.

Note -

If the transfer involves several windows, you can call ddi_dma_getwin(9F) to move to subsequent windows.

ddi_dma_unbind_handle(9F) - Free mapping when DMA is complete
ddi_dma_free_handle(9F) - Free DMA handle when no longer needed

Additional routines have been added to synchronize any underlying caches and buffers, and handle IOPB memory. See Chapter 7, DMA, for details.

In addition, in the SunOS 4.1 system, the driver had to inform the system that it might do DMA, either through the mb_driver structure or with a call to adddma()( ). This was needed because the kernel might need to block interrupts to prevent DMA, but needed to know the highest interrupt level to block. Because the new implementation uses mutexes, this is no longer needed.

Summary of Changes

Autoconfiguration Changes

Changes to Routines

Instance Numbers

Changes to /devices

Changes to/dev

Multithreading Changes

Locking Changes

Mutual Exclusion Locks

Condition Variables

Example A-1 SunOS 4.1 Synchronization Method

Example A-2 Synchronization in SunOS 5.7 Similar to SunOS 4.1

Catching Signals

cv_timedwait()()

Other Locks

Lock Granularity

Interrupt Changes

DMA Changes

Changes to `/devices`

Changes to`/dev`

`cv_timedwait()`()