Writing Device Drivers

Chapter 3 Overview of SunOS Device Drivers

This chapter gives an overview of SunOS device drivers. It discusses what a device driver is and the types of device drivers that Solaris 7 supports. It also provides a general discussion of the routines that device drivers must implement and points out compiler-related issues.

What Is a Device Driver?

A device driver is a kernel module responsible for managing low-level I/O operations for a particular hardware device. Device drivers can also be software-only, emulating a device that exists only in software, such as a RAM disk or a pseudo-terminal. Such device drivers are called pseudo device drivers and cannot perform functions requiring hardware (such as DMA).

A device driver contains all the device-specific code necessary to communicate with a device and provides a standard I/O interface to the rest of the system. This interface protects the kernel from device specifics just as the system call interface protects application programs from platform specifics. Application programs and the rest of the kernel need little (if any) device-specific code to address the device. In this way, device drivers make the system more portable and easier to maintain.

Types of Device Drivers

There are several kinds of device drivers, each handling a different kind of I/O. Block device drivers manage devices with physically addressable storage media, such as disks. All other devices are considered character devices. Two types of character device drivers are standard character device drivers and STREAMS device drivers.

Block Device Drivers

Devices that support a file system are known as block devices. Drivers written for these devices are known as block device drivers. Block device drivers take a file system request (in the form of a buf(9S) structure) and issue the I/O operations to the disk to transfer the specified block. The main interface to the file system is the strategy(9E) routine. See Chapter 10, Drivers for Block Devices, for more information.

Block device drivers can also provide a character driver interface that allows utility programs to bypass the file system and access the device directly. This device access is commonly referred to as the raw interface to a block device.

Standard Character Device Drivers

Character device drivers normally perform I/O in a byte stream. They can also provide additional interfaces not present in block drivers, such as I/O control (ioctl(9E)) commands, memory mapping, and device polling. See Chapter 9, Drivers for Character Devices, for more information.

Byte-Stream I/O

The main task of any device driver is to perform I/O, and many character device drivers do what is called byte-stream or character I/O. The driver transfers data to and from the device without using a specific device address. This is in contrast to block device drivers, where part of the file system request identifies a specific location on the device.

The read(9E) and write(9E) entry points handle byte-stream I/O for standard character drivers. See "I/O Request Handling" for more information.

I/O Control

Many devices have characteristics and behavior that can be configured or tuned. The ioctl(2) system call and the ioctl(9E) driver entry point provide a mechanism for application programs to change and determine the status of a driver's configurable characteristics. For example, the baud rate of a serial communications port is usually configurable in this way.

The I/O control interface is open ended, enabling device drivers to define special commands for the device. The definition of the commands is entirely determined by the driver and is restricted only by the requirements of the application programs using the device and the device itself.

Certain classes of devices such as frame buffers or disks must support standard sets of I/O control requests. These standard I/O control interfaces are documented in the Solaris 2.7 Reference Manual. For example, fbio(7I) documents the I/O controls that frame buffers must support, and dkio(7I) documents standard disk I/O controls. See "Miscellaneous I/O Control " for more information on I/O control.

Note -

This manual does not cover I/O control commands.

Device Memory Mapping

For certain devices, such as frame buffers, it is more efficient for application programs to have direct access to device memory. Applications can map device memory into their address spaces using the mmap(2) system call. To support memory mapping, device drivers implement segmap(9E) and devmap(9E) entry points. For information on devmap(9E), see Chapter 11, Mapping Device or Kernel Memory. For information on segmap(9E), see Chapter 9, Drivers for Character Devices.

Drivers that define an devmap(9E) entry point usually do not define read(9E) and write(9E) entry points, as application programs perform I/O directly to the devices after calling mmap(2).

Device Polling

The poll(2) system call enables application programs to monitor or poll a set of file descriptors for certain conditions or events. poll(2) can be used to find out whether data are available to be read from the file descriptors or whether data may be written to the file descriptors without delay. Drivers referred to by these file descriptors must provide support for the poll(2) system call by implementing a chpoll(9E) entry point.

Drivers for communication devices such as serial ports should support polling, as they are used by applications that require synchronous notification of changes in read and write status. Many communications devices, however, are better implemented as STREAMS drivers.

STREAMS Drivers

STREAMS is a separate programming model for writing a character driver. Devices that receive data asynchronously (such as terminal and network devices) are suited to a STREAMS implementation. STREAMS device drivers must provide the loading and autoconfiguration support described in Chapter 5, Autoconfiguration. See the Streams Programming Guide for additional information on how to write STREAMS drivers.

Bus Address Spaces

Three types of bus address space are memory space, I/O space, and configuration space. The device driver usually accesses memory space through memory mapping and I/O space through I/O ports. The configuration address space is accessed primarily during system initialization.

The preferred method depends on the device; it is generally not software configurable. For example, SBus and VMEbus devices do not provide I/O ports or configuration space, but some PCI devices may provide all three.

The data format of the host may also have different endian characteristics than the data format of the device. If this is the case, data transferred between the host and the device needs to be byte swapped to conform to the data format requirements of the destination location. Other devices may have the same endian characteristics as their host. In this case, no byte swapping is required. The DDI framework performs any required byte swapping on behalf of the driver. The driver simply needs to specify the endianness of the device to the framework.

Address Mapping Setup

Before a driver can access a device's bus address, the bus address spaces must be set up using ddi_regs_map_setup(9F). The driver can then access the device by passing the data access handle returned from ddi_regs_map_setup(9F) to one of the ddi_get8(9F) or ddi_put8(9F) family of routines.

One of the arguments required by ddi_regs_map_setup(9F) is a pointer to a device access attributes structure, ddi_device_acc_attr(9S). The ddi_device_acc_attr(9S) structure describes the data access characteristics and requirements of the device. The ddi_device_acc_attr(9S) structure contains the following members:

	ushort_t   devacc_attr_version;
 	uchar_t    devacc_attr_endian_flags;
 	uchar_t    devacc_attr_dataorder;

devacc_attr_version member identifies the version number of this structure. The current version number is DDI_DEVICE_ATTR_V0.

devacc_attr_endian_flags member describes the endian characteristics of the device. If DDI_NEVERSWAP_ACC is set, data access with no byte swapping is indicated. This flag should be set when no byte swapping is required. For example, if a device does byte-stream I/O, no byte swapping is required. If DDI_STRUCTURE_BE_ACC is set, the device data format is big endian. If DDI_STRUCTURE_LE_ACC is set, the device data format is little endian.

The framework will do any required byte swapping on behalf of the driver based on the flags indicated in devacc_attr_endian_flags and the host's data format endian characteristics.

devacc_attr_dataorder describes the order in which the CPU will reference data. Certain hosts may load or store data in certain orders to pipeline performance. The data ordering may be programmed to execute in one of the following ways:

Strong data ordering - If DDI_STRICTORDER_ACC is set, the CPU must issue the references in order, as specified by the programmer. This is the default behavior.
Reordering - If DDI_UNORDERED_OK_ACC is set, the CPU may reorder the data reference. This includes all kinds of reordering (for example, a load followed by a store may be replaced by a store followed by a load).
Data merging - If DDI_MERGING_OK_ACC is set, the CPU may merge individual stores to consecutive locations. For example, the CPU may turn two consecutive byte stores into one halfword store. It may also batch individual loads. For example, the CPU may turn two consecutive byte loads into one halfword load. DDI_MERGING_OK_ACC also implies reordering.
Cache loading - If DDI_LOADCACHING_OK_ACC is set, the CPU may cache the data it fetches and reuse it until another store occurs. The default behavior is to fetch new data on every load. DDI_LOADCACHING_OK_ACC also implies merging and reordering.
Cache storing - If DDI_STORECACHING_OK_ACC is set, the CPU may keep the data in the cache and push it to the device (perhaps with other data) at a later time. The default behavior is to push the data right away. DDI_STORECACHING_OK_ACC also implies load caching, merging, and reordering.

Note -

The restriction to the hosts diminishes while moving from strong data ordering to cache storing in terms of data accesses by the driver.

The values assigned to devacc_attr_dataorder are advisory, not mandatory. For example, data can be ordered without being merged or cached, even though a driver requests unordered, merged, and cached together.

A driver for a big-endian device that requires strict data ordering during data accesses would encode the ddi_device_acc_attr structure as follows:

	static ddi_device_acc_attr_t access_attr = {
 		DDI_DEVICE_ATTR_V0,		/* version number */
 		DDI_STRUCTURE_BE_ACC, 	/* big endian */
 		DDI_STRICTORDER_ACC		/* strict ordering */
 	}

The system will use the information stored in the ddi_device_acc_attr structure and other system-specific information to encode an opaque data handle as one of the returned parameters from ddi_map_regs_setup(9F). The returned data handle is used as a parameter to the data access routines (such as ddi_put8(9F) or ddi_get8(9F)) during subsequent accesses to the mapped registers. The driver must never attempt to interpret the contents of the data handle.

If successful, ddi_regs_map_setup(9F) also returns a kernel virtual address that is mapped to the bus address base. The address base may be used as a base reference address in deriving the effective address of other registers by adding the appropriate offset.

Note -

Drivers should not directly dereference the returned address. A driver must access the device through one of the data access functions.

Data Access Functions

Data access functions allow drivers to transfer data to and from devices without directly referencing the hardware registers. The driver can transfer data to the device or receive data from the device using the ddi_put8(9F) or the ddi_get8(9F) families of routines. The ddi_put8(9F) routines allow a driver to write data to the device in quantities of 8 bits (ddi_put8(9F)), 16 bits (ddi_put16(9F)), 32 bits (ddi_put32(9F)), and 64 bits (ddi_put64(9F)). The ddi_get8(9F) routines exist for reading from a device. Multiple values may be written or read by using the ddi_rep_put8(9F) or ddi_rep_get8(9F) family of routines respectively. See Appendix C, Summary of Solaris 7 DDI/DKI Services for more information on data access functions.

Note -

These routines may be applied to any address base returned from ddi_regs_map_setup(9F) regardless of the address space the register resides in (such as memory, I/O, or configuration space).

Example 3-1 illustrates the use of ddi_regs_map_setup(9F) and ddi_put8(9F) to access device registers.

Example 3-1 Accessing Device Registers

static ddi_device_acc_attr_t access_attr = {
	DDI_DEVICE_ATTR_V0, /*version number */
	DDI_STRUCTURE_BE_ACC, /* big endian */
	DDI_STRICTORDER_ACC /*strict ordering */
};

caddr_t reg_addr;
ddi_acc_handle_t data_access_handle;

ddi_regs_map_setup(..., &reg_addr, ..., &access_attr,
	&data_access_handle);

When ddi_regs_map_setup(9F) returns, reg_addr contains the address base and data_access_handle contains the opaque data handle to be used in subsequent data accesses. The driver may now access the mapped registers. The following example writes one byte to the first mapped location.

ddi_put8(data_access_handle, (uint8_t *)reg_addr, 0x10);

Similarly, the driver could have used ddi_get8(9F) to read data from the device registers.

Memory Space Access

In memory-mapped access, device registers appear in memory address space. The driver must call ddi_regs_map_setup(9F) to set up the mapping. The driver can then access the device registers using one of the ddi_put8(9F) or ddi_get8(9F) family of routines.

To access memory space, the driver can use the ddi_mem_put8(9F) and ddi_mem_get8(9F) family of routines. These functions may be more efficient on some platforms. Use of these routines, however, may limit the ability of the driver to remain portable across different bus versions of the device.

I/O Space Access

In I/O space access, the device registers appear in I/O space. Each addressable element of the I/O address is called an I/O port. Device registers are accessed through I/O port numbers. These port numbers can refer to 8, 16, or 32-bit registers. The driver must call ddi_regs_map_setup(9F) to set up the mapping, and it can then access the I/O port using one of the ddi_put8(9F) or ddi_get8(9F) family of routines.

The driver can also access I/O space using the ddi_io_put8(9F) and ddi_io_get8(9F) family of routines. These functions may be more efficient on some platforms. Use of these routines, however, may limit the ability of the driver to remain portable across different bus versions of the device.

Configuration Space Access

Configuration space is used primarily during device initialization. It determines the location and size of register sets and memory buffers located on the device. The driver can access configuration space using the ddi_regs_map_setup(9F) and the ddi_put(9F or ddi_get(9F) functions as described previously.

Note -

For PCI local bus devices, an alternative set of routines exists. To get access to the configuration address space, the driver can use pci_config_setup(9F) in place of ddi_regs_map_setup(9F). The pci_config_get8(9F) and pci_config_put8(9F) family of routines may be used in place of the generic routines ddi_get8(9F) and ddi_put8(9F). These functions provide equivalent configuration space access as defined in the PCI bus binding for the IEEE 1275 specifications for FCode drivers. However, use of these routines may limit the ability of the driver to remain portable across different bus versions of the device.

Example Device Registers

Most of the examples in this manual use a fictitious device that has an 8-bit command and status register (csr), followed by an 8-bit data register. The command and status register is so called because writes to it go to an internal command register, and reads from it are directed to an internal status register.

The command register looks like this:

The status register looks like this:

Many drivers provide macros for the various bits in their registers to make the code more readable. The examples in this manual use the following names for the bits in the command register:

	#define	 ENABLE_INTERRUPTS								0x10
 	#define	 CLEAR_INTERRUPT								0x08
 	#define	 START_TRANSFER								0x04

For the bits in the status register, the examples use following macros:

	#define	 INTERRUPTS_ENABLED								0x10
 	#define	 INTERRUPTING								0x08
 	#define	 DEVICE_BUSY								0x04
 	#define	 DEVICE_ERROR								0x02
 	#define	 TRANSFER_COMPLETE								0x01

Device Register Structure

Using pointer accesses to communicate with the device results in unreadable code. For example, the code that reads the data register when a transfer has been completed might look like this:

	uint8_t data;
 	uint8_t status;
 	/* get status */
 	status = ddi_get8(data_access_handle, (uint8_t *)reg_addr);
 	if (status & TRANSFER_COMPLETE) {
 		data = ddi_get8(data_access_handle,
 			(uint8_t *)reg_addr + 1); /* read data */
 	}

To make the code more readable, it is common to define a structure that matches the layout of the device registers. In this case, the structure could look like this:

	struct device_reg {
 		uint8_t csr;
 		uint8_t data;
 	};

The driver then maps the registers into memory and refers to them through a pointer to the structure:

	struct device_reg *regp;
	...
 	ddi_regs_map_setup(..., (caddr_t *)&regp, ... ,
 		&access_attributes, &data_access_handle);
 	...

The code that reads the data register upon a completed transfer now looks like this:

	uint8_t data;
 	uint8_t status;
 	/* get status */
 	status = ddi_get8(data_access_handle, &regp->csr); 	
 	if (status & TRANSFER_COMPLETE) {
 		/* read data */
 		data = ddi_get8(data_access_handle, &regp->data); 	
 	}

Structure Padding

A device that has a 1-byte command and status register followed by a 4-byte data register might lead to the following structure layout:

	struct device_reg {
 		uint8_t			csr;
 		uint32	_t		data;
 	};

This structure is not correct, because the compiler places padding between the two fields. For example, the SPARC processor requires each type to be on its natural boundary, which is 1-byte alignment for the csr field, but 4-byte alignment for the data field. This results in three unused bytes between the two fields. When the driver accesses a data register, it will be three bytes off. Consequently, this layout should not be used.

Finding Padding

The ANSI C offsetof(3C) macro may be used in a test program to determine the offset of each element in the structure. Knowing the offset and the size of each element, the location and size of any padding can be determined.

Example 3-2 Structure Padding

#include <sys/types.h>
#include <stdio.h>
#include <stddef.h>

struct device_reg {
	uint8_t			csr;
	uint32_t			data;
};

int main(void)
{
	printf("The offset of csr is %d, its size is %d.\n",
			offsetof(struct device_reg, csr), sizeof (uint8_t));
	printf("The offset of data is %d, its size is %d.\n",
			offsetof(struct device_reg, data), sizeof (uint32_t));
	return (0);
}

Here is a sample compilation with Sun WorkShop^TM Compiler C version 4.2 and a subsequent run of the program:

test% cc -Xa c.c

test% a.out

The offset of csr is 0, its size is 1.

The offset of data is 4, its size is 4.

Be aware that padding is dependent not only on the processor but also on the compiler.

Driver Interfaces

The kernel expects device drivers to provide certain routines that must perform certain operations; these routines are called entry points. This is similar to the requirement that application programs have a start() entry point or that C applications have the more familiar main() routine.

Entry Points

Each device driver defines a standard set of functions called entry points, which are defined in the Solaris 2.7 Reference Manual. Drivers for different types of devices have different sets of entry points according to the kinds of operations the devices perform. A driver for a memory-mapped character-oriented device, for example, supports a devmap(9E) entry point, while a block driver does not.

Some operations are common to all drivers, such as the functions that are required for module loading (_init(9E), _info(9E), and _fini(9E)), and the required autoconfiguration entry points attach(9E) and getinfo(9E). Drivers may also support the optional autoconfiguration entry points for probe(9E) and detach(9E). Most drivers have open(9E) and close(9E) entry points to control access to their devices.

Traditionally, all driver function and variable names have some prefix added to them. Usually, this is the name of the driver, such as xxopen() for the open(9E) routine of driver xx. In subsequent examples, xx is used as the driver prefix.

Note -

In the SunOS 5.7 system, only the loadable module routines must be visible outside the driver object module. Other routines can have the storage class static.

Loadable Module Routines

	int _init(void);
 	int _info(struct modinfo *modinfop);
 	int _fini(void);

All drivers must implement the _init(9E), _fini(9E) and _info(9E) entry points to load, unload and report information about the driver module. The driver is single-threaded when the kernel calls _init(9E). No other thread will enter a driver routine until mod_install(9F) returns success.

The driver should allocate and initialize any global resources in _init(9E) before calling mod_install(9F) and release global resources in _fini(9E) after mod_remove(9F) returns success.

Note -

Drivers must use these names, and they must not be declared static, unlike the other entry points where the names and storage classes are determined by the driver.

Autoconfiguration Entry Points

	static int xxprobe(dev_info_t *dip);
 	static int xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd);
 	static int xxdetach(dev_info_t *dip, ddi_detach_cmd_t cmd);
 	static int xxgetinfo(dev_info_t *dip,ddi_info_cmd_t infocmd,
 					void *arg, void **result);

Any per-device resources should be allocated in attach(9E) and released in detach(9E). No resources global to the driver should be allocated in attach(9E). For information on autoconfiguration entry points, see Chapter 5, Autoconfiguration.

Block Driver Entry Points

	int xxopen(dev_t *devp, int flag, int otyp, cred_t *credp);
 	int xxclose(dev_t dev, int flag, int otyp, cred_t *credp);
 	int xxstrategy(struct buf *bp);
 	int xxprint(dev_t dev, char *str);
 	int xxdump(dev_t dev, caddr_t addr, daddr_t blkno, int nblk);
 	int xxprop_op(dev_t dev, dev_info_t *dip, ddi_prop_op_t prop_op,
 				   int mod_flags, char *name, caddr_t valuep,
 				   int *length);

For information on block driver entry points, see Chapter 10, Drivers for Block Devices.

Character Driver Entry Points

	int xxopen(dev_t *devp, int flag, int otyp, cred_t *credp);
 	int xxclose(dev_t dev, int flag, int otyp, cred_t *credp);
 	int xxread(dev_t dev, struct uio *uiop, cred_t *credp);
 	int xxwrite(dev_t dev, struct uio *uiop, cred_t *credp);
 	int xxioctl(dev_t dev, int cmd, intptr_t arg, int mode,
 				cred_t *credp, int *rvalp);
 	int xxdevmap(dev_t dev, devmap_cookie_t dhp, offset_t off,
 				size_t len, size_t *maplen, uint_t model);
 	int xxmmap(dev_t dev, off_t off, int prot);
 	int xxsegmap(dev_t dev, off_t off, struct as *asp,
 				caddr_t *addrp, off_t len, unsigned int prot,
 				unsigned int maxprot, unsigned int flags,
 				cred_t *credp);
 	int xxchpoll(dev_t dev, short events, int anyyet,
 				short *reventsp, struct pollhead **phpp);
 	int xxprop_op(dev_t dev, dev_info_t *dip,
 				ddi_prop_op_t prop_op, int mod_flags,
 				char *name, caddr_t valuep, int *length);
 	int xxaread(dev_t dev, struct aio_req *aio, cred_t *credp);
 	int xxawrite(dev_t dev, struct aio_req *aio, cred_t *credp);

For information on character driver entry points, see Chapter 9, Drivers for Character Devices.

Power Management Entry Point

	int xxpower(dev_info_t *dip, int component, int level);

Drivers for hardware devices that provide Power Management functionality may support the optional power(9E) entry point. See Chapter 8, Power Managementfor details about this entry point.

Driver Structure Overview

Figure 3-1 shows data structures and routines that may define the structure of a character or block device driver. Such drivers typically include a device loadable driver section, device configuration section, and device access section.

Figure 3-1 Device Driver Roadmap

Note -

The first two sections in Figure 3-1 are discussed in Chapter 5, Autoconfiguration; the third section is discussed in Chapter 9, Drivers for Character Devices and Chapter 10, Drivers for Block Devices.

Callback Functions

Some routines provide a callback mechanism. This is a way to schedule a function to be called when a condition is met. Typical conditions for which callback functions are set up include:

When a transfer has completed
When a resource might become available
When a time-out period has expired

Transfer completion callbacks perform the tasks usually done in an interrupt service routine.

In some sense, callback functions are similar to entry points. The functions that allow callbacks expect the callback function to perform certain tasks. In the case of DMA routines, a callback function must return a value indicating whether the callback function needs to be rescheduled in case of a failure.

Callback functions execute as a separate interrupt thread and must handle all the usual multithreading issues.

Note -

A driver must cancel all scheduled callback functions before detaching a device.

Interrupt Handling

The Solaris 7 DDI/DKI addresses these aspects of device interrupt handling:

Registering device interrupts with the system
Removing device interrupts from the system

Interrupt information is contained in a property called interrupts (or intr on x86 platforms, see isa(4)), which is either provided by the PROM of a self-identifying device, in a hardware configuration file, or by the booting system on the x86 platform. See sbus(4), vme(4), pci(4), eisa(4), isa(4), mca(4), and "Properties" for more information.

Because the internal implementation of interrupts is an architectural detail, special interrupt cookies are used to enable drivers to perform interrupt-related tasks. The types of cookies for interrupts are:

Device-interrupt cookies
Block-interrupt cookies

Device-Interrupt Cookies

Defined as type ddi_idevice_cookie_t, this cookie is a data structure containing information used by a driver to program the interrupt-request level (or the equivalent) for a programmable device. See ddi_add_intr(9F), ddi_idevice_cookie(9S), and "Registering Interrupts" for more information.

Block-Interrupt Cookies

Defined as type ddi_iblock_cookie_t this cookie is used by a driver to initialize the mutual exclusion locks it uses to protect data. This cookie should not be interpreted by the driver in any way. For more information on ddi_get_iblock_cookie(9F), see "Interrupt Block Cookies".

Driver Context

The driver context determines which kernel routines the driver is permitted to call. For example, in kernel context the driver must not call copyin(9F). There are four contexts in which driver code executes:

User context - A driver entry point has user context if it was directly invoked because of a user thread. For example, the read(9E) entry point of the driver, invoked by a read(2) system call, has user context.
Kernel context - A driver function has kernel context if it was invoked by some other part of the kernel. In a block device driver, the strategy(9E) entry point may be called by the pageout daemon to write pages to the device. Because the page daemon has no relation to the current user thread, strategy(9E) has kernel context in this case.
Interrupt context - Interrupt context is a more restrictive form of kernel context. Driver interrupt routines operate in interrupt context and have an interrupt level associated with them. See Chapter 6, Interrupt Handlers for more information.
High-level interrupt context - High-level interrupt context is a more restricted form of interrupt context. If ddi_intr_hilevel(9F) indicates that an interrupt is high level, the driver interrupt handler will run in high-level interrupt context. See "Handling High-Level Interrupts" for more information.

The manual pages in section 9F document the allowable contexts for each function.

Printing Messages

Device drivers do not usually print messages. Instead, the driver entry points should return error codes so that the application can determine how to handle the error. If the driver really needs to print a message, it can use cmn_err(9F) to do so. This is similar to the C function printf(3S), but only prints to the console, to the message buffer displayed by dmesg(1M), or both.

void cmn_err(int level, char *format, ...);

format is similar to the printf(3S) format string, with the addition of the format %b, which prints bit fields. level indicates which label will be printed, as shown in Table 3-1.

Table 3-1 cmn_err() Messages


Level	Message
`CE_NOTE`	NOTICE: format\n
`CE_WARN`	WARNING:format\n
`CE_CONT`	format
`CE_PANIC`	panic: format\n

CE_PANIC has the side effect of crashing the system. This level should only be used if the system is in such an unstable state that to continue would cause more problems. It can also be used to get a system core dump when debugging.

The first character of the format string is treated specially. See cmn_err(9F) for more details.

Dynamic Memory Allocation

Device drivers must be prepared to simultaneously handle all attached devices that they claim to drive. There should be no driver limit on the number of devices that the driver handles, and all per-device information must be dynamically allocated.

void *kmem_alloc(size_t size, int flag);

The standard kernel memory allocation routine is kmem_alloc(9F). It is similar to the C library routine malloc(3C), with the addition of the flag argument. The flag argument can be either KM_SLEEP or KM_NOSLEEP, indicating whether the caller is willing to block if the requested size is not available. If KM_NOSLEEP is set, and memory is not available, kmem_alloc(9F) returns NULL.

kmem_zalloc(9F) is similar to kmem_alloc(9F), but also clears the contents of the allocated memory.

Note -

Kernel memory is a limited resource, not pageable, and competes with user applications and the rest of the kernel for physical memory. Drivers that allocate a large amount of kernel memory may cause system performance to degrade.

void kmem_free(void *cp, size_t size);

Memory allocated by kmem_alloc(9F) or by kmem_zalloc(9F) is returned to the system with kmem_free(9F). This is similar to the C library routine free(3C), with the addition of the size argument. Drivers must keep track of the size of each object they allocate in order to call kmem_free(9F) later.

Software State Management

Software State Structure

For each device that the driver handles, the driver must keep some state information. At a minimum, this consists of a pointer to the dev_info node for the device (required by getinfo(9E)). The driver can define a structure that contains all the information needed about a single device:

	struct xxstate {
 		dev_info_t			*dip;
 	};

This structure will grow as the device driver evolves. Additional useful fields might be a pointer to each of the device's mapped registers, or flags such as busy or suspended. The initial state structure the examples in this book use is given in Example 3-3.

Example 3-3 Initial State Structure

struct xxstate {
		dev_info_t								*dip;
		struct device_reg								*regp;
		int 								xx_busy;					
		struct xx_saved_device_state									device_state;
};

Subsequent chapters in this manual may require that new fields be added to the state structure. Each chapter will list any additions.

Software State Management Routines

To assist device driver writers in allocating state structures, the Solaris 7 DDI/DKI provides a set of memory management routines called the software state management routines (also known as the soft state routines). These routines dynamically allocate, retrieve, and destroy memory items of a specified size, and hide all the details of list management in a multithreaded kernel. An item number is used to identify the desired memory item; this number can be (and usually is) the instance number assigned by the system.

The driver must provide a state pointer, which is used by the soft state system to create the list of memory items:

	static void *statep;

Routines are provided to:

Initialize the provided state pointer - ddi_soft_state_init(9F)
Allocate space for a certain item - ddi_soft_state_zalloc(9F)
Retrieve a pointer to the indicated item - ddi_get_soft_state(9F)
Free the memory item - ddi_soft_state_free(9F)
Finish using the state pointer - ddi_soft_state_fini(9F)

When the module is loaded, the driver calls ddi_soft_state_init(9F) to initialize the driver state pointer, passing a hint indicating how many items to pre-allocate. If more items are needed, the driver will allocate them as necessary. The driver must call ddi_soft_state_fini(9F) when the driver is unloaded.

To allocate an instance of the soft state structure, the driver calls ddi_soft_state_zalloc(9F). Once the item is allocated, the driver calls ddi_get_soft_state(9F) to retrieve the pointer to the allocated structure. This is usually done when the device is attached. When the device is detached, the driver calls ddi_soft_state_free(9F) to free the memory.

See "Loadable Driver Interface" for an example use of these routines.

Properties

Properties define arbitrary characteristics of the device or device driver. Properties may be defined by the FCode of a self-identifying device, by a hardware configuration file (see driver.conf(4)), or by the driver itself using the ddi_prop_update(9F) family of routines.

A property is a name-value pair. The name is a string that identifies the property with an associated value. Examples of properties are the height and width of a frame buffer, the number of blocks in a partition of a block device, or the name of a device. The value of a property can be one of five types:

A byte array that has an arbitrary length and whose value is a series of bytes
An integer property whose value is an integer
An integer array property whose value is an array of integers
A string property whose value is a NULL-terminated string
A string array property whose value is a list of NULL-terminated strings

A property that has no value is known as a Boolean property. It is considered to be true if it exists and false if it doesn't exist.

Note -

Strictly speaking, DDI/DKI software property names are not restricted in any way; however, there are certain recommended uses. As defined in IEEE 1275-1994 (the Standard for Boot Firmware), a property "is a human readable text string consisting of one to thirty-one printable characters. Property names shall not contain upper case characters or the characters "/", "\", ":", "[", "]" and "@". Property names beginning with the character "+" are reserved for use by future revisions of IEEE 1275-1994." By convention, underscores are not used in property names; use a hyphen (-) instead. Also by convention, property names ending with the question mark character (auto-boot?) contain values that are strings, typically true or false.

A driver can request a property from its parent, which in turn might ask its parent. The driver can control whether the request can go higher than its parent.

For example, the "esp" driver maintains an integer property for each target called target x-sync-speed, where "x" is the target number. The prtconf(1M) command in its verbose mode displays driver properties. The following example shows a partial listing for the "esp" driver.

test% prtconf -v
...
       esp, instance #0
            Driver software properties:
                name <target2-sync-speed> length <4>
                    value <0x00000fa0>.
...

Table 3-2 provides information on the property interfaces.

Table 3-2 Property Interface Uses


Family	Property Interfaces	Description
ddi_prop_lookup	ddi_prop_exists(9F)	Looks up property and returns success if one exists. Returns failure if one does not exist.
	ddi_prop_get_int(9F)	Looks up and returns an integer property.
	ddi_prop_lookup_int_array(9F)	Looks up and returns an integer array property.
	ddi_prop_lookup_string(9F)	Looks up and returns a string property.
	ddi_prop_lookup_string_array(9F)	Looks up and returns a string array property.
	ddi_prop_lookup_byte_array(9F)	Looks up and returns a byte array property.
ddi_prop_update	ddi_prop_update_int(9F)	Updates an integer property.
	ddi_prop_update_int_array(9F)	Updates an integer array property.
	ddi_prop_update_string(9F)	Updates a string property.
	ddi_prop_update_string_array(9F)	Updates an string array property.
	ddi_prop_update_byte_array(9F)	Updates a byte array property.
ddi_prop_remove	ddi_prop_remove(9F)	Removes a property.
	ddi_prop_remove_all(9F)	Removes all properties associated with a device.

prop_op(9E)

The prop_op(9E) entry point reports the values of device properties to the system. In many cases, the ddi_prop_op(9F) routine may be used as the driver's prop_op(9E) entry point in the cb_ops(9S) structure. ddi_prop_op(9F) performs all of the required processing and is sufficient for drivers that do not need to perform any special processing when handling a device property request.

However, there are cases when it is necessary for the driver to provide a prop_op(9E) entry point. For example, if a driver maintains a property whose value changes frequently, updating the property with ddi_prop_update(9F) each time it changes may not be efficient. Instead, the driver can maintain a local copy of the property in a C variable. The driver updates the C variable when the value of the property changes and does not call one of the ddi_prop_update(9F) routines. In this case, the prop_op(9E) entry point would need to intercept requests for this property and call one of the ddi_prop_update(9F) routines to update the value of the property before passing the request to ddi_prop_op(9F) to process the property request. See Example 3-4.

Here is the prop_op(9E) prototype:

int xxprop_op(dev_t dev, dev_info_t *dip,
 	ddi_prop_op_t  prop_op, int flags, char *name,
 	caddr_t valuep, int *lengthp);

Example 3-4 shows a simple implementation of the prop_op(9E) routine. This routine intercepts property requests and then uses the existing software property routines to update property values. For a complete description of all the parameters to (9E), see the manual page.

In Example 3-4, prop_op(9E) intercepts requests for the temperature property. The driver updates a variable in the state structure whenever the property changes but only updates the property when a request is made. It then uses the system routine ddi_prop_op(9F) to process the property request. If the property request is not specific to a device, the driver does not intercept the request. This is indicated when the value of the dev parameter is equal to DDI_DEV_T_ANY (the wildcard device number).

This example adds the following field to the state structure. See "Software State Structure" for more information.

	int		temperature; /* current device temperature */

Example 3-4 prop_op(9E) Routine

static int
xxprop_op(dev_t dev, dev_info_t *dip, ddi_prop_op_t prop_op,
	int flags, char *name, caddr_t valuep, int *lengthp)
{
	minor_t instance;
	struct xxstate *xsp;
	if (dev != DDI_DEV_T_ANY) {
			return (ddi_prop_op(dev, dip, prop_op, flags, name,
				valuep, lengthp));
	}

	instance = getminor(dev);
	xsp = ddi_get_soft_state(statep, instance);
	if (xsp == NULL)
			return (DDI_PROP_NOTFOUND);
	if (strcmp(name, "temperature") == 0) {
			ddi_prop_update_int(dev, dip, name, temperature);
	}

/* other cases */	
}

Driver Layout

Driver code is usually divided into the following files:

Headers (.h files)
Source files (.c files)
Optional configuration files (driver.conf file)

Note -

These files represent a typical driver layout. They are not absolutely required for a driver, as only the final object module matters to the system.

Header Files

Header files define data structures specific to the device (such as a structure representing the device registers), data structures defined by the driver for maintaining state information, defined constants (such as those representing the bits of the device registers), and macros (such as those defining the static mapping between the minor device number and the instance number).

Some of this information, such as the state structure, may only be needed by the device driver. This information should go in private headers. These header files are included only by the device driver itself.

Any information that an application might require, such as the I/O control commands, should be in public header files. These are included by the driver and any applications that need information about the device.

There is no standard for naming private and public files. One possible convention is to name the private header file xximpl.h and the public header file xxio.h. See Appendix E, Driver Code Layout Structure , for more information.

Source Files

A.c file for a device driver contains the data declarations and the code for the entry points of the driver. It contains the #include statements the driver needs, declares extern references, declares local data, sets up the cb_ops and dev_ops structures, declares and initializes the module configuration section, makes any other necessary declarations, and defines the driver entry points. See Appendix E, Driver Code Layout Structure , for more information.

Configuration Files

See driver.conf(4), sbus(4), pci(4). isa(4), and vme(4).

64-Bit-Safe Device Drivers

The Solaris system can run in 64-bit mode on appropriate hardware and provides a 64-bit kernel with a 64-bit address space for applications. To update a device driver to be 64-bit ready, driver writers need to understand the 32-bit and 64-bit C data type models, know how to use the system derived types and the fundamental C data types, and understand specific driver issues, such as how to enable a 64-bit driver and a 32-bit application to share data structures.

For details on making a device driver ready for a 64-bit environment, see Appendix F, Making a Device Driver 64-Bit Ready .

C Language and Compiler Modes

The Sun WorkShop^TM Compiler C version 4.2 provides ANSI C compilers for the Solaris environment. It supports several compilation modes, a number of useful keywords, and function prototypes.

Compiler Modes

Note the following compiler modes.

-Xa (ANSI C Mode)

This mode accepts ANSI C and Sun C compatibility extensions. In case of a conflict between ANSI and Sun C, the compiler issues a warning and uses ANSI C interpretations. This is the default mode.

-Xt (Transition Mode)

This mode accepts ANSI C and Sun C compatibility extensions. In case of a conflict between ANSI and Sun C, a warning is issued and Sun C semantics are used.

Function Prototypes

Function prototypes specify the following information to the compiler:

The type returned by the function
The number of the arguments to the function
The type of each argument

Example 3-5 Function Prototypes

static int
xxgetinfo(dev_info_t *dip, ddi_info_cmd_t cmd, void *arg,
   void **result)
{
	/* definition */
}
static int
xxopen(dev_t *devp, int flag, int otyp, cred_t *credp)
{
	/* definition */
}

This allows the compiler to do more type checking and also to promote the types of the parameters to the type expected by the function. For example, if the compiler knows that a function takes a pointer, casting NULL to that pointer type is no longer necessary. Prototypes are provided for all Solaris 7 DDI/DKI functions, provided the driver includes the proper header file (documented in the manual page for the function).

Keywords

ANSI C provides the following driver-related keywords.

`const`

The const keyword can be used to define constants instead of using #define:

	const int			count=5;

However, it is most useful when combined with function prototypes. Routines that should not be modifying parameters can define the parameters as constants, and the compiler will then give errors if the parameter is modified. Because C passes parameters by value, most parameters don't need to be declared as constants. If the parameter is a pointer, though, it can be declared to point to a constant object:

	int strlen(const char *s)
 	{
 		...
 	}

Any attempt to change the string by strlen() is an error, and the compiler will catch the error.

`volatile`

The correct use of volatile is necessary to prevent elusive bugs. It instructs the compiler to use exact semantics for the declared objects--in particular, to not optimize away or reorder accesses to the object. There are two instances where device drivers must use the volatile qualifier:

When data refers to an external hardware device register (memory that has side effects other than just storage). Note, however, that if the DDI data access functions are used to access device registers, it is not necessary to use volatile.
When data refers to global memory that is accessible by more than one thread, is not protected by locks, and therefore is relying on the sequencing of memory accesses

In general, drivers should not qualify a variable as volatile if it is merely accessible by more than one thread and protected from conflicting access by synchronization routines.

The following example uses volatile. A busy flag is used to prevent a thread from continuing while the device is busy and the flag is not protected by a lock:
```
	while (busy) {
  		/* do something else */
  	}
```
The testing thread will continue when another thread turns off the busy flag:
```
	busy = 0;
```
However, since busy is accessed frequently in the testing thread, the compiler may optimize the test by placing the value of busy in a register, then test the contents of the register without reading the value of busy in memory before every test. The testing thread would never see busy change and the other thread would only change the value of busy in memory, resulting in deadlock. Declaring the busy flag as volatile forces its value to be read before each test.

Note -
It would probably be preferable to use a condition variable mutex, discussed under "Condition Variables" rather than the busy flag in this example.

It is also recommended that the volatile qualifier be used in such a way as to avoid the risk of accidental omission. For example, this code
```
	struct device_reg {
 		volatile uint8_t csr;
 		volatile uint8_t data;
 	};
 	struct device_reg *regp;
```
is recommended over:
```
	struct device_reg {
 		uint8_t csr;
 		uint8_t data;
 	};
 	volatile struct device_reg *regp;
```
Although the two examples are functionally equivalent, the second one requires the writer to ensure that volatile is used in every declaration of type struct device_reg. The first example results in the data being treated as volatile in all declarations and is therefore preferred. Note as mentioned above, that the use of the DDI data access functions to access device registers makes it unnecessary to qualify variables as volatile.