Writing Device Drivers

Chapter 5 Managing Events and Queueing Tasks

Drivers use events to respond to state changes. This chapter provides the following information on events:

Drivers use task queues to manage resource dependencies between tasks. This chapter provides the following information about task queues:

Managing Events

A system often needs to respond to a condition change such as a user action or system request. For example, a device might issue a warning when a component begins to overheat, or might start a movie player when a DVD is inserted into a drive. Device drivers can use a special message called an event to inform the system that a change in state has taken place.

Introduction to Events

An event is a message that a device driver sends to interested entities to indicate that a change of state has taken place. Events are implemented in the Oracle Solaris OS as user-defined, name-value pair structures that are managed using the nvlist* functions. (See the nvlist_alloc(9F) man page. Events are organized by vendor, class, and subclass. For example, you could define a class for monitoring environmental conditions. An environmental class could have subclasses to indicate changes in temperature, fan status, and power.

When a change in state occurs, the device notifies the driver. The driver then uses the ddi_log_sysevent(9F) function to log this event in a queue called sysevent. The sysevent queue passes events to the user level for handling by either the syseventd daemon or syseventconfd daemon. These daemons send notifications to any applications that have subscribed for notification of the specified event.

Two methods for designers of user-level applications deal with events:

This process is illustrated in the following figure.

Figure 5–1 Event Plumbing

Diagram shows how events are logged into the sysevent
queue for notification of user-level applications.

Using ddi_log_sysevent() to Log Events

Device drivers use the ddi_log_sysevent(9F) interface to generate and log events with the system.

ddi_log_sysevent() Syntax

ddi_log_sysevent() uses the following syntax:

int ddi_log_sysevent(dev_info_t *dip, char *vendor, char *class, 
    char *subclass, nvlist_t *attr-list, sysevent_id_t *eidp, int sleep-flag);

where:

dip

A pointer to the dev_info node for this driver.

vendor

A pointer to a string that defines the driver's vendor. Third-party drivers should use their company's stock symbol or a similarly enduring identifier. Oracle-supplied drivers use DDI_VENDOR_SUNW.

class

A pointer to a string defining the event's class. class is a driver-specific value. An example of a class might be a string that represents a set of environmental conditions that affect a device. This value must be understood by the event consumer.

subclass

A driver-specific string that represents a subset of the class argument. For example, within a class that represents environmental conditions, an event subclass might refer to the device's temperature. This value must be intelligible to the event consumer.

attr-list

A pointer to an nvlist_t structure that lists name-value attributes associated with the event. Name-value attributes are driver-defined and can refer to a specific attribute or condition of the device.

For example, consider a device that reads both CD-ROMs and DVDs. That device could have an attribute with the name disc_type and the value equal to either cd_rom or dvd.

As with class and subclass, an event consumer must be able to interpret the name-value pairs.

For more information on name-value pairs and the nvlist_t structure, see Defining Event Attributes, as well as the nvlist_alloc(9F) man page.

If the event has no attributes, then this argument should be set to NULL.

eidp

The address of a sysevent_id_t structure. The sysevent_id_t structure is used to provide a unique identification for the event. ddi_log_sysevent(9F) returns this structure with a system-provided event sequence number and time stamp. See the ddi_log_sysevent(9F) man page for more information on the sysevent_id_t structure.

sleep-flag

A flag that indicates how the caller wants to handle the possibility of resources not being available. If sleep-flag is set to DDI_SLEEP, the driver blocks until the resources become available. With DDI_NOSLEEP, an allocation will not sleep and cannot be guaranteed to succeed. If DDI_ENOMEM is returned, the driver would need to retry the operation at a later time.

Even with DDI_SLEEP, other error returns are possible with this interface, such as system busy, the syseventd daemon not responding, or trying to log an event in interrupt context.

Sample Code for Logging Events

A device driver performs the following tasks to log events:

The following example demonstrates how to use ddi_log_sysevent().


Example 5–1 Calling ddi_log_sysevent()

char *vendor_name = "DDI_VENDOR_JGJG"
char *my_class = "JGJG_event";
char *my_subclass = "JGJG_alert";
nvlist_t *nvl;
/* ... */
nvlist_alloc(&nvl, nvflag, kmflag);
/* ... */
(void) nvlist_add_byte_array(nvl, propname, (uchar_t *)propval, proplen + 1); 
/* ... */
if (ddi_log_sysevent(dip, vendor_name, my_class, 
    my_subclass, nvl, NULL, DDI_SLEEP)!= DDI_SUCCESS)
    cmn_err(CE_WARN, "error logging system event"); 
nvlist_free(nvl);

Defining Event Attributes

Event attributes are defined as a list of name-value pairs. The Oracle Solaris DDI provides routines and structures for storing information in name-value pairs. Name-value pairs are retained in an nvlist_t structure, which is opaque to the driver. The value for a name-value pair can be a Boolean, an int, a byte, a string, an nvlist, or an array of these data types. An int can be defined as 16 bits, 32 bits, or 64 bits and can be signed or unsigned.

The steps in creating a list of name-value pairs are as follows.

  1. Create an nvlist_t structure with nvlist_alloc(9F).

    The nvlist_alloc() interface takes three arguments:

    • nvlp – Pointer to a pointer to an nvlist_t structure

    • nvflag – Flag to indicate the uniqueness of the names of the pairs. If this flag is set to NV_UNIQUE_NAME_TYPE, any existing pair that matches the name and type of a new pair is removed from the list. If the flag is set to NV_UNIQUE_NAME, then any existing pair with a duplicate name is removed, regardless of its type. Specifying NV_UNIQUE_NAME_TYPE allows a list to contain two or more pairs with the same name as long as their types are different, whereas with NV_UNIQUE_NAME only one instance of a pair name can be in the list. If the flag is not set, then no uniqueness checking is done and the consumer of the list is responsible for dealing with duplicates.

    • kmflag – Flag to indicate the allocation policy for kernel memory. If this argument is set to KM_SLEEP, then the driver blocks until the requested memory is available for allocation. KM_SLEEP allocations might sleep but are guaranteed to succeed. KM_NOSLEEP allocations are guaranteed not to sleep but might return NULL if no memory is currently available.

  2. Populate the nvlist with name-value pairs. For example, to add a string, use nvlist_add_string(9F). To add an array of 32-bit integers, use nvlist_add_int32_array(9F). The nvlist_add_boolean(9F) man page contains a complete list of interfaces for adding pairs.

To deallocate a list, use nvlist_free(9F).

The following code sample illustrates the creation of a name-value list.


Example 5–2 Creating and Populating a Name-Value Pair List

nvlist_t*
create_nvlist()
    {
    int err;
    char *str = "child";
    int32_t ints[] = {0, 1, 2};
    nvlist_t *nvl;

    err = nvlist_alloc(&nvl, NV_UNIQUE_NAME, 0);    /* allocate list */
    if (err)
        return (NULL);
    if ((nvlist_add_string(nvl, "name", str) != 0) ||
        (nvlist_add_int32_array(nvl, "prop", ints, 3) != 0)) {
        nvlist_free(nvl);
        return (NULL);
    }
    return (nvl);
}

Drivers can retrieve the elements of an nvlist by using a lookup function for that type, such as nvlist_lookup_int32_array(9F), which takes as an argument the name of the pair to be searched for.


Note –

These interfaces work only if either NV_UNIQUE_NAME or NV_UNIQUE_NAME_TYPE is specified when nvlist_alloc(9F) is called. Otherwise, ENOTSUP is returned, because the list cannot contain multiple pairs with the same name.


A list of name-value list pairs can be placed in contiguous memory. This approach is useful for passing the list to an entity that has subscribed for notification. The first step is to get the size of the memory block that is needed for the list with nvlist_size(9F). The next step is to pack the list into the buffer with nvlist_pack(9F). The consumer receiving the buffer's content can unpack the buffer with nvlist_unpack(9F).

The functions for manipulating name-value pairs are available to both user-level and kernel-level developers. You can find identical man pages for these functions in both man pages section 3: Library Interfaces and Headers and in man pages section 9: DDI and DKI Kernel Functions. For a list of functions that operate on name-value pairs, see the following table.

Table 5–1 Functions for Using Name-Value Pairs

Man Page 

Purpose / Functions 

nvlist_add_boolean(9F)

Add name-value pairs to the list. Functions include: 

nvlist_add_boolean(), nvlist_add_boolean_value(), nvlist_add_byte(), nvlist_add_int8(), nvlist_add_uint8(), nvlist_add_int16(), nvlist_add_uint16(), nvlist_add_int32(), nvlist_add_uint32(), nvlist_add_int64(), nvlist_add_uint64(), nvlist_add_string(), nvlist_add_nvlist(), nvlist_add_nvpair(), nvlist_add_boolean_array(), nvlist_add_int8_array, nvlist_add_uint8_array(), nvlist_add_nvlist_array(), nvlist_add_byte_array(), nvlist_add_int16_array(), nvlist_add_uint16_array(), nvlist_add_int32_array(), nvlist_add_uint32_array(), nvlist_add_int64_array(), nvlist_add_uint64_array(), nvlist_add_string_array()

nvlist_alloc(9F)

Manipulate the name-value list buffer. Functions include: 

nvlist_alloc(), nvlist_free(), nvlist_size(), nvlist_pack(), nvlist_unpack(), nvlist_dup(), nvlist_merge()

nvlist_lookup_boolean(9F)

Search for name-value pairs. Functions include: 

nvlist_lookup_boolean(), nvlist_lookup_boolean_value(), nvlist_lookup_byte(), nvlist_lookup_int8(), nvlist_lookup_int16(), nvlist_lookup_int32(), nvlist_lookup_int64(), nvlist_lookup_uint8(), nvlist_lookup_uint16(), nvlist_lookup_uint32(), nvlist_lookup_uint64(), nvlist_lookup_string(), nvlist_lookup_nvlist(), nvlist_lookup_boolean_array, nvlist_lookup_byte_array(), nvlist_lookup_int8_array(), nvlist_lookup_int16_array(), nvlist_lookup_int32_array(), nvlist_lookup_int64_array(), nvlist_lookup_uint8_array(), nvlist_lookup_uint16_array(), nvlist_lookup_uint32_array(), nvlist_lookup_uint64_array(), nvlist_lookup_string_array(), nvlist_lookup_nvlist_array(), nvlist_lookup_pairs()

nvlist_next_nvpair(9F)

Get name-value pair data. Functions include: 

nvlist_next_nvpair(), nvpair_name(), nvpair_type()

nvlist_remove(9F)

Remove name-value pairs. Functions include: 

nv_remove(), nv_remove_all()

Queueing Tasks

This section discusses how to use task queues to postpone processing of some tasks and delegate their execution to another kernel thread.

Introduction to Task Queues

A common operation in kernel programming is to schedule a task to be performed at a later time, by a different thread. The following examples give some reasons that you might want a different thread to perform a task at a later time:

In each of these cases, a task is executed in a different context. A different context is usually a different kernel thread with a different set of locks held and possibly a different priority. Task queues provide a generic kernel API for scheduling asynchronous tasks.

A task queue is a list of tasks with one or more threads to service the list. If a task queue has a single service thread, all tasks are guaranteed to execute in the order in which they are added to the list. If a task queue has more than one service thread, the order in which the tasks will execute is not known.


Note –

If the task queue has more than one service thread, make sure that the execution of one task does not depend on the execution of any other task. Dependencies between tasks can cause a deadlock to occur.


Task Queue Interfaces

The following DDI interfaces manage task queues. These interfaces are defined in the sys/sunddi.h header file. See the taskq(9F) man page for more information about these interfaces.

ddi_taskq_t

Opaque handle 

TASKQ_DEFAULTPRI

System default priority 

DDI_SLEEP

Can block for memory 

DDI_NOSLEEP

Cannot block for memory 

ddi_taskq_create()

Create a task queue 

ddi_taskq_destroy()

Destroy a task queue 

ddi_taskq_dispatch()

Add a task to a task queue 

ddi_taskq_wait()

Wait for pending tasks to complete 

ddi_taskq_suspend()

Suspend a task queue 

ddi_taskq_suspended()

Check whether a task queue is suspended 

ddi_taskq_resume()

Resume a suspended task queue 

Observing Task Queues

The typical usage in drivers is to create task queues at attach(9E). Most taskq_dispatch() invocations are from interrupt context.

This section describes two techniques that you can use to monitor the system resources that are consumed by a task queue. Task queues export statistics on the use of system time by task queue threads. Task queues also use DTrace SDT probes to determine when a task queue starts and finishes execution of a task.

Task Queue Kernel Statistics Counters

Every task queue has an associated set of kstat counters. Examine the output of the following kstat(1M) command:


$ kstat -c taskq
module: unix                            instance: 0     
name:   ata_nexus_enum_tq               class:    taskq
        crtime                          53.877907833
        executed                        0
        maxtasks                        0
        nactive                         1
        nalloc                          0
        priority                        60
        snaptime                        258059.249256749
        tasks                           0
        threads                         1
        totaltime                       0

module: unix                            instance: 0     
name:   callout_taskq                   class:    taskq
        crtime                          0
        executed                        13956358
        maxtasks                        4
        nactive                         4
        nalloc                          0
        priority                        99
        snaptime                        258059.24981709
        tasks                           13956358
        threads                         2
        totaltime                       120247890619

The kstat output shown above includes the following information:

The following example shows how you can use the kstat command to observe how a counter (number of scheduled tasks) increases over time:


$ kstat -p unix:0:callout_taskq:tasks 1 5
unix:0:callout_taskq:tasks      13994642

unix:0:callout_taskq:tasks      13994711

unix:0:callout_taskq:tasks      13994784

unix:0:callout_taskq:tasks      13994855

unix:0:callout_taskq:tasks      13994926

Task Queue DTrace SDT Probes

Task queues provide several useful SDT probes. All the probes described in this section have the following two arguments:

You can use these probes to collect precise timing information about individual task queues and individual tasks being executed through them. For example, the following script prints the functions that were scheduled through task queues for every 10 seconds:


# !/usr/sbin/dtrace -qs

sdt:genunix::taskq-enqueue
{
  this->tq  = (taskq_t *)arg0;
  this->tqe = (taskq_ent_t *) arg1;
  @[this->tq->tq_name,
    this->tq->tq_instance,
    this->tqe->tqent_func] = count();
}

tick-10s
{
  printa ("%s(%d): %a called %@d times\n", @);
  trunc(@);
}

On a particular machine, the above D script produced the following output:


callout_taskq(1): genunix`callout_execute called 51 times
callout_taskq(0): genunix`callout_execute called 701 times
kmem_taskq(0): genunix`kmem_update_timeout called 1 times
kmem_taskq(0): genunix`kmem_hash_rescale called 4 times
callout_taskq(1): genunix`callout_execute called 40 times
USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 256 times
callout_taskq(0): genunix`callout_execute called 702 times
kmem_taskq(0): genunix`kmem_update_timeout called 1 times
kmem_taskq(0): genunix`kmem_hash_rescale called 4 times
callout_taskq(1): genunix`callout_execute called 28 times
USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 228 times
callout_taskq(0): genunix`callout_execute called 706 times
callout_taskq(1): genunix`callout_execute called 24 times
USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 141 times
callout_taskq(0): genunix`callout_execute called 708 times