JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Writing Device Drivers     Oracle Solaris 11.1 Information Library
search filter icon
search icon

Document Information

Preface

Part I Designing Device Drivers for the Oracle Solaris Platform

1.  Overview of Oracle Solaris Device Drivers

2.  Oracle Solaris Kernel and Device Tree

3.  Multithreading

4.  Properties

5.  Managing Events and Queueing Tasks

6.  Driver Autoconfiguration

7.  Device Access: Programmed I/O

8.  Interrupt Handlers

9.  Direct Memory Access (DMA)

10.  Mapping Device and Kernel Memory

11.  Device Context Management

12.  Power Management

Power Management Framework

Device Power Management

System Power Management

Device Power Management Model

Power Management Components

Multiple Power Management Components

Power Management States

Power Levels

Power Management Dependencies

Automatic Power Management for Devices

Device Power Management Interfaces

Busy-Idle State Transitions

Device Power State Transitions

power() Entry Point

System Power Management Model

Autoshutdown Threshold

Busy State

Hardware State

Automatic Power Management for Systems

Entry Points Used by System Power Management

detach() Entry Point

attach() Entry Point

Power Management Device Access Example

Power Management Flow of Control

Changes to Power Management Interfaces

13.  Hardening Oracle Solaris Drivers

14.  Layered Driver Interface (LDI)

Part II Designing Specific Kinds of Device Drivers

15.  Drivers for Character Devices

16.  Drivers for Block Devices

17.  SCSI Target Drivers

18.  SCSI Host Bus Adapter Drivers

19.  Drivers for Network Devices

20.  USB Drivers

21.  SR-IOV Drivers

Part III Building a Device Driver

22.  Compiling, Loading, Packaging, and Testing Drivers

23.  Debugging, Testing, and Tuning Device Drivers

24.  Recommended Coding Practices

Part IV Appendixes

A.  Hardware Overview

B.  Summary of Oracle Solaris DDI/DKI Services

C.  Making a Device Driver 64-Bit Ready

D.  Console Frame Buffer Drivers

E.  pci.conf File

Index

Device Power Management Model

The following sections describe the details of the device power management model. This model includes the following elements:

Power Management Components

A device is power manageable if the power consumption of the device can be reduced when the device is idle. Conceptually, a power-manageable device consists of a number of power-manageable hardware units that are called components.

The device driver notifies the system about device components and their associated power levels. Accordingly, the driver creates a pm-components(9P) property in the driver's attach(9E) entry point as part of driver initialization.

Most devices that are power manageable implement only a single component. An example of a single-component, power-manageable device is a disk whose spindle motor can be stopped to save power when the disk is idle.

If a device has multiple power-manageable units that are separately controllable, the device should implement multiple components.

An example of a two-component, power-manageable device is a frame buffer card with a monitor. Frame buffer electronics is the first component [component 0]. The frame buffer's power consumption can be reduced when not in use. The monitor is the second component [component 1]. The monitor can also enter a lower power mode when the monitor is not in use. The frame buffer electronics and monitor are considered by the system as one device with two components.

Multiple Power Management Components

To the power management framework, all components are considered equal and completely independent of each other. If the component states are not completely compatible, the device driver must ensure that undesirable state combinations do not occur. For example, a frame buffer/monitor card has the following possible states: D0, D1, D2, and D3. The monitor attached to the card has the following potential states: On, Standby, Suspend, and Off. These states are not necessarily compatible with each other. For example, if the monitor is On, then the frame buffer must be at D0, that is, full on. If the frame buffer driver gets a request to power up the monitor to On while the frame buffer is at D3, the driver must call pm_raise_power(9F) to bring the frame buffer up before setting the monitor On. System requests to lower the power of the frame buffer while the monitor is On must be refused by the driver.

Power Management States

Each component of a device can be in one of two states: busy or idle. The device driver notifies the framework of changes in the device state by calling pm_busy_component(9F) and pm_idle_component(9F). When components are initially created, the components are considered idle.

Power Levels

From the pm-components property exported by the device, the Device Power Management framework knows what power levels the device supports. Power-level values must be positive integers. The interpretation of power levels is determined by the device driver writer. Power levels must be listed in monotonically increasing order in the pm-components property. A power level of 0 is interpreted by the framework to mean off. When the framework must power up a device due to a dependency, the framework sets each component at its highest power level.

The following example shows a pm-components entry from the .conf file of a driver that implements a single power-managed component consisting of a disk spindle motor. The disk spindle motor is component 0. The spindle motor supports two power levels. These levels represent “stopped” and “spinning at full speed.”

Example 12-1 Sample pm-component Entry

pm-components="NAME=Spindle Motor", "0=Stopped", "1=Full Speed";

The following example shows how Example 12-1 could be implemented in the attach() routine of the driver.

Example 12-2 attach(9E) Routine With pm-components Property

static char *pmcomps[] = {
    "NAME=Spindle Motor",
    "0=Stopped",
    "1=Full Speed"
};
/* ... */
xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
    /* ... */
    if (ddi_prop_update_string_array(DDI_DEV_T_NONE, dip,
        "pm-components", &pmcomp[0],
        sizeof (pmcomps) / sizeof (char *)) != DDI_PROP_SUCCESS)
        goto failed;
    /* ... */

The following example shows a frame buffer that implements two components. Component 0 is the frame buffer electronics that support four different power levels. Component 1 represents the state of power management of the attached monitor.

Example 12-3 Multiple Component pm-components Entry

pm-components="NAME=Frame Buffer", "0=Off", "1=Suspend", \
    "2=Standby", "3=On",
    "NAME=Monitor", "0=Off", "1=Suspend", "2=Standby", "3=On";

When a device driver is first attached, the framework does not know the power level of the device. A power transition can occur when:

After a power transition, the framework begins tracking the power level of each component of the device. Tracking also occurs if the driver has informed the framework of the power level. The driver informs the framework of a power level change by calling pm_power_has_changed(9F).

The system calculates a default threshold for each potential power transition. These thresholds are based on the system idleness threshold. Another default threshold based on the system idleness threshold is used when the component power level is unknown.

Power Management Dependencies

Some devices should be powered down only when other devices are also powered down. For example, if a CD-ROM drive is allowed to power down, necessary functions, such as the ability to eject a CD, might be lost.

To prevent a device from powering down independently, you can make that device dependent on another device that is likely to remain powered on. Typically, a device is made dependent upon a frame buffer, because a monitor is generally on whenever a user is utilizing a system.

Where dependent-phys-path is the device that is kept powered up, such as the CD-ROM drive. phys-path represents the device whose power state is to be depended on, such as the frame buffer.

The following syntax enables you to indicate dependency in a general fashion:

device-dependency-property property phys-path

Such an entry mandates that any device that exports the property property must be dependent upon the device named by phys-path. Because this dependency applies especially to removable-media devices, /etc/power.conf includes the following line by default:

device_dependent-property  removable-media  /dev/fb

With this syntax, no device that exports the removable-media property can be powered down unless the console frame buffer is also powered down.

For more information, see the removable-media(9P) man page.

Automatic Power Management for Devices

If automatic power management is enabled, then all devices with a pm-components(9P) property automatically will use power management. After a component has been idle for a default period, the component is automatically lowered to the next lowest power level. The default period is calculated by the power management framework to set the entire device to its lowest power state within the system idleness threshold.


Note - By default, automatic power management is enabled on all SPARC desktop systems first shipped after July 1, 1999. This feature is disabled by default for all other systems.


Device Power Management Interfaces

A device driver that supports a device with power-manageable components must create a pm-components(9P) property. This property indicates to the system that the device has power-manageable components. pm-components also tells the system which power levels are available. The driver typically informs the system by calling ddi_prop_update_string_array(9F) from the driver's attach(9E) entry point. An alternative means of informing the system is from a driver.conf(4) file. See the pm-components(9P) man page for details.

Busy-Idle State Transitions

The driver must keep the framework informed of device state transitions from idle to busy or busy to idle. Where these transitions happen is entirely device-specific. The transitions between the busy and idle states depend on the nature of the device and the abstraction represented by the specific component. For example, SCSI disk target drivers typically export a single component, which represents whether the SCSI target disk drive is spun up or not. The component is marked busy whenever an outstanding request to the drive exists. The component is marked idle when the last queued request finishes. Some components are created and never marked busy. For example, components created by pm-components(9P) are created in an idle state.

The pm_busy_component(9F) and pm_idle_component(9F) interfaces notify the power management framework of busy-idle state transitions. The pm_busy_component(9F) call has the following syntax:

int pm_busy_component(dev_info_t *dip, int component);

pm_busy_component(9F) marks component as busy. While the component is busy, that component should not be powered off. If the component is already powered off, then marking that component busy does not change the power level. The driver needs to call pm_raise_power(9F) for this purpose. Calls to pm_busy_component(9F) are cumulative and require a corresponding number of calls to pm_idle_component to idle the component.

The pm_idle_component(9F) routine has the following syntax:

int pm_idle_component(dev_info_t *dip, int component);

pm_idle_component(9F) marks component as idle. An idle component is subject to being powered off. pm_idle_component(9F) must be called once for each call to pm_busy_component(9F) in order to idle the component.

Device Power State Transitions

A device driver can call pm_raise_power(9F) to request that a component be set to at least a given power level. Setting the power level in this manner is necessary before using a component that has been powered off. For example, the read(9E) routine of a SCSI disk target driver might need to spin up the disk, if the disk has been powered off. The pm_raise_power(9F) function requests the power management framework to initiate a device power state transition to a higher power level. Normally, reductions in component power levels are initiated by the framework. However, a device driver should call pm_lower_power(9F) when detaching, in order to reduce the power consumption of unused devices as much as possible.

Powering down can pose risks for some devices. For example, some tape drives damage tapes when power is removed. Similarly, some disk drives have a limited tolerance for power cycles, because each cycle results in a head landing. Use the no-involuntary-power-cycles(9P) property to notify the system that the device driver should control all power cycles for the device. This approach prevents power from being removed from a device while the device driver is detached unless the device was powered off by a driver's call to pm_lower_power(9F) from its detach(9E) entry point.

The pm_raise_power(9F) function is called when the driver discovers that a component needed for some operation is at an insufficient power level. This interface causes the driver to raise the current power level of the component to the needed level. All the devices that depend on this device are also brought back to full power by this call.

Call the pm_lower_power(9F) function when the device is detaching once access to the device is no longer needed. Call pm_lower_power(9F) to set each component at the lowest power so that the device uses as little power as possible while not in use. The pm_lower_power() function must be called from the detach() entry point. The pm_lower_power() function has no effect if it is called from any other part of the driver.

The pm_power_has_changed(9F) function is called to notify the framework about a power transition. The transition might be due to the device changing its own power level. The transition might also be due to an operation such as suspend-resume. The syntax for pm_power_has_changed(9F) is the same as the syntax for pm_raise_power(9F).

power() Entry Point

The power management framework uses the power(9E) entry point.

power() uses the following syntax:

int power(dev_info_t *dip, int component, int level);

When a component's power level needs to be changed, the system calls the power(9E) entry point. The action taken by this entry point is device driver-specific. In the example of the SCSI target disk driver mentioned previously, setting the power level to 0 results in sending a SCSI command to spin down the disk, while setting the power level to the full power level results in sending a SCSI command to spin up the disk.

If a power transition can cause the dfevice to lose state, the driver must save any necessary state in memory for later restoration. If a power transition requires the saved state to be restored before the device can be used again, then the driver must restore that state. The framework makes no assumptions about what power transactions cause the loss of state or require the restoration of state for automatically power-managed devices. The following example shows a sample power() routine.

Example 12-4 Using the power() Routine for a Single-Component Device

int
xxpower(dev_info_t *dip, int component, int level)
{
    struct xxstate *xsp;
    int instance;

    instance = ddi_get_instance(dip);
    xsp = ddi_get_soft_state(statep, instance);
    /*
     * Make sure the request is valid
     */
    if (!xx_valid_power_level(component, level))
        return (DDI_FAILURE);
    mutex_enter(&xsp->mu);
    /*
     * If the device is busy, don't lower its power level
     */
    if (xsp->xx_busy[component] &&
        xsp->xx_power_level[component] > level) {
        mutex_exit(&xsp->mu);
        return (DDI_FAILURE);
    }

    if (xsp->xx_power_level[component] != level) {
        /*
         * device- and component-specific setting of power level
         * goes here
         */
        xsp->xx_power_level[component] = level;
    }
    mutex_exit(&xsp->mu);
    return (DDI_SUCCESS);
}

The following example is a power() routine for a device with two components, where component 0 must be on when component 1 is on.

Example 12-5 power(9E) Routine for Multiple-Component Device

int
xxpower(dev_info_t *dip, int component, int level)
{
    struct xxstate *xsp;
    int instance;

    instance = ddi_get_instance(dip);
    xsp = ddi_get_soft_state(statep, instance);
    /*
     * Make sure the request is valid
     */
    if (!xx_valid_power_level(component, level))
        return (DDI_FAILURE);
    mutex_enter(&xsp->mu);
    /*
     * If the device is busy, don't lower its power level
     */
    if (xsp->xx_busy[component] &&
        xsp->xx_power_level[component] > level) {
        mutex_exit(&xsp->mu);
        return (DDI_FAILURE);
    }
    /*
     * This code implements inter-component dependencies:
     * If we are bringing up component 1 and component 0 
     * is off, we must bring component 0 up first, and if
     * we are asked to shut down component 0 while component
     * 1 is up we must refuse
     */
    if (component == 1 && level > 0 && xsp->xx_power_level[0] == 0) {
        xsp->xx_busy[0]++;
        if (pm_busy_component(dip, 0) != DDI_SUCCESS) {
            /*
             * This can only happen if the args to 
             * pm_busy_component()
             * are wrong, or pm-components property was not
             * exported by the driver.
             */
            xsp->xx_busy[0]--;
            mutex_exit(&xsp->mu);
            cmn_err(CE_WARN, "xxpower pm_busy_component() 
                failed");
            return (DDI_FAILURE);
        }
        mutex_exit(&xsp->mu);
        if (pm_raise_power(dip, 0, XX_FULL_POWER_0) != DDI_SUCCESS)
            return (DDI_FAILURE);
        mutex_enter(&xsp->mu);
    }
    if (component == 0 && level == 0 && xsp->xx_power_level[1] != 0) {
        mutex_exit(&xsp->mu);
        return (DDI_FAILURE);
    }
    if (xsp->xx_power_level[component] != level) {
        /*
         * device- and component-specific setting of power level
         * goes here
         */
        xsp->xx_power_level[component] = level;
    }
    mutex_exit(&xsp->mu);
    return (DDI_SUCCESS);
}