Writing Device Drivers

Device Power Management Model

The following sections describe the details of the device power management model. This model includes the following elements:

Components

A device is power manageable if the power consumption of the device can be reduced when it is idle. Conceptually, a power manageable device consists of a number of power-manageable hardware units called components.

The device driver notifies the system of the existence of device components and the power levels that they support by creating a pm-components(9) property in its attach(9E) entry point as part of driver initialization.

Most devices which are power manageable implement only a single component. An example of a single-component, power-manageable device is a disk whose spindle motor can be stopped to save power when the disk is idle.

If a device has multiple power-manageable units that are separately controllable, it should implement multiple components.

An example of a two-component, power-manageable device is a frame buffer card with a monitor connected to it. Frame buffer electronics is the first component [component 0]. Its power consumption can be reduced when not in use. The monitor is the second component [component 1], which can also enter a lower power mode when not in use. The combination of frame buffer electronics and monitor is considered by the system as one device with two components.

Multiple Components

To the power management framework, all components are considered equal and completely independent of each other. If this is not true for a particular device, it is the responsibility of the device driver to ensure that undesirable state combinations do not occur. For example, with a frame buffer/monitor combination as described in the previous section, for each possible power state of the monitor (On, Standby, Suspend, Off) there are states of the frame buffer electronics (D0, D1, D2, D3) which are not allowed if the device is to work properly. If the monitor is On, then the frame buffer must be at D0 (full on), so if the frame buffer driver gets a request to power up the monitor to On while the frame buffer is D3, it must ask the system to bring the frame buffer back up (by calling pm_raise_power(9F)) before setting the monitor On. If the frame buffer driver gets a request from the system to lower the power of the frame buffer while the monitor is On, it must fail that request.

Idleness

Each component of a device may be in one of two states: busy or idle. The device driver notifies the framework of changes in the device state by calling pm_busy_component(9F) and pm_idle_component(9F). When components are initially created, they are considered idle.

Power Levels

From the pm-components property exported by the device, the Device Power Management framework knows what power levels the device supports. Power level values must be positive integers. The interpretation of power levels is determined by the device driver writer, but they must be listed in monotonically increasing order in the pm-components property, and a power level of 0 is interpreted by the framework to mean off. When the framework must power up a device because of a dependency, it will bring each component to its highest power level.

Example 9-1 is an example pm-components entry from the .conf file of a driver which implements a single power-managed component consisting of a disk spindle motor. The disk spindle motor is component 0 and it supports 2 power levels, which represent stopped and spinning full speed.


Example 9-1 Sample pm-component Entry

pm-components="NAME=Spindle Motor", "0=Stopped", "1=Full Speed";

Example 9-2 shows an example of how Example 9-1 could be implemented in the attach() routine of the driver.


Example 9-2 attach(9E) Routine With pm-components Property

static char *pmcomps[] = {
        "NAME=Spindle Motor",
        "0=Stopped",
        "1=Full Speed"
};

...

xxattach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
...
        if (ddi_prop_update_string_array(DDI_DEV_T_NONE, dip,
            "pm-components", &pmcomp[0],
            sizeof (pmcomps) / sizeof (char *)) != DDI_PROP_SUCCESS)
                goto failed;
...

Example 9-3 shows a frame buffer that implements two components. Component 0 is the frame buffer electronics that support 4 different power levels. Component 1 represents the state of power management of the attached monitor.


Example 9-3 Multiple Component pm-components Entry

pm-components="NAME=Frame Buffer", "0=Off", "1=Suspend", "2=Standby", "3=On",
        "NAME=Monitor", "0=Off", "1=Suspend", "2=Standby", "3=On";

When a device driver is first attached, the framework does not know the power level of the device. A power transition may occur when:

Once a power transition has occurred or the driver has informed the framework of the power level, the framework tracks the current power level of each component of the device. The driver can inform the framework of a power level change by calling pm_power_has_changed(9F).

The system calculates a default threshold for each possible transition from one power level to the next lower level, based on the system idleness threshold. These default thresholds can be overridden using dtpower(1M) or power.conf(4). Another default threshold, based on the system idleness threshold, is used when the component power level is unknown.

Dependency

A device might depend on one or more other devices. A device depends on another device if it can be powered off only when all the components of all the devices it depends on are also powered off. For example, when the window system is not running, the frame buffer device depends on the keyboard device by default. When the window system is not running, the frame buffer components can only be powered off when the keyboard device is powered off.

The power.conf(4) file specifies the dependencies among devices. A parent node in the device tree implicitly depends upon its children. This dependency is handled automatically by the power management framework.

Policy

If automatic power management is enabled by dtpower(1M) or power.conf(4), then all devices with a pm-components(9) property will be automatically power managed. After each component has been idle for a default period, it will be automatically brought to its next lowest power level. The default period is calculated by the power management framework to get the entire device to its lowest power state within the system idleness threshold.


Note -

By default automatic power management is enabled on all SPARC desktop systems first shipped after July 1, 1999. This feature is disabled by default for all other systems. To determine if automatic power management is enabled on your machine, refer to the power.conf(4) man page for instructions.


dtpower(1M) or power.conf(4) may be used to override the defaults calculated by the framework.

Device Power Management Interfaces

A device driver that supports a device with power-manageable components must notify the system of the existence of these components and the power levels that they support by creating a pm-components(9) property. This is typically done from the driver's attach(9E) entry point by calling ddi_prop_update_string_array(9F), but may be done from a driver.conf(4) file instead. See the pm-components(9) man page for details.

Busy-Idle State Transitions

The driver must keep the framework informed of device state transitions from idle to busy or busy to idle. Where these transitions happen is entirely device specific. The transitions from idle to busy and from busy to idle depend on the nature of the device and the abstraction represented by the specific component. For example, SCSI disk target drivers typically export a single component, which represents whether the SCSI target disk drive is spun up or not. It is marked busy whenever there is an outstanding request to the drive and idle when the last queued request finishes. Some components are created and never marked busy (components created by pm-components(9) are created in an idle state). For example, the keyboard and mouse are never marked busy but have their idle time reset each time a keystroke or mouse event is processed.

The following interfaces notify the power management framework of busy-idle state transitions.

pm_busy_component(9F)

    int pm_busy_component(dev_info_t *dip, int component);

pm_busy_component(9F) marks component as busy. While the component is busy, it will not be powered off. If the component is already powered off, then marking it busy doesn't change its power level. The driver needs to call pm_raise_power(9F) for this purpose. Calls to pm_busy_component(9F) are stacked and require a corresponding number of calls to pm_idle_component(9F) to idle the component.

pm_idle_component(9F)

int pm_idle_component(dev_info_t *dip, int component);

pm_idle_component(9F) marks component as idle. An idle component is subject to being powered off. pm_idle_component(9F) must be called once for each call to pm_busy_component(9F) in order to idle the component.

Device Power State Transitions

A device driver can call pm_raise_power(9F) to request that a component be set to at least a given power level. This is necessary before using a component that has been powered off. For example, a SCSI disk target driver's read(9E) or write(9E) routine might need to spin up the disk if it had been powered off before completing the read or write. pm_raise_power(9F) requests the power management framework to initiate a device power state transition to a higher power level. Normally, reductions in component power levels are initiated by the framework. However, a device driver should call pm_lower_power(9F) when detaching, in order to reduce the power consumption of unused devices as much as possible.

pm_raise_power(9F)

    int pm_raise_power(dev_info_t *dip, int component, int level);

pm_raise_power(9F) is called when the driver discovers that a component needed for some operation is at a power level less than is needed for a particular operation. This interface arranges for the driver to be called to raise the current power level of the component at least to the level specified in the request. All the devices that depend on this device are also brought back to full power by this call.

pm_lower_power(9F)

int pm_lower_power(dev_info_t *dip, int component, int level);

pm_lower_power(9F) is called when the device is detaching, once access to the device is no longer needed. It should be called for each component to set each component to its lowest power so that the device uses as little power as possible while it is not in use.

pm_power_has_changed(9F)

pm_power_has_changed(9F) is called to notify the framework when a device has made a power transition on its own, or to inform the framework of the power level of a device, for example, after a suspend-resume operation.

Entry Points Used by Device Power Management

The Power Management framework uses the power(9E) entry point.

power(9E)

    int power(dev_info_t *dip, int component, int level);

The system calls the power(9E) entry point (either directly or as a result of a call to pm_raise_power() or pm_lower_power()) when it determines that a component's current power level needs to be changed. The action taken by this entry point is device driver specific. In the example of the SCSI target disk driver mentioned previously, setting the power level to 0 results in sending a SCSI command to spin down the disk, while setting the power level to the full power level results in sending a SCSI command to spin up the disk.

If a power transition will cause the device to lose state, then the driver must ensure that any necessary state is saved in memory so that it can be restored when it is needed again. If a power transition will require that saved state be restored before the device can be used again, then the driver must restore that state. The framework makes no assumptions about what power transactions cause the loss of or require the restoration of state for automatically power-manage devices. Example 9-4 shows a sample power(9E) routine.


Example 9-4 power(9E) Routine for Single-Component Device

int
xxpower(dev_info_t *dip, int component, int level)
{
        struct xxstate *xsp;
        int instance;

        instance = ddi_get_instance(dip);
        xsp = ddi_get_soft_state(statep, instance);
        /*
         * Make sure the request is valid
         */
        if (!xx_valid_power_level(component, level))
                return (DDI_FAILURE);
        mutex_enter(&xsp->mu);
        /*
         * If the device is busy, don't lower its power level
         */
        if (xsp->xx_busy[component] &&
            xsp->xx_power_level[component] > level) {
                mutex_exit(&xsp->mu);
                return (DDI_FAILURE);
        }

        if (xsp->xx_power_level[component] != level) {
                /*
                 * device- and component-specific setting of power level
                 * goes here
                 */
                ...
                xsp->xx_power_level[component] = level;
        }
        mutex_exit(&xsp->mu);
        return (DDI_SUCCESS);
}

Example 9-5 is a power(9E) routine for a device with two components, where component 0 must be on when component 1 is on.


Example 9-5 power(9E) Routine for Multiple Component Device

int
xxpower(dev_info_t *dip, int component, int level)
{
        struct xxstate *xsp;
        int instance;

        instance = ddi_get_instance(dip);
        xsp = ddi_get_soft_state(statep, instance);
        /*
         * Make sure the request is valid
         */
        if (!xx_valid_power_level(component, level))
                return (DDI_FAILURE);
        mutex_enter(&xsp->mu);
        /*
         * If the device is busy, don't lower its power level
         */
        if (xsp->xx_busy[component] &&
            xsp->xx_power_level[component] > level) {
                mutex_exit(&xsp->mu);
                return (DDI_FAILURE);
        }

        /*
         * This code implements inter-component dependencies:
         * If we are bringing up component 1 and component 0 is off, we must
         * bring component 0 up first, and if we are asked to shut down
         * component 0 while component 1 is up we must refuse
         */
        if (component == 1 && level > 0 && xsp->xx_power_level[0] == 0) {
                xsp->xx_busy[0]++;
                if (pm_busy_component(dip, 0) != DDI_SUCCESS) {
                        /*
                         * This can only happen if the args to pm_busy_component()
                         * are wrong, or pm-components property was not
                         * exported by the driver.
                         */
                        xsp->xx_busy[0]--;
                        mutex_exit(&xsp->mu);
                        cmn_err(CE_WARN, "xxpower pm_busy_component() failed");
                        return (DDI_FAILURE);
                }
                mutex_exit(&xsp->mu);
                if (pm_raise_power(dip, 0, XX_FULL_POWER_0) != DDI_SUCCESS)
                        return (DDI_FAILURE);
                mutex_enter(&xsp->mu);
        }
        if (component == 0 && level == 0 && xsp->xx_power_level[1] != 0) {
                mutex_exit(&xsp->mu);
                return (DDI_FAILURE);
        }
        if (xsp->xx_power_level[component] != level) {
                /*
                 * device- and component-specific setting of power level
                 * goes here
                 */
                ...
                xsp->xx_power_level[component] = level;
        }
        mutex_exit(&xsp->mu);
        return (DDI_SUCCESS);
}