Writing Device Drivers

Post-Mortem Debugging

When kadb is running and the system panics, control is passed to the debugger so that you can investigate the source of the problem. However, kadb is not always the best tool for problem analysis; frequently it is easier to use ':c' to continue execution and allow the system to save a crash dump. When the system reboots, you can perform post-mortem analysis on the saved crash dump. This process is analogous to debugging an application crash from a process core file.

Post-mortem analysis offers several advantages to driver developers: it allows more than one developer to examine a problem in parallel; it allows developers to retrieve information on a problem that occurred in production at a customer site, where it is not acceptable to debug interactively; it is necessary to perform certain types of advanced kernel analysis, such as checking for kernel memory leaks.

Getting Started With the Modular Debugger

The modular debugger, mdb, provides sophisticated debugging support for analyzing kernel problems. This section provides an overview of mdb features. For a more complete discussion of mdb, refer to the Solaris Modular Debugger Guide.

mdb command syntax is compatible with the kadb syntax and mdbcan execute all of the kadb (and legacy adb) macros. These are stored in /usr/lib/adb and in /usr/platform/`uname -i`/lib/adb for 32-bit kernels; and in /usr/lib/adb/sparcv9 and /usr/platform/`uname -i`/lib/adb/sparcv9 for 64-bit kernels.

In addition to macro files, mdb supports debugger commands (or dcmds). These dcmds can be dynamically loaded at runtime from a set of debugger modules. mdb provides a first-class programming API for implementing debugger modules so that driver developers can implement their own custom debugging support. mdb also provides a host of usability features, such as command line editing, command history, an output pager, and online help.

mdb provides a rich set of modules and dcmds for debugging the Solaris kernel and associated modules and device drivers. Some of the activities these facilities enable you to do include:


Note –

In earlier versions of the Solaris operating environment, adb(1) was the recommended tool for post-mortem analysis. In the Solaris 9 operating environment, mdb(1) is the recommended tool for post-mortem analysis. It provides an upward-compatible syntax and feature set that surpass the set of commands available from the legacy crash(1M) utility, which has been removed from Solaris 9.


To get started, type mdb and supply it with a system crash dump:

% cd /var/crash/testsystem
% ls
bounds     unix.0    vmcore.0
% mdb unix.0 vmcore.0
Loading modules: [ unix krtld genunix ufs_log ip usba s1394 cpc nfs ]
> ::status
debugging crash dump vmcore.0 (64-bit) from testsystem
operating system: 5.9 Generic (sun4u)
panic message: zero
dump content: kernel pages only

When mdb responds with the '>' prompt, it is ready for commands. To examine the running kernel on a live system, type:

# mdb -k
Loading modules: [ unix krtld genunix ufs_log ip usba s1394 ptm cpc ipc nfs ]
> ::status
debugging live kernel (64-bit) on testsystem
operating system: 5.9 Generic (sun4u)

Important mdb Commands

This section provides a tutorial for some of the mdb debugger commands most applicable to driver authors. Note that the information presented here is dependent on the type of system used. A Sun Blade™ 100 workstation running the 64-bit kernel was used to produce these examples.

The Solaris Modular Debugger Guide provides details about each debugger command discussed here, as well as more information about all aspects of mdb. Online help is available from within mdb using the ::help built-in command.

Displaying Data Structures with mdb

mdb provides a powerful facility for displaying kernel data structures, so that earlier kadb(1) and mdb(1) debugger macros are no longer needed. Starting in Solaris 9, the operating system kernel maintains a highly compressed database of data structure type information in nonpageable system memory. This means that when a crash occurs, this type information is saved as part of the crash dump.

Here is an example of using the kernel type information to display all of the fields of a scsi_pkt structure:


Example 18–5 Displaying Kernel Information with mdb

> 7079ceb0::print -t 'struct scsi_pkt'
{
    opaque_t pkt_ha_private = 0x7079ce20
    struct scsi_address pkt_address = {
        struct scsi_hba_tran *a_hba_tran = 0x70175e68
        ushort_t a_target = 0x6
        uchar_t a_lun = 0
        uchar_t a_sublun = 0
    }
    opaque_t pkt_private = 0x708db4d0
    int (*)() *pkt_comp = sd_intr
    uint_t pkt_flags = 0
    int pkt_time = 0x78
    uchar_t *pkt_scbp = 0x7079ce74
    uchar_t *pkt_cdbp = 0x7079ce64
    ssize_t pkt_resid = 0
    uint_t pkt_state = 0x37
    uint_t pkt_statistics = 0
    uchar_t pkt_reason = 0
}

Each data structure member is presented along with its type. Nested structures are expanded for easy viewing. ::print can also decode arrays and unions.

It is frequently helpful to discover the size of a particular kernel data structure; doing so is simple, using the ::sizeof dcmd:


> ::sizeof 'struct scsi_pkt'
sizeof (struct scsi_pkt) = 0x58

You can also locate the offset of a field within a data structure:


> ::offsetof 'struct scsi_pkt' pkt_state
offsetof (pkt_state) = 0x48

The -a option may be used to view the offset of each member of a data structure; if no address is specified to ::print, the output begins at address 0, providing the offset of each field:


Example 18–6 mdb: Viewing Data Members

> ::print -at 'struct scsi_pkt'

{
    0 opaque_t pkt_ha_private
    8 struct scsi_address pkt_address {
        8 struct scsi_hba_tran *a_hba_tran
        10 ushort_t a_target
        12 uchar_t a_lun
        13 uchar_t a_sublun
    }
    18 opaque_t pkt_private
    20 int (*)() *pkt_comp
    28 uint_t pkt_flags
    2c int pkt_time
    30 uchar_t *pkt_scbp
    38 uchar_t *pkt_cdbp
    40 ssize_t pkt_resid
    48 uint_t pkt_state
    4c uint_t pkt_statistics
    50 uchar_t pkt_reason
}

The ::print, ::sizeof and ::offsetof facilities make it possible to more rapidly debug problems which arise when your driver interacts with the Solaris kernel.


Caution – Caution –

This facility provides access to “raw” kernel data structures. You may examine any structure whether it appears as part of the DDI or not; therefore, refrain from relying on any data structure that is not explicitly part of the DDI.



Note –

These dcmds may only be used with objects that contain compressed symbolic debugging information designed for use with mdb. This information is currently only available for certain Solaris kernel modules. The SUNWzlib (32-bit) or SUNWzlibx (64-bit) decompression software must be installed in order to process the symbolic debugging information.


Navigating the Device Tree with mdb

mdb provides the ::prtconf dcmd to display the kernel device tree. The output of this dcmd is similar to the output of the prtconf(1M) command:

> ::prtconf
300015d3e08      SUNW,Sun-Blade-100
    300015d3c28      packages (driver not attached)
        300015d3868      SUNW,builtin-drivers (driver not attached)
        300015d3688      deblocker (driver not attached)
        300015d34a8      disk-label (driver not attached)
        300015d32c8      terminal-emulator (driver not attached)
        300015d30e8      obp-tftp (driver not attached)
        300015d2f08      dropins (driver not attached)
        300015d2d28      kbd-translator (driver not attached)
        300015d2b48      ufs-file-system (driver not attached)
    300015d3a48      chosen (driver not attached)
    300015d2968      openprom (driver not attached)
    ...

The node can then be displayed using a macro (such as the $<devinfo_brief macro) or the ::devinfo dcmd:

> 300015d3e08::devinfo
300015d3e08      SUNW,Sun-Blade-100
        System properties at 0x300015abdc0:
            name='relative-addressing' type=int items=1
                value=00000001
            name='MMU_PAGEOFFSET' type=int items=1
                value=00001fff
            name='MMU_PAGESIZE' type=int items=1
                value=00002000
            name='PAGESIZE' type=int items=1
                value=00002000
        Driver properties at 0x300015abe00:
            name='pm-hardware-state' type=string items=1
                value='no-suspend-resume'


> 300015d3e08$<devinfo_brief

                ============== devinfo  300015d3e08
                                binding_name
0x300013b3058:                    SUNW,Sun-Blade-100
                                node_name
0x300013b3118:                    SUNW,Sun-Blade-100
                                addr
0x300015b1760:
                                node_state
                                  6                 DS_READY
                                major (hex)
                                  1
0x300015d3e08:  parent          child           sibling
                0               300015d3c28     0

Use ::prtconf to see where your driver has attached in the device tree, and to display device properties. You can also specify the verbose (-v) flag to ::prtconf to display the properties for each device node:

> ::prtconf -v
DEVINFO          NAME                                              
300015d3e08      SUNW,Sun-Blade-100
        System properties at 0x300015abdc0:
            name='relative-addressing' type=int items=1
                value=00000001
            name='MMU_PAGEOFFSET' type=int items=1
                value=00001fff
            name='MMU_PAGESIZE' type=int items=1
                value=00002000
            name='PAGESIZE' type=int items=1
                value=00002000
        Driver properties at 0x300015abe00:
            name='pm-hardware-state' type=string items=1
                value='no-suspend-resume'
        ...
        300015ce798      pci10b9,5229, instance #0
                Driver properties at 0x300015ab980:
                    name='target2-dcd-options' type=any items=4
                        value=00.00.00.a4
                    name='target1-dcd-options' type=any items=4
                        value=00.00.00.a2
                    name='target0-dcd-options' type=any items=4
                        value=00.00.00.a4
 		...

Another way to locate instances of your driver is the ::devbindings dcmd. Given a driver name, it displays a list of all instances of the named driver:

> ::devbindings dad
300015ce3d8      ide-disk (driver not attached)
300015c9a60      dad, instance #0
        System properties at 0x300015ab400:
            name='lun' type=int items=1
                value=00000000
            name='target' type=int items=1
                value=00000000
            name='class_prop' type=string items=1
                value='ata'
            name='type' type=string items=1
                value='ata'
            name='class' type=string items=1
                value='dada'
	...
300015c9880      dad, instance #1
        System properties at 0x300015ab080:
            name='lun' type=int items=1
                value=00000000
            name='target' type=int items=1
                value=00000002
            name='class_prop' type=string items=1
                value='ata'
            name='type' type=string items=1
                value='ata'
            name='class' type=string items=1
                value='dada'
	...

Retrieving Driver Soft State Information

A common problem when debugging a driver is retrieving the soft state for a particular driver instance. The soft state is allocated with the ddi_soft_state_zalloc(9F) routine and obtained by drivers using ddi_get_soft_state(9F). If you know the name of the soft state pointer (the first argument to ddi_soft_state_init(9F)), mdb enables you to retrieve the soft state for a particular driver instance using the ::softstate dcmd:

> *bst_state::softstate 0x3
702b7578

In this case, ::softstate is used to fetch the soft state for instance 3 of the bst sample driver. This pointer points to a bst_soft structure used by the driver to track state for this instance.

Detecting Kernel Memory Leaks

The ::findleaks dcmd provides powerful and efficient detection of memory leaks in kernel crash dumps. The full set of kernel-memory debugging features should be enabled for ::findleaks to be effective. For more information see kmem_flags. Running ::findleaks during driver development and testing can detect code which is leaking memory and wasting kernel resources. See “Debugging With the Kernel Memory Allocator” in the Solaris Modular Debugger Guide for a complete discussion of ::findleaks.


Note –

Use ::findleaks to detect and eliminate kernel memory leaks caused by your code. Code that leaks kernel memory can render the system vulnerable to denial-of-service attacks.


Writing Debugger Commands with mdb

mdb provides a powerful API for implementing new debugger facilities that you can use to debug your driver. The Solaris Modular Debugger Guide explains the programming API in more detail, and the SUNWmdbdm package installs sample mdb source code in the directory /usr/demo/mdb. You can use mdb to automate lengthy debugging chores or help to validate that your driver is behaving properly. You can also package your mdb debugging modules with your driver product so that these facilities will be available to service personnel at a customer site.