This section describes debuggers that can be applied to device drivers. Debuggers are described in detail in the Oracle Solaris Modular Debugger Guide.
The kmdb(1) kernel debugger provides typical runtime debugger facilities, such as breakpoints, watch points, and single-stepping. The kmdb debugger supersedes kadb, which was available in previous releases. The commands that were previously available from kadb are used in kmdb, in addition to new functionality. Where kadb could only be loaded at boot time, kmdb can be loaded at any time. The kmdb debugger is preferred for live, interactive debugging due to its execution controls.
The mdb(1) modular debugger is more limited than kmdb as a real-time debugger, but mdb has rich facilities for postmortem debugging.
A debug version of the current Oracle Solaris kernel is also available. To enable a debug build of the kernel and install modules provided by Oracle Solaris, issue the following command:
# pkg change-variant variant.debug.osnet=true
The kmdb and mdb debuggers mostly share the same user interface. Many debugging techniques therefore can be applied with the same commands in both tools. Both debuggers support macros, dcmds, and dmods. A dcmd (pronounced dee-command) is a routine in the debugger that can access any of the properties of the current target program. A dcmd can be dynamically loaded at runtime. A dmod, which is short for debugger module, is a package of dcmds that can be loaded to provide non-standard behavior.
Both mdb and kmdb are backward-compatible with legacy debuggers such as adb and kadb. The mdb debugger can execute all of the macros that are available to kmdb as well as any legacy user-defined macros for adb. See the Oracle Solaris Modular Debugger Guide for information about where to find standard macro sets.
Beginning with the current Oracle Solaris release, kernel protection by using ADI is supported on SPARC M7, T7, and S7 servers as well as for SPARC M8 and T8 servers. By default, KADI is enabled on all debug-related kernels, but disabled for non-debug kernels.
To manage KADI, use the sxadm command. For example, to enable KADI on a non-debug system, you would type:
$ sxadm enable kadi
You would then have to reboot the Oracle Solaris system for the change to take effect.
To check the status of KADI on a system, type sxadm status. For example:
$ sxadm status EXTENSION STATUS FLAGS aslr enabled (tagged-files) u-c-- nxstack enabled (all) u-c-- nxheap enabled (tagged-files) u-c-- adiheap enabled (tagged-files) u-c-- adistack enabled (tagged-files) u-c-- kadi disabled -kcr- hw_bti not supported -----
See the sxadm(8) man page.
Example 127 Debugging KADI PanicsThis example uses the case of an ADI precise mismatch trap panic to show how to debug and analyze ADI related errors. It relies on the error messages to trace where the problem lies. In the example, relevant items of information are in bold for clarity.
The error message contains the following line:
panic[cpu6]/thread=4001840011d70040: ADI precise mismatch trap (cache version 1)
Identify the address involved in the panic.
The address might be displayed as follows:
type=1a rp=2a10543f790 addr=18400383e3b18 mmu_fsr=0 addr=0x18400383e3b18 000002a10543f4c0 unix:die+d8 (1016e000, 2a10543f790, 18400383e3b18, 0
Identify the in-memory version at the addressed location.
If you are debugging a live system that is at the kmdb prompt, type:
kmdb> 18400383e3b18::adiver
The command output will include the version number.
If you are not debugging a live system that is at the kmdb prompt, see if the error message itself already identifies the version at the addressed location.
For example, in the error message, the version is indicated as follows:
panic[cpu6]/thread=4001840011d70040: ADI precise mismatch trap (cache version 1)
Version 1 is used by KADI for freed memory and for kmem_alloc metadata.
A mismatch trap against Version 1 can indicate a use-after-free error, a stray pointer error, or a buffer overrun error. Thus, the cause of the panic is narrowed down to these three possibilities.
Identify the version used by the instruction.
This step can be accomplished in two ways:
If you have access to a live system or to the crash dump, examine the instruction pointed to by the PC and the register(s) it used. Then calculate the virtual address it used.
The high-order four bits of the address calculated in this way will be the version used by the instruction.
For example, the PC pointing at the instruction might appear as follows:
pid=109962, pc=0x10b25dd4, sp=0x2a10543f031, tstate=0x4480081604, context=0x3ee3
If you have access only to error messages, then look for other occurrences of the data address identified in Step 1, while dropping low-order hex digits one by one, until you find matches in the error messages.
In the current case, instead of 18400383e3b18, search for the string 18400383e3b1. You might see the following lines:
000002a10543f830 genunix:as_execute_callback+74 (d00184001c084ca0, a0018400383e3b10, 4, 80000000, 40, 4001840011d70044) %l0-3: d00184001c084ca0 000000001084cc00 0000000000000002 000000007b4e7050 %l4-7: 0000000080000004 001ffe81aaa86000 0000000000000000 0000000000000000 000002a10543f8e0 genunix:as_unmap_impl+2c8 (d00184001c084ca0, a0018400383e3b10, 0, 1ffe81aaa84000, d00184001c084caa, 0) %l0-3: 001ffe81aaa84000 0000000000000001 001ffe81aaa86000 c001840013418dd0 %l4-7: 0000000020909558 0000000020909db0 0000000000002000 0000000000002000
Since 0x18400383e3b18 is part of the same cache line as 0x18400383e3b10, then the version used was 0xA.
Identify the object at the data address and determine what the code was doing at the time the object was being accessed.
The message block of Step 3 indicates that at this point in the code, as_unmap_impl() is calling as_execute_callback(), during which the value of %i1 is 0xa0018400383e3b10. In other words, 0xa0018400383e3b10 is an argument to as_execute_callback().
0xa0018400383e3b10 serves as a pointer to an as_callback structure.
In the as_callback structure, offset 8 is ascb_events.
In summary, the panic occurrs while as_execute_callback() is accessing the as_callback structure through 0xa0018400383e3b10, specifically at offset 8, which is ascb_events.
Based on the 3 possible causes of the panic as indicated in Step 2, the cause is probably a use-after-free error, where the code is attempting to access a structure that is no longer available. Thus, the panic might indicate a race condition.
Check the code to see what freed the callback structure and why attempts to access it occurred even after the structure was freed.
Postmortem analysis offers numerous advantages to driver developers. More than one developer can examine a problem in parallel. Multiple instances of the debugger can be used simultaneously on a single crash dump. The analysis can be performed offline so that the crashed system can be returned to service, if possible. Postmortem analysis enables the use of user-developed debugger functionality in the form of dmods. Dmods can bundle functionality that would be too memory-intensive for real-time debuggers, such as kmdb.
When a system panics while kmdb is loaded, control is passed to the debugger for immediate investigation. If kmdb does not seem appropriate for analyzing the current problem, a good strategy is to use :c to continue execution and save the crash dump. When the system reboots, you can perform postmortem analysis with mdb on the saved crash dump. This process is analogous to debugging an application crash from a process core file.
The kmdb debugger is an interactive kernel debugger that provides the following capabilities:
Control of kernel execution
Inspection of the kernel state
Live modifications to the code
This section assumes that you are already familiar with the kmdb debugger. The focus in this section is on kmdb capabilities that are useful in device driver design. To learn how to use kmdb in detail, refer to the kmdb(1) man page and to the Oracle Solaris Modular Debugger Guide. If you are familiar with kadb, refer to the kadb(8) man page for the major differences between kadb and kmdb.
The kmdb debugger can be loaded and unloaded at will. Instructions for loading and unloading kmdb are in the Oracle Solaris Modular Debugger Guide. For safety and convenience, booting with an alternate kernel is highly encouraged. The boot process is slightly different between the SPARC platform and the x86 platform, as described in this section.
Use either of the following commands to boot a SPARC system with both kmdb and an alternate kernel:
boot kmdb -D kernel.test/sparcv9/unix boot kernel.test/sparcv9/unix -k
Use either of the following commands to boot an x86 system with both kmdb and an alternate kernel:
b kmdb -D kernel.test/unix b kernel.test/unix -k
Use the bp command to set a breakpoint, as shown in the following example.
Example 128 Setting Standard Breakpoints in kmdb[0]> myModule`myBreakpointLocation::bp
If the target module has not been loaded, then an error message that indicates this condition is displayed, and the breakpoint is not created. In this case you can use a deferred breakpoint. A deferred breakpoint activates automatically when the specified module is loaded. Set a deferred breakpoint by specifying the target location after the bp command. The following example demonstrates a deferred breakpoint.
Example 129 Setting Deferred Breakpoints in kmdb[0]>::bp myModule`myBreakpointLocation
For more information about using breakpoints, see the Oracle Solaris Modular Debugger Guide. You can also get help by typing either of the following two lines:
> ::help bp > ::bp dcmd
The kmdb(8) debugger supports macros that can be used to display kernel data structures. Use $M to display kmdb macros. Macros are used in the form:
[ address ] $<macroname
The kmdb macros in the following table are particularly useful to developers of device drivers. For convenience, legacy macro names are shown where applicable.
|
The ::dev info dcmd displays a node state that can have one of the following values:
The driver's attach (9E) routine returned successfully.
The node is bound to a driver, but the driver's probe (9E) routine has not yet been called.
The parent nexus has assigned a bus address for the driver. The implementation-specific initializations have been completed. The driver's probe (9E) routine has not yet been called at this point.
The device node has been linked into the kernel's device tree, but the system has not yet found a driver for this node.
The driver's probe (9E) routine returned successfully.
The device is fully configured.
The mdb(1) modular debugger can be applied to the following types of files:
Live operating system components
Operating system crash dumps
User processes
User process core dumps
Object files
The mdb debugger provides sophisticated debugging support for analyzing kernel problems. This section provides an overview of mdb features. For a complete discussion of mdb, refer to the Oracle Solaris Modular Debugger Guide.
Although mdb can be used to alter live kernel state, mdb lacks the kernel execution control that is provided by kmdb. As a result kmdb is preferred for runtime debugging. The mdb debugger is used more for static situations.
The mdb debugger provides an extensive programming API for implementing debugger modules so that driver developers can implement custom debugging support. The mdb debugger also provides many usability features, such as command-line editing, command history, an output pager, and online help.
The mdb debugger provides a rich set of modules and dcmds. With these tools, you can debug the Oracle Solaris kernel, any associated modules, and device drivers. These facilities enable you to perform tasks such as:
Formulate complex debugging queries
Locate all the memory allocated by a particular thread
Print a visual picture of a kernel STREAM
Determine what type of structure a particular address refers to
Locate leaked memory blocks in the kernel
Analyze memory to locate stack traces
Assemble dcmds into modules called mods for creating customized operations
To get started, switch to the crash directory and type mdb, specifying a system crash dump, as illustrated in the following example.
Example 130 Invoking mdb on a Crash Dump% cd /var/crash/testsystem % ls bounds unix.0 vmcore.0 % mdb unix.0 vmcore.0 Loading modules: [ unix kited genunix ufs_log ip SBA s1394 cc nfs ] > ::status debugging crash dump vmcore.0 (64-bit) from test system operating system: 5.10 Generic (sun4v) panic message: zero dump content: kernel pages only
When mdb responds with the > prompt, you can run commands.
To examine the running kernel on a live system, run mdb from the system prompt as follows.
Example 131 Invoking mdb on a Running Kernel# mdb -k Loading modules: [ unix kited genunix ufs_log ip SBA s1394 Pym cc IPX nfs ] > ::status debugging live kernel (64-bit) on test system operating system: 5.10 Generic (sun4v)
This section provides examples of useful debugging tasks. The tasks in this section can be performed with either mdb or kmdb unless specifically noted. This section assumes a basic knowledge of the use of kmdb and mdb. Note that the information presented here is dependent on the type of system used.
Caution - Because irreversible destruction of data can result from modifying data in kernel structures, you should exercise extreme caution. Do not modify or rely on data in structures that are not part of the Oracle Solaris DI. See the Intro(9S) man page for information about structures that are part of the Oracle Solaris DI. |
The kmdb debugger can display system registers as a group or individually. To display all registers as a group, use $r as shown in the following example.
Example 132 Reading All Registers on a SPARC Processor With kmdb[0]: $r g0 0 l0 0 g1 100130a4 debug_enter l1 edd00028 g2 10411c00 tsbmiss_area+0xe00 l2 10449c90 g3 10442000 ti_statetbl+0x1ba l3 1b g4 3000061a004 l4 10474400 ecc_syndrome_tab+0x80 g5 0 l5 3b9aca00 g6 0 l6 0 g7 2a10001fd40 l7 0 o0 0 i0 0 o1 c i1 10449e50 o2 20 i2 0 o3 300006b2d08 i3 10 o4 0 i4 0 o5 0 i5 b0 sp 2a10001b451 DP 2a10001b521 o7 1001311c debug_enter+0x78 i7 1034bb24 zsa_xsint+0x2c4 y 0 state: 1604 (cc=0x0, Adi=0x0, state=0x16, cup=0x4) state: ag:0 ie:1 priv:1 am:0 pef:1 mm:0 tle:0 cle:0 mg:0 ig:0 winer: cur:4 other:0 clean:7 cansave:1 canrest:5 wstate:14 tba 0x10000000 pc edd000d8: ta %icc,%g0 + 125 NC edd000dc: nope
The debugger exports each register value to a variable with the same name as the register. If you read the variable, the current value of the register is returned. If you write to the variable, the value of the associated machine register is changed. The following example changes the value of the %o0 register from 0 to 1 on an x86 machine.
Example 133 Reading and Writing Registers on an x86 Machine With kmdb[0]> & <ex=K c1e6e0f0 [0]> 0>ex [0]> & <ex=K 0 [0]> c1e6e0f0>ex
If you need to inspect the registers of a different processor, you can use the ::purges dcmd. The ID of the processor to be examined can be supplied as either the address to the dcmd or as the value of the –c option, as shown in the following example.
Example 134 Inspecting the Registers of a Different Processor[0]> 0::purges %cs = 0x0158 %ex = 0xc1e6e0f0 kmdbmod`kaif_dvec %DDS = 0x0160 %Eb = 0x00000000
The following example switches from processor 0 to processor 3 on a SPARC machine. The %g3 register is inspected and then cleared. To confirm the new value, %g3 is read again.
Example 135 Retrieving the Value of an Individual Register From a Specified Processor[0]> 3::switch [3]> <g3=K 24 [3]> 0>g3 [3]> <g3 0
The ::find leaks dcmd provides powerful, efficient detection of memory leaks in kernel crash dumps. The full set of kernel-memory debugging features must be enabled for ::find leaks to be effective. For more information, see Setting kmem_flags Debugging Flags. Run ::find leaks during driver development and testing to detect code that leaks memory, thus wasting kernel resources. See Chapter 9, Debugging With the Kernel Memory Allocator in Oracle Solaris Modular Debugger Guide for a complete discussion of ::find leaks.
The mdb debugger provides a powerful API for implementing debugger facilities that you customize to debug your driver. The Oracle Solaris Modular Debugger Guide explains the programming API in detail.
The SUNWmdbdm package installs sample mdb source code in the directory /usr/demo/mdb. You can use mdb to automate lengthy debugging chores or help to validate that your driver is behaving properly. You can also package your mdb debugging modules with your driver product. With packaging, these facilities are available to service personnel at a customer site.
The Oracle Solaris kernel provides data type information in structures that can be inspected with either kmdb or mdb.
The following example demonstrates how to display the data in the scsi_pkt structure.
Example 136 Displaying Kernel Data Structures With a Debugger> 7079ceb0::print -t 'struct scsi_pkt' { opaque_t pkt_ha_private = 0x7079ce20 struct scsi_address pkt_address = { struct scsi_hba_tran *a_hba_tran = 0x70175e68 ushort_t a_target = 0x6 uchar_t a_lun = 0 uchar_t a_sublun = 0 } opaque_t pkt_private = 0x708db4d0 int (*)() *pkt_comp = sd_intr uint_t pkt_flags = 0 int pkt_time = 0x78 uchar_t *pkt_scbp = 0x7079ce74 uchar_t *pkt_cdbp = 0x7079ce64 ssize_t pkt_resid = 0 uint_t pkt_state = 0x37 uint_t pkt_statistics = 0 uchar_t pkt_reason = 0 }
The size of a data structure can be useful in debugging. Use the ::sized dcmd to obtain the size of a structure, as shown in the following example.
Example 137 Displaying the Size of a Kernel Data Structure> ::sized struct scsi_pkt sized (struct scsi_pkt) = 0x58
The address of a specific member within a structure is also useful in debugging. Several methods are available for determining a member's address.
Use the ::offset dcmd to obtain the offset for a given member of a structure, as in the following example.
Example 138 Displaying the Offset to a Kernel Data Structure> ::offset struct scsi_pkt pkt_state offset (struct pkt_state) = 0x48
Use the ::print dcmd with the –a option to display the addresses of all members of a structure, as in the following example.
Example 139 Displaying the Relative Addresses of a Kernel Data Structure> ::print -a struct scsi_pkt { 0 pkt_ha_private 8 pkt_address { ... } 18 pkt_private ... }
If an address is specified with ::print in conjunction with the –a option, the absolute address for each member is displayed.
Example 140 Displaying the Absolute Addresses of a Kernel Data Structure> 10000000::print -a struct scsi_pkt { 10000000 pkt_ha_private 10000008 pkt_address { ... } 10000018 pkt_private ... }
The ::print, ::sized and ::offset dcmds enable you to debug problems when your driver interacts with the Oracle Solaris kernel.
Caution - This facility provides access to raw kernel data structures. You can examine any structure whether or not that structure appears as part of the DI. Therefore, you should refrain from relying on any data structure that is not explicitly part of the DI. |
The mdb debugger provides the ::prtconf dcmd for displaying the kernel device tree. For example:
Example 141 Using the ::prtconf Dcmd> ::prtconf DEVINFO NAME 400a2cc7a48 ORCL,SPARC-T4-1B, instance #0 (driver name: rootnex) 400a6f003a8 scsi_vhci, instance #0 (driver name: scsi_vhci) a006009c748 scsiclass,00, instance #4 (driver not attached) a006009ca90 scsiclass,00, instance #3 (driver name: sd) 400a2cc7700 packages (driver not attached) 400a2cc7070 SUNW,builtin-drivers (driver not attached) 400a2cc6d28 deblocker (driver not attached) 400a2cc69e0 disk-label (driver not attached) 400a2cc6698 terminal-emulator (driver not attached) 400a2cc6350 dropins (driver not attached) 400a2cc6008 SUNW,asr (driver not attached) 400a6f2da50 kbd-translator (driver not attached) 400a6f2d708 obp-tftp (driver not attached) 400a6f2d3c0 vdisk-helper-pkg (driver not attached) 400a6f2d078 vnet-helper-pkg (driver not attached) 400a6f2cd30 zfs-file-system (driver not attached) 400a6f2c9e8 hsfs-file-system (driver not attached) 400a2cc73b8 chosen (driver not attached) 400a6f2c6a0 openprom (driver not attached)
You can display the node by using a macro, such as the ::dev info dcmd, as shown in the following example.
Example 142 Displaying Device Information for an Individual Node> 400a2cc7a48::devinfo 400a2cc7a48 ORCL,SPARC-T4-1B, instance #0 (driver name: rootnex) Driver properties at a0060015400 (1 reference): name='fm-ereport-capable' type=any items=0 name='fm-errcb-capable' type=any items=0 name='pm-hardware-state' type=string items=1 value='no-suspend-resume' System properties at a0060015310 (1 reference): name='MMU_PAGEOFFSET' type=int items=1 value=00001fff name='MMU_PAGESIZE' type=int items=1 value=00002000 name='PAGESIZE' type=int items=1 value=00002000 name='archfs' type=any items=4 value=fe.6d.e8.48 name='archive-fstype' type=any items=1 value='hsfs' name='bootarchive' type=any items=1 value='/ramdisk-root' name='bootargs' type=string items=1 value='' name='bootfs' type=any items=4 value=fe.6e.15.98
Use ::prtconf to see where your driver has attached in the device tree, and to display device properties. You can also specify the verbose (–v) flag to ::prtconf to display the properties for each device node, as follows.
Example 143 Using the ::prtconf Dcmd in Verbose Mode> ::prtconf -v DEVINFO NAME 400a2cc7a48 ORCL,SPARC-T4-1B, instance #0 (driver name: rootnex) Driver properties at a0060015400 (1 reference): name='fm-ereport-capable' type=any items=0 name='fm-errcb-capable' type=any items=0 name='pm-hardware-state' type=string items=1 value='no-suspend-resume' System properties at a0060015310 (1 reference): name='MMU_PAGEOFFSET' type=int items=1 value=00001fff name='MMU_PAGESIZE' type=int items=1 value=00002000 name='PAGESIZE' type=int items=1 value=00002000 name='archfs' type=any items=4 value=fe.6d.e8.48 name='archive-fstype' type=any items=1 value='hsfs' name='bootarchive' type=any items=1 value='/ramdisk-root' name='bootargs' type=string items=1 value='' name='bootfs' type=any items=4 value=fe.6e.15.98 name='bootpath' type=any items=1 value=' /pci@400/pci@1/pci@0/pci@c/LSI,sas@0/disk@w5000c500138bfd21,0:a' name='elfheader-address' type=int items=1 value=50000000 name='elfheader-length' type=any items=4 value=00.45.40.00 name='fm-capable' type=int items=1 value=00000009 name='fs-package' type=any items=1
Another way to locate instances of your driver is the ::devbindings dcmd. Given a driver name, the command displays a list of all instances of the named driver as demonstrated in the following example.
Example 144 Using the ::devbindings Dcmd to Locate Driver Instances> ::devbindings dad 300015ce3d8 videodisk (driver not attached) 300015c9a60 dad, instance #0 System properties at 0x300015ab400: name='lean' type=int items=1 value=00000000 name='target' type=int items=1 value=00000000 name='class_prop' type=string items=1 value='data' name='type' type=string items=1 value='data' name='class' type=string items=1 value='dada' ... 300015c9880 dad, instance #1 System properties at 0x300015ab080: name='lean' type=int items=1 value=00000000 name='target' type=int items=1 value=00000002 name='class_prop' type=string items=1 value='data' name='type' type=string items=1 value='data' name='class' type=string items=1 value='dada'
A common problem when debugging a driver is retrieving the soft state for a particular driver instance. The soft state is allocated with the ddi_soft_state_zalloc (9F) routine. The driver can obtain the soft state through ddi_get_soft_state (9F) . The name of the soft state pointer is the first argument to ddi_soft_state_init (9F) ). With the name, you can use mdb to retrieve the soft state for a particular driver instance through the ::soft state dcmd:
> *bst_state::softstate 0x3 702b7578
In this case, ::soft state is used to fetch the soft state for instance 3 of the best sample driver. This pointer references a bst_soft structure that is used by the driver to track state for this instance.
You can use both kmdb and mdb to modify kernel variables or other kernel state. Kernel state modification with mdb should be done with care, because mdb does not stop the kernel before making modifications. Groups of modifications can be made atomically by using kmdb, because kmdb stops the kernel before allowing access by the user. The mdb debugger is capable of making single atomic modifications only.
Be sure to use the proper format specifier to perform the modification. The formats are:
w – Writes the lowest two bytes of the value of each expression to the target beginning at the location specified by dot
W – Writes the lowest 4 bytes of the value of each expression to the target beginning at the location specified by dot
Z – Write the complete 8 bytes of the value of each expression to the target beginning at the location specified by dot
Use the ::sized dcmd to determine the size of the variable to be modified.
The following example overwrites the value of mod debug with the value 0x80000000.
Example 145 Modifying a Kernel Variable With a Debugger> moddebug/W 0x80000000 mod debug: 0 = 0x80000000