This example shows how kadb can be used to debug a driver bug. This example was taken from the development of the ramdisk sample driver. This driver exports physical memory as a virtual disk. In this case, the dd(1M) command hangs while trying to copy some data onto the device and cannot be aborted. Though a crash dump could be forced, for illustrative purposes, kadb(1M) will be used. After logging into the system remotely, ps was used to determine that the system was still running; and only the dd(1M) command is hung.
At this point, the system is rebooted with kadb, which can now be entered by typing STOP-A on the system console. After the rest of the kernel has loaded, moddebug is patched to see if loading is the problem:
stopped at: edd000d8: ta %icc,%g0 + 125 kadb[0]: moddebug/X moddebug: moddebug: 0 kadb[0]: moddebug/W 0x80000000 moddebug: 0x0 = 0x80000000 kadb[0]: :c
modload(1M) is used to load the driver, to separate module loading from the real access:
# modload /home/driver/drv/ramdisk
It loads without errors, so loading is not the problem. The condition is recreated with dd(1M):
# dd if=/dev/zero of=/devices/pseudo/ramdisk@0:c,raw
dd(1M) hangs. At this point, kadb(1M) is entered and the stack examined:
stopped at: edd000d8: ta %icc,%g0 + 125 kadb[0]: $c intr_vector() + 7dcfc0d8 debug_enter(0,0,10431e50,10,1,b0) + 78 zsa_xsint(80,7044a06c,44,7044a000,ff0113,0) + 278 zs_high_intr(7044a000,1,1,1042f78c,10424680,100949d0) + 20c sbus_intr_wrapper(704dfad4,0,702bd048,7029cec0,630,10260250) + 30 current_thread(4001fe60,1041a550,10424698,10424698,10150f08,0) + 180 idle(1040b6c0,0,0,1041a550,704d6a98,0) + 54 thread_start(0,0,0,0,0,0) + 4
The presence of idle on the current thread stack indicates that this thread is not the cause of the deadlock. To determine the deadlocked thread, the entire thread list is checked:
kadb[0]: $<threadlist ... ============== thread_id 70cef120 70c8b1c0: process args dd if=/dev/zero of=/devices/pseudo/ramdisk@0:c,raw 70cef1c8: lwp procp wchan 70fa9080 70c8aec0 70691fc8 70cef144: pc sp sema_p+0x290 40313a78 ?(70691fc8,10424680,1,1042b99c,10460f8c,70691fc8) biowait(70691f60,1041a6c4,70691f60,70c385d0,40313bcc,705c73a0) + 8c default_physio(1042e8fc,200,129,100,70eb5b54,705c73a0) + 3bc write(2002,70aac1d0,70f9f9ac,200,4,200) + 23c ...
Of all the threads, only one has a stack trace which references the ramdisk driver. It seems that the process running dd(1M) is blocked in biowait(9F). biowait(9F)'s first parameter is a buf(9S) structure. The next step is to examine this structure:
kadb[0]: 70691f60$70691f60$ 70691f60: flags forw back 204129 0 0 70691f6c: av_forw av_back bcount 0 0 512 70691fa0: bufsize error edev 0 0 1180000 70691f7c: un.b_addr _b_blkno resid 710e8000 0 0 70691f94: proc iodone vp 70c8aec0 0 0 70691f98: pages 0
The resid field is 0, which indicates that the transfer is complete. physio(9F) is still blocked, however. The reference for physio(9F) in the Solaris 8 Reference Manual Collection points out that biodone(9F) should be called to unblock biowait(9F). This is the problem; rd_strategy() did not call biodone(9F). Adding a call to biodone(9F) before returning fixes this problem.