During the development of the example ramdisk driver, the system crashes with a data fault when running mkfs(1M).
test# mkfs -F ufs -o nsect=8,ntrack=8,free=5 /devices/pseudo/ramdisk:0,raw 1024BAD TRAP mkfs: Data fault kernel read fault at addr=0x4, pme=0x0 Sync Error Reg 80<INVALID> pid=280, pc=0xff2f88b0, sp=0xf01fe750, psr=0xc0, context=2 g1-g7: ffffff98, 8000000, ffffff80, 0, f01fe9d8, 1, ff1d4900 Begin traceback... sp = f01fe750 Called from f0098050,fp=f01fe7b8,args=1180000 f01fe878 ff1ed280 ff1ed280 2 ff2f8884 Called from f0097d94,fp=f01fe818,args=ff24fd40 f01fe878 f01fe918 0 0 ff2c9504 Called from f0024e8c,fp=f01fe8b0,args=f01fee90 f01fe918 2 f01fe8a4 f01fee90 3241c Called from f0005a28,fp=f01fe930,args=f00c1c54 f01fe98c 1 f00b9d58 0 3 Called from 15c9c,fp=effffca0,args=5 3241c 200 0 0 7fe00 End traceback... panic: Data fault
When the system comes up, it saves the kernel and the core file, which can then be examined with adb(1):
# cd /var/crash/test# lsbounds unix.0 vmcore.0 # adb -k unix.0 vmcore.0physmem 1ece
The first step is to examine the stack to determine where the system was when it crashed:
$ccomplete_panic(0x0,0x1,0xf00b6c00,0x7d0,0xf00b6c00,0xe3) + 114 do_panic(0xf00be7ac,0xf0269750,0x4,0xb,0xb,0xf00b6c00) + 1c die(0x9,0xf0269704,0x4,0x80,0x1,0xf00be7ac) + 5c trap(0x9,0xf0269704,0x4,0x80,0x1,0xf02699d8) + 6b4
This stack trace is not helpful initially, as the ramdisk routines are not on the stack trace. However, there is a useful bit of information: the call to trap(). The first argument to trap() is the trap type. The second argument to trap() is a pointer to a regs structure containing the state of the registers at the time of the trap. See The SPARC Architecture Manual, Version 9 for more information.
0xf0269704$<regs0xf0269704: psr pc npc c0 ff2dd8b0 ff2dd8b4 0xf0269710: y g1 g2 g3 e0000000 ffffff98 8000000 ffffff80 0xf0269720: g4 g5 g6 g7 0 f02699d8 1 ff22c800 0xf0269730: o0 o1 o2 o3 f02697a0 ff080000 19000 ef709000 0xf0269740: o4 o5 o6 o7 8000 0 f0269750 7fffffff
Note that the program counter (pc) in the previous example was ff2dd8b0 when the trap occurred. The next step is to determine which routine it is in.
ff2dd8b0/ird_write+0x2c: ld [%o2 + 0x4], %o3
The pc corresponds to rd_write(), which is a routine in the ramdisk driver. The bug is in the ramdisk write routine, and occurs during an load (ld) instruction. This load instruction is dereferencing the value of o2+4, so the next step is to determine the value of o2.
Using the $r command to examine the registers is inappropriate because the registers have been reused in the trap routine. Instead, examine the value of o2 from the regs structure.
o2 has the value 19000 in the regs structure. Valid kernel addresses are constrained to be above KERNELBASE by the ABI, so this is probably a user address. The ramdisk does not deal with user addresses; consequently, the ramdisk write routine should not dereference an address below KERNELBASE.
To match the assembly language with the C code, the routine is disassembled up to the problem instruction. Each instruction is 4 bytes in size, so 2c/4 or 0xb additional instructions should be displayed:
rd_write,c/ird_write: rd_write: sethi %hi(0xfffffc00), %g1 add %g1, 0x398, %g1 ! ffffff98 save %sp, %g1, %sp st %i0, [%fp + 0x44] st %i1, [%fp + 0x48] st %i2, [%fp + 0x4c] ld [%fp + 0x44], %o0 call getminor nop st %o0, [%fp - 0x4] ld [%fp - 0x8], %o2 ld [%o2 + 0x4], %o3
The crash occurs a few instructions after a call to getminor(9F). If the ramdisk.c file is examined, the following lines stand out in rd_write:
int instance = getminor(dev); rd_devstate_t *rsp; if (uiop->uio_offset >= rsp->ramsize) return (EINVAL);
Notice that rsp is never initialized. This is the problem. It is fixed by including the correct call to ddi_get_soft_state(9F) (as the ramdisk driver uses the soft state routines to do state management):
int instance = getminor(dev); rd_devstate_t *rsp = ddi_get_soft_state(rd_state, instance); if (uiop->uio_offset >= rsp->ramsize) return (EINVAL);
Many data fault panics are the result of bad pointer references.