Writing Device Drivers

Example: adb on a Core Dump

During the development of the example ramdisk driver, the system crashes with a data fault when running mkfs(1M).

test# mkfs -F ufs -o nsect=8,ntrack=8,free=5 /devices/pseudo/ramdisk:0,raw 1024BAD TRAP
mkfs: Data fault
kernel read fault at addr=0x4, pme=0x0
Sync Error Reg 80<INVALID>
pid=280, pc=0xff2f88b0, sp=0xf01fe750, psr=0xc0, context=2
g1-g7: ffffff98, 8000000, ffffff80, 0, f01fe9d8, 1, ff1d4900
Begin traceback... sp = f01fe750
Called from f0098050,fp=f01fe7b8,args=1180000 f01fe878 ff1ed280 ff1ed280 2 ff2f8884
Called from f0097d94,fp=f01fe818,args=ff24fd40 f01fe878 f01fe918 0 0 ff2c9504
Called from f0024e8c,fp=f01fe8b0,args=f01fee90 f01fe918 2 f01fe8a4 f01fee90 3241c
Called from f0005a28,fp=f01fe930,args=f00c1c54 f01fe98c 1 f00b9d58 0 3
Called from 15c9c,fp=effffca0,args=5 3241c 200 0 0 7fe00
End traceback...
panic: Data fault

When the system comes up, it saves the kernel and the core file, which can then be examined with adb(1):

# cd /var/crash/test# lsbounds    unix.0    vmcore.0
# adb -k unix.0 vmcore.0physmem 1ece

The first step is to examine the stack to determine where the system was when it crashed:

$ccomplete_panic(0x0,0x1,0xf00b6c00,0x7d0,0xf00b6c00,0xe3) + 114
do_panic(0xf00be7ac,0xf0269750,0x4,0xb,0xb,0xf00b6c00) + 1c
die(0x9,0xf0269704,0x4,0x80,0x1,0xf00be7ac) + 5c
trap(0x9,0xf0269704,0x4,0x80,0x1,0xf02699d8) + 6b4

This stack trace is not helpful initially, as the ramdisk routines are not on the stack trace. However, there is a useful bit of information: the call to trap(). The first argument to trap() is the trap type. The second argument to trap() is a pointer to a regs structure containing the state of the registers at the time of the trap. See The SPARC Architecture Manual, Version 9 for more information.

0xf0269704$<regs0xf0269704: psr					pc			npc
			c0		ff2dd8b0			ff2dd8b4
0xf0269710: y						g1			g2			g3
			e0000000			ffffff98			8000000			ffffff80
0xf0269720: g4						g5			g6			g7
			0			f02699d8			1			ff22c800
0xf0269730: o0						o1			o2			o3
			f02697a0			ff080000			19000			ef709000
0xf0269740: o4						o5			o6			o7
			8000			0			f0269750			7fffffff

Note that the program counter (pc) in the previous example was ff2dd8b0 when the trap occurred. The next step is to determine which routine it is in.

ff2dd8b0/ird_write+0x2c: ld [%o2 + 0x4], %o3

The pc corresponds to rd_write(), which is a routine in the ramdisk driver. The bug is in the ramdisk write routine, and occurs during an load (ld) instruction. This load instruction is dereferencing the value of o2+4, so the next step is to determine the value of o2.


Note -

Using the $r command to examine the registers is inappropriate because the registers have been reused in the trap routine. Instead, examine the value of o2 from the regs structure.


o2 has the value 19000 in the regs structure. Valid kernel addresses are constrained to be above KERNELBASE by the ABI, so this is probably a user address. The ramdisk does not deal with user addresses; consequently, the ramdisk write routine should not dereference an address below KERNELBASE.

To match the assembly language with the C code, the routine is disassembled up to the problem instruction. Each instruction is 4 bytes in size, so 2c/4 or 0xb additional instructions should be displayed:

rd_write,c/ird_write:
rd_write:			sethi		%hi(0xfffffc00), %g1
			add		%g1, 0x398, %g1		 ! ffffff98
			save		%sp, %g1, %sp
			st		%i0, [%fp + 0x44]
			st		%i1, [%fp + 0x48]
			st		%i2, [%fp + 0x4c]
			ld		[%fp + 0x44], %o0
			call		getminor
			nop
			st		%o0, [%fp - 0x4]
			ld		[%fp - 0x8], %o2
			ld		[%o2 + 0x4], %o3

The crash occurs a few instructions after a call to getminor(9F). If the ramdisk.c file is examined, the following lines stand out in rd_write:

int instance = getminor(dev);
 rd_devstate_t *rsp;

 if (uiop->uio_offset >= rsp->ramsize)
 	return (EINVAL);

Notice that rsp is never initialized. This is the problem. It is fixed by including the correct call to ddi_get_soft_state(9F) (as the ramdisk driver uses the soft state routines to do state management):

int instance = getminor(dev);
 rd_devstate_t *rsp = ddi_get_soft_state(rd_state, instance);
if (uiop->uio_offset >= rsp->ramsize)
 	return (EINVAL);

Note -

Many data fault panics are the result of bad pointer references.