Kernel Destructive Actions (Solaris Dynamic Tracing Guide)

Solaris Dynamic Tracing Guide

Kernel Destructive Actions

Some destructive actions are destructive to the entire system. These actions must obviously be used extremely carefully, as they will affect every process on the system and any other system implicitly or explicitly depending upon the affected system's network services.

`breakpoint()`

void breakpoint(void)

The breakpoint() action induces a kernel breakpoint, causing the system to stop and transfer control to the kernel debugger. The kernel debugger will emit a string denoting the DTrace probe that triggered the action. For example, if one were to do the following:

# dtrace -w -n clock:entry'{breakpoint()}'
dtrace: allowing destructive actions
dtrace: description 'clock:entry' matched 1 probe

On Solaris running on SPARC, the following message might appear on the console:

dtrace: breakpoint action at probe fbt:genunix:clock:entry (ecb 30002765700)
Type  'go' to resume
ok

On Solaris running on x86, the following message might appear on the console:

dtrace: breakpoint action at probe fbt:genunix:clock:entry (ecb d2b97060)
stopped at      int20+0xb:      ret
kmdb[0]:

The address following the probe description is the address of the enabling control block (ECB) within DTrace. You can use this address to determine more details about the probe enabling that induced the breakpoint action.

A mistake with the breakpoint() action may cause it to be called far more often than intended. This behavior might in turn prevent you from even terminating the DTrace consumer that is triggering the breakpoint actions. In this situation, set the kernel integer variable dtrace_destructive_disallow to 1. This setting will disallow all destructive actions on the machine. Apply this setting only in this particular situation.

The exact method for setting dtrace_destructive_disallow will depend on the kernel debugger that you are using. If using the OpenBoot PROM on a SPARC system, use w!:

ok 1 dtrace_destructive_disallow w!
ok

Confirm that the variable has been set using w?:

ok dtrace_destructive_disallow w?
1
ok

Continue by typing go:

ok go

If using kmdb(1) on x86 or SPARC systems, use the 4–byte write modifier (W) with the / formatting dcmd:

kmdb[0]: dtrace_destructive_disallow/W 1
dtrace_destructive_disallow:    0x0             =       0x1
kmdb[0]:

Continue using :c:

kadb[0]: :c

To re-enable destructive actions after continuing, you will need to explicitly reset dtrace_destructive_disallow back to 0 using mdb(1):

# echo "dtrace_destructive_disallow/W 0" | mdb -kw
dtrace_destructive_disallow:    0x1             =       0x0
#

`panic()`

void panic(void)

The panic() action causes a kernel panic when triggered. This action should be used to force a system crash dump at a time of interest. You can use this action together with ring buffering and postmortem analysis to understand a problem. For more information, see Chapter 11, Buffers and Buffering and Chapter 37, Postmortem Tracing respectively. When the panic action is used, a panic message appears that denotes the probe causing the panic. For example:

  panic[cpu0]/thread=30001830b80: dtrace: panic action at probe
  syscall::mmap:entry (ecb 300000acfc8)

  000002a10050b840 dtrace:dtrace_probe+518 (fffe, 0, 1830f88, 1830f88,
    30002fb8040, 300000acfc8)
    %l0-3: 0000000000000000 00000300030e4d80 0000030003418000 00000300018c0800
    %l4-7: 000002a10050b980 0000000000000500 0000000000000000 0000000000000502
  000002a10050ba30 genunix:dtrace_systrace_syscall32+44 (0, 2000, 5,
    80000002, 3, 1898400)
    %l0-3: 00000300030de730 0000000002200008 00000000000000e0 000000000184d928
    %l4-7: 00000300030de000 0000000000000730 0000000000000073 0000000000000010

  syncing file systems... 2 done
  dumping to /dev/dsk/c0t0d0s1, offset 214827008, content: kernel
  100% done: 11837 pages dumped, compression ratio 4.66, dump
  succeeded
  rebooting...

syslogd(1M) will also emit a message upon reboot:

  Jun 10 16:56:31 machine1 savecore: [ID 570001 auth.error] reboot after panic:
  dtrace: panic action at probe syscall::mmap:entry (ecb 300000acfc8)

The message buffer of the crash dump also contains the probe and ECB responsible for the panic() action.

`chill()`

void chill(int nanoseconds)

The chill() action causes DTrace to spin for the specified number of nanoseconds. chill() is primarily useful for exploring problems that might be timing related. For example, you can use this action to open race condition windows, or to bring periodic events into or out of phase with one another. Because interrupts are disabled while in DTrace probe context, any use of chill() will induce interrupt latency, scheduling latency, dispatch latency. Therefore, chill() can cause unexpected systemic effects and it should not used indiscriminately. Because system activity relies on periodic interrupt handling, DTrace will refuse to execute the chill() action for more than 500 milliseconds out of each one-second interval on any given CPU. If the maximum chill() interval is exceeded, DTrace will report an illegal operation error, as shown in the following example:

# dtrace -w -n syscall::open:entry'{chill(500000001)}'
dtrace: allowing destructive actions
dtrace: description 'syscall::open:entry' matched 1 probe
dtrace: 57 errors
CPU     ID                    FUNCTION:NAME
dtrace: error on enabled probe ID 1 (ID 14: syscall::open:entry): \
  illegal operation in action #1

This limit is enforced even if the time is spread across multiple calls to chill(), or multiple DTrace consumers of a single probe. For example, the same error would be generated by the following command:

# dtrace -w -n syscall::open:entry'{chill(250000000); chill(250000001);}'