4.2.4 Parameters that Control Kernel Panics

The following parameters control the circumstances under which a kernel panic can occur:

kernel.hung_task_panic

(UEK R3 only) If set to 1, the kernel panics if any kernel or user thread sleeps in the TASK_UNINTERRUPTIBLE state (D state) for more than kernel.hung_task_timeout_secs seconds. A process remains in D state while waiting for I/O to complete. You cannot kill or interrupt a process in this state.

The default value is 0, which disables the panic.

Tip

To diagnose a hung thread, you can examine /proc/PID/stack, which displays the kernel stack for both kernel and user threads.

kernel.hung_task_timeout_secs

(UEK R3 only) Specifies how long a user or kernel thread can remain in D state before a warning message is generated or the kernel panics (if the value of kernel.hung_task_panic is 1). The default value is 120 seconds. A value of 0 disables the timeout.

kernel.nmi_watchdog

If set to 1 (default), enables the non-maskable interrupt (NMI) watchdog thread in the kernel. If you want to use the NMI switch or the OProfile system profiler to generate an undefined NMI, set the value of kernel.nmi_watchdog to 0.

kernel.panic

Specifies the number of seconds after a panic before a system will automatically reset itself.

If the value is 0, the system hangs, which allows you to collect detailed information about the panic for troubleshooting. This is the default value.

To enable automatic reset, set a non-zero value. If you require a memory image (vmcore), allow enough time for Kdump to create this image. The suggested value is 30 seconds, although large systems will require a longer time.

kernel.panic_on_io_nmi

If set to 0 (default), the system tries to continue operations if the kernel detects an I/O channel check (IOCHK) NMI that usually indicates a uncorrectable hardware error. If set to 1, the system panics.

kernel.panic_on_oops

If set to 0, the system tries to continue operations if the kernel encounters an oops or BUG condition. If set to 1 (default), the system delays a few seconds to give the kernel log daemon, klogd, time to record the oops output before the panic occurs.

In an OCFS2 cluster. set the value to 1 to specify that a system must panic if a kernel oops occurs. If a kernel thread required for cluster operation crashes, the system must reset itself. Otherwise, another node might not be able to tell whether a node is slow to respond or unable to respond, causing cluster operations to hang.

kernel.panic_on_stackoverflow

(RHCK only) If set to 0 (default), the system tries to continue operations if the kernel detects an overflow in a kernel stack. If set to 1, the system panics.

kernel.panic_on_unrecovered_nmi

If set to 0 (default), the system tries to continue operations if the kernel detects an NMI that usually indicates an uncorrectable parity or ECC memory error. If set to 1, the system panics.

kernel.softlockup_panic

If set to 0 (default), the system tries to continue operations if the kernel detects a soft-lockup error that causes the NMI watchdog thread to fail to update its time stamp for more than twice the value of kernel.watchdog_thresh seconds. If set to 1, the system panics.

kernel.unknown_nmi_panic

If set to 1, the system panics if the kernel detects an undefined NMI. You would usually generate an undefined NMI by manually pressing an NMI switch. As the NMI watchdog thread also uses the undefined NMI, set the value of kernel.unknown_nmi_panic to 0 if you set kernel.nmi_watchdog to 1.

kernel.watchdog_thresh

Specifies the interval between generating an NMI performance monitoring interrupt that the kernel uses to check for hard-lockup and soft-lockup errors. A hard-lockup error is assumed if a CPU is unresponsive to the interrupt for more than kernel.watchdog_thresh seconds. The default value is 10 seconds. A value of 0 disables the detection of lockup errors.

vm.panic_on_oom

If set to 0 (default), the kernel’s OOM-killer scans through the entire task list and attempts to kill a memory-hogging process to avoid a panic. If set to 1, the kernel panics but can survive under certain conditions. If a process limits allocations to certain nodes by using memory policies or cpusets, and those nodes reach memory exhaustion status, the OOM-killer can kill one process. No panic occurs in this case because other nodes’ memory might be free and the system as a whole might not yet be out of memory. If set to 2, the kernel always panics when an OOM condition occurs. Settings of 1 and 2 are for intended for use with clusters, depending on your preferred failover policy.