10.1 About Kdump

10.1.1 Configuring and Using Kdump
10.1.2 Files Used by Kdump
10.1.3 Using Kdump with OCFS2
10.1.4 Using Kdump with a System Hang

Kdump is the Linux kernel crash-dump mechanism. Oracle recommends that you enable the Kdump feature. In the event of a system crash, Kdump creates a memory image (vmcore) that can help in determining the cause of the crash. Enabling Kdump requires you to reserve a portion of system memory for exclusive use by Kdump. This memory is unavailable for other uses.

Kdump uses kexec to boot into a second kernel whenever the system crashes. kexec is a fast-boot mechanism which allows a Linux kernel to boot from inside the context of a kernel that is already running without passing through the bootloader stage.

10.1.1 Configuring and Using Kdump

During installation, you are given the option of enabling Kdump and specifying the amount of memory to reserve for it. If you prefer, you can enable kdump at a later time as described in this section.

If the kexec-tools and system-config-kdump packages are not already installed on your system, use yum to install them.

To enable Kdump by using the Kernel Dump Configuration GUI.

  1. Enter the following command.

    # system-config-kdump

    The Kernel Dump Configuration GUI starts. If Kdump is currently disabled, the green Enable button is selectable and the Disable button is greyed out.

  2. Click Enable to enable Kdump.

  3. You can select the following settings tags to adjust the configuration of Kdump.

    Basic Settings

    Allows you to specify the amount of memory to reserve for Kdump. The default setting is 128 MB.

    Target Settings

    Allows you to specify the target location for the vmcore dump file on a locally accessible file system, to a raw disk device, or to a remote directory using NFS or SSH over IPv4. The default location is /var/crash.

    You cannot save a dump file on an eCryptfs file system, on remote directories that are NFS mounted on the rootfs file system, or on remote directories that access require the use of IPv6, SMB, CIFS, FCoE, wireless NICs, multipathed storage, or iSCSI over software initiators to access them.

    Filtering Settings

    Allows to select which type of data to include in or exclude from the dump file. Selecting or deselecting the options alters the value of the argument that Kdump specifies to the -d option of the core collector program, makedumpfile.

    Expert Settings

    Allows you to choose which kernel to use, edit the command line options that are passed to the kernel and the core collector program, choose the default action if the dump fails, and modify the options to the core collector program, makedumpfile.

    For example, if Kdump fails to start, and the following error appears in /var/log/messages, set the offset for the reserved memory to 48 MB or greater in the command line options, for example crashkernel=128M@48M:

    kdump: No crashkernel parameter specified for running kernel

    The Unbreakable Enterprise Kernel supports the use of the crashkernel=auto setting for UEK Release 3 Quarterly Update 1 and later. If you use the crashkernel=auto setting, the output of the dmesg command shows crashkernel=XM@0M, which is normal. The setting actually reserves 128 MB plus 64 MB for each terabyte of physical memory.

    Note

    You cannot configure crashkernel=auto for Xen or for the UEK prior to UEK Release 3 Quarterly Update 1. Only standard settings such as crashkernel=128M@48M are supported. For systems with more than 128 GB of memory, the recommended setting is crashkernel=512M@64M.

    You can select one of five default actions should the dump fail:

    mount rootfs and run /sbin/init

    Mount the root file system and run init. The /etc/init.d/kdump script attempts to save the dump to /var/crash, which requires a large amount of memory to be reserved.

    reboot

    Reboot the system, losing the vmcore. This is the default action.

    shell

    Enter a shell session inside the initramfs so that you can attempt to record the core. To reboot the system, exit the shell.

    halt

    Halt the system.

    poweroff

    Power down the system.

    Click Help for more information on these settings.

  4. Click Apply to save your changes. The GUI displays a popup message to remind you that you must reboot the system for the changes to take effect.

  5. Click OK to dismiss the popup messages.

  6. Select File > Quit.

  7. Reboot the system at a suitable time.

10.1.2 Files Used by Kdump

The Kernel Dump Configuration GUI modifies the following files:

File

Description

/boot/grub/grub.conf

Appends the crashkernel option to the kernel line to specify the amount of reserved memory and any offset value.

/etc/kdump.conf

Sets the location where the dump file can be written, the filtering level for the makedumpfile command, and the default behavior to take if the dump fails. See the comments in the file for information about the supported parameters.

If you edit these files, you must reboot the system for the changes to take effect.

For more information, see the kdump.conf(5) manual page.

10.1.3 Using Kdump with OCFS2

By default, a fenced node in an OCFS2 cluster restarts instead of panicking so that it can quickly rejoin the cluster. If the reason for the restart is not apparent, you can change the node's behavior so that it panics and generates a vmcore for analysis.

To configure a node to panic when it next fences, run the following command on the node after the cluster starts:

# echo panic > /sys/kernel/config/cluster/cluster_name/fence_method

where cluster_name is the name of the cluster. To set the value after each reboot of the system, add this line to /etc/rc.local. To restore the default behavior, set the value of fence_method to reset instead of panic and remove the line from /etc/rc.local.

For more information, see Section 21.3.5, “Configuring the Behavior of Fenced Nodes”.

10.1.4 Using Kdump with a System Hang

To allow you to troubleshoot an issue where any user or kernel thread sleeps in the TASK_UNINTERRUPTIBLE state (D state) for more than the time interval defined by the parameter kernel.hung_task_timeout_secs, use sysctl to set the value of kernel.hung_task_panic to 1 so that the system panics and generates a vmcore for analysis.

# sysctl -w kernel.hung_task_panic=1
kernel.hung_task_panic = 1

The setting remains in force only until the system is rebooted. To make the setting persist after the system is rebooted, add it to the /etc/sysctl.conf file. To restore the default behavior, set the value of kernel.hung_task_panic to 0.

For more information, see Section 5.2.2, “Changing Kernel Parameters” and Section 5.2.4, “Parameters that Control Kernel Panics”.