10.2 Using the crash Debugger

10.2.1 Installing the crash Packages
10.2.2 Running crash
10.2.3 Kernel Data Structure Analysis Commands
10.2.4 System State Commands
10.2.5 Helper Commands
10.2.6 Session Control Commands
10.2.7 Guidelines for Examining a Dump File

The crash utility allows you to analyze the state of the Oracle Linux system while it is running or of a core dump that resulted from a kernel crash. crash has been merged with the GNU Debugger gdb to provide source code debugging capabilities.

10.2.1 Installing the crash Packages

To use crash, you must install the crash package and the appropriate debuginfo and debuginfo-common packages.

To install the required packages:

  1. Install the latest version of the crash package:

    # yum install crash
  2. Download the appropriate debuginfo and debuginfo-common packages for the vmcore or kernel that you want to examine from https://oss.oracle.com/ol6/debuginfo/:

    • If you want to examine the running Unbreakable Enterprise Kernel on the system, use commands such as the following to download the packages:

      # export DLP="https://oss.oracle.com/ol6/debuginfo"
      # wget ${DLP}/kernel-uek-debuginfo-`uname -r`.rpm
      # wget ${DLP}/kernel-uek-debuginfo-common-`uname -r`.rpm
    • If you want to examine the running Red Hat Compatible Kernel on the system, use commands such as the following to download the packages:

      # export DLP="https://oss.oracle.com/ol6/debuginfo"
      # wget ${DLP}/kernel-debuginfo-`uname -r`.rpm
      # wget ${DLP}/kernel-debuginfo-common-`uname -r`.rpm
    • If you want to examine a vmcore file that relates to a different kernel than is currently running, download the appropriate debuginfo and debuginfo-common packages for the kernel that produce the vmcore, for example:

      # export DLP="https://oss.oracle.com/ol6/debuginfo"
      # wget ${DLP}/kernel-uek-debuginfo-2.6.32-300.27.1.el6uek.x86_64.rpm
      # wget ${DLP}/kernel-uek-debuginfo-common-2.6.32-300.27.1.el6uek.x86_64.rpm
      Note

      If the vmcore file was produced by Kdump, you can use the following crash command to determine the version:

      # crash --osrelease /var/tmp/vmcore/2013-0211-2358.45-host03.28.core
      2.6.39-200.24.1.el6uek.x86_64
  3. Install the debuginfo and debuginfo-common packages, for example:

    # rpm -Uhv kernel-uek-debuginfo-2.6.32-300.27.1.el6uek.x86_64.rpm \
      kernel-uek-debuginfo-common-2.6.32-300.27.1.el6uek.x86_64.rpm

    The vmlinux kernel object file (also known as the namelist file) that crash requires is installed in /usr/lib/debug/lib/modules/kernel_version/.

10.2.2 Running crash

Warning

Running crash on a live system is dangerous and can cause data corruption or total system failure. Do not use crash to examine a production system unless so directed by Oracle Support.

To examine the currently running kernel:

# crash

To determine the version of the kernel that produced a vmcore file:

# crash --osrelease /var/tmp/vmcore/2013-0211-2358.45-host03.28.core
2.6.39-200.24.1.el6uek.x86_64

To examine a vmcore file, specify the path to the file as an argument, for example:

# crash /var/tmp/vmcore/2013-0211-2358.45-host03.28.core

The appropriate vmlinux file must exist in /usr/lib/debug/lib/modules/kernel_version/.

If the vmlinux file is located elsewhere, specify its path before the path to the vmcore file, for example:

# crash /var/tmp/namelist/vmlinux-host03.28 /var/tmp/vmcore/2013-0211-2358.45-host03.28.core

The following crash output is from a vmcore file that was dumped after a system panic:

      KERNEL: /usr/lib/debug/lib/modules/2.6.39-200.24.1.el6uek.x86_64/vmlinux
    DUMPFILE: /var/tmp/vmcore/2013-0211-2358.45-host03.28.core
        CPUS: 2
        DATE: Fri Feb 11 16:55:41 2013
      UPTIME: 04:24;54
LOAD AVERAGE: 0.00, 0.01, 0.05
       TASKS: 84
    NODENAME: host03.mydom.com
     RELEASE: 2.6.39-200.24.1.el6uek.x86_64
     VERSION: #1 SMP Sat Jun 23 02:39:07 EDT 2012
     MACHINE: x86_64  (2992 MHz)
      MEMORY: 2 GB
       PANIC: "Oops: 0002" (check log for details)
         PID: 1696
     COMMAND: "insmod“
        TASK: c74de000
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash>

The output includes the number of CPUs, the load average over the last 1 minute, last 5 minutes, and last 15 minutes, the number of tasks running, the amount of memory, the panic string, and the command that was executing at the time the dump was created. In this example, an attempt by insmod to install a module resulted in an oops violation.

At the crash> prompt, you can enter help or ? to display the available crash commands. Enter help command to display more information for a specified command.

crash commands can be grouped into several different groups according to purpose:

Kernel Data Structure Analysis Commands

Display kernel text and data structures. See Section 10.2.3, “Kernel Data Structure Analysis Commands”.

System state commands

Examine kernel subsystems on a system-wide or a per-task basis. See Section 10.2.4, “System State Commands”.

Helper commands

Perform calculation, translation, and search functions. See Section 10.2.5, “Helper Commands”

Session control commands

Control the crash session. See Section 10.2.6, “Session Control Commands”

For more information, see the crash(8) manual page.

10.2.3 Kernel Data Structure Analysis Commands

The following crash commands takes advantage of gdb integration to display kernel data structures symbolically:

*

The pointer-to command can be used instead struct or union. The gdb module calls the appropriate function. For example:

crash> *buffer_head
struct buffer_head {
    long unsigned int b_state;
    struct buffer_head *b_this_page;
    struct page *b_page;
    sector_t b_blocknr;
    size_t b_size;
    char *b_data;
    struct block_device *b_bdev;
    bh_end_io_t *b_end_io;
    void *b_private;
    struct list_head b_assoc_buffers;
    struct address_space *b_assoc_map;
    atomic_t b_count;
}
SIZE: 104
dis

Disassembles source code instructions of a complete kernel function, from a specified address for a specified number of instructions, or from the beginning of a function up to a specified address. For example:

crash> dis fixup_irqs
0xffffffff81014486 <fixup_irqs>:        push   %rbp
0xffffffff81014487 <fixup_irqs+1>:      mov    %rsp,%rbp
0xffffffff8101448a <fixup_irqs+4>:      push   %r15
0xffffffff8101448c <fixup_irqs+6>:      push   %r14
0xffffffff8101448e <fixup_irqs+8>:      push   %r13
0xffffffff81014490 <fixup_irqs+10>:     push   %r12
0xffffffff81014492 <fixup_irqs+12>:     push   %rbx
0xffffffff81014493 <fixup_irqs+13>:     sub    $0x18,%rsp
0xffffffff81014497 <fixup_irqs+17>:     nopl   0x0(%rax,%rax,1)
...
p

Displays the contents of a kernel variable. For example:

crash> p init_mm
init_mm = $5 = {
  mmap = 0x0, 
  mm_rb = {
    rb_node = 0x0
  }, 
  mmap_cache = 0x0, 
  get_unmapped_area = 0, 
  unmap_area = 0, 
  mmap_base = 0, 
  task_size = 0, 
  cached_hole_size = 0, 
  free_area_cache = 0, 
  pgd = 0xffffffff81001000, 
...
struct

Displays either a structure definition, or a formatted display of the contents of a structure at a specified address. For example:

crash> struct cpu
struct cpu {
    int node_id;
    int hotpluggable;
    struct sys_device sysdev;
}
SIZE: 88
sym

Translates a kernel symbol name to a kernel virtual address and section, or a kernel virtual address to a symbol name and section. You can also query (-q) the symbol list for all symbols containing a specified string or list (-l) all kernel symbols. For example:

crash> sym jiffies
ffffffff81b45880 (A) jiffies
crash> sym -q runstate
c590 (d) per_cpu__runstate
c5c0 (d) per_cpu__runstate_snapshot
ffffffff8100e563 (T) xen_setup_runstate_info
crash> sym -l
0 (D) __per_cpu_start
0 (D) per_cpu__irq_stack_union
4000 (D) per_cpu__gdt_page
5000 (d) per_cpu__exception_stacks
b000 (d) per_cpu__idt_desc
b010 (d) per_cpu__xen_cr0_value
b018 (D) per_cpu__xen_vcpu
b020 (D) per_cpu__xen_vcpu_info
b060 (d) per_cpu__mc_buffer
c570 (D) per_cpu__xen_mc_irq_flags
c578 (D) per_cpu__xen_cr3
c580 (D) per_cpu__xen_current_cr3
c590 (d) per_cpu__runstate
c5c0 (d) per_cpu__runstate_snapshot
...
union

Similar to the struct command, displaying kernel data types that are defined as unions instead of structures.

whatis

Displays the definition of structures, unions, typedefs or text or data symbols. For example:

crash> whatis linux_binfmt
struct linux_binfmt {
    struct list_head lh;
    struct module *module;
    int (*load_binary)(struct linux_binprm *, struct pt_regs *);
    int (*load_shlib)(struct file *);
    int (*core_dump)(long int, struct pt_regs *, struct file *, long unsigned int);
    long unsigned int min_coredump;
    int hasvdso;
}
SIZE: 64

10.2.4 System State Commands

The following commands display kernel subsystems on a system-wide or per-task basis:

bt

Displays a kernel stack trace of the current context or of a specified PID or task. In the case of a dump that followed a kernel panic, the command traces the functions that were called leading up to the panic. For example:

crash> bt
PID: 10651  TASK: d1347000  CPU: 1   COMMAND: "insmod"
 #0 [d1547e44] die at c010785a
 #1 [d1547e54] do_invalid_op at c0107b2c
 #2 [d1547f0c] error_code (via invalid_op) at c01073dc
...

You can use the –l option to display the line number of the source file that corresponds to each function call in a stack trace.

crash> bt -l 1
PID: 1      TASK: ffff88007d032040  CPU: 1   COMMAND: "init"
 #0 [ffff88007d035878] schedule at ffffffff8144fdd4
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/kernel/sched.c: 3091
 #1 [ffff88007d035950] schedule_hrtimeout_range at ffffffff814508e4
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/include/asm/current.h: 14
 #2 [ffff88007d0359f0] poll_schedule_timeout at ffffffff811297d5
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/include/asm/current.h: 14
 #3 [ffff88007d035a10] do_select at ffffffff81129d72
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/fs/select.c: 500
 #4 [ffff88007d035d80] core_sys_select at ffffffff8112a04c
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/fs/select.c: 575
 #5 [ffff88007d035f10] sys_select at ffffffff8112a326
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/fs/select.c: 615
 #6 [ffff88007d035f80] system_call_fastpath at ffffffff81011cf2
    /usr/src/debug////////kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/kernel/entry_64.S:
    488
    RIP: 00007fce20a66243  RSP: 00007fff552c1038  RFLAGS: 00000246
    RAX: 0000000000000017  RBX: ffffffff81011cf2  RCX: ffffffffffffffff
    RDX: 00007fff552c10e0  RSI: 00007fff552c1160  RDI: 000000000000000a
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000200
    R10: 00007fff552c1060  R11: 0000000000000246  R12: 00007fff552c1160
    R13: 00007fff552c10e0  R14: 00007fff552c1060  R15: 00007fff552c121f
    ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

bt is probably the most useful crash command. It has a large number of options that you can use to examine a kernel stack trace. For more information, enter help bt.

dev

Displays character and block device data. The -d and -i options display disk I/O statistics and I/O port usage. For example:

crash> dev
CHRDEV    NAME                 CDEV        OPERATIONS      
   1      mem            ffff88007d2a66c0  memory_fops
   4      /dev/vc/0      ffffffff821f6e30  console_fops
   4      tty            ffff88007a395008  tty_fops
   4      ttyS           ffff88007a3d3808  tty_fops
   5      /dev/tty       ffffffff821f48c0  tty_fops
...
BLKDEV    NAME                GENDISK      OPERATIONS      
   1      ramdisk        ffff88007a3de800  brd_fops
 259      blkext              (none)     
   7      loop           ffff880037809800  lo_fops
   8      sd             ffff8800378e9800  sd_fops
   9      md                  (none)     
...
crash> dev -d
MAJOR GENDISK            NAME       REQUEST QUEUE      TOTAL ASYNC  SYNC   DRV
    8 0xffff8800378e9800 sda        0xffff880037b513e0    10     0    10     0
   11 0xffff880037cde400 sr0        0xffff880037b50b10     0     0     0     0
  253 0xffff880037902c00 dm-0       0xffff88003705b420     0     0     0     0
  253 0xffff880037d5f000 dm-1       0xffff88003705ab50     0     0     0     0
crash> dev -i
    RESOURCE        RANGE    NAME
ffffffff81a9e1e0  0000-ffff  PCI IO
ffffffff81a96e30  0000-001f  dma1
ffffffff81a96e68  0020-0021  pic1
ffffffff81a96ea0  0040-0043  timer0
ffffffff81a96ed8  0050-0053  timer1
ffffffff81a96f10  0060-0060  keyboard
...
files

Displays information about files that are open in the current context or in the context of a specific PID or task. For example:

crash> files 12916
PID: 12916  TASK: ffff8800276a2480  CPU: 0   COMMAND: "firefox"
ROOT: /    CWD: /home/guest
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff88001c57ab00 ffff88007ac399c0 ffff8800378b1b68 CHR  /null
  1 ffff88007b315cc0 ffff88006046f800 ffff8800604464f0 REG  /home/guest/.xsession-errors
  2 ffff88007b315cc0 ffff88006046f800 ffff8800604464f0 REG  /home/guest/.xsession-errors
  3 ffff88001c571a40 ffff88001d605980 ffff88001be45cd0 REG  /home/guest/.mozilla/firefox
  4 ffff88003faa7300 ffff880063d83440 ffff88001c315bc8 SOCK
  5 ffff88003f8f6a40 ffff88007b41f080 ffff88007aef0a48 FIFO
...
fuser

Displays the tasks that reference a specified file name or inode address as the current root directory, current working directory, open file descriptor, or that memory map the file. For example:

crash> fuser /home/guest
 PID         TASK        COMM             USAGE
 2990  ffff88007a2a8440  "gnome-session"  cwd 
 3116  ffff8800372e6380  "gnome-session"  cwd 
 3142  ffff88007c54e540  "metacity"       cwd 
 3147  ffff88007aa1e440  "gnome-panel"    cwd 
 3162  ffff88007a2d04c0  "nautilus"       cwd 
 3185  ffff88007c00a140  "bluetooth-appl  cwd 
...
irq

Displays interrupt request queue data. For example:

crash> irq 0
    IRQ: 0
 STATUS: 400000 ()
HANDLER: ffffffff81b3da30            <ioapic_chip>
         typename: ffffffff815cdaef  "IO-APIC"
          startup: ffffffff8102a513  <startup_ioapic_irq>
         shutdown: ffffffff810aef92  <default_shutdown>
           enable: ffffffff810aefe3  <default_enable>
          disable: ffffffff810aeecc  <default_disable>
              ack: ffffffff8102a43d  <ack_apic_edge>
             mask: ffffffff81029be1  <mask_IO_APIC_irq>
...
kmem

Displays the state of the kernel memory subsystems. For example:

crash> kmem -i
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM   512658         2 GB         ----
      FREE    20867      81.5 MB    4% of TOTAL MEM
      USED   491791       1.9 GB   95% of TOTAL MEM
    SHARED   176201     688.3 MB   34% of TOTAL MEM
   BUFFERS     8375      32.7 MB    1% of TOTAL MEM
    CACHED   229933     898.2 MB   44% of TOTAL MEM
      SLAB    39551     154.5 MB    7% of TOTAL MEM

TOTAL SWAP  1032190       3.9 GB         ----
 SWAP USED     2067       8.1 MB    0% of TOTAL SWAP
 SWAP FREE  1030123       3.9 GB   99% of TOTAL SWAP

kmem has a large number of options. For more information, enter help kmem.

log

Displays the kernel message buffer in chronological order. This is the same data that dmesg displays but the output can include messages that never made it to syslog or disk.

mach

Displays machine-specific information such as the cpuinfo structure and the physical memory map.

mod

Displays information about the currently installed kernel modules. The -s and -S options load debug data (if available) from the specified module object files to enable symbolic debugging.

mount

Displays information about currently mounted file systems.

net

Displays network-related information.

ps

Displays information about processes. For example:

crash> ps Xorg crash bash
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
   2679   2677   0  ffff88007cbcc400  IN   4.0  215488  84880  Xorg
> 13362  11853   0  ffff88007b25a500  RU   6.9  277632 145612  crash
   3685   3683   1  ffff880058714580  IN   0.1  108464   1984  bash
  11853  11845   1  ffff88001c6826c0  IN   0.1  108464   1896  bash
pte

Translates a page table entry (PTE) to the physical page address and page bit settings. If the PTE refers to a swap location, the command displays the swap device and offset.

runq

Displays the list of tasks that are on the run queue of each CPU.

sig

Displays signal-handling information for the current context or for a specified PID or task.

swap

Displays information about the configured swap devices.

task

Displays the contents of the task_struct for the current context or for a specified PID or task.

timer

Displays the entries in the timer queue in chronological order.

vm

Displays the virtual memory data, including the addresses of mm_struct and the page directory, resident set size, and total virtual memory size for the current context or for a specified PID or task.

vtop

Translates a user or kernel virtual address to a physical address. The command also displays the PTE translation, vm_area_struct data for user virtual addresses, mem_map page data for a physical page, and the swap location or file location if the page is not mapped.

waitq

Displays tasks that are blocked on a specified wait queue.

10.2.5 Helper Commands

The following commands perform calculation, translation, and search functions:

ascii

Translates a hexadecimal value to ASCII. With no argument, the command displays an ASCII chart.

btop

Translates a hexadecimal address to a page number.

eval

Evaluates an expression and displays the result in hexadecimal, decimal, octal, and binary. For example:

crash> eval 4g / 0x100
hexadecimal: 1000000  (16MB)
    decimal: 16777216  
      octal: 100000000
     binary: 0000000000000000000000000000000000000001000000000000000000000000
list

Displays the contents of a linked list of data objects, typically structures, starting at a specified address.

ptob

Translates a page number to its physical address (byte value).

ptov

Translates a physical address to a kernel virtual address.

search

Searches for a specified value in a specified range of user virtual memory, kernel virtual memory, or physical memory.

rd

Displays a selected range of user virtual memory, kernel virtual memory, or physical memory using the specified format.

wr

Writes a value to a memory location specified by symbol or address.

Warning

To avoid data loss or data corruption, take great care when using the wr command.

10.2.6 Session Control Commands

The following commands control the crash session:

alias

Defines an alias for a command. With no argument, the command displays the current list of aliases.

exit, q, or quit

Ends the crash session.

extend

Loads or unloads the specified crash extension shared object libraries.

foreach

Execute a bt, files, net, task, set, sig, vm, or vtop command on multiple tasks.

gdb

Passes any arguments to the GNU Debugger for processing.

repeat

Repeats a command indefinitely until you type Ctrl-C. This command is only useful when you use crash to examine a live system.

set

Sets the context to a specified PID or task. With no argument, the command displays the current context.

10.2.7 Guidelines for Examining a Dump File

The steps for debugging a memory dump from a kernel crash vary widely according to the problem. The following guidelines suggest some basic investigations that you can try:

  • Use bt to trace the functions that led to the kernel panic.

  • Use bt -a to trace the active task on each CPU. There is often a relationship between the panicking task on one CPU and the running tasks on the other CPUs. If the listed command is cpu_idle or swapper, no task was running on a CPU.

  • Use bt –l to display the line number of the source files corresponding to each function call in the stack trace.

  • Use kmem –i to obtain a summary of memory and swap usage. Look for a SLAB value greater than 500 MB and a SWAP USED value greater than 0%.

  • Use ps | grep UN to check for processes in the TASK_UNINTERRUPTIBLE state (D state), usually because they are waiting on I/O. Such processes contribute to the load average and cannot be killed.

  • Use files to display the files that a process had open.

You can shell indirection operators to save output from a command to a file for later analysis or to pipe the output through commands such as grep, for example:

crash> foreach files > files.txt
crash> foreach bt | grep bash
PID: 3685   TASK: ffff880058714580  CPU: 1   COMMAND: "bash"
PID: 11853  TASK: ffff88001c6826c0  CPU: 0   COMMAND: "bash"