System Administration Guide

System Crashes

System crashes can occur due to hardware malfunctions, power failures, I/O (input/output) problems, and software errors. If a software glitch, such as a fatal kernel error caused by an operating system bug, causes a system to crash, the system writes an image of its physical memory into a core file at the end of the swap slice of the disk. This file is a snapshot of the state of the kernel, including its program text, data, and control structures, captured at the time of the crash.

Crash Dump (or Core) Files

The crash dump or core file written when a UNIX system crashes can provide clues about what caused the crash if it is examined by an experienced kernel debugger. However, when a UNIX system reboots after a crash, it generally overwrites any core file that may have been produced--unless you have enabled the system to save the core file in a crash dump file.

See "Using Crash Dumps Task Map" for detailed instructions on how to enable a system to save crash dump files. Crash dump files can be very big, so do not retain them longer than necessary.

Saving Crash Dumps

You can examine the control structures, active tables, memory images of a live or crashed system kernel, and other information about the operation of the kernel by using the crash utility. Using crash to its full potential requires a detailed knowledge of the kernel, and is beyond the scope of this manual. See crash(1M)for more details on the operation of the crash utility.

Additionally, crash dumps saved by crash can be useful to send to a customer service representative for analysis of why the system is crashing. If you will be sending crash dump files to a customer service representative, perform the first three tasks listed in "Using Crash Dumps Task Map".