Checklist for Troubleshooting a System Crash
Answer the questions in the following checklist to help isolate the system problem and to
prepare to consult with your support providers.
|
|
Is a system crash dump available?
|
|
Identify the operating system release and appropriate software application release
levels.
|
|
Identify system hardware.
|
|
Include prtdiag output for SPARC systems. Include Explorer output for other
systems.
|
|
Are patches installed? If so, include showrev -p output.
|
|
Is the problem reproducible?
This is important because a reproducible test case is often essential for debugging really
hard problems. By reproducing the problem, the service provider can build kernels with special
instrumentation to trigger, diagnose, and fix the bug.
|
|
Does the system have any third-party drivers?
Drivers run in the same address space as the kernel, with all the same privileges, so they can
cause system crashes if they have bugs.
|
|
What was the system doing before it crashed?
If the system was doing anything unusual like running a new stress test or experiencing
higher-than-usual load, that might have led to the crash.
|
|
Were there any unusual console messages right before the system crashed?
Sometimes the system will show signs of distress before it actually crashes; this information
is often useful.
|
|
Did you add any parameters to the /etc/system file?
Sometimes tuning parameters, such as increasing shared memory segments so that the system
tries to allocate more than it has, can cause the system to crash.
|
|
Did the problem start recently?
If so, did the onset of problems coincide with any changes to the system, for example, new
drivers, new software, different workload, CPU upgrade, or a memory upgrade.
|
|
|