JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Administration: Common Tasks     Oracle Solaris 11 Information Library
search filter icon
search icon

Document Information

About This Book

1.  Locating Information About Oracle Solaris Commands

2.  Managing User Accounts and Groups (Overview)

3.  Managing User Accounts and Groups (Tasks)

4.  Booting and Shutting Down an Oracle Solaris System

5.  Working With Oracle Configuration Manager

6.  Managing Services (Overview)

7.  Managing Services (Tasks)

8.  Using the Fault Manager

9.  Managing System Information (Tasks)

10.  Managing System Processes (Tasks)

11.  Monitoring System Performance (Tasks)

12.  Managing Software Packages (Tasks)

13.  Managing Disk Use (Tasks)

14.  Scheduling System Tasks (Tasks)

15.  Setting Up and Administering Printers by Using CUPS (Tasks)

16.  Managing the System Console, Terminal Devices, and Power Services (Tasks)

17.  Managing System Crash Information (Tasks)

18.  Managing Core Files (Tasks)

19.  Troubleshooting System and Software Problems (Tasks)

Troubleshooting a System Crash

What to Do If the System Crashes

Gathering Troubleshooting Data

Troubleshooting a System Crash Checklist

Managing System Messages

Viewing System Messages

How to View System Messages

System Log Rotation

Customizing System Message Logging

How to Customize System Message Logging

Enabling Remote Console Messaging

Using Auxiliary Console Messaging During Run Level Transitions

Using the consadm Command During an Interactive Login Session

How to Enable an Auxiliary (Remote) Console

How to Display a List of Auxiliary Consoles

How to Enable an Auxiliary (Remote) Console Across System Reboots

How to Disable an Auxiliary (Remote) Console

Troubleshooting File Access Problems

Solving Problems With Search Paths (Command not found)

How to Diagnose and Correct Search Path Problems

Changing File and Group Ownerships

Solving File Access Problems

Recognizing Problems With Network Access

20.  Troubleshooting Miscellaneous System and Software Problems (Tasks)

Index

Troubleshooting a System Crash

If a system running Oracle Solaris crashes, provide your service provider with as much information as possible, including crash dump files.

What to Do If the System Crashes

The following list describes the most important information to remember in the event of a system crash:

  1. Write down the system console messages.

    If a system crashes, making it run again might seem like your most pressing concern. However, before you reboot the system, examine the console screen for messages. These messages can provide some insight about what caused the crash. Even if the system reboots automatically and the console messages have disappeared from the screen, you might be able to check these messages by viewing the system error log, the /var/adm/messages file. For more information about viewing system error log files, see How to View System Messages.

    If you have frequent crashes and cannot determine the cause, gather all of the information you can from the system console or the /var/adm/messages files and have it ready for a customer service representative to examine. For a complete list of troubleshooting information to gather for your service provider, see Troubleshooting a System Crash.

    If the system fails to reboot successfully after a system crash, see Chapter 20, Troubleshooting Miscellaneous System and Software Problems (Tasks).

  2. Synchronize the disks and reboot.

    ok sync

    If the system fails to reboot successfully after a system crash, see Chapter 20, Troubleshooting Miscellaneous System and Software Problems (Tasks).

Check to see if a system crash dump was generated after the system crash. System crash dumps are saved by default. For information about crash dumps, see Chapter 17, Managing System Crash Information (Tasks).

Gathering Troubleshooting Data

Answer the following questions to help isolate the system problem. Use Troubleshooting a System Crash Checklist for gathering troubleshooting data for a crashed system.

Table 19-1 Identifying System Crash Data

Question
Description
Can you reproduce the problem?
This is important because a reproducible test case is often essential for debugging really hard problems. By reproducing the problem, the service provider can build kernels with special instrumentation to trigger, diagnose, and fix the bug.
Are you using any third-party drivers?
Drivers run in the same address space as the kernel, with all the same privileges, so they can cause system crashes if they have bugs.
What was the system doing just before it crashed?
If the system was doing anything unusual like running a new stress test or experiencing higher-than-usual load, that might have led to the crash.
Were there any unusual console messages right before the crash?
Sometimes the system will show signs of distress before it actually crashes; this information is often useful.
Did you add any tuning parameters to the /etc/system file?
Sometimes tuning parameters, such as increasing shared memory segments so that the system tries to allocate more than it has, can cause the system to crash.
Did the problem start recently?
If so, did the onset of problems coincide with any changes to the system, for example, new drivers, new software, different workload, CPU upgrade, or a memory upgrade.

Troubleshooting a System Crash Checklist

Use this checklist when gathering system data for a crashed system.

Item
Your Data
Is a system crash dump available?
Identify the operating system release and appropriate software application release levels.
Identify system hardware.
Include prtdiag output for SPARC systems. Include Explorer output for other systems.
Are patches installed? If so, include showrev -p output.
Is the problem reproducible?
Does the system have any third-party drivers?
What was the system doing before it crashed?
Were there any unusual console messages right before the system crashed?
Did you add any parameters to the /etc/system file?
Did the problem start recently?