Responding to a Hung System

Troubleshooting a hung x86 Oracle Linux system or Oracle Solaris system can be a difficult process because the root cause of the problem might be masked by false error indications from another part of the system. Therefore, it is important that you carefully examine all the information sources available to you before you attempt any remedy. Also, it is helpful to know the type of hang the system is experiencing. This hang state information is especially important to Oracle Service personnel, when you contact them.

A system "soft hang" can be characterized by any of the following symptoms:

  • Usability or performance of the system gradually decreases.

  • New attempts to access the system fail.

  • Some parts of the system appear to stop responding.

Some soft hangs might dissipate on their own, while others will require that the system be interrupted to gather information. A soft hang responds to a break signal that is sent through the system console.

A system "hard hang" leaves the system unresponsive to a system break sequence. You know that a system is in a hard hang state when you attempt all the soft hang remedies with no success.

See Troubleshoot a Hung System.

A system might not actually be hung due to another condition causing the system to appear to be hung. For example, a network or network share problem, or a power or boot issue could be the cause. For information on how to eliminate conditions that may give the appearance of a system hang, go to My Oracle Support, and refer to the Knowledge Article Doc ID 1012991.1.

Troubleshoot a Hung System

This procedure describes how to troubleshoot a hung system by using the Oracle Linux console and the Oracle Solaris serial console.

  1. Verify that the system is hanging.
    1. Type the ping command to determine whether there is any network activity.
    2. Type the ps -ef command to determine whether any other user sessions are active or responding.

      If another user session is active, use it to review the contents of the /var/adm/messages file for any indications of the system problem.

    3. Try to access the system console through Oracle ILOM.

      If you can establish a working system console connection, the problem might not be a true hang but might instead be a network-related problem. For suspected network problems, use the ping or ssh commands to reach another system that is on the same sub-network, hub, or router. If NFS services are served by the affected system, determine whether NFS activity is present on other systems.

  2. If there are no responding user sessions, record the state of the system LEDs.

    The system LEDs might indicate a hardware failure in the system. You can use Oracle ILOM to check the state of the system LEDs. For more information about how to interpret system LEDs, refer to the server Service Manual.

  3. To force a kernel core dump on an x86 system, go to My Oracle Support, and refer to the Knowledge Article Doc ID 1003085.1.
  4. Review the contents of the /var/adm/messages file.

    Look for the following information about the system state:

    • Any large gaps in the time stamp of operating system software or application messages

    • Warning messages about any hardware or software components

    • Information from last root logins to determine whether any system administrators might be able to provide any information about the system state at the time of the hang

  5. If possible, verify whether the system saved a core dump file.

    Core dump files provide invaluable information to your support provider to aid in diagnosing any system problems. For further information about saving core dump files, see Core Dump File.