1 Introduction to System Diagnostics and Troubleshooting

Oracle provides a wide spectrum of diagnostic and troubleshooting tools for use with Oracle x86 servers. These tools include integrated log file information, operating system diagnostics, and hardware LED indicators, all of which contain clues helpful in narrowing down the possible sources of a problem.

Some diagnostic tools stress the system by running tests in parallel, while other tools run sequential tests, enabling the system to continue its normal functions. Some diagnostic tools function on Standby power or when the system is offline, while others require the operating system to be up and running.

This section describes the Oracle diagnostic tools for x86 servers equipped with Oracle ILOM Firmware Releases 4.x and 5.x. It includes the following topics:

Diagnostic and Troubleshooting Tools

Why are there so many different diagnostic and troubleshooting tools? There are a number of reasons for the lack of a single all-in-one diagnostic test, starting with the complexity of the server. Consider also that some diagnostics must function even when the system fails to boot. Any diagnostic capable of isolating problems when the system fails to boot must be independent of the operating system. But any diagnostic that is independent of the operating system is also unable to make use of the operating system’s considerable resources for getting at the more complex causes of faults or failures. Consider the different tasks you expect to perform with your diagnostic and troubleshooting tools:

  • Isolating faults to a specific replaceable hardware component

  • Exercising the system to disclose more subtle problems that might or might not be hardware related

  • Monitoring the system to catch problems before they become serious enough to cause unplanned downtime

You cannot optimize every diagnostic tool for all these varied tasks. Instead of one unified diagnostic tool, Oracle provides a palette of tools each of which has its own strengths and applications.

The following diagnostic and troubleshooting tools are available for your server.

Tool Description Link

Status indicators

Status indicators (LEDs) located on the chassis and on selected system components can serve as front-line indicators of a limited set of hardware failures.

System LEDs and Diagnostics

Oracle ILOM Diagnostics

Oracle ILOM displays the status of system components. You can then replace a failed component, which often clears the problem.

Oracle ILOM Diagnostics

HWdiag (Oracle ILOM Diag shell)

Oracle ILOM allows you to run HWdiag, a command-line utility that checks the status of system components. Access the hwdiag command from the Oracle ILOM Diag shell.

Using the Oracle ILOM Diag Shell

Snapshot Utility (Oracle ILOM)

Oracle ILOM collects information about the current state of the Oracle ILOM SP, including environmental data, logs, and information about field-replaceable units installed on the server. You also can use Snapshot to run diagnostics on the host and capture the diagnostics log files.

Using the Snapshot Utility

UEFIdiag (Oracle ILOM/UEFI shell)

Oracle ILOM allows you to run diagnostics in a UEFI environment to evaluate system components, such as the CPU, memory, disk drives, and I/O cards.

Using UEFI Diagnostics

Oracle Solaris Diagnostics

Use Oracle Solaris diagnostics to diagnose component problems and interpret the log files.

Core Dump File

Troubleshooting System Components

The following table lists the system components and shows which utility you can use to either test the components or get status information about them.

Server Component Oracle ILOM UEFIdiag HWdiag

Service processor

Yes

No

Yes

CPU and memory

Yes

Yes

Yes

Fans

Yes

No

Yes

Power supplies

Yes

No

Yes

Storage devices

Yes (limited)

Yes

Yes (limited)

Network interface

Yes

Yes (limited)

Yes (limited)