1 Introduction to System Diagnostics and Troubleshooting
Oracle provides a wide spectrum of diagnostic and troubleshooting tools for use with Oracle x86 servers. These tools include integrated log file information, operating system diagnostics, and hardware LED indicators, all of which contain clues helpful in narrowing down the possible sources of a problem.
Some diagnostic tools stress the system by running tests in parallel, while other tools run sequential tests, enabling the system to continue its normal functions. Some diagnostic tools function on Standby power or when the system is offline, while others require the operating system to be up and running.
This section describes the Oracle diagnostic tools for x86 servers equipped with Oracle ILOM Firmware Releases 4.x and 5.x. It includes the following topics:
Diagnostic and Troubleshooting Tools
Why are there so many different diagnostic and troubleshooting tools? There are a number of reasons for the lack of a single all-in-one diagnostic test, starting with the complexity of the server. Consider also that some diagnostics must function even when the system fails to boot. Any diagnostic capable of isolating problems when the system fails to boot must be independent of the operating system. But any diagnostic that is independent of the operating system is also unable to make use of the operating system’s considerable resources for getting at the more complex causes of faults or failures. Consider the different tasks you expect to perform with your diagnostic and troubleshooting tools:
-
Isolating faults to a specific replaceable hardware component
-
Exercising the system to disclose more subtle problems that might or might not be hardware related
-
Monitoring the system to catch problems before they become serious enough to cause unplanned downtime
You cannot optimize every diagnostic tool for all these varied tasks. Instead of one unified diagnostic tool, Oracle provides a palette of tools each of which has its own strengths and applications.
The following diagnostic and troubleshooting tools are available for your server.
Tool | Description | Link |
---|---|---|
Status indicators |
Status indicators (LEDs) located on the chassis and on selected system components can serve as front-line indicators of a limited set of hardware failures. |
|
Oracle ILOM Diagnostics |
Oracle ILOM displays the status of system components. You can then replace a failed component, which often clears the problem. |
|
HWdiag (Oracle ILOM Diag shell) |
Oracle ILOM allows you to run HWdiag, a command-line utility that
checks the status of system components. Access the
|
|
Snapshot Utility (Oracle ILOM) |
Oracle ILOM collects information about the current state of the Oracle ILOM SP, including environmental data, logs, and information about field-replaceable units installed on the server. You also can use Snapshot to run diagnostics on the host and capture the diagnostics log files. |
|
UEFIdiag (Oracle ILOM/UEFI shell) |
Oracle ILOM allows you to run diagnostics in a UEFI environment to evaluate system components, such as the CPU, memory, disk drives, and I/O cards. |
|
Oracle Solaris Diagnostics |
Use Oracle Solaris diagnostics to diagnose component problems and interpret the log files. |
Troubleshooting System Components
The following table lists the system components and shows which utility you can use to either test the components or get status information about them.
Server Component | Oracle ILOM | UEFIdiag | HWdiag |
---|---|---|---|
Service processor |
Yes |
No |
Yes |
CPU and memory |
Yes |
Yes |
Yes |
Fans |
Yes |
No |
Yes |
Power supplies |
Yes |
No |
Yes |
Storage devices |
Yes (limited) |
Yes |
Yes (limited) |
Network interface |
Yes |
Yes (limited) |
Yes (limited) |