You can use a variety of diagnostic tools, commands, and indicators to monitor and troubleshoot a server:
LEDs – Provide a quick visual notification of the status of the server and of some of the field-replaceable units (FRUs).
Oracle ILOM – This firmware runs on the service processor. In addition to providing the interface between the hardware and OS, Oracle ILOM also tracks and reports the health of key server components. Oracle ILOM works closely with POST and Solaris Predictive Self-Healing technology to keep the system running even when there is a faulty component.
Power-on self-test (POST) – POST performs diagnostics on system components upon system reset to ensure the integrity of those components. POST is configureable and works with Oracle ILOM to take faulty components offline if needed.
Oracle Solaris OS Predictive Self-Healing (PSH) - This technology continuously monitors the health of the CPU, memory and other components, and works with Oracle ILOM to take a faulty component offline if needed. The Predictive Self-Healing technology enables systems to accurately predict component failures and mitigate many serious problems before they occur.
Log files and command interface – Provide the standard Oracle Solaris OS log files and investigative commands that can be accessed and displayed on the device of your choice.
Oracle VTS – An application that exercises the system, provides hardware validation, and discloses possible faulty components with recommendations for repair.
The LEDs, Oracle ILOM, PSH, and many of the log files and console messages are integrated. For example, when the Solaris software detects a fault, it displays the fault, logs it, and passes information to Oracle ILOM where it is logged. Depending on the fault, one or more LEDs might also be illuminated.
The diagnostic flow chart in Diagnostics Process describes an approach for using the server diagnostics to identify a faulty field-replaceable unit (FRU). The diagnostics you use, and the order in which you use them, depend on the nature of the problem you are troubleshooting. So you might perform some actions and not others.