You can use a variety of diagnostic tools, commands, and indicators to monitor and troubleshoot a server module:
LEDs – Provide a quick visual notification of the status of the server module and of some of the FRUs.
Oracle ILOM – This firmware runs on the SP. In addition to providing the interface between the hardware and OS, Oracle ILOM also tracks and reports the health of key server module components. Oracle ILOM works closely with POST and Oracle Solaris PSH technology to keep the system running even when there is a faulty component. You can log in to multiple SP accounts simultaneously and have separate Oracle ILOM shell commands executing concurrently under each account.
Note - Unless indicated otherwise, all examples of interaction with the SP are depicted with Oracle ILOM shell commands.
POST – POST performs diagnostics on system components upon system reset to ensure the integrity of those components. POST can be configured and works with Oracle ILOM to take faulty components offline if needed.
Oracle Solaris PSH - This technology continuously monitors the health of the CPU, memory, and other components, and works with Oracle ILOM to take a faulty component offline if needed. The PSH technology enables systems to accurately predict component failures and mitigate many serious problems before they occur.
Log files and command interface – Provide the standard Oracle Solaris OS log files and investigative commands that can be accessed and displayed on the device of your choice.
The LEDs, Oracle ILOM, PSH, and many of the log files and console messages are integrated. For example, when the Oracle Solaris software detects a fault, it displays the fault, logs it, and passes information to Oracle ILOM where it is logged. Depending on the fault, one or more LEDs might also be illuminated.
The diagnostic flow chart in Diagnostics Process describes an approach for using the server module diagnostics to identify a faulty replaceable unit. The diagnostics you use, and the order in which you use them, depend on the nature of the problem you are troubleshooting. Therefore, you might perform some actions and not others.