In Oracle Solaris 11.2 SRU 10, enhanced diagnostic features were added to collect more data from disks connected to the Sun Storage 6 Gb SAS PCIe HBA, Internal (SGX-SAS6-INT-Z). This includes various disk errors and SMART events. In addition, these events identify suspect physical disks among logical disks in a RAID volume. These events are captured and logged in /var/log/ssm/event.log when the hardware management agent (svc:/system/sp/management:default) is running.
The following table lists enhanced diagnostic events being logged.
|
The controller polls each physical disk in the volume at regular intervals. If a disk has encountered an error, an event is generated by the controller. The hardware management agent captures that event and enters it in the hardware management event log.
To view the event information in the hardware management event log, type:
# view /var/log/ssm/event.log
For disk events, you will see information similar to:
Thu Apr 30 12:32:31 2015:(CLI) Event Name : PD_MEDIA_ERROR Thu Apr 30 12:32:31 2015:(CLI) Event Description : A medium error was detected by the device that was non-recoverable. Thu Apr 30 12:32:31 2015:(CLI) ASC : 0x10 Thu Apr 30 12:32:31 2015:(CLI) ASCQ : 0x3 Thu Apr 30 12:32:31 2015:(CLI) Sense Key : 0x3 Thu Apr 30 12:32:31 2015:(CLI) Source : LSI Thu Apr 30 12:32:31 2015:(CLI) SAS Address : 0x5000cca01200fadd Thu Apr 30 12:32:31 2015:(CLI) LSI Description : Unexpected sense: PD 0c(e0xfc/s1) Path 5000cca01200fadd, CDB: 2f 00 00 fc 4d 42 00 10 00 00, Sense: 3/10/03 Thu Apr 30 12:32:31 2015:(CLI) Event TimeStamp : 04/30/2015 ; 19:30:25 Thu Apr 30 12:32:31 2015:(CLI) Node ID : 00000000:12 Thu Apr 30 12:32:31 2015:(CLI) Nac Name : /SYS/HDD1 Thu Apr 30 12:32:31 2015:(CLI) Serial Number : 001015N0JPXA PMG0JPXA Thu Apr 30 12:32:31 2015:(CLI) WWN No : PDS:5000cca01200fadd Thu Apr 30 12:32:31 2015:(CLI) Disk Model : H106030SDSUN300G
You can then use the information in the event listing to determine which physical disk in the system has the issue. Information such as the Oracle ILOM Nac Name (which matches the label on the front panel of the system) and drive Serial Number help you identify the disk and its drive slot in the system.
For the other disk diagnostic events described in this document, it is up to the administrator to check the hardware management event log for these disk events when a disk problem is suspected. There is currently no alert mechanism to proactively announce these events.