A P P E N D I X E |
Event Messages |
This appendix lists the following event messages:
There are three categories of events as shown in TABLE E-1:.
The controller records all array events during power on; it records up to one thousand events.
Note - Powering off or resetting the controller automatically deletes all recorded event log entries. |
Controller event messages include the following:
A failure in a dual-redundant configuration has occurred and the other controller is managing all controller functions. When you reset the active controller or power-cycle the array, the failed controller is restarted and the following alert is displayed.
Controller ALERT: Controller Unrecoverable Error 000n [followed by code trap data] |
This message is displayed each time a failed controller is restarted after a redundant controller failure and does not indicate a new controller failure. The date and time of the event message indicates the time the controller was restarted, not the time of the failure.
Note - If the unrecoverable error recurs, clear the core only on the advice of your support representative. |
Power supply unstable, NVRAM has failed, firmware update failure, or incorrect configuration (for example, a controller combined with the wrong backplane type).
Memory capacity not sufficient to support current configuration.
This event message indicates that voltage dropped below the low voltage threshold (shown in parentheses).
Battery under charge and unable to support a configuration change. It is normal to see this message periodically as the battery regularly discharges and recharges.
There is a temperature sensor on the charger board. The upper threshold is 65° C. The controller will resume charging when normal temperature is restored.
The controller has been forced to adopt a safe caching mode on event-trigger conditions. The safety trigger can shutdown the controller or change the caching mode. The trigger causes for safety mechanisms are user-configurable, including battery condition, overheated board temperature, or peripheral device failure.
Firmware settings have been restored to factory defaults. Options for restoring defaults are not available to users and are only reserved for qualified engineers.
BBU has once been removed and is now installed.
Battery once was absent or failed, and is now restored to normal functionality; charging has resumed.
The battery has been recharged. It is normal to see this message periodically as the battery regularly discharges and recharges.
Memory is now sufficient to support current configuration.
An environmental trigger event occurred that caused the controller to switch the cache policy to write-through (see the following message).
The environmental event that caused the cache policy switch (see above message) was corrected and the previous write policy was reestablished.
Physical drive event messages include the following:
Drive SCSI target select timeout. The specified hard drive cannot be selected by the controller. This can occur if a disk drive hangs a bus during the selection phase, resulting in a selection timeout. Since the bus is hung, the controller resets the bus which results in a Gross Phase error being reported by the controller. Drives are failed because the controller can no longer communicate properly. This is not necessarily due to a faulty disk drive but can be the result of any connection to the specifc SCSI bus including the disk drive, controller, cable or I/O module.
Drive-side SCSI phase/signal abnormality detected. This can occur if a disk drive hangs a bus during the selection phase, resulting in a selection timeout. Since the bus is hung, the controller resets the bus which results in a Gross Phase error being reported by the controller. Drives are failed because the controller can no longer communicate properly. This is not necessarily due to a faulty disk drive but can be the result of any connection to the specifc SCSI bus including the disk drive, controller, cable or I/O module.
Drive-side SCSI target unexpected disconnect detected.
Drive-side SCSI target I/O timeout. Possible drive-side cabling/termination and canister connection abnormal or drive malfunctioning.
SCSI parity/CRC error detected while communicating with the specified hard drive.
Drive installed does not respond with "Ready."
Hard drive media error reported. A bad block is encountered in the specified hard drive. The RAID controller will ask the hard drive to retry. If the host attempts a read to this location, a "media error" status will be returned. If it attempts a write, the block will be recovered and the "recovered" message will be displayed.
Drive-Side SCSI drive unrecoverable hardware error reported.
Unit attention received on the SCSI drive target.
SCSI drive aborted command reported.
Drive-side SCSI drive unexpected sense data received.
Note - The three-digit code in parentheses provides additional information about the drive error. The first of these three digits represents the SCSI Sense Key. The remaining two digits represent the Additional Sense Code (ASC). For more information about SCSI sense codes, refer to:
|
Rewrites attempted and bad blocks have been successfully reassigned.
Drive-side block reassignment failed. Drive will be considered as having media errors or failed.
Drive-side SCSI target data overrun or underrun detected.
Drive-side SCSI target sync/wide negotiation abnormality detected.
Drive-side SCSI invalid status/sense data received from target.
Disconnection with the pair loop of the loop connection where CHL:_ ID:_ resides may have occurred.
The SMART detect function has detected a Recovered Error (0x01) check condition.
(Test Mode) This message appears when simulating the SMART detect function. This message shows that your drives support SMART functions.
SMART errors detected; a spare is conducted to rebuild and/or replace the faulty drive. This is done according to the preset scheme.
SMART errors detected and a spare is conducted to rebuild. The cloning process is halted due to power interruption or yet another member drive has failed. Interruption to array integration will halt the cloning process; for example, drive failure.
Scanning new/missing drives from a SCSI channel successful.
CHL:_ loop connection restored.
Alternate connection to the dual-ported device, CHL:_ ID:_ is restored.
Channel event messages include the following:
Drive channel CHL:_ select timeout. The specified drive channel cannot be selected by the controller. The channel has been disconnected; or the mode, cabling, termination, or canister for the channel is out of order.
Gross phase/signal error found on the channel path used for redundant controller communications. This can occur if a disk drive hangs a bus during the selection phase, resulting in a selection timeout. Since the bus is hung, the controller resets the bus which results in a Gross Phase error being reported by the controller. Drives are failed because the controller can no longer communicate properly. This is not necessarily due to a faulty disk drive but can be the result of any connection to the specifc SCSI bus including the disk drive, controller, cable or I/O module.
Unexpected disconnect detected on the channel path used for redundant controller communications. This can occur if a disk drive hangs a bus during the selection phase, resulting in a selection timeout. Since the bus is hung, the controller resets the bus which results in a Gross Phase error being reported by the controller. Drives are failed because the controller can no longer communicate properly. This is not necessarily due to a faulty disk drive but can be the result of any connection to the specifc SCSI bus including the disk drive, controller, cable or I/O module.
Unexpected disconnect detected on the drive channel CHL:_.
I/O timeout on the channel path used for redundant controller communications. Possible channel path cabling/termination and canister connection abnormal or malfunctioning.
I/O timeout on the drive channel path CHL:_.
SCSI parity/CRC error detected on the channel path used for redundant controller communications.
SCSI parity/CRC error detected on the drive channel path CHL:_
Unit attention received on the channel path used for redundant controller communications.
Unit attention received on the drive channel CHL:_.
Data overrun or underrun detected on the channel path used for redundant controller communications.
Data overrun or underrun detected on the drive channel CHL:_.
SCSI target sync/wide negotiation abnormality detected on the channel path used for redundant controller communications.
SCSI target sync/wide negotiation abnormality detected on the drive channel CHL:_.
Invalid status/sense data received on the channel path used for redundant controller communications.
Invalid status/sense data received on the drive channel CHL:_.
Host SCSI bus CHL:_ reset issued.
One of the dual loop members may have failed or been disconnected. Make sure all channels are properly connected and topological configuration properly set.
Specific drive channel CHL:_ may have failed or disconnected.
Fibre channel loop failure is detected.
The pair loop of CHL:_ has failed.
Disconnection with the pair loop of the loop connection where CHL:_ ID:_ resides may have occurred.
Fibre Loop LIP issued on CHL:_.
SCSI bus reset issued on CHL:_
CHL:_ loop connection restored.
Logical drive event messages often begin with the letters LG, an abbreviation for Logical Group that identifies the logical drive number to which the message applies.
Logical drive event messages include the following:
A member hard drive in the specified logical drive is missing.
A member hard drive in the specified logical drive has failed.
The creation process of logical drive LG_ is aborted.
The creation process of logical drive LG_ has failed.
The initialization process of logical drive LG_ has failed.
A member drive or other hardware failed, bad blocks were encountered, or the user cancelled the operation.
The rebuilding operation on logical drive LG_ is aborted.
The rebuilding operation on logical drive LG_ has failed. It can be the result of the following conditions:
During the parity-regeneration process, one member drive failed.
Media scan failed on the member of logical drive LG_ (CHL_, ID_)
Media scan canceled by user or aborted on the member of logical drive LG_ (CHL_, ID_) for array integrity concerns.
Cloning process failed when proceeding with the member of logical drive LG_, CHL_, ID_.
Bad block table full with entries found in logical drive LG_.
Logical drive LG_ bad block table has failed.
The table storing information about online initialization progress of logical drive LG_ has failed.
One or more bad blocks found during media scan, parity regeneration, or normal write check operations on logical drive LG_. The block was marked BAD so that the host can deal with it appropriately without risking data.
Bad blocks found irrecoverable even after the controller attempts to rewrite data onto it. Block address is 0x_______.
Bad blocks found on drive CHL_ ID_. Block address is _______ (___).
Bad blocks encountered on CHL_ ID_. Block address is 0x_______.
A Fatal Fail condition occurred on Logical Drive LG:_.
A Fatal Fail condition occurred on LG:_ while under load. Data in cache was discarded.
A message related to "Immediate Array Availability." The controller/subsystem starts assembling member hard drives into a logical drive, LG_. The logical drive will be ready for I/O when creation is done, and the controller/subsystem will find appropriate time to conduct parity initialization.
A message related to "Immediate Array Availability." The controller/subsystem starts initializing the logical drive. "On-Line" means the array is immediately accessible, even before the initialization process is completed.
"Off-Line" means the array is accessible only after the initialization process is completed. The controller/subsystem starts initializing the logical drive once the array is configured.
A message related to "Immediate Array Availability." Initialization of logical drive, LG_, is completed.
Initialization of logical drive LG_ is completed.
A message related to "Immediate Array Availability." Member hard drives have been successfully grouped into a logical drive, LG_. The logical drive is now ready for I/O, and the controller/subsystem will find appropriate time to complete parity initialization.
The rebuild process on logical drive LG_ has started.
Logical drive LG_ has been successfully rebuilt.
Start regenerating parity data of logical drive LG_.
Parity regeneration on logical drive_ completed.
Start expanding the logical drive. Data re-striping is carried out later in the background.
Start expanding the logical drive. Data re-striping is carried out immediately.
Logical drive expansion completed.
Logical drive expansion completed.
Expansion "by adding new drive" has started.
The expansion "by adding new drive" is completed.
The expansion process is halted because of one of the following events:
The "Add Drive" process had once been paused and is now resumed. The target logical drive has been restored to its previous status, and the system can continue with the "Add Drive" operation.
This message is displayed when a member drive is manually cloned to a spare, or a spare is automatically applied to clone a faulty member on SMART-detected errors.
This message is displayed when a spare is used to replace a member drive suspected of imminent faults. This message indicates completion of cloning.
Cloning process on the member of LG_, CHL_, ID_, has been completed.
Starting media scan on the members of logical drive LG_. Each member being scanned is recognized by its channel and channel ID. This message is shown when member drives are being scanned.
Media scan is completed on a member drive (CHL:_ and ID:_).
Bad block recovered by rewriting data onto it.
Bad block recovered by rewriting data onto it. Block address is 0x______ .
Inconsistent parity of logical drive LG:_, found on block address _______.
Scanning new/missing drives on a SCSI channel successful.
CHL:_ loop connection restored.
Alternate connection to the dual-ported device, CHL:_ ID:_ is restored.
General target event messages include SAF-TE device messages, controller self-diagnostic messages, I2C messages, SES device messages, and general peripheral device messages.
SAF-TE device event messages include the following:
Power supply (device __; device ID__) failure detected by enclosure management.
Fan (_) is missing from device slot.
Temperature exceeding threshold on SAF-TE device_.
UPS power failure detected through SAF-TE device_.
Device _ failed fan back on-line (device ID:_).
Temperature restored to within safety range.
Power supply module_ back on-line (device ID:_), reported through SAF-TE device (_).
UPS power restored, reported through SAT-TE device (_).
Controller self-diagnostic event messages include the following:
This event refers to the cooling fan in front bezel. Check cable connection and see if a fan has failed.
This message refers to the cooling fan in controller's front bezel. Low rotation speed detected.
The detected +3.3V voltage source is now lower than the preset threshold.
Main board temperature restored to within safety range.
Firmware updated for both controllers in the dual-controller configuration.
+12V restored to within upper safety threshold.
+12V restored to within lower safety threshold.
I2C event messages include the following:
Fan module _ back online (Fan_, _RPM).
Controller fan_ (fan on the front bezel) back online (_RPM).
SES event messages include the following:
Unrecognizable device type on C_ I_. (SES.)
Unrecognizable device type on C_ I_. (SES.)
A voltage sensor has detected a critical under-voltage condition.
A voltage sensor has detected a power supply failure. The value can be either 1 or 2, depending on which power supply was detected.
Cooling fan_ back online, reported through SES (C_I_)
Temperature restored to within safety range; detected by SES (C_ I_) sensor.
Power supply_ back online, reported through SES (C_I_).
UPS_ power back online, reported through SES (C_I_).
General peripheral device event messages include the following:
Power supply failure detected.
Power supply module installed but not present now.
Low voltage detected from power supply module __.
Fan module_ installed but not present now.
Fan module_ low rotation speed detected (__RPM).
CPU temperature dropped below preset threshold.
Elevated ambient temperature within chassis.
Peripheral device temperature sensor_ failure detected.
Peripheral device temp sensor_ installed but not present now.
Cold temperature detected by device_ (_C).
A board temperature event has been detected for Board 1, Board 2, or a CPU.
Fan module_ back online (_ RPM).
Temperature detected through sensor_ restored to within safety range.
Temperature detected through sensor_ restored to within safety range (_ C).
Temperature sensor_ is present
Power supply module_ back online.
Power supply module_ back online (_._V).
UPS battery charge restored to within safe levels.
Controller/subsystem lost connection with UPS device.
UPS battery found under-charge, charge percentage _%.
Copyright © 2009, Dot Hill Systems Corporation. All rights reserved.