A P P E N D I X B |
System Events |
This appendix contains tables of details and troubleshooting suggestions for system events. The tables are organized in alphabetical order, by component and sub-type.
Machine Check error detected on cpu <CPU>. [Machine Check in Progress.] [Error IP Valid.] [Restart IP Valid.] Error detected in [Data Cache] | [InstructionCache] | [Bus Unit] | [Load/Store unit] | [North Bridge] | [Invalid bank reached]. [Second error detected.] [Error not corrected] [Error reporting disabled.] [Misc. register contains more info.] [Error occurred at address <address>.] [Processor state may have been corrupted] [Correctable ECC error.] [Un-correctable ECC error.] [Detected on a scrub.] Raw data: <data> |
|
See Machine Check Error. |
Crowbar; fatal error in the power supply or the VRM modules has occurred. |
|
Sensor <sensor> reports that [crowbar failure has been detected - attempting to power system off] | [crowbar failure has been cleared]. |
|
One of the VRM modules has indicated either an over-temperature condition, an over-current condition, or an inability to regulate voltage properly, or the condition has been cleared. This usually is an over-temperature error when the failure is detected. |
|
See Thermal Trip Events. |
A power supply which had previously failed or been unplugged is now available and working normally. |
|
A new power supply has been plugged into the system and identified. |
|
One of the power supplies no longer can be accessed. It is assumed that it has been removed. |
|
Sensor <sensor> reports that the fans have resumed normal operation. |
|
The internal fans within a power supply have recovered from a failure and now are working normally. |
|
Power supply temperature is too high or has returned to normal. |
|
Sensor <sensor> reports that the [temperature has exceeded specification] | [temperature has returned to normal]. |
|
The power supply temperature is too high or has returned to normal. |
|
See Thermal Trip Events. |
Dimm Fault: CPU <cpu>, Dimm <dimm>, [Fault Detected] | [Paired with faulty Dimm] | [Unknown] |
|
The platform bios has detected an error in the DIMMs during memory configuration and initialization. It might or might not be possible to isolate the fault to a specific DIMM. (Certain configurations of memory do not allow for fault isolation across the paired DIMMs of a single memory channel.) |
|
Run the memory diagnostics tests and seeDIMM Faults. |
SP <hostname> IP [is now set to <ip_addr>] | [deconfigured]. |
|
SP hostname set to <hostname>, IP is [<ip_addr>] | [not configured.] |
|
SP has been rebooted by PRS due to lost heartbeat or failure of SP to initialize. |
|
SP Rebooted by PRS - reason is [SP Failed to Initialize] | [SP Heartbeat was lost] | [SP Failed Init and HB]. |
|
The SP failed to boot properly and was reset by the platform power sequencing chip. A failure to initialize indicates that the SP failed to boot far enough, fast enough to indicate to PRS that it had completed initialzation. A loss of heartbeat indicates that either the SP failed to complete the boot process, or hung during normal operation. |
|
See DIMM Faults. |
SP has been rebooted by PRS due to lost heartbeat or failure of SP to initialize. |
|
SP Rebooted by PRS - reason is [SP Failed to Initialize] | [SP Heartbeat was lost] | [SP Failed Init and HB]. |
|
The SP failed to boot properly and was reset by the platform power sequencing chip. A failure to initialize indicates that the SP failed to boot far enough, fast enough to indicate to PRS that it had completed initialzation. A loss of heartbeat indicates that either the SP failed to complete the boot process, or hung during normal operation. |
|
See DIMM Faults. |
An error occurred while writing the content of the SDRR to persistent storage. This usually results when persistent storage is full. |
|
Application failure after 3 or more restarts within 90 seconds. |
|
An application is not operating properly and is exiting shortly after being started. This is probably caused by an intermittent hardware problem on the Service Processor (for example, one of the sensor devices has gone into an incorrect state and is causing problems). This also can be caused by a bad SP software load or by misconfigured network or file system settings. |
|
See DIMM Faults. |
Copyright © 2005, Sun Microsystems, Inc. All Rights Reserved.