A red state condition can be identified by host console output similar to this:
Redstate trap occurred on socket 4 strand 80 2013-08-08 18:17:03 4:10:0> NOTICE: Redstate handler finished
After a red state condition, autorecovery is initiated, and although a host's autorunonerror property is set to powercycle, the host might not complete automatic restart. Fault messages similar to these might be seen on the host console during autorecovery.
2013-08-08 18:41:51 SP> NOTICE: Faulted /SYS/SSB7/SA/SLINK12 will exclude /SYS/CMU2/CMP1 on future reboots 2013-08-08 18:41:52 SP> NOTICE: Abort boot due to /SYS/SSB7/SA/SLINK12. Power Cycle Host 2013-08-08 18:41:53 SP> NOTICE: Faulted /SYS/CMU2/CMP1/SLINK4 will exclude /SYS/CMU2/CMP1 on future reboots 2013-08-08 18:41:56 SP> NOTICE: Start Host in progress: Step 6 of 9 2013-08-08 18:42:04 SP> NOTICE: Faulted /SYS/SSB7/SA/SLINK13 will exclude /SYS/CMU0/CMP1 on future reboots . . . 2013-08-08 18:43:13 SP> NOTICE: Check for usable CPUs in /SYS/DCU0 2013-08-08 18:43:14 SP> NOTICE: Exclude /SYS/CMU0/CMP0. Reason: Prior fault on dependent resource 2013-08-08 18:43:15 SP> NOTICE: Exclude /SYS/CMU0/CMP1. Reason: Prior fault on dependent resource . . . 2013-08-08 18:43:19 SP> NOTICE: Apply configuration rules to /SYS/DCU0 2013-08-08 18:43:20 SP> NOTICE: Exclude all of /SYS/DCU0. Reason: No configurable CPU in an even slot 2013-08-08 18:43:21 SP> NOTICE: HOST0 cannot be restarted. Reason: No configurable CPUs 2013-08-08 18:44:03 SP> NOTICE: Host is off
Workaround: Manually stop the hosts, acquit the faults, and start the hosts.
Stop all hosts.
-> stop /Servers/PDomains/PDomain_x/HOST
where x is 0, 1, 2, and 3.
Start the Oracle ILOM fault management shell.
-> start -script /SP/faultmgmt/shell
List the faults.
faultmgmtsp> fmadm faulty
Record the UUIDs of the faults that affect SLINKs.
For example:
Time UUID msgid Severity ------------------- ------------------------------------ -------------- -------- 2013-08-16/12:56:32 09135d98-eafb-ee84-8643-fd8bb879cb6f SPSUN4V-8001-83 Critical . . . Suspect 1 of 2 Fault class : fault.asic.switch.c2c-uc Certainty : 50% Affects : /SYS/SSB7/SA/SLINK13 Status : faulted . . . Suspect 2 of 2 Fault class : fault.cpu.generic-sparc.c2c-uc Certainty : 50% Affects : /SYS/CMU0/CMP1/SLINK4 Status : faulted . . .
The UUID of the faults affecting SLINKs /SYS/SSB7/SA/SLINK13 and /SYS/CMU0/CMP1/SLINK4 is 09135d98-eafb-ee84-8643-fd8bb879cb6f.
Acquit the faults.
faultmgmtsp> fmadm acquit UUID
where UUID is the UUID of the fault. For example:
faultmgmtsp> fmadm acquit 09135d98-eafb-ee84-8643-fd8bb879cb6f
Repeat Step 5 for all respective faults.
Exit the Oracle ILOM fault management shell.
faultmgmtsp> exit ->
Start all hosts.
-> start /Servers/PDomains/PDomain_x/HOST
where x is 0, 1, 2, and 3.