Go to main content

SPARC T7 Series Servers Product Notes

Exit Print View

Updated: July 2019
 
 

Solaris OS Fails to Drop to OpenBoot Upon Retirement of SCC Cores and L2DS (21644300, 21772653)

This issue was fixed in Sun System Firmware 9.5.2.

If the primary domain is configured without enough resources (two SCCs or fewer) and correctable errors trigger an FMA retirement action affecting both these SCCs, then the domain hangs upon reboot. Other domains are not affected, and continue to run normally as long as their own network cards and drives are still available. If an error triggers a domain retirement, you can view the fault using the fmadm faulty command.


Note -  This example shows output from a SPARC T7-2 server.
SUNW-MSG-ID: SPSUN4V-8001-YA, TYPE: Problem, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Oct  6 18:50:50 EDT 2015
PLATFORM: SPARC T7-2, CSN: 12345678, HOSTNAME: bur-t72-303-sp
SOURCE: fdd, REV: 1.0
EVENT-ID: f78853a2-87cf-e147-efb3-ecc370ef147e
DESC: An event was received indicating a fault was diagnosed by another fault manager.
AUTO-RESPONSE: Refer to the document at http://support.oracle.com/msg/SPSUN4V-8001-YA.
IMPACT: Refer to the document at http://support.oracle.com/msg/SPSUN4V-8001-YA.
REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest service procedures and policies regarding this diagnosis. 

-> fmadm faulty

Time                UUID msgid                                          Severity
------------------- ------------------------------------ -------------- --------
2015-10-06/22:51:00 abea80bd-6d18-46a4-e9cc-fda7df765748 SPSUN4V-8001-YA Major

Problem Status    : open [injected]
Diag Engine       : fdd 1.0
System
   Manufacturer   : Oracle Corporation
   Name           : SPARC T7-2
   Part_Number    : 87654321
   Serial_Number  : 12345678

----------------------------------------
Suspect 1 of 1
   Fault class  : fault.cpu.generic-sparc.l2d-uc
   Certainty    : 100%
   Affects      : /SYS/MB/CM0/CMP/SCC3/L2D1
   Status       : faulted

   FRU
      Status            : faulty
      Location          : /SYS/MB
      Manufacturer      : Oracle Corporation
      Name              : ASY,MB,T7-2
      Part_Number       : 7093274
      Revision          : 02
      Serial_Number     : 465769T+1434NH00JJ
      Chassis
         Manufacturer   : Oracle Corporation
         Name           : SPARC T7-2
         Part_Number    : 87654321
         Serial_Number  : 12345678

Description : A cpu has experienced an uncorrectable level 2 data cache
              error (UE).

Response    : Cpu cores associated with the cache will be deconfigured.

Impact      : Some services may be lost and performance may be impacted.

Action      : Use 'fmadm faulty' to provide a more detailed view of this
              event. Please refer to the associated reference document at
              http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest
              service procedures and policies regarding this diagnosis.

------------------- ------------------------------------ -------------- --------
Time                UUID msgid          Severity
------------------- ------------------------------------ -------------- --------
2015-10-06/22:50:50 f78853a2-87cf-e147-efb3-ecc370ef147e SPSUN4V-8001-YA Major

Problem Status    : open [injected]
Diag Engine       : fdd 1.0
System
   Manufacturer   : Oracle Corporation
   Name           : SPARC T7-2
   Part_Number    : 87654321
   Serial_Number  : 12345678

----------------------------------------
Suspect 1 of 1
   Fault class  : fault.cpu.generic-sparc.l2d-uc
   Certainty    : 100%
   Affects      : /SYS/MB/CM0/CMP/SCC3/L2D0
   Status       : faulted

   FRU
      Status            : faulty
      Location          : /SYS/MB
      Manufacturer      : Oracle Corporation
      Name              : ASY,MB,T7-2
      Part_Number       : 7093274
      Revision          : 02
      Serial_Number     : 465769T+1434NH00JJ
      Chassis
         Manufacturer   : Oracle Corporation
         Name           : SPARC T7-2
         Part_Number    : 87654321
         Serial_Number  : 12345678

Description : A cpu has experienced an uncorrectable level 2 data cache
              error (UE).

Response    : Cpu cores associated with the cache will be deconfigured.

Impact      : Some services may be lost and performance may be impacted.

Action      : Use 'fmadm faulty' to provide a more detailed view of this
              event. Please refer to the associated reference document at
              http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest
              service procedures and policies regarding this diagnosis. 

This issue is the root cause of a domain retirement if the fault is reported on the same cores running the primary domain, and the primary domain hangs upon reboot.

Workaround: Ensure that the primary guest domain is assigned two SCCs or more (that is, a minimum of two SCCs and a few additional cores) on the same node.

Recovery: Force reset the domain (reset -f /HOST) to regain access. Upon reboot the server is unable to access the most recently saved SPM configuration, and reverts to the factory default configuration instead.