此问题已在系统固件 9.5.2 中修复。
如果没有为主域配置足够的资源(只有两个 SCC 或更少),则可更正的错误会触发同时影响这两个 SCC 的 FMA 弃用操作,然后,该域在重新引导时会挂起。其他域不受影响,并且只要它们自己的网卡和驱动器仍然可用,它们将继续正常运行。如果某个错误触发了域弃用,则可以使用 fmadm faulty 命令查看该故障。
SUNW-MSG-ID: SPSUN4V-8001-YA, TYPE: Problem, VER: 1, SEVERITY: Major EVENT-TIME: Tue Oct 6 18:50:50 EDT 2015 PLATFORM: SPARC T7-2, CSN: 12345678, HOSTNAME: bur-t72-303-sp SOURCE: fdd, REV: 1.0 EVENT-ID: f78853a2-87cf-e147-efb3-ecc370ef147e DESC: An event was received indicating a fault was diagnosed by another fault manager. AUTO-RESPONSE: Refer to the document at http://support.oracle.com/msg/SPSUN4V-8001-YA. IMPACT: Refer to the document at http://support.oracle.com/msg/SPSUN4V-8001-YA. REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest service procedures and policies regarding this diagnosis. -> fmadm faulty Time UUID msgid Severity ------------------- ------------------------------------ -------------- -------- 2015-10-06/22:51:00 abea80bd-6d18-46a4-e9cc-fda7df765748 SPSUN4V-8001-YA Major Problem Status : open [injected] Diag Engine : fdd 1.0 System Manufacturer : Oracle Corporation Name : SPARC T7-2 Part_Number : 87654321 Serial_Number : 12345678 ---------------------------------------- Suspect 1 of 1 Fault class : fault.cpu.generic-sparc.l2d-uc Certainty : 100% Affects : /SYS/MB/CM0/CMP/SCC3/L2D1 Status : faulted FRU Status : faulty Location : /SYS/MB Manufacturer : Oracle Corporation Name : ASY,MB,T7-2 Part_Number : 7093274 Revision : 02 Serial_Number : 465769T+1434NH00JJ Chassis Manufacturer : Oracle Corporation Name : SPARC T7-2 Part_Number : 87654321 Serial_Number : 12345678 Description : A cpu has experienced an uncorrectable level 2 data cache error (UE). Response : Cpu cores associated with the cache will be deconfigured. Impact : Some services may be lost and performance may be impacted. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest service procedures and policies regarding this diagnosis. ------------------- ------------------------------------ -------------- -------- Time UUID msgid Severity ------------------- ------------------------------------ -------------- -------- 2015-10-06/22:50:50 f78853a2-87cf-e147-efb3-ecc370ef147e SPSUN4V-8001-YA Major Problem Status : open [injected] Diag Engine : fdd 1.0 System Manufacturer : Oracle Corporation Name : SPARC T7-2 Part_Number : 87654321 Serial_Number : 12345678 ---------------------------------------- Suspect 1 of 1 Fault class : fault.cpu.generic-sparc.l2d-uc Certainty : 100% Affects : /SYS/MB/CM0/CMP/SCC3/L2D0 Status : faulted FRU Status : faulty Location : /SYS/MB Manufacturer : Oracle Corporation Name : ASY,MB,T7-2 Part_Number : 7093274 Revision : 02 Serial_Number : 465769T+1434NH00JJ Chassis Manufacturer : Oracle Corporation Name : SPARC T7-2 Part_Number : 87654321 Serial_Number : 12345678 Description : A cpu has experienced an uncorrectable level 2 data cache error (UE). Response : Cpu cores associated with the cache will be deconfigured. Impact : Some services may be lost and performance may be impacted. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest service procedures and policies regarding this diagnosis.
如果在运行主域的相同核心上报告了该故障,则此问题是域弃用的根本原因,并且主域在重新引导时将挂起。
解决方法:确保为主来宾域分配同一节点上的两个或多个 SCC(即,最少两个 SCC 以及几个额外核心)。
恢复:强制对域进行复位 (reset -f /HOST) 以重新访问。在重新引导时,服务器无法访问最近保存的 SPM 配置,而是恢复到出厂默认配置。