此问题已在系统固件 9.5.2 中修复。
如果没有为主域配置足够的资源(只有两个 SCC 或更少),则可更正的错误会触发同时影响这两个 SCC 的 FMA 弃用操作,然后,该域在重新引导时会挂起。其他域不受影响,并且只要它们自己的网卡和驱动器仍然可用,它们将继续正常运行。如果某个错误触发了域弃用,则可以使用 fmadm faulty 命令查看该故障。
SUNW-MSG-ID: SPSUN4V-8001-YA, TYPE: Problem, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Oct 6 18:50:50 EDT 2015
PLATFORM: SPARC T7-2, CSN: 12345678, HOSTNAME: bur-t72-303-sp
SOURCE: fdd, REV: 1.0
EVENT-ID: f78853a2-87cf-e147-efb3-ecc370ef147e
DESC: An event was received indicating a fault was diagnosed by another fault manager.
AUTO-RESPONSE: Refer to the document at http://support.oracle.com/msg/SPSUN4V-8001-YA.
IMPACT: Refer to the document at http://support.oracle.com/msg/SPSUN4V-8001-YA.
REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest service procedures and policies regarding this diagnosis.
-> fmadm faulty
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2015-10-06/22:51:00 abea80bd-6d18-46a4-e9cc-fda7df765748 SPSUN4V-8001-YA Major
Problem Status : open [injected]
Diag Engine : fdd 1.0
System
Manufacturer : Oracle Corporation
Name : SPARC T7-2
Part_Number : 87654321
Serial_Number : 12345678
----------------------------------------
Suspect 1 of 1
Fault class : fault.cpu.generic-sparc.l2d-uc
Certainty : 100%
Affects : /SYS/MB/CM0/CMP/SCC3/L2D1
Status : faulted
FRU
Status : faulty
Location : /SYS/MB
Manufacturer : Oracle Corporation
Name : ASY,MB,T7-2
Part_Number : 7093274
Revision : 02
Serial_Number : 465769T+1434NH00JJ
Chassis
Manufacturer : Oracle Corporation
Name : SPARC T7-2
Part_Number : 87654321
Serial_Number : 12345678
Description : A cpu has experienced an uncorrectable level 2 data cache
error (UE).
Response : Cpu cores associated with the cache will be deconfigured.
Impact : Some services may be lost and performance may be impacted.
Action : Use 'fmadm faulty' to provide a more detailed view of this
event. Please refer to the associated reference document at
http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest
service procedures and policies regarding this diagnosis.
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2015-10-06/22:50:50 f78853a2-87cf-e147-efb3-ecc370ef147e SPSUN4V-8001-YA Major
Problem Status : open [injected]
Diag Engine : fdd 1.0
System
Manufacturer : Oracle Corporation
Name : SPARC T7-2
Part_Number : 87654321
Serial_Number : 12345678
----------------------------------------
Suspect 1 of 1
Fault class : fault.cpu.generic-sparc.l2d-uc
Certainty : 100%
Affects : /SYS/MB/CM0/CMP/SCC3/L2D0
Status : faulted
FRU
Status : faulty
Location : /SYS/MB
Manufacturer : Oracle Corporation
Name : ASY,MB,T7-2
Part_Number : 7093274
Revision : 02
Serial_Number : 465769T+1434NH00JJ
Chassis
Manufacturer : Oracle Corporation
Name : SPARC T7-2
Part_Number : 87654321
Serial_Number : 12345678
Description : A cpu has experienced an uncorrectable level 2 data cache
error (UE).
Response : Cpu cores associated with the cache will be deconfigured.
Impact : Some services may be lost and performance may be impacted.
Action : Use 'fmadm faulty' to provide a more detailed view of this
event. Please refer to the associated reference document at
http://support.oracle.com/msg/SPSUN4V-8001-YA for the latest
service procedures and policies regarding this diagnosis. 如果在运行主域的相同核心上报告了该故障,则此问题是域弃用的根本原因,并且主域在重新引导时将挂起。
解决方法:确保为主来宾域分配同一节点上的两个或多个 SCC(即,最少两个 SCC 以及几个额外核心)。
恢复:强制对域进行复位 (reset -f /HOST) 以重新访问。在重新引导时,服务器无法访问最近保存的 SPM 配置,而是恢复到出厂默认配置。