此问题仅影响 SPARC M7-16 服务器。
如果 PDECB 编号 0、1、2 或 3 上的 ECB 触发,则拥有相应 CMIOU 的主机将执行关开机循环。如果在关开机循环过程中主机重新启动,则 DCU0 和 DCU1 上的所有组件都会被排除在外。
要验证是否遇到此问题,请在主机控制台上检查以下消息:
host-name-pd0 console login: 2015-11-03 11:35:17 SP> NOTICE: Fatal error occurred. Collecting diagnostic information. 2015-11-03 11:40:03 SP> NOTICE: Abort boot due to /SYS/CMIOU1. Power Cycle Host 2015-11-03 11:42:38 SP> NOTICE: Exclude all of /SYS/DCU0. Reason: Not enough power supplies 2015-11-03 11:42:38 SP> NOTICE: Exclude all of /SYS/DCU1. Reason: Not enough power supplies
类似地,如果 PDECB 编号 14、15、16 或 17 上的 ECB 触发,则 DCU2 和 DCU3 上的所有组件将被排除在外。
解决方法:无。
恢复:如果发生此问题,则会针对该 PDECB 及其相应 CMIOU 记录一个 fault.chassis.voltage.isolated 类的故障。
启动故障管理 Shell。
-> start /SP/faultmgmt/shell Are you sure you want to start /SP/faultmgmt/shell (y/n)? y faultmgmtsp>
查看故障。
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2015-11-12/15:42:38 45ce7f9f-bd7e-4599-db3d-ef728e714f31 SPT-8001-XC    Critical
Problem Status    : open
Diag Engine       : fdd 1.0
System           
   Manufacturer   : Oracle Corporation
   Name           : SPARC M7-16
   Part_Number    : 32397701+7+1
   Serial_Number  : AK00192372
----------------------------------------
Suspect 1 of 1
   Fault class  : fault.chassis.voltage.isolated
   Certainty    : 100%
   Affects      : /SYS/PDECB1
   Status       : faulted
   FRU                 
      Status            : faulty
      Location          : /SYS/PDECB1
      Manufacturer      : Celestica Holdings PTE LTD
      Name              : ECB
      Part_Number       : 7082640
      Revision          : 02
      Serial_Number     : 465769T+14029F01YV
      Chassis          
         Manufacturer   : Oracle Corporation
         Name           : SPARC M7-16
         Part_Number    : 32397701+7+1
         Serial_Number  : AK00192372
Description : A power supply has failed to maintain a good POK (Power On
              OK) condition.
Response    : The system will shutdown in a non-graceful fashion.
Impact      : The platform will restart with the affected component
              deconfigured.
Action      : Please refer to the associated reference document at
              http://support.oracle.com/msg/SPT-8001-XC for the latest
              service procedures and policies regarding this diagnosis.
------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2015-11-12/15:33:20 faf9042a-5452-ee1c-d9c3-a9f6d3248c17 SPT-8001-XC    Critical
Problem Status    : open
Diag Engine       : fdd 1.0
System           
   Manufacturer   : Oracle Corporation
   Name           : SPARC M7-16
   Part_Number    : 32397701+7+1
   Serial_Number  : AK00192372
----------------------------------------
Suspect 1 of 1
   Fault class  : fault.chassis.voltage.isolated
   Certainty    : 100%
   Affects      : /SYS/CMIOU1
   Status       : faulted
   FRU                 
      Status            : faulty
      Location          : /SYS/CMIOU1
      Manufacturer      : Oracle Corporation
      Name              : CMIOU Module
      Part_Number       : 7090838
      Revision          : 04
      Serial_Number     : 465769T+14456C01VH
      Chassis          
         Manufacturer   : Oracle Corporation
         Name           : SPARC M7-16
         Part_Number    : 32397701+7+1
         Serial_Number  : AK00192372
Description : A power supply has failed to maintain a good POK (Power On
              OK) condition.
Response    : The system will shutdown in a non-graceful fashion.
Impact      : The platform will restart with the affected component
              deconfigured.
Action      : Please refer to the associated reference document at
              http://support.oracle.com/msg/SPT-8001-XC for the latest
              service procedures and policies regarding this diagnosis.
faultmgmtsp>
                    在 Oracle ILOM 中,有些电源将从此主机的电源列表中消失。如下面的示例所示。
-> show  /Servers/PDomains/PDomain_0/System/Power/Power_Supplies
 /Servers/PDomains/PDomain_0/System/Power/Power_Supplies
    Targets:
        Power_Supply_6
        Power_Supply_7
        Power_Supply_8
        Power_Supply_9
        Power_Supply_10
        Power_Supply_11
        Power_Supply_12
        Power_Supply_13
        Power_Supply_14
        Power_Supply_15
...
-> 
                    在几分钟之后,所有电源应该重新出现在列表中。如下面的示例所示。
-> show  /Servers/PDomains/PDomain_0/System/Power/Power_Supplies
 /Servers/PDomains/PDomain_0/System/Power/Power_Supplies
    Targets:
        Power_Supply_0
        Power_Supply_1
        Power_Supply_2
        Power_Supply_3
        Power_Supply_4
        Power_Supply_5
        Power_Supply_6
        Power_Supply_7
        Power_Supply_8
        Power_Supply_9
        Power_Supply_10
        Power_Supply_11
        Power_Supply_12
        Power_Supply_13
        Power_Supply_14
        Power_Supply_15
...
->
                当电源在 Oracle ILOM 中重新出现时,重新启动主机。之前排除的 DCU 现在应该已包括在内。启动服务调用以解决 PDECB 上的故障。