3 Hardware Issues
This section describes important operating notes and known hardware issues for Oracle Server X8-2.
- Oracle Dual Port 10/25GbE Ethernet Controller-LOM Firmware 214.2.271.5
- 10GBASE-T RJ-45 GbE Ports Are Not Operating And LEDs Remain Dark After Port Switch and Power Cycle
- Diagnosing SAS Data Path Failures on Servers Using MegaRAID Disk Controllers
- Failure of a Single Server Fan Module Might Impact Performance
- Remove and Replace a Fan Module Within 60 Seconds
- Lockstep Memory (Channel) Mode Is Not Supported
- Do Not Install an Avago 10G SR Optical Transceiver Into an SFP28 Port
- Oracle Dual Port 25 Gb Ethernet Adapter Can Experience a Fault During System Reset
Oracle Dual Port 10/25GbE Ethernet Controller-LOM Firmware 214.2.271.5
Bug ID: 30658955
                     Issue: After updating Oracle Dual Port 10/25GbE
            Ethernet Controller LOM from 20.8.x to 214.2.271.5, its NCSI firmware goes into
                INIT state and requires an AC power cycle or the Oracle ILOM
            command hwdiag system info to initialize it. 
                  
Affected Hardware: Oracle Server X8-2, Oracle Dual Port 10/25GbE Ethernet Controller
Affected Software: Oracle Server X8-2 Software 3.1.0, Oracle Dual Port 10/25GbE Ethernet Controller-LOM Firmware 214.2.271.5
Workaround: Oracle Dual Port 10/25GbE Ethernet Controller-LOM Firmware 214.2.271.5 Caveats:
After updating Oracle Dual Port 10/25GbE Ethernet Controller LOM from 20.8.x to
            214.2.271.5, its NCSI firmware goes into INIT state and requires an
            AC power cycle or the Oracle ILOM command hwdiag system info to
            initialize it. 
                  
- 
                        
                        To perform an AC power cycle, you can unplug and then insert all server power cords while in Standby mode, or run the Oracle ILOM deep_power_cyclecommand from the Oracle ILOM SP console.For example, to run the Oracle ILOM deep_power_cyclecommand, type:-> stop -f /system Are you sure you want to immediately stop /System (y/n)? y Stopping /System immediately -> ls /System/ power_state /System Properties: power_state = Off -> set /System/ deep_power_cycle=true Set 'deep_power_cycle' to 'true' [false] Connection was aborted. ......Refer to the server Installation Guide for more information on controlling system power and power cables. 
- 
                        
                        To run the Oracle ILOM HWdiagcommand, type:-> start /sp/diag/shell Are you sure you want to start /SP/diag/shell (y/n)? y -> start /sp/diag/shell diag>hwdiag system infoRefer to the Oracle ILOM User's Guide for more information on running the x86 HWDiag Tool within the Oracle ILOM Diag Shell. 
Parent topic: Hardware Issues
10GBASE-T RJ-45 GbE Ports Are Not Operating And LEDs Remain Dark After Port Switch and Power Cycle
Bug ID: 30353680
                     Issue:  After switching onboard Oracle Dual Port
            10/25GbE Ethernet Controller active_media from SFP28 ports to RJ45
            ports, and after issuing a power cycle, both RJ45 ports are not operational and port
            LEDs are not lit. Oracle ILOM reports two RJ45 ports are active and status is up, but
            the two port LEDs stay unlit.
                  
The status reported up for two Oracle Dual Port 10/25GbE Ethernet Controller-LOM RJ45 ports, but 10GBASE-T RJ-45 GbE (NET 1 and NET 2) Ethernet port LEDs were unlighted after switching active_media from SFP28 to RJ45 and resetting host.
Affected Hardware: Oracle Server X8-2, Oracle Dual Port 10/25GbE Ethernet Controller
Affected Software: Oracle Server X8-2 Software 3.1.0, Oracle Dual Port 10/25GbE Ethernet Controller-LOM Firmware 214.2.271.5, Oracle Server X8-2 Software 1.1.0 and 1.1.1, Oracle Dual Port 10/25GbE Ethernet Controller-LOM Firmware 20.08.01.18
                     Workaround: To ensure that server back panel LEDs
            light and RJ45 ports are in use, reset the system after switching onboard Oracle Dual
            Port 10/25GbE Ethernet Controller active_media from SFP28 ports to
            RJ45 ports.
                  
Note:
Do not power off the server. Also reset the system after switchingactive_media from RJ45 ports to SFP28 ports.
      
                  -> set active_media=RJ45 Set 'active_media' to 'RJ45' The host must be reset or powered off for the new host media to take effect. -> reset /System/ Are you sure you want to reset /System (y/n)? y Performing hard reset on /System
                     Recovery: If a system already encounters this RJ45
            ports-off condition, AC power cycle the server to recover the RJ45 ports. You can unplug
            and then insert all server power cords while in Standby mode, or run the Oracle ILOM
                deep_power_cycle command from the Oracle ILOM SP console.
                  
For example, to run the Oracle ILOM deep_power_cycle command,
            type:
                  
-> stop -f /system
Are you sure you want to immediately stop /System (y/n)? y
Stopping /System immediately
-> ls /System/ power_state
 /System
    Properties:
        power_state = Off
-> set /System/ deep_power_cycle=true
Set 'deep_power_cycle' to 'true' [false]
Connection was aborted.
......Refer to the server Installation Guide for more information on controlling system power and power cables.
Parent topic: Hardware Issues
Diagnosing SAS Data Path Failures on Servers Using MegaRAID Disk Controllers
Important Operating Note
On Oracle x86 servers using MegaRAID disk controllers, Serial Attached SCSI (SAS) data path errors can occur. To triage and isolate a data path problem on the SAS disk controller, disk backplane (DBP), SAS cable, SAS expander, or hard disk drive (HDD), gather and review the events in the disk controller event log. Classify and analyze all failure events reported by the disk controller based on the server SAS topology.
To classify a MegaRAID disk controller event:
- 
                        
                        Gather and parse the MegaRAID disk controller event logs either by running the automated sundiag utility or manually using the or StorCLIcommand.- 
                              
                              For Oracle Exadata Database Machine database or storage cell servers, run the sundiag utility. 
- 
                              
                              For Oracle Server X8-2, use the StorCLIcommand.
 
- 
                              
                              
For example, manually gather and parse the controller event log by using the
                StorCLI command. At the root prompt, type:
                  
root# ./storcli64/c0 show events file=event.log Controller=0 Status=Success
Note:
Use the existing name of the event log as the name for the disk controller event log. This produces a MegaRAID controller event log with the given file nameevent.log.
      
                  To show drive and slot errors separately, at the root prompt, type:
root# /opt/MegaRAID/storcli/storcli64 /c0 /eall /sall show errorcounters Controller=0 Status=Success Description=Show Drive/Cable Error Counters Succeeded.
Error Counters:
| Drive | Error Counter for Drive Error | Error Counter for Slot | 
|---|---|---|
| /c0/e8/s0 | 0 | 0 | 
| /c0/e8/s1 | 0 | 0 | 
| /c0/e8/s2 | 0 | 0 | 
| /c0/e8/s3 | 0 | 0 | 
| /c0/e8/s4 | 0 | 0 | 
| /c0/e8/s5 | 0 | 0 | 
| /c0/e8/s12 | 0 | 0 | 
| /c0/e8/s13 | 0 | 0 | 
These error counters reflect drive or slot errors separately.
The following SCSI sense key errors found in the event log in SAS data path failures indicate a SAS data path fault:
B/4B/05 :SERIOUS: DATA OFFSET ERROR B/4B/03 :SERIOUS: ACK/NAK TIMEOUT B/47/01 :SERIOUS: DATA PHASE CRC ERROR DETECTED B/4B/00 :SERIOUS: DATA PHASE ERROR
A communication fault between the disk and the host bus adapter causes these errors. The presence of these errors, even on a single disk, means there is a data path issue. The RAID controller, SAS cables, SAS expander, or disk backplane might be causing the interruption to the communication in the path between the RAID controller and the disks.
Oracle Service personnel can find more information about the diagnosis and triage of hard disk and SAS data path failures on x86 servers at the My Oracle Support web site: https://support.oracle.com . Refer to the Knowledge Article Doc ID 2161195.1. If there are multiple, simultaneous disk problems on an Exadata server, Oracle Service personnel can refer to Knowledge Article Doc ID 1370640.1.
Parent topic: Hardware Issues
Failure of a Single Server Fan Module Might Impact Performance
Important Operating Note
If a single server fan module fails and the server's operating temperature rises above 30 degrees C (86 degrees F), the performance of the server's processors might be reduced.
Parent topic: Hardware Issues
Remove and Replace a Fan Module Within 60 Seconds
Important Operating Note
When removing and replacing a server fan module, you must complete the entire removal and replacement procedure within 60 seconds in order to maintain adequate cooling within the system. In anticipation of this time limit, prior to starting the replacement procedure, obtain the replacement fan module and verify that the new fan module is ready for installation. Remove and replace only one fan module at a time.
Fan modules are hot-swappable components, with N+1 fan redundancy. Each fan module contains two fans, with two fan motors per fan. The four fan motors provide separate tachometer signals so that the fan module reports four tachometer signals to Oracle ILOM. Even if only one fan motor is faulted within the fan module, the Oracle ILOM service processor detects that four fan motors have failed to spin while the fan module is removed. If the fan module is not replaced within 60 seconds of removal, Oracle ILOM will take the protective action to shut down the system to prevent thermal damage to the system. This is expected behavior.
Parent topic: Hardware Issues
Lockstep Memory (Channel) Mode Is Not Supported
Important Operating Note
Oracle Server X8-2 does not support lockstep memory mode, which is also known as double device data correction, or Extended ECC.
Parent topic: Hardware Issues
Do Not Install an Avago 10G SR Optical Transceiver Into an SFP28 Port
Important Operating Note
Do not install an Avago 10G SR Optical transceiver (part number AFBR-703SDDZ-SN1) into an SFP28 port on the Oracle Server X8-2. If installed, it could be extremely difficult to remove and might also cause damage to the SFP28 metal cage on the system motherboard.
Parent topic: Hardware Issues
Oracle Dual Port 25 Gb Ethernet Adapter Can Experience a Fault During System Reset
Bug ID: 26259122
Issue: The Oracle Dual Port 25 Gb Ethernet Adapter can experience a completion timeout fault during a system warm reset operation. The fault is logged by Oracle ILOM.
Affected Hardware: Oracle Dual Port 25 Gb Ethernet Adapter
Workaround: This issue has no functional impact on normal system behavior and can be ignored.
Parent topic: Hardware Issues