2 Hardware Issues

This section describes important operating notes and known hardware issues for Oracle Server X7-8.

Diagnosing SAS Data Path Failures on Servers Using MegaRAID Disk Controllers

Important Operating Note

On Oracle x86 servers using MegaRAID disk controllers, Serial Attached SCSI (SAS) data path errors can occur. To triage and isolate a data path problem on the SAS disk controller, disk backplane (DBP), SAS cable, SAS expander, or hard disk drive (HDD), gather and review the events in the disk controller event log. Classify and analyze all failure events reported by the disk controller based on the server SAS topology.

To classify a MegaRAID disk controller event:

  • Gather and parse the MegaRAID disk controller event logs either by running the automated sundiag utility or manually using the StorCLI command.

    • For Oracle Exadata Database Machine database or storage cell servers, run the sundiag utility.

    • For Oracle Server X7-8, use the StorCLI command.

4-Socket Configuration Controller Event Log

Manually gather and parse the controller event log by using the StorCLI command.

At the root prompt, type:

# storcli /c0/eall/sall show errorcounters
Controller=0
Status=Success
Description = Show Drive/Cable Error Counters Succeeded.

Note:

Use the existing name of the event log as the name for the disk controller event log. This produces a MegaRAID controller event log with the given file name event.log.

These error counters reflect drive or slot errors separately. The following table contains the drive, and error counter for the driver error and the slot.

Drive Error Counter for Drive Error Error Counter for Slot

/c0/e252/s0

0

0

/c0/e252/s1

0

0

/c0/e252/s2

0

1

/c0/e252/s3

0

0

8-Socket Configuration Controller Event Log

Manually gather and parse the controller event log by using the StorCLI command.

Note:

Use the existing name of the event log as the name for the disk controller event log. This produces a MegaRAID controller event log with the given file name event.log.

For an 8-socket configuration, run storcli twice, one for c0 and one for c1:

Run storcli for c0:

[root@sca05-0a81e75a ~]# storcli /c0/eall/sall show errorcounters
Controller = 0
Status = Success
Description = Show Drive/Cable Error Counters Succeeded.

Run storcli for c1:

[root@sca05-0a81e75a ~]# storcli /c1/eall/sall show errorcounters
Controller = 0
Status = Success
Description = Show Drive/Cable Error Counters Succeeded.

The following table contains the drive, and error counter for the driver error and the slot for c0.

Drive Error Counter for Drive Error Error Counter for Slot

/c0/e252/s0

0

0

/c0/e252/s1

0

0

/c0/e252/s2

0

1

/c0/e252/s3

0

0

The following table contains the drive, and error counter for the driver error and the slot for c1.

Drive Error Counter for Drive Error Error Counter for Slot

/c1/e252/s0

0

0

/c1/e252/s1

0

0

/c1/e252/s2

0

0

/c1/e252/s3

0

0

The following SCSI sense key errors found in the event log in SAS data path failures indicate a SAS data path fault:

B/4B/05 :SERIOUS: DATA OFFSET ERROR
B/4B/03 :SERIOUS: ACK/NAK TIMEOUT
B/47/01 :SERIOUS: DATA PHASE CRC ERROR DETECTED
B/4B/00 :SERIOUS: DATA PHASE ERROR

A communication fault between the disk and the host bus adapter causes these errors. The presence of these errors, even on a single disk, means there is a data path issue. The RAID controller, SAS cables, SAS expander, or disk backplane might be causing the interruption to the communication in the path between the RAID controller and the disks.

Oracle Service personnel can find more information about the diagnosis and triage of hard disk and SAS data path failures on x86 servers at the My Oracle Support web site: https://support.oracle.com . Refer to the Knowledge Article Doc ID 2161195.1. If there are multiple, simultaneous disk problems on an Exadata server, Oracle Service personnel can refer to Knowledge Article Doc ID 1370640.1.

Failure of a Single Server Fan Module Might Impact Performance

Important Operating Note

If a single server fan module fails and the server's operating temperature rises above 30 degrees C (86 degrees F), the performance of the server processors might be reduced.

Remove and Replace a Fan Module Within 30 Seconds

Important Operating Note

When removing and replacing a server fan module, you must complete the entire removal and replacement procedure within 30 seconds in order to maintain adequate cooling within the system. In anticipation of this time limit, prior to starting the replacement procedure, obtain the replacement fan module and verify that the new fan module is ready for installation. Remove and replace only one fan module at a time.

Fan modules are hot-swappable components, with N+1 fan redundancy. Each fan module contains two complete counter-rotating fans with two fan rotors per fan. Fan rotors provide separate tachometer signals so that the fan module reports tach signals to Oracle ILOM. Even if only one fan is faulted within the fan module, the Oracle ILOM service processor detects that fans have failed to spin while the fan module is being removed for replacement. If replacing the fan module is not replaced within 30 seconds of removal, Oracle ILOM will take protective action to shut down the system to prevent thermal damage to the system. This is expected behavior.

Lockstep Memory (Channel) Mode Is Not Supported

Important Operating Note

Oracle Server X7-8 does not support lockstep memory mode, which is also known as double device data correction, or Extended ECC.

MAC Address Mapping to Ethernet Ports

Important Operating Note

A system serial label that displays the MAC ID (and the associated barcode) for the server is attached to the top, front-left side of the Oracle Server X7-8 server disk cage bezel.

This MAC ID (and barcode) corresponds to a hexadecimal (base 16) MAC address for a sequence of six consecutive MAC addresses. These six MAC addresses correspond to the server network ports, as shown in the following table.

Base MAC Address Corresponding Ethernet Port

“base” + 0

NET 0

“base” + 1

NET 1

“base” + 2

NET 2

“base” + 3

NET 3

“base” + 4

SP (NET MGT)

“base” + 5

Used only when Network Controller-Sideband Interface (NC-SI) sideband management is configured.

Oracle Dual Port 25 Gb Ethernet Adapter Hot Plug Is Not Supported

Bug ID: 26713370

Issue: Oracle Dual Port 25 Gb Ethernet Adapters are supported by all supported operating systems. However, Oracle Dual Port 25 Gb Ethernet Adapter hot plug is not supported on Oracle Server X7-8 software release 1.0.0.

Affected Hardware: Oracle Dual Port 25 Gb Ethernet Adapter

Workaround: You must power down the system before removing and replacing Oracle Dual Port 25 Gb Ethernet Adapters.

Oracle ILOM Reports Fault with Oracle Dual Port 25 Gb Ethernet Adapter During System Reset

Bug ID: 26259122

Issue: The Oracle Dual Port 25 Gb Ethernet Adapter can experience a completion timeout fault during a system warm reset operation. The fault is logged by Oracle ILOM.

Affected Hardware: Oracle Dual Port 25 Gb Ethernet Adapter

Workaround: This issue has no functional impact on normal system behavior and can be ignored.

Oracle ILOM Reports Fault alert.chassis.domain.boot.power-on-failed After System Stop

Bug ID: 26629988

Issue: On rare occasion, the Oracle Server X7-8 may remain powered off after the Oracle ILOM start /System command, an SMOD power button press, or after the host reset. When this condition occurs, Oracle ILOM reports the following failure: alert.chassis.domain.boot.power-on-failed.

Affected Hardware: Oracle Server X7-8

Workaround: Power on the server again. If the second power-on attempt fails, contact an Oracle Service representative for assistance.