4 Oracle ILOM Issues

This section describes important operating notes and known Oracle ILOM issues for Oracle Server X9-2L.

For updated information about Oracle ILOM, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Servers Documentation - Systems Management.

Oracle ILOM Command Force Stop of PCIe Slot Power Can Cause Server PCIe Bus Error

Bug ID: 334503411

Issue: SW3.3.3 Enterprise Oracle ILOM build02: 5.1.0.23_r146986 Enhancement 34371396 implements CLI command stop /SYS/MB/PCIEn. Use of the stop /SYS/MB/PCIEn command can cause some Smart NIC PCIe cards to stop operation and the system may report a PCIe bus error. Systems with this firmware enhancement should not need to use this command if they do not have any Cavium LiquidIO III 100 Gb Network Interface Card (NIC)s installed in PCIe slots. If not resetting Smart NIC Add-In Card (AIC) PCIe slots, avoid using the stop /SYS/MB/PCIEn command on PCIe slots and cards unless instructed by Oracle Service personnel.

  1. When host power is off, you can start any PCIe slot power with the start /SYS/MB/PCIEn command and stop any PCIe slot power with the stop /SYS/MB/PCIEn command without PCIe bus errors.

    • -> stop /SYS/MB/PCIE#
    • -> start /SYS/MB/PCIE#
  2. Note: When host power is on, you can add use the –force option to force stop/start PCIe slot power. But there is a risk of causing system and PCIe bus errors.

    Before using the –force option to force stop or force start PCIe slot power, ensure the following preconditions.

    1. Verify that the UEFI BIOS has already enabled the PCIe slot hotplug feature.

    2. Verify that the OS is Linux UEK4 or UEK5, and the Cavium LiquidIO III 100 Gb Network Interface Card (NIC) remote console is idle.

    3. Shutdown PCIe communication/data traffic for the slot with the installed Cavium LiquidIO III 100 Gb Network Interface Card (NIC).

  3. Use these actions only for the PCIe slot when installing Cavium LiquidIO III 100 Gb Network Interface Card (NIC)s when host power is on.

    • -> stop –force /SYS/MB/PCIE#
    • -> start –force /SYS/MB/PCIE#

Cavium LiquidIO III 100 Gb Network Interface Card (NIC) may require start/stop power cycles without affecting other AIC cards using the -force option to start or stop PCIe slot power when server host main power is on. For all other AIC cards installed in system configurations, exercise caution when using the stop /SYS/MB/PCIEn command to force to start/stop PCIe slot power. Using force stop on Oracle Flash Accelerator F640 PCIe Cards and Oracle Quad Port 10GBase-T Adapters installed in PCIe slots may result in PCIe bus issues. Refer to Enhancement 34371396 for details.

Example of force stop command that may cause a reset:

-> stop /SYS/MB/PCIE8
Are you sure you want to stop /SYS/MB/PCIE8 (y/n)? y
stop: Operation is not allowed when Host power is on.
-> stop -f /SYS/MB/PCIE8
Are you sure you want to immediately stop /SYS/MB/PCIE8 (y/n)? y
Stopping /SYS/MB/PCIE8 immediately
-> ls
 /SYS/MB/PCIE8
    Targets:        POWER        PRSNT        SERVICE
    Properties:        type = PCIE Module        fault_state = OK
        clear_fault_action = (none)        power_state = Off    
Commands:        cd        reset        set        show        start        stop
-> show faulty
Target                       |
          Property                          | Value      
-----------------------------+-----------------------------------+------------
-> show /sp/logs/event/list/
Event
ID     Date/Time                 Class          Type      Severity
-----  ------------------------  --------  --------  --------
324    Wed Aug 10 06:41:56 2022  Chassis   State     minor   
       /SYS/MB/PCIE8 power is disabled
323    Wed Aug 10 06:41:51 2022  Power     Off       major   
       Power to /SYS/MB/PCIE8 has been turned off by: Shell session, 
Username:
322    Wed Aug 10 05:23:51 2022  IPMI      Log       minor   
       ID =  131 : 08/10/2022 : 05:23:51 : System Firmware Progress : BIOS : 
System boot initiated : Asserted

Affected Hardware: Oracle Server X9-2, Oracle Server X9-2L, Oracle Server X7-2, Oracle Server X7-2L

Affected Software:

x86 server software Oracle ILOM releases or later: SW3.3.3 Enterprise ILOM build02: 5.1.0.23_r146986, BIOS: 42120100

Oracle Server X9-2, Oracle Server X9-2L, Oracle Server X7-2, Oracle Server X7-2L

For updated information about Oracle ILOM, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Servers Documentation - Systems Management.

Workaround: Use Oracle ILOM CLI command stop /SYS/MB/PCIEn only for Cavium LiquidIO III 100 Gb Network Interface Card (NIC) slot required start/stop power cycles. Do not use Oracle ILOM CLI command stop /SYS/MB/PCIEn for any other purpose unless instructed by Oracle Service personnel.

Oracle Service personnel can find more information about x86 servers at My Oracle Support. Refer to Oracle ILOM update Enh 34371396 - Added --force to start/stop PCIe slot power.

DIMM Fault SPX86A-800A-95 - Memtest Single Symbol Test Failed - ILOM 5.1.0.21

Bug ID: 34325538, 34445460

Issue: The following DIMM Fault message is seen: SPX86A-800A-95 - Memtest Single Symbol Test Failed (Doc ID 2317012.1) SPX86A-800A-95 indicates that the ILOM fault manager has received an error report indicating a memory DIMM produced correctable errors (CE) during both passes of the memory test.

If the server encounters multiple runtime memory fault related events, increased runtime error messages may be related to DIMM memory testing conditions. Oracle ILOM Adaptive Double DRAM Device Correction (ADDDC) and Post Package Repair (PPR) features are enabled in the server firmware. ADDDC Sparing is a RAS feature to test memory reliability. The Advanced Memory Test (AMT) in the Memory Reference Code (MRC) can fail a DIMM with a single symbol error and then PPR would try to repair the defect.

When enabled, PPR may be able to repair affected DRAM areas on a DIMM. PPR runs when ADDDC was previously activated before reboot or MRC initialization failed memory tests. Upon encountering any memory related fault event during MRC initialization or experiencing certain memory correctable events during runtime that triggers ADDDC on first occurrence, then PPR would be activated after the next system initialization/reboot and attempt to repair the DIMM.

Note:

Certain DIMM manufacturers may exhibit different memory failure patterns, and may not support soft PPR configuration (which enables temporarily attempting a repair action).

Affected Hardware: Oracle Server X9-2, Oracle Server X9-2L, Oracle Server X8-8, Oracle Server X8-2, Oracle Server X8-2L, Oracle Server X7-8, Oracle Server X7-2, Oracle Server X7-2L

Note:

Not all server systems enable ADDDC.

Affected Software:

The following x86 server software Oracle ILOM releases or later, support PPR (Post-Package Repair).

  • Oracle Server X9-2 SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X9-2L SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X8-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X8-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X8-2L SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X7-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-2L SW3.2.2 ILOM 5.0.2.24

Workaround: Some DIMM faults are recoverable errors if PPR is enabled on the server. If multiple DIMM memory errors are detected on the server:

  1. Log in to the Oracle ILOM command-line interface (CLI) using an account with admin (a) role privileges.
  2. From the Oracle ILOM CLI, launch the Oracle ILOM Fault Management Shell.
    -> start /SP/faultmgmt/shell 
    Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
  3. Display information about server components using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm faulty
  4. Manually clear server faults using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm repair <FRU>
  5. Exit the Oracle ILOM Fault Management Shell and return to the the Oracle ILOM CLI command prompt.
    faultmgmtsp> exit
  6. Upgrade the server to the latest ILOM/UEFI firmware release that supports PPR. The system resets during the firmware upgrade and runs memory tests again.
  7. If memory related events faults continue to be logged, replace the faulted DIMMs in the server. Log an Oracle Support case through the support portal for further assistance.

For updated information about Oracle ILOM, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Servers Documentation - Systems Management.

Oracle Service personnel can find more information about the diagnosis and triage of DIMM Fault failures on x86 servers at My Oracle Support. Refer to the Knowledge Article Doc ID 2698328.1. If there are multiple, simultaneous DIMM Fault message problems on a server, Oracle Service personnel can refer to Knowledge Articles Doc IDs 1603015.1 (KA single symbol error) and 2317012.1 (KA multiple symbol errors).

Note:

Adaptive Double DRAM Device Correction (ADDDC) is also referred to as Adaptive Device Correction (ADC) in some Oracle documents.

BIOS Setup Utility Looks Distorted When Accessed Through Oracle ILOM Remote System Console

Bug ID 32035569

Issue: If you disable serial console redirection, and then launch the BIOS Setup Utility through the Oracle ILOM Remote System Console, the main screen of the BIOS Setup Utility is distorted upon first accessing the utility.

Workaround: From the main screen of the BIOS Setup Utility, click the Advanced tab, and then return to the Main tab. Returning to the main screen of the BIOS Setup Utility clears the distortion.

Virtual Boot Device Is Unavailable After Disabling SSL From Oracle ILOM Remote System Console

Bug ID 32403589

Issue: If you clear the SSL Enable check box from the KVMS and Storage menu of the Oracle ILOM Remote System Console, and then you add a Linux ISO install boot device, the virtual boot device might not be displayed as an available boot device from the system UEFI Menu.

Workaround: Restore the default factory settings for KVMS and storage devices, and reboot the system. You can then return to the Oracle ILOM Remote System Console, clear the SSL Enable check box again, and the virtual boot device will now be displayed as available from the UEFI Menu.

Resolving Warning Messages for Custom CA and Self-Signed SSL Certificates

Important Operating Note

The following information applies to users of the Oracle ILOM Remote System Console and the Oracle ILOM Remote System Console Plus.

A warning message occurs when the Java client is not properly configured to validate the Secure Sockets Layer (SSL) certificate that is currently being used by Oracle ILOM. This validation behavior applies to Oracle ILOM firmware version 3.2.8 or later for systems using the default self-signed SSL certificate, and to Oracle ILOM firmware version 3.2.10 and later for systems using a Custom Certification Authority (CA) SSL certificate.

To resolve the SSL warning message, refer to the following applicable sections in the Oracle ILOM Administrator’s Guide for Configuration and Maintenance Firmware Release 5.1.x, which is available at Servers Documentation -Systems Management:

The Default Baud Rate for the SER MGT Port Is 115200

Important Operating Note

When attempting to connect to the Oracle ILOM Service Processor using the Oracle Server X9-2L SER MGT port, the default baud rate for the port configured in BIOS is 115200. For many other Oracle servers, the default baud rate is 9600.