5 Oracle ILOM Issues

This section describes important operating notes and known Oracle ILOM issues for Oracle Server X7-2L.

For updated information about Oracle ILOM, refer to the latest Oracle ILOM documents at Servers Documentation - Systems Management.

Oracle ILOM Command Force Stop of PCIe Slot Power Can Cause Server PCIe Bus Error

Bug ID: 334503411

Issue: SW3.3.3 Enterprise Oracle ILOM build02: 5.1.0.23_r146986 Enhancement 34371396 implements CLI command stop /SYS/MB/PCIEn. Use of the stop /SYS/MB/PCIEn command can cause some Smart NIC PCIe cards to stop operation and the system may report a PCIe bus error. Systems with this firmware enhancement should not need to use this command if they do not have any Cavium LiquidIO III 100 Gb Network Interface Card (NIC)s installed in PCIe slots. If not resetting Smart NIC Add-In Card (AIC) PCIe slots, avoid using the stop /SYS/MB/PCIEn command on PCIe slots and cards unless instructed by Oracle Service personnel.

  1. When host power is off, you can start any PCIe slot power with the start /SYS/MB/PCIEn command and stop any PCIe slot power with the stop /SYS/MB/PCIEn command without PCIe bus errors.

    • -> stop /SYS/MB/PCIE#
    • -> start /SYS/MB/PCIE#
  2. Note: When host power is on, you can add use the –force option to force stop/start PCIe slot power. But there is a risk of causing system and PCIe bus errors.

    Before using the –force option to force stop or force start PCIe slot power, ensure the following preconditions.

    1. Verify that the UEFI BIOS has already enabled the PCIe slot hotplug feature.

    2. Verify that the OS is Linux UEK4 or UEK5, and the Cavium LiquidIO III 100 Gb Network Interface Card (NIC) remote console is idle.

    3. Shutdown PCIe communication/data traffic for the slot with the installed Cavium LiquidIO III 100 Gb Network Interface Card (NIC).

  3. Use these actions only for the PCIe slot when installing Cavium LiquidIO III 100 Gb Network Interface Card (NIC)s when host power is on.

    • -> stop –force /SYS/MB/PCIE#
    • -> start –force /SYS/MB/PCIE#

Cavium LiquidIO III 100 Gb Network Interface Card (NIC) may require start/stop power cycles without affecting other AIC cards using the -force option to start or stop PCIe slot power when server host main power is on. For all other AIC cards installed in system configurations, exercise caution when using the stop /SYS/MB/PCIEn command to force to start/stop PCIe slot power. Using force stop on Oracle Flash Accelerator F640 PCIe Cards and Oracle Quad Port 10GBase-T Adapters installed in PCIe slots may result in PCIe bus issues. Refer to Enhancement 34371396 for details.

Example of force stop command that may cause a reset:

-> stop /SYS/MB/PCIE8
Are you sure you want to stop /SYS/MB/PCIE8 (y/n)? y
stop: Operation is not allowed when Host power is on.
-> stop -f /SYS/MB/PCIE8
Are you sure you want to immediately stop /SYS/MB/PCIE8 (y/n)? y
Stopping /SYS/MB/PCIE8 immediately
-> ls
 /SYS/MB/PCIE8
    Targets:        POWER        PRSNT        SERVICE
    Properties:        type = PCIE Module        fault_state = OK
        clear_fault_action = (none)        power_state = Off    
Commands:        cd        reset        set        show        start        stop
-> show faulty
Target                       |
          Property                          | Value      
-----------------------------+-----------------------------------+------------
-> show /sp/logs/event/list/
Event
ID     Date/Time                 Class          Type      Severity
-----  ------------------------  --------  --------  --------
324    Wed Aug 10 06:41:56 2022  Chassis   State     minor   
       /SYS/MB/PCIE8 power is disabled
323    Wed Aug 10 06:41:51 2022  Power     Off       major   
       Power to /SYS/MB/PCIE8 has been turned off by: Shell session, 
Username:
322    Wed Aug 10 05:23:51 2022  IPMI      Log       minor   
       ID =  131 : 08/10/2022 : 05:23:51 : System Firmware Progress : BIOS : 
System boot initiated : Asserted

Affected Hardware: Oracle Server X9-2, Oracle Server X9-2L, Oracle Server X7-2, Oracle Server X7-2L

Affected Software:

x86 server software Oracle ILOM releases or later: SW3.3.3 Enterprise ILOM build02: 5.1.0.23_r146986, BIOS: 42120100

Oracle Server X9-2, Oracle Server X9-2L, Oracle Server X7-2, Oracle Server X7-2L

For details, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Systems Management Documentation.

Workaround: Use Oracle ILOM CLI command stop /SYS/MB/PCIEn only for Cavium LiquidIO III 100 Gb Network Interface Card (NIC) slot required start/stop power cycles. Do not use Oracle ILOM CLI command stop /SYS/MB/PCIEn for any other purpose unless instructed by Oracle Service personnel.

Oracle Service personnel can find more information about x86 servers at My Oracle Support. Refer to Oracle ILOM update Enh 34371396 - Added --force to start/stop PCIe slot power.

DIMM Fault SPX86A-800A-95 - Memtest Single Symbol Test Failed - ILOM 5.1.0.21

Bug ID: 34325538, 34445460

Issue: The following DIMM Fault message is seen: SPX86A-800A-95 - Memtest Single Symbol Test Failed (Doc ID 2317012.1) SPX86A-800A-95 indicates that the ILOM fault manager has received an error report indicating a memory DIMM produced correctable errors (CE) during both passes of the memory test.

If the server encounters multiple runtime memory fault related events, increased runtime error messages may be related to DIMM memory testing conditions. Oracle ILOM Adaptive Double DRAM Device Correction (ADDDC) and Post Package Repair (PPR) features are enabled in the server firmware. ADDDC Sparing is a RAS feature to test memory reliability. The Advanced Memory Test (AMT) in the Memory Reference Code (MRC) can fail a DIMM with a single symbol error and then PPR would try to repair the defect.

When enabled, PPR may be able to repair affected DRAM areas on a DIMM. PPR runs when ADDDC was previously activated before reboot or MRC initialization failed memory tests. Upon encountering any memory related fault event during MRC initialization or experiencing certain memory correctable events during runtime that triggers ADDDC on first occurrence, then PPR would be activated after the next system initialization/reboot and attempt to repair the DIMM.

Note:

Certain DIMM manufacturers may exhibit different memory failure patterns, and may not support soft PPR configuration (which enables temporarily attempting a repair action).

Affected Hardware: Oracle Server X9-2, Oracle Server X9-2L, Oracle Server X8-8, Oracle Server X8-2, Oracle Server X8-2L, Oracle Server X7-8, Oracle Server X7-2, Oracle Server X7-2L

Note:

Not all server systems enable ADDDC.

Affected Software:

The following x86 server software Oracle ILOM releases or later, support PPR (Post-Package Repair).

  • Oracle Server X9-2 SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X9-2L SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X8-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X8-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X8-2L SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X7-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-2L SW3.2.2 ILOM 5.0.2.24

Workaround: Some DIMM faults are recoverable errors if PPR is enabled on the server. If multiple DIMM memory errors are detected on the server:

  1. Log in to the Oracle ILOM command-line interface (CLI) using an account with admin (a) role privileges.
  2. From the Oracle ILOM CLI, launch the Oracle ILOM Fault Management Shell.
    -> start /SP/faultmgmt/shell 
    Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
  3. Display information about server components using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm faulty
  4. Manually clear server faults using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm repair <FRU>
  5. Exit the Oracle ILOM Fault Management Shell and return to the the Oracle ILOM CLI command prompt.
    faultmgmtsp> exit
  6. Upgrade the server to the latest ILOM/UEFI firmware release that supports PPR. The system resets during the firmware upgrade and runs memory tests again.
  7. If memory related events faults continue to be logged, replace the faulted DIMMs in the server. Log an Oracle Support case through the support portal for further assistance.

For updated information about Oracle ILOM, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Servers Documentation - Systems Management.

Oracle Service personnel can find more information about the diagnosis and triage of DIMM Fault failures on x86 servers at My Oracle Support. Refer to the Knowledge Article Doc ID 2698328.1. If there are multiple, simultaneous DIMM Fault message problems on a server, Oracle Service personnel can refer to Knowledge Articles Doc IDs 1603015.1 (KA single symbol error) and 2317012.1 (KA multiple symbol errors).

Note:

Adaptive Double DRAM Device Correction (ADDDC) is also referred to as Adaptive Device Correction (ADC) in some Oracle documents.

SSL Must Be Turned On When Booting a Redirected ISO Image

Important Operating Note

When booting a redirected ISO image for an operating system installation, SSL (Secure Sockets Layer) must be turned on. This is the default setting. If SSL is not turned on, the installation might stall or fail. This affects all supported operating systems.

Reset Takes a Long Time and Causes the Server to Power Cycle

Important Operating Note

If you have a pending BIOS upgrade, a routine reset takes longer than expected, and causes the server to power cycle and reboot several times. This is expected behavior, as it is necessary to power cycle the server to upgrade the BIOS firmware. If the upgrade includes an FPGA update, it can take more than 30 minutes to complete the upgrade, and the service processor (SP) resets during the process. Since the SP resets during the process, you will need to reestablish any active connection to the SP after the update.

A pending BIOS upgrade exists when the following conditions are true:

  • You update the BIOS and service processor firmware using Oracle Integrated Lights Out Manager (ILOM).

  • You select the Oracle ILOM option to Delay BIOS Upgrade.

  • The host is powered on.

If you then reboot the server expecting a routine server reset and instead initiate a (delayed) BIOS upgrade, wait until the upgrade is finished. Do not interrupt the process, as this can result in corrupted firmware and server down time.

Caution:

Firmware Data corruption and system downtime. Interrupting the firmware upgrade process can corrupt the firmware and render the server inoperable. Do not interrupt the upgrade. Allow the process to finish.

Note:

Oracle ILOM and BIOS updates are designed to work together. When you have a pending BIOS upgrade, it is recommended that you install the upgrade by resetting or power cycling your server as soon as possible.

For details, refer to “Update the BIOS and Service Processor Firmware (Oracle ILOM)" in the Oracle X7 Series Servers Administration Guide at https://www.oracle.com/goto/x86admindiag/docs .

Deprecation Notice for Oracle ILOM IPMI 2.0 Management Service

Important Operating Note

Present Behavior: IPMI 2.0 Management Sessions - Enabled (default setting).

Future Behavior: The following IPMI Management Service changes will occur in a future Oracle ILOM firmware release after firmware version 4.0.2.

  • First IPMI Service Support Change – The default configuration property for IPMI 2.0 Sessions will change from Enabled to Disabled. Clients relying on Oracle ILOM IPMI 2.0 session support by default will no longer be able to communicate with Oracle ILOM. To enable IPMI communication with Oracle ILOM, perform one of the following:

    • Use the Oracle IPMI TLS service and interface. For more information, refer to “IPMI TLS Service and Interface” in the Oracle ILOM Protocol Management Reference SNMP and IPMI Firmware Release 4.0.x.

      or

    • Manually enable the configuration property for IPMI 2.0 Session. For details, refer to “IPMI Service Configuration Properties” in the Oracle ILOM Administrator’s Guide for Configuration and Maintenance Firmware Release 4.0.x.

  • Second IPMI Service Support Change – Removal of IPMI 2.0 client support. IPMI 2.0 clients no longer will be able to communicate with Oracle ILOM. Clients relying on IPMI communication will need to use the IPMI TLS service and interface. For more information, refer to “IPMI TLS Service and Interface” in the Oracle ILOM Protocol Management Reference SNMP and IPMI Firmware Release 4.0.x.

For future updates about IPMI Management Service support in Oracle ILOM, refer to the latest firmware release information published in the Oracle ILOM Feature Updates and Release Notes Firmware Release 4.0.x. You can access the Oracle ILOM documents at https://www.oracle.com/goto/ilom/docs .

Resolving Warning Messages for Custom CA and Self-Signed SSL Certificates

Important Operating Note

The following information applies to users of the Oracle ILOM Remote System Console and the Oracle ILOM Remote System Console Plus.

A warning message occurs when the Java client is not properly configured to validate the Secure Sockets Layer (SSL) certificate that is currently being using by Oracle ILOM. This validation behavior applies to Oracle ILOM firmware version 3.2.8 or later for systems using the default self-signed SSL certificate, and to Oracle ILOM firmware version 3.2.10 and later for systems using a Custom Certification Authority (CA) SSL certificate.

To resolve the SSL warning message, refer to the following applicable sections in the Oracle ILOM Administrator’s Guide for Configuration and Maintenance Firmware Release 4.0.x, which is available at https://www.oracle.com/goto/ilom/docs :

  • “Warning Messages for Self-Signed SSL Certificate”

  • “Resolving Warning Messages for Custom Certification Authority (CA) SSL Certificate”

Changes to TLSv1.1 Configuration Property as of ILOM 4.0.3.x

Important Operating Note

Present Behavior: The Oracle ILOM TLSv1.1 configuration property is Enabled by default.

Future Behavior: The following changes will occur to the TLSv1.1 configuration property sometime after the Oracle ILOM 4.0.3 firmware release:

  • First Change: The TLSv1.1 configuration property will default to Disabled in the next minor release of Oracle ILOM.

  • Second Change: The TLSv1.1 configuration property will no longer be supported and will be removed from all Oracle ILOM user interfaces in the next major release of Oracle ILOM.

For future updates regarding TLSv1.1 support in Oracle ILOM, refer to latest release information in the Oracle ILOM Feature Updates and Release Notes for Firmware 4.0.x at https://docs.oracle.com/cd/E81115_01/index.html .