6.1 Oracle Private Cloud Appliance Hardware

This section describes hardware-related limitations and workarounds.

6.1.1 Compute Node Boot Sequence Interrupted by LSI Bios Battery Error

When a compute node is powered off for an extended period of time, a week or longer, the LSI BIOS may stop because of a battery error, waiting for the user to press a key in order to continue.

Workaround: Wait for approximately 10 minutes to confirm that the compute node is stuck in boot. Use the Reprovision button in the Oracle Private Cloud Appliance Dashboard to reboot the server and restart the provisioning process.

Bug 16985965

6.1.2 Reboot From Oracle Linux Prompt May Cause Management Node to Hang

When the reboot command is issued from the Oracle Linux command line on a management node, the operating system could hang during boot. Recovery requires manual intervention through the server ILOM.

Workaround: When the management node hangs during (re-)boot, log in to the ILOM and run these two commands in succession: stop -f /SYS and start /SYS. The management node should reboot normally.

Bug 28871758

6.1.3 NM2-36P Sun Datacenter InfiniBand Expansion Switch Firmware Upgrade 2.2.9-3 Requires A Two-Phased Procedure

Recent InfiniBand switches use a power supply that requires a newer firmware version. Because some firmware versions may cause the ILOM shell to hang, Oracle PCA requires that you install firmware version 2.2.9-3. In this version, the ILOM issue has been addressed by setting the parameter polling_retry_number to a value of 5.

Oracle PCA racks shipped prior to Release 2.3.4 all contain InfiniBand switches with firmware version 2.1.8-1 or older. Because the firmware has changed from unsigned to signed packages, there is no direct upgrade path to version 2.2.9-3. Therefore, an intermediate upgrade to unsigned version 2.2.7-2 is required.

Workaround: Upgrade the firmware of both NM2-36P Sun Datacenter InfiniBand Expansion Switches twice: first to version 2.2.7-2, then to version 2.2.9-3. Both required firmware versions are provided as part of the Oracle PCA Release 2.3.4 controller software. For upgrade instructions, refer to the section Upgrading the NM2-36P Sun Datacenter InfiniBand Expansion Switch Firmware in the Oracle Private Cloud Appliance Administrator's Guide.

Note

Firmware version 2.2.7-2 is delivered as part of the 2.2.7-1 package. Instructions are in included in the readme file inside the 2.2.7-1 directory.

Bugs 27724015 and 27275899

6.1.4 Oracle ZFS Storage Appliance Firmware Upgrade 8.7.20 Requires A Two-Phased Procedure

Oracle PCA racks shipped prior to Release 2.3.4 have all been factory-installed with an older version of the Operating Software (AK-NAS) on the controllers of the ZFS Storage Appliance. A new version has been qualified for use with Oracle PCA Release 2.3.4, but a direct upgrade is not possible. An intermediate upgrade to version 8.7.14 is required.

Workaround: Upgrade the firmware of storage heads twice: first to version 8.7.14, then to version 8.7.20. Both required firmware versions are provided as part of the Oracle PCA Release 2.3.4 controller software. For upgrade instructions, refer to the section Upgrading the Operating Software on the Oracle ZFS Storage Appliance in the Oracle Private Cloud Appliance Administrator's Guide.

Bug 28913616

6.1.5 Interruption of iSCSI Connectivity Leads to LUNs Remaining in Standby

If network connectivity between compute nodes and their LUNs is disrupted, it may occur that one or more compute nodes mark one or more iSCSI LUNs as being in standby state. The system cannot automatically recover from this state without operations requiring downtime, such as rebooting VMs or even rebooting compute nodes. The standby LUNs are caused by the specific methods that the Linux kernel and the ZFS Storage Appliance use to handle failover of LUN paths.

Workaround: This issue was resolved in the ZFS Storage Appliance firmware version AK 8.7.6. Customers who have run into issues with missing LUN paths and standby LUNs, should update the ZFS Storage Appliance firmware to version AK 8.7.6 or later before upgrading Oracle Private Cloud Appliance.

Bug 24522087

6.1.6 Emulex Fibre Channel HBAs Discover Maximum 256 LUN Paths

When using optional Broadcom/Emulex Fibre Channel expansion cards in Oracle Server X8-2 compute nodes, and your FC configuration results in more than 256 LUN paths between the compute nodes and the FC storage hardware, it may occur that only 256 paths are discovered. This is typically caused by a driver parameter for Emulex HBAs.

Workaround: Update the Emulex lpcf driver settings by performing the steps below on each affected compute node.

  1. On the compute node containing the Emulex card, modify the file /etc/default/grub. At the end of the GRUB_CMDLINE_LINUX parameter, append the scsi_mod and lpfc module options shown.

    GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=vg/lvroot rd.lvm.lv=vg/lvswap \
    rd.lvm.lv=vg/lvusr rhgb quiet numa=off transparent_hugepage=never \
    scsi_mod.max_luns=4096 scsi_mod.max_report_luns=4096 lpfc.lpfc_max_luns=4096"
  2. Rebuild the grub configuration with the new parameters.

    # grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
  3. Reboot the compute node.

Bug 30461433

6.1.7 Fibre Channel LUN Path Discovery Is Disrupted by Other Oracle VM Operations

During the setup of Fibre Channel storage, when the zones on the FC switch have been created, the LUNs become visible to the connected compute nodes. Discovery operations are started automatically, and all discovered LUNs are added to the multipath configuration on the compute nodes. If the storage configuration contains a large number of LUNs, the multipath configuration may take a long time to complete. As long as the multipath configuration has not finished, the system is under high load, and concurrent Oracle VM operations may prevent some of the FC LUN paths from being added to multipath.

Workaround: It is preferred to avoid Oracle VM operations during FC LUN discovery. Especially all operations related to compute node provisioning and tenant group configuration are disruptive, because they include refreshing the storage layer. When LUNs become visible to the compute nodes, they are detected almost immediately. In contrast, the multipath configuration stage is time-consuming and resource-intensive.

Use the lsscsi command to determine the number of detected LUN paths. The command output is equal to the number of LUN paths plus the system disk. Next, verify that all paths have been added to multipath. The multipath configuration is complete once the multipath -ll command output is equal to the output of the lsscsi command minus 1 (for the system disk).

# lsscsi | wc -l
251
# multipath -ll | grep "active ready running" | wc -l
250

When you have established that the multipath configuration is complete, all Oracle VM operations can be resumed.

Bug 30461555

6.1.8 Poor Oracle VM Performance During Configuration of Fibre Channel LUNs

Discovering Fibre Channel LUNs is a time-consuming and resource-intensive operation. As a result, Oracle VM jobs take an unusually long time to complete. Therefore, it is advisable to complete the FC storage configuration and make sure that the configuration is stable before initiating new Oracle VM operations.

Workaround: Schedule Fibre Channel storage setup and configuration changes at a time when no other Oracle VM operations are required. Verify that all FC configuration jobs have been completed, as explained in Section 6.1.7, “Fibre Channel LUN Path Discovery Is Disrupted by Other Oracle VM Operations”. When the FC configuration is finished, all Oracle VM operations can be resumed.

Bug 30461478

6.1.9 ILOM Firmware Does Not Allow Loopback SSH Access

In Oracle Integrated Lights Out Manager (ILOM) firmware releases newer than 3.2.4, the service processor configuration contains a field, called allowed_services, to control which services are permitted on an interface. By default, SSH is not permitted on the loopback interface. However, Oracle Enterprise Manager uses this mechanism to register Oracle Private Cloud Appliance management nodes. Therefore, SSH must be enabled manually if the ILOM version is newer than 3.2.4.

Workaround: On management nodes running an ILOM version more recent than 3.2.4, make sure that SSH is included in the allowed_services field of the network configuration. Log into the ILOM CLI through the NETMGT Ethernet port and enter the following commands:

-> cd /SP/network/interconnect
-> set hostmanaged=false
-> set allowed_services=fault-transport,ipmi,snmp,ssh
-> set hostmanaged=true 

Bug 26953763

6.1.10 incorrect opcode Messages in the Console Log

Any installed packages that use the mstflint command with a device (-d flag) format using the PCI ID will generate the mst_ioctl 1177: incorrect opcode = 8008d10 error message. Messages similar to the following appear in the console log:

Sep 26 09:50:12 ovcacn10r1 kernel: [  218.707917]   MST::  : print_opcode  549: MST_PARAMS=8028d001 
Sep 26 09:50:12 ovcacn10r1 kernel: [  218.707919]   MST::  : print_opcode  551: PCICONF_READ4=800cd101 
Sep 26 09:50:12 ovcacn10r1 kernel: [  218.707920]   MST::  : print_opcode  552: PCICONF_WRITE4=400cd102 

This issue is caused by an error in the PCI memory mapping associated with the InfiniBand ConnectX device. The messages can be safely ignored, the reported error has no impact on PCA functionality.

Workaround: Using mstflint, access the device from the PCI configuration interface, instead of the PCI ID.

[root@ovcamn06r1 ~]# mstflint -d /proc/bus/pci/13/00.0 q
Image type: FS2
FW Version: 2.11.1280
Device ID: 4099
HW Access Key: Disabled
Description: Node Port1 Port2 Sysimage
GUIDs: 0010e0000159ed0c 0010e0000159ed0d 0010e0000159ed0e 0010e0000159ed0f
MACs: 0010e059ed0d 0010e059ed0e
VSD:
PSID: ORC1090120019 

Bug 29623624

6.1.11 Megaraid Firmware Crash Dump Is Not Available

ILOM console logs may contain many messages similar to this:

[ 1756.232496] megaraid_sas 0000:50:00.0: Firmware crash dump is not available
[ 1763.578890] megaraid_sas 0000:50:00.0: Firmware crash dump is not available
[ 2773.220852] megaraid_sas 0000:50:00.0: Firmware crash dump is not available

These are notifications, not errors or warnings. The crash dump feature in the megaraid controller firmware is not enabled, as it is not required in Oracle Private Cloud Appliance.

Workaround: This behavior is not a bug. No workaround is required.

Bug 30274703

6.1.12 North-South Traffic Connectivity Fails After Restarting Network

This issue may occur if you have not up upgraded the Cisco Switch firmware to version NX-OS I7(7) or later. See Upgrading the Cisco Switch Firmware in the Oracle Private Cloud Appliance Administrator's Guide

Bug 29585636

6.1.13 Some Services Require an Upgrade of Hardware Management Pack

Certain secondary services running on Oracle Private Cloud Appliance, such as Oracle Auto Service Request or the Oracle Enterprise Manager Agent, depend on a specific or minimum version of the Oracle Hardware Management Pack. By design, the Controller Software upgrade does not include the installation of a new Oracle Hardware Management Pack or server ILOM version included in the ISO image. This may leave the Hardware Management Pack in a degraded state and not fully compatible with the ILOM version running on the servers.

Workaround: When upgrading the Oracle Private Cloud Appliance Controller Software, make sure that all component firmware matches the qualified versions for the installed Controller Software release. To ensure correct operation of services depending on the Oracle Hardware Management Pack, make sure that the relevant oracle-hmp*.rpm packages are upgraded to the versions delivered in the Controller Software ISO.

Bug 30123062