- 6.1.1 Compute Node Boot Sequence Interrupted by LSI Bios Battery Error
- 6.1.2 Reboot From Oracle Linux Prompt May Cause Management Node to Hang
- 6.1.3 NM2-36P Sun Datacenter InfiniBand Expansion Switch Firmware Upgrade 2.2.9-3 Requires A Two-Phased Procedure
- 6.1.4 Oracle ZFS Storage Appliance Firmware Upgrade 8.7.20 Requires A Two-Phased Procedure
- 6.1.5 Interruption of iSCSI Connectivity Leads to LUNs Remaining in Standby
- 6.1.6 Emulex Fibre Channel HBAs Discover Maximum 256 LUN Paths
- 6.1.7 Fibre Channel LUN Path Discovery Is Disrupted by Other Oracle VM Operations
- 6.1.8 Poor Oracle VM Performance During Configuration of Fibre Channel LUNs
- 6.1.9 ILOM Firmware Does Not Allow Loopback SSH Access
- 6.1.10 incorrect opcode Messages in the Console Log
- 6.1.11 Megaraid Firmware Crash Dump Is Not Available
- 6.1.12 North-South Traffic Connectivity Fails After Restarting Network
- 6.1.13 Some Services Require an Upgrade of Hardware Management Pack
This section describes hardware-related limitations and workarounds.
When a compute node is powered off for an extended period of time, a week or longer, the LSI BIOS may stop because of a battery error, waiting for the user to press a key in order to continue.
Workaround: Wait for approximately 10 minutes to confirm that the compute node is stuck in boot. Use the Reprovision button in the Oracle Private Cloud Appliance Dashboard to reboot the server and restart the provisioning process.
Bug 16985965
When the reboot command is issued from the Oracle Linux command line on a management node, the operating system could hang during boot. Recovery requires manual intervention through the server ILOM.
Workaround: When the
management node hangs during (re-)boot, log in to the ILOM and
run these two commands in succession: stop -f
/SYS
and start /SYS
. The
management node should reboot normally.
Bug 28871758
Recent InfiniBand switches use a power supply that requires a
newer firmware version. Because some firmware versions may
cause the ILOM shell to hang, Oracle PCA requires that you
install firmware version 2.2.9-3. In this version, the ILOM
issue has been addressed by setting the parameter
polling_retry_number
to a value of 5.
Oracle PCA racks shipped prior to Release 2.3.4 all contain InfiniBand switches with firmware version 2.1.8-1 or older. Because the firmware has changed from unsigned to signed packages, there is no direct upgrade path to version 2.2.9-3. Therefore, an intermediate upgrade to unsigned version 2.2.7-2 is required.
Workaround: Upgrade the firmware of both NM2-36P Sun Datacenter InfiniBand Expansion Switches twice: first to version 2.2.7-2, then to version 2.2.9-3. Both required firmware versions are provided as part of the Oracle PCA Release 2.3.4 controller software. For upgrade instructions, refer to the section Upgrading the NM2-36P Sun Datacenter InfiniBand Expansion Switch Firmware in the Oracle Private Cloud Appliance Administrator's Guide.
Firmware version 2.2.7-2 is delivered as part of the 2.2.7-1 package. Instructions are in included in the readme file inside the 2.2.7-1 directory.
Bugs 27724015 and 27275899
Oracle PCA racks shipped prior to Release 2.3.4 have all been factory-installed with an older version of the Operating Software (AK-NAS) on the controllers of the ZFS Storage Appliance. A new version has been qualified for use with Oracle PCA Release 2.3.4, but a direct upgrade is not possible. An intermediate upgrade to version 8.7.14 is required.
Workaround: Upgrade the firmware of storage heads twice: first to version 8.7.14, then to version 8.7.20. Both required firmware versions are provided as part of the Oracle PCA Release 2.3.4 controller software. For upgrade instructions, refer to the section Upgrading the Operating Software on the Oracle ZFS Storage Appliance in the Oracle Private Cloud Appliance Administrator's Guide.
Bug 28913616
If network connectivity between compute nodes and their LUNs is disrupted, it may occur that one or more compute nodes mark one or more iSCSI LUNs as being in standby state. The system cannot automatically recover from this state without operations requiring downtime, such as rebooting VMs or even rebooting compute nodes. The standby LUNs are caused by the specific methods that the Linux kernel and the ZFS Storage Appliance use to handle failover of LUN paths.
Workaround: This issue was resolved in the ZFS Storage Appliance firmware version AK 8.7.6. Customers who have run into issues with missing LUN paths and standby LUNs, should update the ZFS Storage Appliance firmware to version AK 8.7.6 or later before upgrading Oracle Private Cloud Appliance.
Bug 24522087
When using optional Broadcom/Emulex Fibre Channel expansion cards in Oracle Server X8-2 compute nodes, and your FC configuration results in more than 256 LUN paths between the compute nodes and the FC storage hardware, it may occur that only 256 paths are discovered. This is typically caused by a driver parameter for Emulex HBAs.
Workaround: Update the Emulex lpcf driver settings by performing the steps below on each affected compute node.
On the compute node containing the Emulex card, modify the file
/etc/default/grub
. At the end of theGRUB_CMDLINE_LINUX
parameter, append thescsi_mod
andlpfc
module options shown.GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=vg/lvroot rd.lvm.lv=vg/lvswap \ rd.lvm.lv=vg/lvusr rhgb quiet numa=off transparent_hugepage=never \
scsi_mod.max_luns=4096
scsi_mod.max_report_luns=4096
lpfc.lpfc_max_luns=4096
"Rebuild the grub configuration with the new parameters.
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Reboot the compute node.
Bug 30461433
During the setup of Fibre Channel storage, when the zones on the FC switch have been created, the LUNs become visible to the connected compute nodes. Discovery operations are started automatically, and all discovered LUNs are added to the multipath configuration on the compute nodes. If the storage configuration contains a large number of LUNs, the multipath configuration may take a long time to complete. As long as the multipath configuration has not finished, the system is under high load, and concurrent Oracle VM operations may prevent some of the FC LUN paths from being added to multipath.
Workaround: It is preferred to avoid Oracle VM operations during FC LUN discovery. Especially all operations related to compute node provisioning and tenant group configuration are disruptive, because they include refreshing the storage layer. When LUNs become visible to the compute nodes, they are detected almost immediately. In contrast, the multipath configuration stage is time-consuming and resource-intensive.
Use the lsscsi
command to determine the
number of detected LUN paths. The command output is equal to
the number of LUN paths plus the system disk. Next, verify
that all paths have been added to multipath. The multipath
configuration is complete once the multipath
-ll
command output is equal to the output of the
lsscsi command minus 1 (for the system disk).
# lsscsi | wc -l 251 # multipath -ll | grep "active ready running" | wc -l 250
When you have established that the multipath configuration is complete, all Oracle VM operations can be resumed.
Bug 30461555
Discovering Fibre Channel LUNs is a time-consuming and resource-intensive operation. As a result, Oracle VM jobs take an unusually long time to complete. Therefore, it is advisable to complete the FC storage configuration and make sure that the configuration is stable before initiating new Oracle VM operations.
Workaround: Schedule Fibre Channel storage setup and configuration changes at a time when no other Oracle VM operations are required. Verify that all FC configuration jobs have been completed, as explained in Section 6.1.7, “Fibre Channel LUN Path Discovery Is Disrupted by Other Oracle VM Operations”. When the FC configuration is finished, all Oracle VM operations can be resumed.
Bug 30461478
In Oracle Integrated Lights Out Manager (ILOM) firmware releases newer than 3.2.4, the
service processor configuration contains a field, called
allowed_services
, to control which services
are permitted on an interface. By default, SSH is not
permitted on the loopback interface. However, Oracle Enterprise Manager uses this
mechanism to register Oracle Private Cloud Appliance management nodes.
Therefore, SSH must be enabled manually if the ILOM version is
newer than 3.2.4.
Workaround: On management
nodes running an ILOM version more recent than 3.2.4, make
sure that SSH is included in the
allowed_services
field of the network
configuration. Log into the ILOM CLI through the
NETMGT
Ethernet port and enter the
following commands:
-> cd /SP/network/interconnect -> set hostmanaged=false -> set allowed_services=fault-transport,ipmi,snmp,ssh -> set hostmanaged=true
Bug 26953763
Any installed packages that use the mstflint command with a device (-d flag) format using the PCI ID will generate the mst_ioctl 1177: incorrect opcode = 8008d10 error message. Messages similar to the following appear in the console log:
Sep 26 09:50:12 ovcacn10r1 kernel: [ 218.707917] MST:: : print_opcode 549: MST_PARAMS=8028d001 Sep 26 09:50:12 ovcacn10r1 kernel: [ 218.707919] MST:: : print_opcode 551: PCICONF_READ4=800cd101 Sep 26 09:50:12 ovcacn10r1 kernel: [ 218.707920] MST:: : print_opcode 552: PCICONF_WRITE4=400cd102
This issue is caused by an error in the PCI memory mapping associated with the InfiniBand ConnectX device. The messages can be safely ignored, the reported error has no impact on PCA functionality.
Workaround: Using mstflint, access the device from the PCI configuration interface, instead of the PCI ID.
[root@ovcamn06r1 ~]# mstflint -d /proc/bus/pci/13/00.0 q Image type: FS2 FW Version: 2.11.1280 Device ID: 4099 HW Access Key: Disabled Description: Node Port1 Port2 Sysimage GUIDs: 0010e0000159ed0c 0010e0000159ed0d 0010e0000159ed0e 0010e0000159ed0f MACs: 0010e059ed0d 0010e059ed0e VSD: PSID: ORC1090120019
Bug 29623624
ILOM console logs may contain many messages similar to this:
[ 1756.232496] megaraid_sas 0000:50:00.0: Firmware crash dump is not available [ 1763.578890] megaraid_sas 0000:50:00.0: Firmware crash dump is not available [ 2773.220852] megaraid_sas 0000:50:00.0: Firmware crash dump is not available
These are notifications, not errors or warnings. The crash dump feature in the megaraid controller firmware is not enabled, as it is not required in Oracle Private Cloud Appliance.
Workaround: This behavior is not a bug. No workaround is required.
Bug 30274703
This issue may occur if you have not up upgraded the Cisco Switch firmware to version NX-OS I7(7) or later. See Upgrading the Cisco Switch Firmware in the Oracle Private Cloud Appliance Administrator's Guide
Bug 29585636
Certain secondary services running on Oracle Private Cloud Appliance, such as Oracle Auto Service Request or the Oracle Enterprise Manager Agent, depend on a specific or minimum version of the Oracle Hardware Management Pack. By design, the Controller Software upgrade does not include the installation of a new Oracle Hardware Management Pack or server ILOM version included in the ISO image. This may leave the Hardware Management Pack in a degraded state and not fully compatible with the ILOM version running on the servers.
Workaround: When upgrading
the Oracle Private Cloud Appliance Controller Software, make sure that all
component firmware matches the qualified versions for the
installed Controller Software release. To ensure correct
operation of services depending on the Oracle
Hardware Management Pack, make sure that the relevant
oracle-hmp*.rpm
packages are upgraded to
the versions delivered in the Controller Software ISO.
Bug 30123062