5.1 Oracle Private Cloud Appliance Hardware

This section describes hardware-related limitations and workarounds.

5.1.1 Oracle Server X7-2 External Layout and Connectivity

As of Release 2.3.2 the Oracle Private Cloud Appliance Controller Software provides support for Oracle Server X7-2 compute nodes. Their function is identical to that of the Oracle Server X6-2, and you should manipulate them in exactly the same way. However, the front and back panel layout of the new hardware is slightly different.

After installing an Oracle Server X7-2 into the appliance rack, it should be connected using the same pre-installed cables and the equivalent ports on the back of the server. The following two images illustrate the subtle differences between both server models. The letters indicate where the required cables must be connected, and the figure legend below identifies the ports and their purpose.

Figure 5.1 Oracle Server X7-2 Cabling Requirements

Figure showing the rear panel of an Oracle Server X7-2 compute node. The call-outs identify the required cable connections.

Figure 5.2 Oracle Server X6-2 Cabling Requirements

Figure showing the rear panel of an Oracle Server X6-2 compute node. The call-outs identify the required cable connections.

Table 5.1 Figure Legend

Item

Description

A

Power supplies

B

InfiniBand connectors

C

NET0 Ethernet port


5.1.2 Change in Expansion Rack Support

The ability to add expansion racks to the Oracle Private Cloud Appliance base rack is no longer available.

5.1.3 Compute Node Boot Sequence Interrupted by LSI Bios Battery Error

When a compute node is powered off for an extended period of time, a week or longer, the LSI BIOS may stop because of a battery error, waiting for the user to press a key in order to continue.

Workaround: Wait for approximately 10 minutes to confirm that the compute node is stuck in boot. Use the Reprovision button in the Oracle PCA Dashboard to reboot the server and restart the provisioning process.

Bug 16985965

5.1.4 Management Node Network Interfaces Are Down After System Restart

If the Oracle PCA needs to be powered down and restarted for maintenance or in the event of a power failure, the components should come back up in this order: first networking, then storage, and then the management and compute nodes. For detailed instructions to bring the appliance offline and return it to operation, refer to the section Powering Down Oracle Private Cloud Appliance in the Oracle Private Cloud Appliance Administrator's Guide.

It may occur that the management nodes complete their boot sequence before the appliance network configuration is up. In that case, the management nodes are unreachable because their bond0 and bond2 interfaces are down.

Workaround: Reboot the management nodes again. When they come back online, their network interfaces should be correctly configured.

Bug 17648454

5.1.5 Only One Oracle Switch ES1-24 May Be Connected Upstream

Do not connect port 24 of both internal Oracle Switch ES1-24 Ethernet switches to the next-level switch of your data center network. This causes spanning tree issues and provisioning failures. Only one IP address in this appliance management network range is available for customer use: 192.168.4.254. Make sure that the upstream network is configured to protect the appliance management network against DHCP leaks.

The upstream link from an Oracle Switch ES1-24 may only be used for out-of-band management of the Oracle PCA. It must never be used as a data path for virtual machines.

Workaround: Connect only one Oracle Switch ES1-24 (port 24) to the next-level data center switch. If provisioning failures have occurred, reprovision the affected compute nodes.

Bug 21554535

5.1.6 Removing I/O Cards from Fabric Interconnect Slots Is Not Supported

Once an I/O card has been installed and initialized in an expansion slot of an Oracle Fabric Interconnect F1-15, that slot can no longer be left empty. If this does occur, the Fabric Interconnect generates errors containing "Unsupported IO Card state resourceMissing", and prevents the normal operation of the entire appliance.

Replacing a defective component with a new one of the same type is supported. Removing the I/O card and leaving the slot empty is not allowed.

Bug 25918553

5.1.7 Sun ZFS Storage Appliance 7320 Firmware Upgrade Must Be Performed After Management Node Update to Release 2.0.2

The Oracle PCA Release 2.0.2 software contains firmware upgrades for a number of appliance hardware components. If you are using a base rack with Sun Server X3-2 management nodes and Sun ZFS Storage Appliance 7320, the firmware upgrade is likely to cause storage connectivity issues.

Workaround: Make sure that the Release 2.0.2 software is installed on the management nodes before you upgrade the ZFS storage appliance firmware.

Bug 20319302

5.1.8 ZFS Storage Appliance Firmware Upgrade and Network Configuration Fail with Appliance Software Release 2.1.1 or 2.2.1

During the controller software update from Release 2.0.5 to Release 2.1.1 or 2.2.1 an automated firmware upgrade takes place on the ZFS Storage Appliance. This upgrade, and the network configuration of the storage appliance, could fail if there is another user or process taking control of the console. Also, it has been observed that the text strings passed in the Pexpect commands sometimes have wrong or missing characters, which result in configuration errors.

Workaround: Make sure that there is no other activity on the ZFS Storage Appliance console, and that any external backup activity is suspended for the duration of the software update and firmware upgrade. If the firmware upgrade fails, retrying the same procedure could resolve the problem.

Bug 22269393

5.1.9 Interruption of iSCSI Connectivity Leads to LUNs Remaining in Standby

If network connectivity between compute nodes and their LUNs is disrupted, it may occur that one or more compute nodes mark one or more iSCSI LUNs as being in standby state. The system cannot automatically recover from this state without operations requiring downtime, such as rebooting VMs or even rebooting compute nodes. The standby LUNs are caused by the specific methods that the Linux kernel and the ZFS Storage Appliance use to handle failover of LUN paths.

Workaround: As the root cause has been identified, an update of the ZFS Storage Appliance firmware is being developed and tested. Until the new firmware is released, customers who have run into issues with missing LUN paths and standby LUNs, are advised not to upgrade Oracle PCA. The new firmware is likely to be released independently, not as part of the Oracle PCA Controller Software ISO.

Bug 24522087

5.1.10 Catastrophic Failure of ZFS Storage Appliance Controller Causes Management Node Fencing

The shared ocfs2 file system used as cluster heartbeat device by the management node cluster is located on the ZFS Storage Appliance. In the event of a catastrophic failure of a storage controller, the standby controller needs several minutes to take over all storage services. The downtime is likely to exceed the heartbeat limit, which causes the management nodes to begin the fencing process and eventually shut down.

Workaround: Follow the recovery procedure in the section Recovering From a Catastrophic Storage Controller Failure in the Oracle Private Cloud Appliance Administrator's Guide.

Bug 25410225

5.1.11 ILOM Service Processor Clocks Are Out-of-Sync

Most Oracle PCA components are equipped with an Oracle Integrated Lights Out Manager (ILOM). Each ILOM Service Processor (SP) contains its own clock, which is synchronized with the operating system (OS) clock before it leaves the factory. However, when new expansion nodes are installed or when parts in a component have been repaired or replaced, SP clocks could be out-of-sync. The problem may also be the result of a configuration error or normal clock drift.

If necessary, the SP clock can be synchronized manually. There is no need to continually update the hardware clock, because it only serves as a reference point for the host OS. Once the systems are up and running the OS obtains the correct time through NTP.

Workaround: After configuring the NTP server in the Oracle PCA Dashboard, synchronize the ILOM SPs with the OS clock. The easiest way is to log into the host and run this command: hwclock --systohc.

Bug 17664050

5.1.12 Compute Node ILOM Firmware Causes Provisioning Failure and OSA Image Corruption with Oracle PCA Release 2.3.x Controller Software

Certain versions of the ILOM firmware installed on Oracle Server X5-2, Sun Server X4-2, or Sun Server X3-2, have Oracle System Assistant (OSA) enabled by default. In combination with Oracle PCA Controller Software Release 2.3.x, this setting is not permitted for Oracle PCA compute nodes, as it exposes an additional target disk for the operating system installation. However, in some cases OSA cannot be disabled or is re-enabled automatically.

If a compute node is provisioned with an ILOM firmware in this state, while Oracle PCA is running Controller Software Release 2.3.x, the Oracle VM Server installation will fail or at least be incorrect, and the OSA image becomes corrupted even when disabled. Therefore, it is critical that you upgrade the ILOM firmware before you start the Oracle PCA Release 2.3.x Controller Software update to prevent provisioning failures.

Note

For this reason, ILOM firmware version 3.2.4.52 is not supported on Oracle Server X5-2 nodes.

Workaround: Before starting the Oracle PCA Release 2.3.x Controller Software update, upgrade the ILOM firmware on all compute nodes to version 3.2.4.68 or newer. If provisioning has corrupted the OSA image, follow the recovery procedure in the section Rebuilding a Corrupted Compute Node OSA Image in the Oracle Private Cloud Appliance Administrator's Guide.

Bug 25392805

5.1.13 ILOM Firmware Does Not Allow Loopback SSH Access

In Oracle Integrated Lights Out Manager (ILOM) firmware releases newer than 3.2.4, the service processor configuration contains a field, called allowed_services, to control which services are permitted on an interface. By default, SSH is not permitted on the loopback interface. However, Oracle Enterprise Manager uses this mechanism to register Oracle PCA management nodes. Therefore, SSH must be enabled manually if the ILOM version is newer than 3.2.4.

Workaround: On management nodes running an ILOM version more recent than 3.2.4, make sure that SSH is included in the allowed_services field of the network configuration. Log into the ILOM CLI through the NETMGT Ethernet port and enter the following commands:

-> cd /SP/network/interconnect
-> set hostmanaged=false
-> set allowed_services=fault-transport,ipmi,snmp,ssh
-> set hostmanaged=true 

Bug 26953763