6.2 Oracle Private Cloud Appliance Software

6.2.1 Do Not Install Additional Software on Appliance Components
6.2.2 Do Not Reconfigure Network During Compute Node Provisioning or Upgrade
6.2.3 Nodes Attempt to Synchronize Time with the Wrong NTP Server
6.2.4 Node Manager Does Not Show Node Offline Status
6.2.5 Compute Node State Changes Despite Active Provisioning Lock
6.2.6 Compute Nodes Are Available in Oracle VM Server Pool Before Provisioning Completes
6.2.7 Reprovisioning or Upgrading a Compute Node Hosting Virtual Machines Leads to Errors
6.2.8 Virtual Machines Remain in Running Status when Host Compute Node Is Reprovisioned
6.2.9 Provisioning Is Slow in Systems with Many VMs and VLANs
6.2.10 Management Nodes Have Non-Functional bond0 Network Interface
6.2.11 Network Performance Is Impacted by VxLAN Encapsulation
6.2.12 Altering Custom Network VLAN Tag Is Not Supported
6.2.13 Host Network Parameter Validation Is Too Permissive
6.2.14 Virtual Appliances Cannot Be Imported Over a Host Network
6.2.15 Customizations for ZFS Storage Appliance in multipath.conf Are Not Supported
6.2.16 Customer Created LUNs Are Mapped to the Wrong Initiator Group
6.2.17 Storage Head Failover Disrupts Running Virtual Machines
6.2.18 Oracle VM Manager Fails to Restart after Restoring a Backup Due to Password Mismatch
6.2.19 Changing Multiple Component Passwords Causes Authentication Failure in Oracle VM Manager
6.2.20 ILOM Password of Expansion Compute Nodes Is Not Synchronized During Provisioning
6.2.21 SSH Host Key Mismatch After Management Node Failover
6.2.22 External Storage Cannot Be Discovered Over Data Center Network
6.2.23 High Network Load with High MTU May Cause Time-Out and Kernel Panic in Compute Nodes
6.2.24 User Interface Does Not Support Internet Explorer 10 and 11
6.2.25 Mozilla Firefox Cannot Establish Secure Connection with User Interface
6.2.26 Authentication Error Prevents Oracle VM Manager Login
6.2.27 Virtual Machine with High Availability Takes Five Minutes to Restart when Failover Occurs
6.2.28 Compute Node CPU Load at 100 Percent Due to Hardware Management Daemon
6.2.29 CLI Command configure Results in Failure
6.2.30 CLI Command update appliance Deprecated
6.2.31 Adding the Virtual Machine Role to the Storage Network Causes Cluster to Lose Heartbeat Networking
6.2.32 Adding Virtual Machine Role to the Management Network Causes Oracle VM Manager to Lose Contact with the Compute Nodes

This section describes software-related limitations and workarounds.

6.2.1 Do Not Install Additional Software on Appliance Components

Oracle Private Cloud Appliance is delivered as an appliance: a complete and controlled system composed of selected hardware and software components. If you install additional software packages on the pre-configured appliance components, be it a compute node, management node or storage component, you introduce new variables that potentially disrupt the operation of the appliance as a whole. Unless otherwise instructed, Oracle advises against the installation of additional packages, either from a third party or from Oracle's own software channels like the Oracle Linux YUM repositories.

Workaround: Do not install additional software on any internal Oracle Private Cloud Appliance system components. If your internal processes require certain additional tools, contact your Oracle representative to discuss these requirements.

6.2.2 Do Not Reconfigure Network During Compute Node Provisioning or Upgrade

In the Oracle Private Cloud Appliance Dashboard, the Network Setup tab becomes available when the first compute node has been provisioned successfully. However, when installing and provisioning a new system, you must wait until all nodes have completed the provisioning process before changing the network configuration. Also, when provisioning new nodes at a later time, or when upgrading the environment, do not apply a new network configuration before all operations have completed. Failure to follow these guidelines is likely to leave your environment in an indeterminate state.

Workaround: Before reconfiguring the system network settings, make sure that no provisioning or upgrade processes are running.

Bug 17475738

6.2.3 Nodes Attempt to Synchronize Time with the Wrong NTP Server

External time synchronization, based on ntpd, is left in default configuration at the factory. As a result, NTP does not work when you first power on the Oracle Private Cloud Appliance, and you may find messages in system logs similar to these:

Oct  1 11:20:33 ovcamn06r1 kernel: o2dlm: Joining domain ovca ( 0 1 ) 2 nodes
Oct  1 11:20:53 ovcamn06r1 ntpd_initres[3478]: host name not found:0.rhel.pool.ntp.org
Oct  1 11:20:58 ovcamn06r1 ntpd_initres[3478]: host name not found:1.rhel.pool.ntp.org
Oct  1 11:21:03 ovcamn06r1 ntpd_initres[3478]: host name not found:2.rhel.pool.ntp.org

Workaround: Apply the appropriate network configuration for your data center environment, as described in the section Network Setup in the Oracle Private Cloud Appliance Administrator's Guide. When the data center network configuration is applied successfully, the default values for NTP configuration are overwritten and components will synchronize their clocks with the source you entered.

Bug 17548941

6.2.4 Node Manager Does Not Show Node Offline Status

The role of the Node Manager database is to track the various states a compute node goes through during provisioning. After successful provisioning the database continues to list a node as running, even if it is shut down. For nodes that are fully operational, the server status is tracked by Oracle VM Manager. However, the Oracle Private Cloud Appliance Dashboard displays status information from the Node Manager. This may lead to inconsistent information between the Dashboard and Oracle VM Manager, but it is not considered a bug.

Workaround: To verify the status of operational compute nodes, use the Oracle VM Manager user interface.

Bug 17456373

6.2.5 Compute Node State Changes Despite Active Provisioning Lock

The purpose of a lock of the type provisioning or all_provisioning is to prevent all compute nodes from starting or continuing a provisioning process. However, when you attempt to reprovision a running compute node from the Oracle PCA CLI while an active lock is in place, the compute node state changes to "reprovision_only" and it is marked as "DEAD". Provisioning of the compute node continues as normal when the provisioning lock is deactivated.

Bug 22151616

6.2.6 Compute Nodes Are Available in Oracle VM Server Pool Before Provisioning Completes

Compute node provisioning can take up to several hours to complete. However, those nodes are added to the Oracle VM server pool early on in the process, but they are not placed in maintenance mode. In theory the discovered servers are available for use in Oracle VM Manager, but you must not attempt to alter their configuration in any way before the Oracle Private Cloud Appliance Dashboard indicates that provisioning has completed.

Workaround: Wait for compute node provisioning to finish. Do not modify the compute nodes or server pool in any way in Oracle VM Manager.

Bug 22159111

6.2.7 Reprovisioning or Upgrading a Compute Node Hosting Virtual Machines Leads to Errors

Reprovisioning or upgrading a compute node that hosts virtual machines (VMs) is considered bad practice. Good practice is to migrate all VMs away from the compute node before starting a reprovisioning operation or software update. At the start of the reprovisioning, the removal of the compute node from its server pool could fail partially, due to the presence of configured VMs that are either running or powered off. When the compute node returns to normal operation after reprovisioning, it could report failures related to server pool configuration and storage layer operations. As a result, both the compute node and its remaining VMs could end up in an error state. There is no straightforward recovery procedure.

Workaround: Avoid upgrade and reprovisioning issues due to existing VM configurations by migrating all VMs away from their host first.

Bug 23563071

6.2.8 Virtual Machines Remain in Running Status when Host Compute Node Is Reprovisioned

Using the Oracle Private Cloud Appliance CLI it is possible to force the reprovisioning of a compute node even if it is hosting running virtual machines. The compute node is not placed in maintenance mode. Consequently, the active virtual machines are not shut down or migrated to another compute node. Instead these VMs remain in running status and Oracle VM Manager reports their host compute node as "N/A".

Caution

Reprovisioning a compute node that hosts virtual machines is considered bad practice. Good practice is to migrate all virtual machines away from the compute node before starting a reprovisioning operation or software update.

Workaround: In this particular condition the VMs can no longer be migrated. They must be killed and restarted. After a successful restart they return to normal operation on a different host compute node in accordance with start policy defined for the server pool.

Bug 22018046

6.2.9 Provisioning Is Slow in Systems with Many VMs and VLANs

As the Oracle VM environment grows and contains more and more virtual machines and many different VLANs connecting them, the number of management operations and registered events increases rapidly. In a system with this much activity the provisioning of a compute node takes significantly longer, because the provisioning tasks run through the same management node where Oracle VM Manager is active. There is no impact on functionality, but the provisioning tasks can take several hours to complete.

There is no workaround to speed up the provisioning of a compute node when the entire system is under heavy load. It is recommended to perform compute node provisioning at a time when system activity is at its lowest.

Bug 22159038 and 22085580

6.2.10 Management Nodes Have Non-Functional bond0 Network Interface

When the driver for network interface bonding is loaded, the system automatically generates a default bond0 interface. However, this interface is not activated or used in the management nodes of an Oracle Private Cloud Appliance with the all-Ethernet network architecture.

Workaround: The bond0 interface is not configured in any usable way and can be ignored.

Bug 29559810

6.2.11 Network Performance Is Impacted by VxLAN Encapsulation

The design of the all-Ethernet network fabric in Oracle Private Cloud Appliance relies heavily on VxLAN encapsulation and decapsulation. This extra protocol layer requires additional CPU cycles and consequently reduces network performance compared to regular tagged or untagged traffic. In particular the connectivity to and from VMs can be affected. To compensate for the CPU load of VxLAN processing, the MTU (Maximum Transmission Unit) on VM networks can be increased to 9000 bytes, which is the setting across the standard appliance networks. However, the network paths should be analyzed carefully to make sure that the larger MTU setting is supported between the end points: if an intermediate network device only supports an MTU of 1500 bytes, then the fragmentation of the 9000 byte packets will result in a bigger performance penalty.

Workaround: If the required network performance cannot be obtained with a default MTU of 1500 bytes for regular VM traffic, you should consider increasing the MTU to 9000 bytes; on the VM network and inside the VM itself. For guidance on this subject, refer to the Oracle white paper on 10GbE network performance tuning in Oracle VM 3.

Bug 29664090

6.2.12 Altering Custom Network VLAN Tag Is Not Supported

When you create a custom network, it is technically possible – though not supported – to alter the VLAN tag in Oracle VM Manager. However, when you attempt to add a compute node, the system creates the network interface on the server but fails to enable the modified VLAN configuration. At this point the custom network is stuck in a failed state: neither the network nor the interfaces can be deleted, and the VLAN configuration can no longer be changed back to the original tag.

Workaround: Do not modify appliance-level networking in Oracle VM Manager. There are no documented workarounds and any recovery operation is likely to require significant downtime of the Oracle Private Cloud Appliance environment.

Bug 23250544

6.2.13 Host Network Parameter Validation Is Too Permissive

When you define a host network, it is possible to enter invalid or contradictory values for the Prefix, Netmask and Route_Destination parameters. For example, when you enter a prefix with "0" as the first octet, the system attempts to configure IP addresses on compute node Ethernet interfaces starting with 0. Also, when the netmask part of the route destination you enter is invalid, the network is still created, even though an exception occurs. When such a poorly configured network is in an invalid state, it cannot be reconfigured or deleted with standard commands.

Workaround: Double-check your CLI command parameters before pressing Enter. If an invalid network configuration is applied, use the --force option to delete the network.

Bug 25729227

6.2.14 Virtual Appliances Cannot Be Imported Over a Host Network

A host network provides connectivity between compute nodes and hosts external to the appliance. It is implemented to connect external storage to the environment. If you attempt to import a virtual appliance, also known as assemblies in previous releases of Oracle VM and Oracle Private Cloud Appliance, from a location on the host network, it is likely to fail, because Oracle VM Manager instructs the compute nodes to use the active management node as a proxy for the import operation.

Workaround: Make sure that the virtual appliance resides in a location accessible from the active management node.

Bug 25801215

6.2.15 Customizations for ZFS Storage Appliance in multipath.conf Are Not Supported

The ZFS stanza in multipath.conf is controlled by the Oracle Private Cloud Appliance software. The internal ZFS Storage Appliance is a critical component of the appliance and the multipath configuration is tailored to the internal requirements. You should never modify the ZFS parameters in multipath.conf, because it could adversely affect the appliance performance and functionality.

Even if customizations were applied for (external) ZFS storage, they are overwritten when the Oracle Private Cloud Appliance Controller Software is updated. A backup of the file is saved prior to the update. Customizations in other stanzas of multipath.conf, for storage devices from other vendors, are preserved during upgrades.

Bug 25821423

6.2.16 Customer Created LUNs Are Mapped to the Wrong Initiator Group

When adding LUNs on the Oracle Private Cloud Appliance internal ZFS Storage Appliance you must add them under the "OVM" target group. Only this default target group is supported; there can be no additional target groups. However, on the initiator side you should not use the default configuration, otherwise all LUNs are mapped to the "All Initiators" group, and accessible for all nodes in the system. Such a configuration may cause several problems within the appliance.

Additional, custom LUNs on the internal storage must instead be mapped to one or more custom initiator groups. This ensures that the LUNs are mapped to the intended initiators, and are not remapped by the appliance software to the default "All Initiators" group.

Workaround: When creating additional, custom LUNs on the internal ZFS Storage Appliance, always use the default target group, but make sure the LUNs are mapped to one or more custom initiator groups.

Bugs 22309236 and 18155778

6.2.17 Storage Head Failover Disrupts Running Virtual Machines

When a failover occurs between the storage heads of a ZFS Storage Appliance, virtual machine operation could be disrupted by temporary loss of disk access. Depending on the guest operating system, and on the configuration of the guest and Oracle VM, a VM could hang, power off or reboot. This behavior is caused by an iSCSI configuration parameter that does not allow sufficient recovery time for the storage failover to complete.

Workaround: Increase the value of node.session.timeo.replacement_timeout in the file /etc/iscsi/iscsid.conf. For details, refer to the support note with Doc ID 2189806.1.

Bug 24439070

6.2.18 Oracle VM Manager Fails to Restart after Restoring a Backup Due to Password Mismatch

If you have changed the password for Oracle VM Manager or its related components Oracle WebLogic Server and Oracle MySQL database, and you need to restore the Oracle VM Manager from a backup that was made prior to the password change, the passwords will be out of sync. As a result of this password mismatch, Oracle VM Manager cannot connect to its database and cannot be started.

Workaround: Follow the instructions in the section Restoring a Backup After a Password Change in the Oracle Private Cloud Appliance Administrator's Guide.

Bug 19333583

6.2.19 Changing Multiple Component Passwords Causes Authentication Failure in Oracle VM Manager

When several different passwords are set for different appliance components using the Oracle Private Cloud Appliance Dashboard, you could be locked out of Oracle VM Manager, or communication between Oracle VM Manager and other components could fail, as a result of authentication failures. The problem is caused by a partially failed password update, whereby a component has accepted the new password while another component continues to use the old password to connect.

The risk of authentication issues is considerably higher when Oracle VM Manager and its directly related components Oracle WebLogic Server and Oracle MySQL database are involved. A password change for these components requires the ovmm service to restart. If another password change occurs within a matter of a few minutes, the operation to update Oracle VM Manager accordingly could fail because the ovmm service was not active. An authentication failure will prevent the ovmm service from restarting.

Workaround: If you set different passwords for appliance components using the Oracle Private Cloud Appliance Dashboard, change them one by one with a 10 minute interval. If the ovmm service is stopped as a result of a password change, wait for it to restart before making further changes. If the ovmm service fails to restart due to authentication issues, it may be necessary to replace the file /nfs/shared_storage/wls1/servers/AdminServer/security/boot.properties with the previous version of the file (boot.properties.old).

Bug 26007398

6.2.20 ILOM Password of Expansion Compute Nodes Is Not Synchronized During Provisioning

After the rack components have been configured with a custom password, any compute node ILOM of a newly installed expansion compute node does not automatically take over the password set by the user in the Wallet. The compute node provisions correctly, and the Wallet maintains access to its ILOM even though it uses the factory-default password. However, it is good practice to make sure that custom passwords are correctly synchronized across all components.

Workaround: Set or update the compute node ILOM password using the Oracle Private Cloud Appliance Dashboard or CLI. This sets the new password both in the Wallet and the compute node ILOM.

Bug 26143197

6.2.21 SSH Host Key Mismatch After Management Node Failover

When logging in to the active management node using SSH, you typically use the virtual IP address shared between both management nodes. However, since they are separate physical hosts, they have a different host key. If the host key is stored in the SSH client, and a failover to the secondary management node occurs, the next attempt to create an SSH connection through the virtual IP address results in a host key verification failure.

Workaround: Do not store the host key in the SSH client. If the key has been stored, remove it from the client's file system; typically inside the user directory in .ssh/known_hosts.

Bug 22915408

6.2.22 External Storage Cannot Be Discovered Over Data Center Network

The default compute node configuration does not allow connectivity to additional storage resources in the data center network. Compute nodes are connected to the data center subnet to enable public connectivity for the virtual machines they host, but the compute nodes' network interfaces have no IP address in that subnet. Consequently, SAN or file server discovery will fail.

Bug 17508885

6.2.23 High Network Load with High MTU May Cause Time-Out and Kernel Panic in Compute Nodes

When network throughput is very high, certain conditions, like a large number of MTU 9000 streams, have been known to cause a kernel panic in a compute node. In that case, /var/log/messages on the affected compute node contains entries like "Task Python:xxxxx blocked for more than 120 seconds". As a result, HA virtual machines may not have been migrated in time to another compute node. Usually compute nodes return to their normal operation automatically.

Workaround: If HA virtual machines have not been live-migrated off the affected compute node, log into Oracle VM Manager and restart the virtual machines manually. If an affected compute node does not return to normal operation, restart it from Oracle VM Manager.

Bugs 20981004 and 21841578

6.2.24 User Interface Does Not Support Internet Explorer 10 and 11

Oracle PCA Release 2.4.1 uses the Oracle Application Development Framework (ADF) version 11.1.1.2.0 for both the Dashboard and the Oracle VM Manager user interface. This version of ADF does not support Microsoft Internet Explorer 10 or 11.

Workaround: Use Internet Explorer 9 or a different web browser; for example Mozilla Firefox.

Bug 18791952

6.2.25 Mozilla Firefox Cannot Establish Secure Connection with User Interface

Both the Oracle Private Cloud Appliance Dashboard and the Oracle VM Manager user interface run on an architecture based on Oracle WebLogic Server, Oracle Application Development Framework (ADF) and Oracle JDK 6. The cryptographic protocols supported on this architecture are SSLv3 and TLSv1.0. Mozilla Firefox version 38.2.0 or later no longer supports SSLv3 connections with a self-signed certificate. As a result, an error message might appear when you try to open the user interface login page.

Workaround: Override the default Mozilla Firefox security protocol as follows:

  1. In the Mozilla Firefox address bar, type about:config to access the browser configuration.

  2. Acknowledge the warning about changing advanced settings by clicking I'll be careful, I promise!.

  3. In the list of advanced settings, use the Search bar to filter the entries and look for the settings to be modified.

  4. Double-click the following entries and then enter the new value to change the configuration preferences:

    • security.tls.version.fallback-limit: 1

    • security.ssl3.dhe_rsa_aes_128_sha: false

    • security.ssl3.dhe_rsa_aes_256_sha: false

  5. If necessary, also modify the configuration preference security.tls.insecure_fallback_hosts and enter the affected hosts as a comma-separated list, either as domain names or as IP addresses.

  6. Close the Mozilla Firefox advanced configuration tab. The pages affected by the secure connection failure should now load normally.

Bug 21622475 and 21803485

6.2.26 Authentication Error Prevents Oracle VM Manager Login

In environments with a large number of virtual machines and frequent connections through the VM console of Oracle VM Manager, the browser UI login to Oracle VM Manager may fail with an "unexpected error during login". A restart of the ovmm service is required.

Workaround: From the Oracle Linux shell of the master management node, restart the ovmm service by entering the command service ovmm restart. You should now be able to log into Oracle VM Manager again.

Bug 19562053

6.2.27 Virtual Machine with High Availability Takes Five Minutes to Restart when Failover Occurs

The compute nodes in an Oracle Private Cloud Appliance are all placed in a single clustered server pool during provisioning. A clustered server pool is created as part of the provisioning process. One of the configuration parameters is the cluster time-out: the time a server is allowed to be unavailable before failover events are triggered. To avoid false positives, and thus unwanted failovers, the Oracle Private Cloud Appliance server pool time-out is set to 300 seconds. As a consequence, a virtual machine configured with high availability (HA VM) can be unavailable for 5 minutes when its host fails. After the cluster time-out has passed, the HA VM is automatically restarted on another compute node in the server pool.

This behavior is as designed; it is not a bug. The server pool cluster configuration causes the delay in restarting VMs after a failover has occurred.

6.2.28 Compute Node CPU Load at 100 Percent Due to Hardware Management Daemon

The Hardware Management daemon, which runs as the process named hwmgmtd, can sometimes consume a large amount of CPU capacity. This tends to become worse over time and eventually reach 100 percent. As a direct result, the system becomes less responsive over time.

Workaround: If you find that CPU load on a compute node is high, log in to its Oracle Linux shell and use the top command to check if hwmgmtd is consuming a lot of CPU capacity. If so, restart the daemon by entering the command /sbin/service hwmgmtd restart.

Bug 23174421

6.2.29 CLI Command configure Results in Failure

The Oracle Private Cloud Appliance CLI command configure has no targets in Release 2.4.1. If you attempt to execute the command, it returns a failure with no error message. This is expected behavior for an inactive command.

Workaround: The configure command must remain for backward compatibility. Since the command is inactive, it should not be executed. The failure status can be ignored.

Bug 29624091

6.2.30 CLI Command update appliance Deprecated

The Oracle Private Cloud Appliance command line interface contains the update appliance command, which is used in releases prior to 2.3.4 to unpack a Controller Software image and update the appliance with a new software stack. This functionality is now part of the Upgrader tool, so the CLI command is deprecated and will be removed in the next release.

Workaround: Future updates and upgrades will be executed through the Oracle Private Cloud Appliance Upgrader.

Bug 29913246

6.2.31 Adding the Virtual Machine Role to the Storage Network Causes Cluster to Lose Heartbeat Networking

Attempting to add the Virtual Machine role to the storage network in Oracle VM Manger on an Oracle Private Cloud Appliance can cause your cluster to lose heartbeat networking, which will impact running Virtual Machines and their workloads. This operation is not supported on Oracle Private Cloud Appliance.

Workaround: Do not add the VM role to the storage-int network.

Bug 30936974

6.2.32 Adding Virtual Machine Role to the Management Network Causes Oracle VM Manager to Lose Contact with the Compute Nodes

Attempting to add the Virtual Machine role to the management network in Oracle VM Manger on an Oracle Private Cloud Appliance causes you to lose connectivity with your compute nodes. The compute nodes are still up, however your manager can not communicate with the compute nodes, which leaves your rack in a degraded state. This operation is not supported on Oracle Private Cloud Appliance.

Workaround: Do not add the VM role to the mgmt-int network.

Bug 30937049