3.2 Oracle Private Cloud Appliance Software

3.2.1 Do Not Reconfigure Network During Compute Node Provisioning or Upgrade
3.2.2 Nodes Attempt to Synchronize Time with the Wrong NTP Server
3.2.3 Unknown Symbol Warning during InfiniBand Driver Installation
3.2.4 Node Manager Does Not Show Node Offline Status
3.2.5 Update Functionality Not Available in Dashboard
3.2.6 Interrupting Download of Software Update Leads to Inconsistent Image Version and Leaves Image Mounted and Stored in Temporary Location
3.2.7 Compute Nodes Lose Oracle VM iSCSI LUNs During Software Update
3.2.8 Virtual Machine File Systems Become Read-Only after Storage Head Failover
3.2.9 Oracle VM Manager Tuning Settings Are Lost During Software Update
3.2.10 Oracle VM Manager Fails to Restart after Restoring a Backup Due to Password Mismatch
3.2.11 Oracle VM Java Processes Consume Large Amounts of Resources
3.2.12 External Storage Cannot Be Discovered Over Data Center Network
3.2.13 High Network Load with High MTU May Cause Time-Out and Kernel Panic in Compute Nodes
3.2.14 Oracle PCA Dashboard URL Is Not Redirected
3.2.15 User Interface Does Not Support Internet Explorer 10 and 11
3.2.16 Authentication Error Prevents Oracle VM Manager Login
3.2.17 Error Getting VM Stats in Oracle VM Agent Logs
3.2.18 Virtual Machine with High Availability Takes Five Minutes to Restart when Failover Occurs
3.2.19 The CLI Command show Accepts Non-Existent Targets As Parameters
3.2.20 Deleting a Running Task in the CLI Results in Errors
3.2.21 The CLI Command diagnose software Fails To Correctly Run Some Diagnostic Tests

This section describes software-related limitations and workarounds.

3.2.1 Do Not Reconfigure Network During Compute Node Provisioning or Upgrade

In the Oracle PCA Dashboard, the Network Setup tab becomes available when the first compute node has been provisioned successfully. However, when installing and provisioning a new system, you must wait until all nodes have completed the provisioning process before changing the network configuration. Also, when provisioning new nodes at a later time, or when upgrading the environment, do not apply a new network configuration before all operations have completed. Failure to follow these guidelines is likely to leave your environment in an indeterminate state.

Workaround: Before reconfiguring the system network settings, make sure that no provisioning or upgrade processes are running.

Bug 17475738

3.2.2 Nodes Attempt to Synchronize Time with the Wrong NTP Server

External time synchronization, based on ntpd , is left in default configuration at the factory. As a result, NTP does not work when you first power on the Oracle PCA, and you may find messages in system logs similar to these:

Oct  1 11:20:33 ovcamn06r1 kernel: o2dlm: Joining domain ovca ( 0 1 ) 2 nodes
Oct  1 11:20:53 ovcamn06r1 ntpd_initres[3478]: host name not found:0.rhel.pool.ntp.org
Oct  1 11:20:58 ovcamn06r1 ntpd_initres[3478]: host name not found:1.rhel.pool.ntp.org
Oct  1 11:21:03 ovcamn06r1 ntpd_initres[3478]: host name not found:2.rhel.pool.ntp.org

Workaround: Apply the appropriate network configuration for your data center environment, as described in the section Network Setup in the Oracle Private Cloud Appliance Administrator's Guide. When the data center network configuration is applied successfully, the default values for NTP configuration are overwritten and components will synchronize their clocks with the source you entered.

Bug 17548941

3.2.3 Unknown Symbol Warning during InfiniBand Driver Installation

Towards the end of the management node install.log file, the following warnings appear:

> WARNING:
> /lib/modules/2.6.39-300.32.1.el6uek.x86_64/kernel/drivers/infiniband/ \
> hw/ipath/ib_ipath.ko needs unknown symbol ib_wq
> WARNING:
> /lib/modules/2.6.39-300.32.1.el6uek.x86_64/kernel/drivers/infiniband/ \
> hw/qib/ib_qib.ko needs unknown symbol ib_wq
> WARNING:
> /lib/modules/2.6.39-300.32.1.el6uek.x86_64/kernel/drivers/infiniband/ \
> ulp/srp/ib_srp.ko needs unknown symbol ib_wq
> *** FINISHED INSTALLING PACKAGES ***

These warnings have no adverse effects and may be disregarded.

Bug 16946511

3.2.4 Node Manager Does Not Show Node Offline Status

The role of the Node Manager database is to track the various states a compute node goes through during provisioning. After successful provisioning the database continues to list a node as running, even if it is shut down. For nodes that are fully operational, the server status is tracked by Oracle VM Manager. However, the Oracle PCA Dashboard displays status information from the Node Manager. This may lead to inconsistent information between the Dashboard and Oracle VM Manager, but it is not considered a bug.

Workaround: To verify the status of operational compute nodes, use the Oracle VM Manager user interface.

Bug 17456373

3.2.5 Update Functionality Not Available in Dashboard

The Oracle PCA Dashboard cannot be used to perform an update of the software stack.

Workaround: Use the command line tool pca-updater to update the software stack of your Oracle PCA. For details, refer to the section Oracle Private Cloud Appliance Software Update in the Oracle Private Cloud Appliance Administrator's Guide. For step-by-step instructions, refer to the section Update. You can use SSH to log in to each management node and check /etc/pca-info for log entries indicating restarted services and new software revisions.

Bug 17476010, 17475976 and 17475845

3.2.6 Interrupting Download of Software Update Leads to Inconsistent Image Version and Leaves Image Mounted and Stored in Temporary Location

The first step of the software update process is to download an image file, which is unpacked in a particular location on the ZFS storage appliance. When the download is interrupted, the file system is not cleaned up or rolled back to a previous state. As a result, contents from different versions of the software image may end up in the source location from where the installation files are loaded. In addition, the downloaded *.iso file remains stored in /tmp and is not unmounted. If downloads are frequently started and stopped, this could cause the system to run out of free loop devices to mount the *.iso files, or even to run out of free space.

Workaround: The files left behind by previous downloads do not prevent you from running the update procedure again and restarting the download. Download a new software update image. When it completes successfully you can install the new version of the software, as described in the section Update in the Oracle Private Cloud Appliance Administrator's Guide.

Bug 18352512

3.2.7 Compute Nodes Lose Oracle VM iSCSI LUNs During Software Update

Several iSCSI LUNs, including the essential server pool file system, are mapped on each compute node. When you update the appliance software, it may occur that one or more LUNs are missing on certain compute nodes. In addition, there may be problems with the configuration of the clustered server pool, preventing the existing compute nodes from joining the pool and resuming correct operation after the software update.

Workaround: To avoid these software update issues, upgrade all previously provisioned compute nodes by following the procedure described in the section Upgrading Existing Compute Node Configuration from Release 1.0.2 in the Oracle Private Cloud Appliance Administrator's Guide.

Bugs 17922555, 18459090, 18433922 and 18397780

3.2.8 Virtual Machine File Systems Become Read-Only after Storage Head Failover

When a failover occurs between the storage heads of the Oracle PCA internal ZFS storage appliance, or an externally connected ZFS storage appliance, the file systems used by virtual machines may become read-only, preventing normal VM operation. Compute nodes may also hang or crash as a result.

Workaround: There is no documented workaround to prevent the issue. Once the storage head failover has completed, you can reboot the virtual machines to bring them back online in read-write mode.

Bugs 19324312 and 19670873

3.2.9 Oracle VM Manager Tuning Settings Are Lost During Software Update

During the Oracle PCA software update from Release 1.0.2 to Release 1.1.x, it may occur that the specific tuning settings for Oracle VM Manager are not applied correctly, and that default settings are used instead.

Workaround: Verify the Oracle VM Manager tuning settings and re-apply them if necessary. Follow the instructions in the section Verifying and Re-applying Oracle VM Manager Tuning after Software Update in the Oracle Private Cloud Appliance Administrator's Guide.

Bug 18477228

3.2.10 Oracle VM Manager Fails to Restart after Restoring a Backup Due to Password Mismatch

If you have changed the Oracle PCA password in the Dashboard, and need to restore the Oracle VM Manager from a backup that was made prior to the password change, the passwords will be out of sync, and Oracle VM Manager cannot be started because it cannot connect to its database. In this case, you need to make sure that the actual database password is also restored in the Oracle WebLogic Server JDBC connection configuration used by Oracle VM Manager. It is important to keep the password entries in the Oracle PCA Wallet up-to-date as well, although it is not the cause of this particular bug.

Workaround: After restoring the Oracle VM Manager database, also restore the database connection settings for Oracle VM Manager. The Oracle WebLogic Server JDBC connection must be configured to use the password that was in use at the time of the database backup. Make sure that the database entry in the Oracle PCA Wallet matches this password.

Caution

After synchronizing the passwords to access the Oracle VM Manager MySQL database, restart the ovca service from the master management node command line as follows: service ovca restart.

The database restore procedure is described in the section entitled Restoring the MySQL Database for Oracle VM Manager in the Oracle VM Installation and Upgrade Guide.
The database password used by Oracle VM Manager can be restored by extracting this file from the backup: ovmm/wls/config/jdbc/OVMDS-6373-jdbc.xml. The file must be extracted to this location: /nfs/shared_storage/wls/config/jdbc/OVMDS-6373-jdbc.xml. Since the jdbc directory is symlinked from /u01/app/oracle/ovm-manager-3/machine1/base_adf_domain/config/jdbc/, the file only needs to be extracted on one of the management nodes.
For instructions to manually update the passwords stored in the Wallet, refer to the section entitled Replacing Default Passwords Manually in the Oracle Private Cloud Appliance Administrator's Guide.

Bug 19333583

3.2.11 Oracle VM Java Processes Consume Large Amounts of Resources

Particularly in environments with a large number of virtual machines, and when many virtual machine operations – such as start, stop, save, restore or migrate – occur in a short time, the Java processes of Oracle VM may consume a lot of CPU and memory capacity on the master management node. Users will notice the browser and command line interfaces becoming very slow or unresponsive. This behavior is likely caused by a memory leak in the Oracle VM CLI.

Workaround: A possible remedy is to restart the Oracle VM CLI from the Oracle Linux shell on the master management node.

# /u01/app/oracle/ovm-manager-3/ovm_cli/bin/stopCLIMain.sh
# nohup /u01/app/oracle/ovm-manager-3/ovm_cli/bin/startCLIMain.sh&

Bug 18965916

3.2.12 External Storage Cannot Be Discovered Over Data Center Network

The default compute node configuration does not allow connectivity to additional storage resources in the data center network. Compute nodes are connected to the data center subnet to enable public connectivity for the virtual machines they host, but the compute nodes' physical network interfaces have no IP address in that subnet. Consequently, SAN or file server discovery will fail.

Bug 17508885

3.2.13 High Network Load with High MTU May Cause Time-Out and Kernel Panic in Compute Nodes

When network throughput is very high, certain conditions, like a large number of MTU 9000 streams, have been known to cause a kernel panic in a compute node. In that case, /var/log/messages on the affected compute node contains entries like "Task Python:xxxxx blocked for more than 120 seconds". As a result, HA virtual machines may not have been migrated in time to another compute node. Usually compute nodes return to their normal operation automatically.

Workaround: If HA virtual machines have not been live-migrated off the affected compute node, log into Oracle VM Manager and restart the virtual machines manually. If an affected compute node does not return to normal operation, restart it from Oracle VM Manager.

Bugs 20981004 and 21119672

3.2.14 Oracle PCA Dashboard URL Is Not Redirected

Before the product name change from Oracle Virtual Compute Appliance to Oracle Private Cloud Appliance, the Oracle PCA Dashboard could be accessed at https://<manager-vip>:7002/ovca. As of Release 2.0.5, the URL ends in /dashboard instead. However, there is no redirect from /ovca to /dashboard.

Workaround: Enter the correct URL: https://<manager-vip>:7002/dashboard.

Bug 21199163

3.2.15 User Interface Does Not Support Internet Explorer 10 and 11

Oracle PCA Release 2.0.1 uses the Oracle Application Development Framework (ADF) version 11.1.1.2.0 for both the Dashboard and the Oracle VM Manager user interface. This version of ADF does not support Microsoft Internet Explorer 10 or 11.

Workaround: Use Internet Explorer 9 or a different web browser; for example Mozilla Firefox.

Bug 18791952

3.2.16 Authentication Error Prevents Oracle VM Manager Login

In environments with a large number of virtual machines and frequent connections through the VM console of Oracle VM Manager, the browser UI login to Oracle VM Manager may fail with an "unexpected error during login". A restart of the ovmm service is required.

Workaround: From the Oracle Linux shell of the master management node, restart the ovmm service by entering the command service ovmm restart. You should now be able to log into Oracle VM Manager again.

Bug 19562053

3.2.17 Error Getting VM Stats in Oracle VM Agent Logs

During the upgrade to Oracle PCA Software Release 2.0.4 a new version of the Xen hypervisor is installed on the compute nodes. While the upgrade is in progress, entries may appear in the ovs-agent.log files on the compute nodes indicating that xen commands are not executed properly ("Error getting VM stats"). This is a benign and temporary condition resolved by the compute node reboot at the end of the upgrade process. No workaround is required.

Bug 20901778

3.2.18 Virtual Machine with High Availability Takes Five Minutes to Restart when Failover Occurs

The compute nodes in an Oracle PCA are all placed in a single clustered server pool during provisioning. A clustered server pool is created as part of the provisioning process. One of the configuration parameters is the cluster time-out: the time a server is allowed to be unavailable before failover events are triggered. To avoid false positives, and thus unwanted failovers, the Oracle PCA server pool time-out is set to 300 seconds. As a consequence, a virtual machine configured with high availability (HA VM) can be unavailable for 5 minutes when its host fails. After the cluster time-out has passed, the HA VM is automatically restarted on another compute node in the server pool.

This behavior is as designed; it is not a bug. The server pool cluster configuration causes the delay in restarting VMs after a failover has occurred.

3.2.19 The CLI Command show Accepts Non-Existent Targets As Parameters

The CLI command show expects a target parameter to be specified to indicate the target object for which information should be displayed. For most commands you can use tab-completion to determine what target objects are available for use as a parameter. However, if you enter a target object that does not exist, the command completes successfully but does not return any useful information. For example:

PCA> show cloud-wwpn blah

----------------------------------------
Cloud_Name           blah                 
WWPN_List                                 
----------------------------------------

Status: Success

When using the show rack-layout command on an x4-2 rack, tab-completion may indicate that the rack name is 'x3-2_base'. This is a misnomer for the rack, however the command works as expected.

Bug 19679777

3.2.20 Deleting a Running Task in the CLI Results in Errors

A task may take a while to complete, in which case it appears as "Running" if you display the task list in the CLI. While a task is running, it is possible to delete it using the CLI. The delete operation succeeds, but error messages appear at the CLI prompt a few minutes later. For example:

PCA> backup
Task_ID                           Status  Progress  Start_Time
-------                           ------  --------  ----------
1a553fae7ede40a7ac110fff557f2590  RUNNING        0  06-10-2015 06:22:12   
--------------- 
1 row displayed
Status: Success

PCA> delete task 1a553fae7ede40a7ac110fff557f2590 
************************************************************
WARNING !!! THIS IS A DESTRUCTIVE OPERATION.
************************************************************
Are you sure [y/N]:y
Status: Success
PCA>
PCA>
PCA> Traceback (most recent call last): 
        File "/usr/lib64/python2.6/logging/__init__.py", line 776, in emit      
          msg = self.format(record)
        File "/usr/lib64/python2.6/logging/__init__.py", line 654, in format
          return fmt.format(record)
        File "/usr/lib64/python2.6/logging/__init__.py", line 436, in format 
          record.message = record.getMessage()
        File "/usr/lib64/python2.6/logging/__init__.py", line 306, in getMessage
          msg = msg % self.args
TypeError: not all arguments converted during string formatting

Caution

It is advised not to delete running tasks. While the risk of irreparable damage is minimal, there may be adverse effects to deleting a running task.

Bug 21231788

3.2.21 The CLI Command diagnose software Fails To Correctly Run Some Diagnostic Tests

The CLI command diagnose software executes several individual tests that execute various scripts from the management node where the command is executed. There is a known issue with this mechanism where some of these scripts fail to run due to a missing ssh key. Therefore, it is recommended that this command is not run for diagnostic purposes until this issue has been resolved in a forthcoming release.

If required, a workaround is available, which involves exchanging the ssh keys between the live management node and itself. This can be achieved by running the following command on the management node:

[root@ovcamn05r1 ~]# ssh-copy-id root@192.168.4.3
The authenticity of host '192.168.4.3 (192.168.4.3)' can't be established.
RSA key fingerprint is 4e:33:d2:d1:2c:43:7f:f1:74:3f:42:b3:83:78:22:78.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.4.3' (RSA) to the list of known hosts.
root@192.168.4.3's password:

If you implement this workaround, you should check that you are able to ssh into the management node from itself, and that /root/.ssh/authorized_keys does not contain any additional keys that you were not expecting to be added.

Bug 19667855