12.2 Troubleshooting Oracle VM Server

This section describes some problems you may encounter when using Oracle VM Server, and explains how to resolve them.

If you need to contact Oracle Support Services, you will be asked to supply the log files mentioned in this section. You may also be required to provide the exact version of each Oracle VM component. To find the version of Oracle VM Manager, click the Help menu, then About. To find the version of Oracle VM Server and Oracle VM Agent, see Control Domains Perspective in the Oracle VM Manager User's Guide.

12.2.1 Oracle VM Server Debugging Tools

If virtual machine creation fails, check the Oracle VM Server log files and use the command-line tools to help you find the cause of a problem. There are a number of useful command-line tools, important directories, and log files that you should check when troubleshooting problems with Oracle VM Server. This section discusses these tools and log files.

12.2.1.1 Oracle VM Server Directories

The important Oracle VM Server directories you should check when troubleshooting problems with Oracle VM Server are listed in Table 12.1, “Oracle VM Server directories”.

Table 12.1 Oracle VM Server directories

Directory

Purpose

/etc/xen

Contains Oracle VM Server configuration files for the Oracle VM Server daemon and virtualized guests.

/etc/xen/scripts

Contains networking related scripts.

/var/log

Contains the following files: .

  • ovs-agent.log, log file for the Oracle VM Agent.

  • ovmwatch.log, logs virtual machine life cycle events.

  • ovm-consoled.log, logs remote VNC console access, and all communication with Oracle VM Manager.

  • messages*, logs all Oracle VM Server messages.

/var/log/xen

Contains Oracle VM Server log files.


12.2.1.2 Oracle VM Server Log Files

The Oracle VM Serverlog files you should check when troubleshooting problems with Oracle VM Server are listed in the following table:

Table 12.2 Oracle VM Server log files

Log File

Directory

Description

xend.log

/var/log/xen/

Contains a log of all the actions of the Oracle VM Server daemon. Actions are normal or error conditions. This log contains the same information as output using the xm log command.

xend-debug.log

/var/log/xen/

Contains more detailed logs of the actions of the Oracle VM Server daemon.

xen-hotplug.log

/var/log/xen/

Contains a log of hotplug events. Hotplug events are logged if a device or network script does not start up or become available.

qemu-dm.pid.log

/var/log/xen/

Contains a log for each hardware virtualized guest. This log is created by the quemu-dm process. Use the ps command to find the pid (process identifier) and replace this in the file name.

ovs-agent.log

/var/log/

Contains a log for Oracle VM Agent.

osc.log

/var/log/

Contains a log for Oracle VM Storage Connect plug-ins.

ovm-consoled.log

/var/log/

Contains a log for the Oracle VM virtual machine console.

ovmwatch.log

/var/log/

Contains a log for the Oracle VM watch daemon.


12.2.1.3 Oracle VM Server Command Line Tools

The following table lists command line tools you can use when troubleshooting problems with Oracle VM Server:

Note

These command line tools are included as part of the Xen environment. You should refer to the appropriate Xen documentation for more information on using them.

Table 12.3 Oracle VM Server command line tools

Command Line Tool

Description

xentop

Displays real-time information about Oracle VM Server and domains.

xm dmesg

Displays log information on the hypervisor.

xm log

Displays log information of the Oracle VM Server daemon.


12.2.2 Using DHCP on Oracle VM Servers

It is recommended that you install Oracle VM Server on a computer with a static IP address. If you use DHCP to manage the IP address space in your environment, the DHCP server should be configured to map the server interface MAC addresses to specific IP assignments. This makes sure your host always receives the same IP address. The behavior of the Oracle VM Server host is undefined if used in an environment where your IP address may change due to DHCP lease expiry.

12.2.3 Cannot Use Certain Key Combinations When Connecting to Dom0 Console

Some server models and some client terminals are not ideally compatible with regard to special key combinations. For instance, on some HP servers, such as the HP DL380G4 (BIOS P51) server, the Alt-F2 key combination required to toggle to the login screen does not work for all terminal clients. Some terminal clients provide alternate key mappings, so it is worth checking the documentation of your selected terminal client to determine whether an alternative mapping is available.

If you are using the Windows PuTTY SSH client, you can press Alt + the right arrow key and Alt + the left arrow key to toggle the login screen, instead of the printed Alt-F2.

12.2.4 Storage Array LUN Remapping on Oracle VM Servers

Storage array LUN remapping is not supported by Oracle VM Servers. An Oracle VM Server must maintain the connections to a storage array's logical drive using the same LUN IDs. If a LUN is remapped, the following error may be printed in the Oracle VM Server's messages log:

Warning! Received an indication that the LUN assignments on this target have changed. 
The Linux SCSI layer does not automatically remap LUN assignments.

To work around this issue:

  • For Fibre Channel storage, reboot the Oracle VM Server. The new storage array LUN IDs are used.

  • For iSCSI storage, restart the iscsi daemon on the Oracle VM Server to delete and restore all iSCSI target connections, for example:

    # service iscsi restart

    Alternatively, on the Oracle VM Server, log out and log in again to the target for which the LUN IDs have changed, for example:

    # iscsiadm --mode node --logout ip_address iqn.xyz:1535.uuid
    # iscsiadm --mode node --login ip_address iqn.xyz:1535.uuid

12.2.5 Tuning ISCSI Settings on Oracle VM Servers

In some cases, it is possible to run into limitations or bugs within a particular ISCSI implementation that may require you to tune your ISCSI settings for storage initiators on each of your Oracle VM Servers.

Typically this is required when you experience an IO lock on a LUN and an unexpected change in the kernel workload on the Oracle VM Server. You may also notice a dramatic increase in network traffic between the Oracle VM Server and the storage array. This particular case has been noted to occur on some ZFS appliances running Oracle Solaris 11 and is related to the Sun iSCSI COMSTAR port provider. The problem can usually be resolved by updating package versions, however if this is not an option you may tune your ISCSI settings on each Oracle VM Server that communicates with a SUN.COMSTAR target to enable flow control.

To tune ISCSI, on each Oracle VM Server, perform the following steps:

  • Open /etc/iscsi/iscsid.conf in a text editor.

  • Locate the section titled iSCSI settings.

  • Change the value of the entry node.session.iscsi.InitialR2T to yes, and the value of the entry node.session.iscsi.ImmediateData to no.

  • Save the file.

  • Restart the ISCSI daemon by issuing the following command:

    # service iscsid restart
Note

The preferred resolution to this issue is to update your software. Manual configuration of Oracle VM Server settings is not generally advisable, as changes may be lost during future updates.

12.2.6 Troubleshooting Clustered Server Pools Oracle VM Server for x86

There are some situations where removing an Oracle VM Server from a server pool may generate an error. Typical examples include the situation where an OCFS2-based repository is still presented to the Oracle VM Server at the time that you attempt to remove it from the server pool, or if the Oracle VM Server has lost access to the server pool file system or the heartbeat function is failing for that Oracle VM Server. The following list describes steps that can be taken to handle these situations.

  • Make sure that there are no repositories presented to the server when you attempt to remove it from the server pool. If this is the cause of the problem, the error that is displayed usually indicates that there are still OCFS2 file systems present. See the Repositories Perspective section in the Oracle VM Manager Online Help for more information.

  • If a pool file system is causing the remove operation to fail, other processes might be working on the pool file system during the unmount. Try removing the Oracle VM Server at a later time.

  • In a case where you try to remove a server from a clustered server pool on a newly installed instance of Oracle VM Manager, it is possible that the file server has not been refreshed since the server pool was discovered in your environment. Try refreshing all storage and all file systems on your storage before attempting to remove the Oracle VM Server.

  • In the situation where the Oracle VM Server cannot be removed from the server pool because the server has lost network connectivity with the rest of the server pool, or the storage where the server pool file system is located, a critical event is usually generated for the server in question. Try acknowledging any critical events that have been generated for the Oracle VM Server in question. See the Events Perspective section in the Oracle VM Manager Online Help for more information. Once these events have been acknowledged you can try to remove the server from the server pool again. In most cases, the removal of the server from the server pool succeeds after critical events have been acknowledged, although some warnings may be generated during the removal process. Once the server has been removed from the server pool, you should resolve any networking or storage access issues that the server may be experiencing.

  • If the server is still experiencing trouble accessing storage and all critical events have been acknowledged and you are still unable to remove it from the server pool, try to reboot the server to allow it to rejoin the cluster properly before attempting to remove it again.

  • If the server pool file system has become corrupt for some reason, or a server still contains remnants of an old stale cluster, it may be necessary to completely erase the server pool and reconstruct it from scratch. This usually involves performing a series of manual steps on each Oracle VM Server in the cluster and should be attempted with the assistance of Oracle Support.

12.2.7 Allocating Memory for Multiple Infiniband HCAs

Out of memory errors can occur when using multiple Infiniband host channel adapters (HCA) with the PCIe Scalable Interface (psif) driver for paravirtualized environments on Oracle VM Server. These out of memory errors occur because the psif driver requires a minimum 30 MB of memory for the driver itself in addition to 50 MB of memory for each interface instance. The default memory allocation for dom0 does not provide enough memory to support multiple interfaces.

To resolve the out of memory errors, you should increase the dom0 memory allocation for Oracle VM Server. See Section 1.6, “Changing the Memory Size of the Management Domain”.

The following is an example of an out of memory error that is written to /var/log/messages:

time_stamp hostname kernel: [ 1465.466059] Out of memory: Kill process
8106 (python) score 0 or sacrifice child
Connection to hostname closed.l: [ 1465.466089] Killed process 8467 (ovs-agent) total-vm:108488kB, anon-rss:12kB, file-rss:2064kB

12.2.8 Resolving Issue Where NIC Fails to Get IP Address if Configured for DHCP

In some cases, when you configure a network interface for Oracle VM Server to retrieve IP addresses via DHCP, and then bring up that interface, it cannot retrieve an IP address and the following error occurs:

Determining IP information for interface_name... 
failed; no link present. Check cable?

This issue can be caused when the time it takes the network interface to come up is greater than the time set for the LINKDELAY environment variable.

To resolve this issue, set the value of the LINKDELAY to a value of 10 or higher in the /etc/sysconfig/network-scripts/ifcfg-eth* file, as in the following example:

DEVICE=eth3
BOOTPROTO=dhcp
HWADDR=00:00:00:00:00:00
ONBOOT=yes
LINKDELAY=10