Oracle® VM Server for SPARC 3.3 Release Notes

Exit Print View

Updated: October 2015
 
 

Migration Issues

Migration of a Domain Between SPARC T7 Series Servers With Fragmented Memory Might Cause ldmd to Crash

Bug ID 21554591: During a live migration, the ldmd service on the target machine might dump core then restart.

This problem might occur when the memory on the domain to be migrated is highly fragmented into multiple memory segments and the target machine's free memory layout is not compatible. The problem is more likely to occur if you use memory DR to remove memory from the domain prior to live migration.

The stack trace of the core dump is similar to the following:

restore_lgpg_mblk+0x398(17bbc88, 16c39c8, 80000000, 80000000, 0, 40000000)
rgrp_restore_lgpg+0x39c(0, 0, 1733948, 1711598, 0, 20000000)
mem_allocate_real+0x92c(0, 20000000, ffbff868, 13aec88, 80808080, 373cd8)
affinity_bind_resources+0x9f4(17bbc88, ffbff948, 13aec88, 3a10c000, 3a10c000, 1010101)
mem_bind_real+0x468(17bbc88, ffbff9d4, 13aec88, 3a10c000, 3a10c000, 1010101)
mem_bind_real_check+0xf4(17bbc88, 12ee338, 13aec88, 0, 376468, ff29fd80)
mig_tgt_bound_feasibility_check+0x168(164be08, ff000000, ff, 1, 0, 0)
i_tgt_do_feasibility_check+0x168(164be08, 0, 12390, 1, f960d244, ffffff)
sequence+0x4a4(0, ff000000, ff322a40, 1, f960d244, ffffff)
main+0xb54(5, ffbffc64, ffbffc7c, f960a900, 0, ff320200)
_start+0x108(0, 0, 0, 0, 0, 370b60)

When this problem occurs, the guest domain continues to run. If the ldmd services restarts successfully, no further recovery is needed.

If the ldmd service fails to restart and goes into maintenance mode due to Bug 21569507, you must perform a power cycle of the host or applicable physical domain before you can restart ldmd.

Workaround: Stop and unbind the guest domain and then perform a cold migration. Do not use memory DR to remove memory from the guest domains to be migrated.

Kernel Zones Block Live Migration of Guest Domains

Bug ID 21289174: On a SPARC system, a running kernel zone within an Oracle VM Server for SPARC domain will block live migration of the guest domain. The following error message is shown:

Guest suspension failed because Kernel Zones are active.
Stop Kernel Zones and retry.

Workaround: Choose one of the following workarounds:

Cross-CPU Live Migrations Between SPARC T7 Series Servers and SPARC M7 Series Servers and Older Platforms Require at Least the Oracle VM Server for SPARC 3.2 Software on the Source Machine and the Target Machine

Bug ID 20606773: Cross-CPU live migrations between a SPARC T7 series server or a SPARC M7 series server and an older platform require that you run at least Oracle VM Server for SPARC 3.2 software on the source and target machines.

For example, live migration between a SPARC T5 system and a SPARC T7 series server requires that at least Oracle VM Server for SPARC 3.2 software is installed on the SPARC T5 system.

Domain Migration Might Fail Even Though Sufficient Memory in a Valid Layout Is Available on the Target System

Bug ID 20453206: A migration operation might fail even if sufficient memory in a valid layout is available on the target system. Memory DR operations might make it more difficult to migrate a guest domain.

Workaround: None.

Oracle Solaris 10 Guest Domains That Have Only One Virtual CPU Assigned Might Panic During a Live Migration

Bug ID 17285751: Migrating an Oracle Solaris 10 guest domain that has only one virtual CPU assigned to it might cause a panic on the guest domain in the function pg_cmt_cpu_fini().

Workaround: Assign at least two virtual CPUs to the guest domain before you perform the live migration. For example, use the ldm add-vcpu number-of-virtual-CPUs domain-name command to increase the number of virtual CPUs assigned to the guest domain.

Domain Migrations From SPARC T4 Systems That Run System Firmware 8.3 to SPARC T5, SPARC M5, or SPARC M6 Systems Are Erroneously Permitted

Bug ID 17027275: Domain migrations from SPARC T4 systems that run system firmware 8.3 to SPARC T5, SPARC M5, or SPARC M6 systems are not permitted. Although the migration succeeds, a subsequent memory DR operation causes a panic.

Workaround: Update the system firmware on the SPARC T4 system to version 8.4. See the workaround for Guest Domain Panics at lgrp_lineage_add(mutex_enter: bad mutex, lp=10351178).

ldm migrate -n Should Fail When Performing a Cross-CPU Migration From SPARC T5, SPARC M5, or SPARC M6 System to UltraSPARC T2 or SPARC T3 System

Bug ID 16864417: The ldm migrate -n command does not report failure when attempting to migrate between a SPARC T5, SPARC M5, or SPARC M6 machine and an UltraSPARC T2 or SPARC T3 machine.

Workaround: None.

ldm list -o status on Target Control Domain Reports Bogus Migration Progress

Bug ID 15819714: In rare circumstances, the ldm list -o status command reports a bogus completion percentage when used to observe the status of a migration on a control domain.

This problem has no impact on the domain that is being migrated or on the ldmd daemons on the source or target control domains.

Workaround: Run the ldm list -o status command on the other control domain that is involved in the migration to observe the progress.

Guest Domain Panics When Running the cputrack Command During a Migration to a SPARC T4 System

Bug ID 15776123: If the cputrack command is run on a guest domain while that domain is migrated to a SPARC T4 system, the guest domain might panic on the target machine after it has been migrated.

Workaround: Do not run the cputrack command during the migration of a guest domain to a SPARC T4 system.

Guest Domain That Uses Cross-CPU Migration Reports Random Uptimes After the Migration Completes

Bug ID 15775055: After a domain is migrated between two machines that have different CPU frequencies, the uptime reports by the ldm list command might be incorrect. These incorrect results occur because uptime is calculated relative to the STICK frequency of the machine on which the domain runs. If the STICK frequency differs between the source and target machines, the uptime appears to be scaled incorrectly.

This issue only applies to UltraSPARC T2, UltraSPARC T2 Plus and SPARC T3 systems.

The uptime reported and shown by the guest domain itself is correct. Also, any accounting that is performed by the Oracle Solaris OS in the guest domain is correct.

nxge Panics When Migrating a Guest Domain That Has Hybrid I/O and Virtual I/O Virtual Network Devices

Bug ID 15710957: When a heavily loaded guest domain has a hybrid I/O configuration and you attempt to migrate it, you might see an nxge panic.

Workaround: Add the following line to the /etc/system file on the primary domain and on any service domain that is part of the hybrid I/O configuration for the domain:

set vsw:vsw_hio_max_cleanup_retries = 0x200

Live Migration of a Domain That Depends on an Inactive Master Domain on the Target Machine Causes ldmd to Fault With a Segmentation Fault

Bug ID 15701865: If you attempt a live migration of a domain that depends on an inactive domain on the target machine, the ldmd daemon faults with a segmentation fault, and the domain on the target machine restarts. Although you can still perform a migration, it will not be a live migration.

    Workaround: Perform one of the following actions before you attempt the live migration:

  • Remove the guest dependency from the domain to be migrated.

  • Start the master domain on the target machine.

DRM Fails to Restore the Default Number of Virtual CPUs for a Migrated Domain When the Policy Is Removed or Expired

Bug ID 15701853: After you perform a domain migration while a DRM policy is in effect, if the DRM policy expires or is removed from the migrated domain, DRM fails to restore the original number of virtual CPUs to the domain.

Workaround: If a domain is migrated while a DRM policy is active and the DRM policy is subsequently expired or removed, reset the number of virtual CPUs. Use the ldm set-vcpu command to set the number of virtual CPUs to the original value on the domain.

Migration Failure Reason Not Reported When the System MAC Address Clashes With Another MAC Address

Bug ID 15699763: A domain cannot be migrated if it contains a duplicate MAC address. Typically, when a migration fails for this reason, the failure message shows the duplicate MAC address. However in rare circumstances, this failure message might not report the duplicate MAC address.

# ldm migrate ldg2 system2
Target Password:
Domain Migration of LDom ldg2 failed

Workaround: Ensure that the MAC addresses on the target machine are unique.

Simultaneous Migration Operations in “Opposite Direction” Might Cause ldm to Hang

Bug ID 15696986: If two ldm migrate commands are issued between the same two systems simultaneously in the “opposite direction,” the two commands might hang and never complete. An opposite direction situation occurs when you simultaneously start a migration on machine A to machine B and a migration on machine B to machine A.

The hang occurs even if the migration processes are initiated as dry runs by using the –n option. When this problem occurs, all other ldm commands might hang.

Workaround: None.

Migration of a Domain That Has an Enabled Default DRM Policy Results in a Target Domain Being Assigned All Available CPUs

Bug ID 15655513: Following the migration of an active domain, CPU utilization in the migrated domain can increase dramatically for a short period of time. If a dynamic resource management (DRM) policy is in effect for the domain at the time of the migration, the Logical Domains Manager might begin to add CPUs. In particular, if the vcpu-max and attack properties were not specified when the policy was added, the default value of unlimited causes all the unbound CPUs in the target machine to be added to the migrated domain.

Recovery: No recovery is necessary. After the CPU utilization drops below the upper limit that is specified by the DRM policy, the Logical Domains Manager automatically removes the CPUs.

Explicit Console Group and Port Bindings Are Not Migrated

Bug ID 15527921: During a migration, any explicitly assigned console group and port are ignored, and a console with default properties is created for the target domain. This console is created using the target domain name as the console group and using any available port on the first virtual console concentrator (vcc) device in the control domain. If there is a conflict with the default group name, the migration fails.

Recovery: To restore the explicit console properties following a migration, unbind the target domain and manually set the desired properties using the ldm set-vcons command.

Migration Can Fail to Bind Memory Even If the Target Has Enough Available

Bug ID 15523120: In certain situations, a migration fails and ldmd reports that it was not possible to bind the memory needed for the source domain. This situation can occur even if the total amount of available memory on the target machine is greater than the amount of memory being used by the source domain.

This failure occurs because migrating the specific memory ranges in use by the source domain requires that compatible memory ranges are available on the target as well. When no such compatible memory range is found for any memory range in the source, the migration cannot proceed. See Migration Requirements for Memory in Oracle VM Server for SPARC 3.3 Administration Guide .

Recovery: If this condition is encountered, you might be able to migrate the domain if you modify the memory usage on the target machine. To do this, unbind any bound or active logical domain on the target.

Use the ldm list-devices -a mem command to see what memory is available and how it is used. You might also need to reduce the amount of memory that is assigned to another domain.

Cannot Connect to Migrated Domain's Console Unless vntsd Is Restarted

Bug ID 15513998: Occasionally, after a domain has been migrated, it is not possible to connect to the console for that domain.

Workaround: Restart the vntsd SMF service to enable connections to the console:

# svcadm restart vntsd

Note - This command will disconnect all active console connections.

Cannot Migrate a Domain Between a System That Has EFI GPT Disk Labels and a System That Does Not Have EFI GPT Disk Labels

System firmware versions 8.4, 9.1, and XCP2230 introduced support for EFI GPT disk labels. By default, virtual disks that are installed when running at least the Oracle Solaris 11.1 OS on those systems have an EFI GPT disk label. You cannot read this disk label on older versions of firmware (such as 9.0.x, 8.3, 7.x, or XCP2221). This situation precludes you from performing a live migration or a cold migration to a system that runs a system firmware version without EFI GPT support. Note that a cold migration also fails in this situation, which is different than the previous limitations.

    To determine whether your virtual disk has an EFI GPT disk label, run the devinfo -i command on the raw device. The following examples show whether the virtual disk has an SMI VTOC or an EFI GPT disk label.

  • SMI VTOC disk label. When your virtual disk has an SMI VTOC, you can perform a migration to firmware regardless of whether it supports EFI.

    This example indicates that the device has a VTOC label because the devinfo -i command reports device-specific information.

    # devinfo -i /dev/rdsk/c2d0s2
    /dev/rdsk/c2d0s2        0       0       73728   512     2
  • EFI GPT disk label. When your virtual disk has an EFI GPT disk label, you can perform a migration only to firmware that has EFI support.

    This example indicates that the device has an EFI GPT disk label because the devinfo -i command reports an error.

    # devinfo -i /dev/rdsk/c1d0s0
    devinfo: /dev/rdsk/c1d0s0: This operation is not supported on EFI
    labeled devices