This chapter describes how to migrate logical domains from one host machine to another as of this release of Logical Domains 1.2 software.
This chapter covers the following topics:
Logical Domain Migration provides the ability to migrate a logical domain from one host machine to another. The host where the migration is initiated is referred to as the source machine, and the host where the domain is migrated to is referred to as the target machine. Similarly, once a migration is started, the domain to be migrated is referred to as the source domain and the shell of a domain created on the target machine is referred to as the target domain while the migration is in progress.
The Logical Domains Manager on the source machine accepts the request to migrate a domain and establishes a secure network connection with the Logical Domains Manager running on the target machine. Once this connection has been established, the migration occurs. The migration itself can be broken down into different phases.
Phase 1: After connecting with the Logical Domains Manager running in the target host, information about the source machine and domain are transferred to the target host. This information is used to perform a series of checks to determine whether a migration is possible. The checks differ depending on the state of the source domain. For example, if the source domain is active, a different set of checks are performed than if the domain is bound or inactive.
Phase 2: When all checks in Phase 1 have passed, the source and target machines prepare for the migration. In the case where the source domain is active, this includes shrinking the number of CPUs to one and suspending the domain. On the target machine, a domain is created to receive the source domain.
Phase 3: For an active domain, the next phase is to transfer all the runtime state information for the domain to the target. This information is retrieved from the hypervisor. On the target, the state information is installed in the hypervisor.
Phase 4: Handoff. After all state information is transferred, the handoff occurs when the target domain resumes execution (if the source was active) and the source domain is destroyed. From this point on, the target domain is the sole version of the domain running.
For a migration to occur, both the source and target machines must be running compatible software:
The hypervisor on both the source and target machines must have firmware that supports domain migration.
If you see the following error, you do not have the correct version of system firmware on either the source or target machine.
System Firmware version on <downrev machine> does not support Domain Migration Domain Migration of LDom <source domain> failed
A compatible version of the Logical Domains Manager must be running on both machines.
The migration feature was first released with the Logical Domains 1.1 software and corresponding firmware. For information about the latest firmware for your platform, see the Logical Domains 1.2 Release Notes.
Since the migration operation executes on two machines, a user must be authenticated on both the source and target host. In particular, the user must have the solaris.ldoms.write authorization on both machines.
The ldm command line interface for migration allows the user to specify an optional alternate user name for authentication on the target host. If this is not specified, the user name of the user executing the migration command is used. In both cases, the user is prompted for a password for the target machine.
For the migration of an active domain to occur with Logical Domains 1.2 software, there is a certain set of requirements and restrictions imposed on the source logical domain, the source machine, and the target machine. The sections following describe these requirements and restrictions for each of the resource types.
The migration operation speeds up when the primary domain on the source and target systems have cryptographic units assigned.
Following are the requirements and restrictions on CPUs when performing a migration.
The source and target machines must have the same processor type running at the same frequency.
The target machine must have sufficient free strands to accommodate the number of strands in use by the domain. In addition, full cores must be allocated for the migrated domain. If the number of strands in the source are less than a full core, the extra strands are unavailable to any domain until after the migrated domain is rebooted.
After a migration, CPU dynamic reconfiguration (DR) is disabled for the target domain until it has been rebooted. Once a reboot has occurred, CPU DR becomes available for that domain.
Either the source domain must have only a single strand, or the guest OS must support CPU DR, so that the domain can be shrunk to a single strand before migration. Conditions in the guest domain that would cause a CPU DR removal to fail would also cause the migration attempt to fail. For example, processes bound to CPUs within the guest domain, or processor sets configured in the source logical domain, can cause a migration operation to fail.
There must be sufficient free memory on the target machine to accommodate the migration of the source domain. In addition, following are a few properties that must be maintained across the migration:
It must be possible to create the same number of identically-sized memory blocks.
The physical addresses of the memory blocks do not need to match, but the same real addresses must be maintained across the migration.
The target machine must have sufficient free memory to accommodate the migration of the source domain. In addition, the layout of the available memory on the target machine must be compatible with the memory layout of the source domain or the migration will fail.
In particular, if the memory on the target machine is fragmented into multiple small address ranges, but the source domain requires a single large address range, the migration will fail. The following example illustrates this scenario. The target domain has two Gbytes of free memory in two memory blocks:
# ldm list-devices memory MEMORY PA SIZE 0x108000000 1G 0x188000000 1G
The source domain, ldg-src, also has two Gbytes of free memory, but it is laid out in a single memory block:
# ldm list -o memory ldg-src NAME ldg-src MEMORY RA PA SIZE 0x8000000 0x208000000 2G
Given this memory layout situation, the migration fails:
# ldm migrate-domain ldg-src dt212-239 Target Password: Unable to bind 2G memory region at real address 0x8000000 Domain Migration of LDom ldg-src failed
Virtual devices that are backed by physical devices can be migrated. However, virtual devices that have direct access to physical devices cannot be migrated. For instance, you cannot migrate I/O domains.
All virtual I/O (VIO) services used by the source domain must be available on the target machine. In other words, the following conditions must exist:
Each logical volume used in the source logical domain must also be available on the target host and must refer to the same storage.
If the logical volume used by the source as a boot device exists on the target but does not refer to the same storage, the migration appears to succeed, but the machine is not usable as it is unable to access its boot device. The domain has to be stopped, the configuration issue corrected, and then the domain restarted. Otherwise, the domain could be left in an inconsistent state.
For each virtual network device in the source domain, a virtual network switch must exist on the target host, with the same name as the virtual network switch the device is attached to on the source host.
For example, if vnet0 in the source domain is attached to a virtual switch service name switch-y, then there must be a logical domain on the target host providing a virtual switch service named switch-y.
The switches do not have to be connected to the same network for the migration to occur, though the migrated domain can experience networking problems if the switches are not connected to the same network.
MAC addresses used by the source domain that are in the automatically allocated range must be available for use on the target host.
A virtual console concentrator (vcc) service must exist on the target host and have at least one free port. Explicit console constraints are ignored during the migration. The console for the target domain is created using the target domain name as the console group and using any available port on the first vcc device in the control domain. If there is a conflict with the default group name, the migration fails.
A domain using NIU Hybrid I/O resources can be migrated. A constraint specifying NIU Hybrid I/O resources is not a hard requirement of a logical domain. If such a domain is migrated to a machine that does not have available NIU resources, the constraint is preserved, but not fulfilled.
You cannot migrate a logical domain that has bound cryptographic units if it has more than one VCPU. Attempts to migrate such a domain will fail.
Any active delayed reconfiguration operations on the source or target hosts prevent a migration from starting. Delayed reconfiguration operations are blocked while a migration is in progress.
While a migration is in progress on a machine, any operation which could result in the modification of the Machine Description (MD) of the domain being migrated is blocked. This includes all operations on the domain itself as well as operations such as bind and stop on other domains on the machine.
Because a bound or inactive domain is not executing at the time of the migration, there are fewer restrictions than when you migrate an active domain.
The migration of a bound domain requires that the target is able to satisfy the CPU, memory, and I/O constraints of the source domain. Otherwise, the migration will fail. The migration of an inactive domain does not have such requirements. However, the target must satisfy the domain's constraints when the binding occurred. Otherwise, the domain binding will fail.
You can migrate a bound or inactive domain between machines running different processor types and machines that are running at different frequencies.
The Solaris OS image in the guest must support the processor type on the target machine.
For an inactive domain, there are no checks performed against the virtual input/output (VIO) constraints. So, the VIO servers do not need to exist for the migration to succeed. As with any inactive domain, the VIO servers need to exist and be available at the time the domain is bound.
When you provide the -n option to the migrate-domain subcommand, migration checks are performed, but the source domain is not migrated. Any requirement that is not satisfied is reported as an error. This allows you to correct any configuration errors before attempting a real migration.
Because of the dynamic nature of logical domains, it is possible for a dry run to succeed and a migration to fail and vice-versa.
When a migration is in progress, the source and target domains are shown differently in the status output. The output of the ldm list command indicates the state of the migrating domain.
The sixth column in the FLAGS field shows one of the following values:
The source domain shows an s to indicate that it is the source of the migration.
The target domain shows a t to indicate that it is the target of a migration.
If an error occurs that requires user intervention, an e is shown.
The following shows that ldg-src is the source domain of the migration:
# ldm list ldg-src NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME ldg-src suspended -n---s 1 1G 0.0% 2h 7m
The following shows that ldg-tgt is the target domain of the migration:
# ldm list ldg-tgt NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME ldg-tgt bound -----t 5000 1 1G
In the long form of the status output, additional information is shown about the migration. On the source, the percentage of the operation complete is displayed along with the target host and domain name. Similarly, on the target, the percentage of the operation complete is displayed along with the source host and domain name.
# ldm list -o status ldg-src NAME ldg-src STATUS OPERATION PROGRESS TARGET migration 17% t5440-sys-2
Once a migration starts, if the ldm command is interrupted with a KILL signal, the migration is terminated. The target domain is destroyed, and the source domain is resumed if it was active. If the controlling shell of the ldm command is lost, the migration continues in the background.
A migration operation can also be canceled externally by using the ldm cancel-operation command. This terminates the migration in progress, and the source domain resumes as the active domain. The ldm cancel-operation command should be initiated from the source system. On a given system, any migration-related command impacts the migration operation that was started from that system. A system cannot control a migration operation when it is the target system.
Once a migration has been initiated, suspending the ldm(1M) process does not pause the operation, because it is the Logical Domains Manager daemon (ldmd) on the source and target machines that are effecting the migration. The ldm process waits for a signal from the ldmd that the migration has been completed before returning.
If the network connection is lost after the source has completed sending all the runtime state information to the target, but before the target can acknowledge that the domain has been resumed, the migration operation terminates, and the source is placed in an error state. This indicates that user interaction is required to determine whether or not the migration was completed successfully. In such a situation, take the following steps.
Determine whether the target domain has resumed successfully. The target domain will be in one of two states:
If the migration completed successfully, the target domain is in the normal state.
If the migration failed, the target cleans up and destroys the target domain.
If the target is resumed, it is safe to destroy the source domain in the error state. If the target is not present, the source domain is still the master version of the domain, and it must be recovered. To do this, execute the cancel command on the source machine. This clears the error state and restores the source domain back to its original condition.
Example 8–2 shows how a domain, called ldg1, can be migrated to a machine called t5440-sys-2.
# ldm migrate-domain ldg1 t5440-sys-2 Target Password:
Example 8–3 shows that a domain can be renamed as part of the migration. In this example, ldg-src is the source domain, and it is renamed to ldg-tgt on the target machine (t5440-sys-2) as part of the migration. In addition, the user name (root) on the target machine is explicitly specified.
# ldm migrate ldg-src root@t5440-sys-2:ldg-tgt Target Password:
Example 8–4 shows a sample failure message if the target domain does not have migration support; that is, if you are running an LDoms version prior to version 1.2.
# ldm migrate ldg1 t5440-sys-2 Target Password: Failed to establish connection with ldmd(1m) on target: t5440-sys-2 Check that the 'ldmd' service is enabled on the target machine and that the version supports Domain Migration. Check that the 'xmpp_enabled' and 'incoming_migration_enabled' properties of the 'ldmd' service on the target machine are set to 'true' using svccfg(1M).
Example 8–5 shows how to obtain status on a target domain while the migration is in progress. In this example, the source machine is t5440-sys-1.
# ldm list -o status ldg-tgt NAME ldg-tgt STATUS OPERATION PROGRESS SOURCE migration 55% t5440-sys-1
Example 8–6 shows how to obtain parseable status on the source domain while the migration is in progress. In this example, the target machine is t5440-sys-2.
# ldm list -o status -p ldg-src VERSION 1.3 DOMAIN|name=ldg-src| STATUS |op=migration|progress=42|error=no|target=t5440-sys-2