C H A P T E R 8 |
This chapter describes how to migrate logical domains from one host machine to another as of this release of LDoms 1.1 software.
Logical Domain Migration provides the ability to migrate a logical domain from one host machine to another. The host where the migration is initiated is referred to as the source machine, and the host where the domain is migrated to is referred to as the target machine. Similarly, once a migration is started, the domain to be migrated is referred to as the source domain and the shell of a domain created on the target machine is referred to as the target domain while the migration is in progress.
The Logical Domains Manager on the source machine accepts the request to migrate a domain and establishes a secure network connection with the Logical Domains Manager running on the target machine. Once this connection has been established, the migration occurs. The migration itself can be broken down into different phases.
Phase 1: After connecting with the Logical Domains Manager running in the target host, information about the source machine and domain are transferred to the target host. This information is used to perform a series of checks to determine whether a migration is possible. The checks differ depending on the state of the source domain. For example, if the source domain is active, a different set of checks are performed than if the domain is bound or inactive.
Phase 2: When all checks in Phase 1 have passed, the source and target machines prepare for the migration. In the case where the source domain is active, this includes shrinking the number of CPUs to one and suspending the domain. On the target machine, a domain is created to receive the source domain.
Phase 3: For an active domain, the next phase is to transfer all the runtime state information for the domain to the target. This information is retrieved from the hypervisor. On the target, the state information is installed in the hypervisor.
Phase 4: Handoff. After all state information is transferred, the handoff occurs when the target domain resumes execution (if the source was active) and the source domain is destroyed. From this point on, the target domain is the sole version of the domain running.
For a migration to occur, both the source and target machines must be running compatible software:
The hypervisor on the source and target machines both must support the most recent version of the LDoms 1.1 firmware.
If you see the following error, you do not have the correct version of system firmware on either the source or target machine.
System Firmware version on <downrev machine> does not support Domain Migration Domain Migration of LDom <source domain> failed |
A compatible version of the Logical Domains Manager must be running on both machines.
Since the migration operation executes on two machines, a user must be authenticated on both the source and target host. In particular, the user must have the solaris.ldoms.write authorization on both machines.
The ldm command line interface for migration allows the user to specify an optional alternate user name for authentication on the target host. If this is not specified, the user name of the user executing the migration command is used. In both cases, the user is prompted for a password for the target machine.
For the migration of an active domain to occur with LDoms 1.1 software, there is a certain set of requirements and restrictions imposed on the source logical domain, the source machine, and the target machine. The sections following describe these requirements and restrictions for each of the resource types.
Following are the requirements and restrictions on CPUs when performing a migration.
The source and target machines must have the same processor type running at the same frequency.
The target machine must have sufficient free strands to accommodate the number of strands in use by the domain. In addition, full cores must be allocated for the migrated domain. If the number of strands in the source are less than a full core, the extra strands are unavailable to any domain until after the migrated domain is rebooted.
After a migration, CPU dynamic reconfiguration (DR) is disabled for the target domain until it has been rebooted. Once a reboot has occurred, CPU DR becomes available for that domain.
Either the source domain must have only a single strand, or the guest OS must support CPU DR, so that the domain can be shrunk to a single strand before migration. Conditions in the guest domain that would cause a CPU DR removal to fail would also cause the migration attempt to fail. For example, processes bound to CPUs within the guest domain, or processor sets configured in the source logical domain, can cause a migration operation to fail.
There must be sufficient free memory on the target machine to accommodate the migration of the source domain. In addition, following are a few properties that must be maintained across the migration:
The logical domain to be migrated must not contain any physical I/O devices. If a domain has any physical I/O devices, the migration fails.
All virtual I/O (VIO) services used by the source domain must be available on the target machine. In other words, the following conditions must exist:
Each logical volume used in the source logical domain must also be available on the target host and must refer to the same storage.
For each virtual network device in the source domain, a virtual network switch must exist on the target host, with the same name as the virtual network switch the device is attached to on the source host.
For example, if vnet0 in the source domain is attached to a virtual switch service name switch-y, then there must be a logical domain on the target host providing a virtual switch service named switch-y.
MAC addresses used by the source domain that are in the automatically allocated range must be available for use on the target host.
A virtual console concentrator (vcc) service must exist on the target host and have at least one free port. Explicit console constraints are ignored during the migration. The console for the target domain is created using the target domain name as the console group and using any available port on the first vcc device in the control domain. If there is a conflict with the default group name, the migration fails.
A domain using NIU Hybrid I/O resources can be migrated. A constraint specifying NIU Hybrid I/O resources is not a hard requirement of a logical domain. If such a domain is migrated to a machine that does not have available NIU resources, the constraint is preserved, but not fulfilled.
You cannot migrate a logical domain that has bound cryptographic units. Attempts to migrate such a domain fail.
Any active delayed reconfiguration operations on the source or target hosts prevent a migration from starting. Delayed reconfiguration operations are blocked while a migration is in progress.
While a migration is in progress on a machine, any operation which could result in the modification of the Machine Description (MD) of the domain being migrated is blocked. This includes all operations on the domain itself as well as operations such as bind, stop, and start on other domains on the machine.
Because a bound or inactive domain is not executing at the time of the migration, there are fewer restrictions than when you migrate an active domain.
You can migrate a bound or inactive domain between machines running different processor types and machines that are running at different frequencies.
The Solaris OS image in the guest must support the processor type on the target machine.
For an inactive domain, there are no checks performed against the virtual input/output (VIO) constraints. So, the VIO servers do not need to exist for the migration to succeed. As with any inactive domain, the VIO servers need to exist and be available at the time the domain is bound.
When you provide the -n option to the migrate-domain subcommand, migration checks are performed, but the source domain is not migrated. Any requirement that is not satisfied is reported as an error. This allows you to correct any configuration errors before attempting a real migration.
Note - Because of the dynamic nature of logical domains, it is possible for a dry run to succeed and a migration to fail and vice-versa. |
When a migration is in progress, the source and target domains are displayed differently in the status output. In particular, the short version of the status output shows a new flag indicating the state of the migrating domain. The source domain shows a s to indicate that it is the source of the migration. The target domain shows a t to indicate that it is the target of a migration. If an error occurs that requires user intervention, an e is displayed.
In the long form of the status output, additional information is displayed about the migration. On the source, the percentage of the operation complete is displayed along with the target host and domain name. Similarly, on the target, the percentage of the operation complete is displayed along with the source host and domain name.
# ldm ls -o status ldg-src NAME ldg-src STATUS OPERATION PROGRESS TARGET migration 17% t5440-sys-2 |
Once a migration starts, if the ldm command is interrupted with a KILL signal, the migration is terminated. The target domain is destroyed, and the source domain is resumed if it was active. If the controlling shell of the ldm command is lost, the migration continues in the background.
A migration operation can also be canceled externally from the ldm command using the cancel-operation subcommand. This terminates the migration in progress, and the source domain resumes as the master domain.
If the network connection is lost after the source has completed sending all the runtime state information to the target, but before the target can acknowledge that the domain has been resumed, the migration operation terminates, and the source is placed in an error state. This indicates that user interaction is required to determine whether or not the migration was completed successfully. In such a situation, take the following steps.
Determine whether the target domain has resumed successfully. The target domain will be in one of two states:
If the target is resumed, it is safe to destroy the source domain in the error state. If the target is not present, the source domain is still the master version of the domain, and it must be recovered. To do this, execute the cancel command on the source machine. This clears the error state and restores the source domain back to its original condition.
EXAMPLE 8-2 shows how a domain, called ldg1, can be migrated to a machine called t5440-sys-2.
# ldm migrate-domain ldg1 t5440-sys-2 Target Password: # |
EXAMPLE 8-3 shows that a domain can be renamed as part of the migration. In this example, ldg-src is the source domain, and it is renamed to ldg-tgt on the target machine (t5440-sys-2) as part of the migration. In addition, the user name (root) on the target machine is explicitly specified.
# ldm migrate ldg-src root@t5440-sys-2:ldg-tgt Target Password: # |
EXAMPLE 8-4 shows a sample failure message if the target domain does not have migration support; that is, if you are running an LDoms version prior to version 1.1.
EXAMPLE 8-5 shows how to obtain status on a target domain while the migration is in progress. In this example, the source machine is t5440-sys-1.
# ldm ls -o status ldg-tgt NAME ldg-tgt STATUS OPERATION PROGRESS SOURCE migration 55% t5440-sys-1 |
EXAMPLE 8-6 shows how to obtain parseable status on the source domain while the migration is in progress. In this example, the target machine is t5440-sys-2.
# ldm ls -o status -p ldg-src VERSION 1.3 DOMAIN|name=ldg-src| STATUS |op=migration|progress=42|error=no|target=t5440-sys-2 |
Copyright © 2008, Sun Microsystems, Inc. All rights reserved.