Logical Domains (LDoms) 1.0 Release Notes
These release notes contain the following information about this release of the Logical Domains 1.0 software:
This release of Logical Domains 1.0 software is specifically to add support for the Sun Blade T6300 Server Module.
Logical Domains (LDoms) Manager 1.0 software is supported on the following servers:
The following software is required or recommended. The minimum versions required for software and patches are listed.
You can find the patches at the SunSolveSM site:
The Logical Domains (LDoms) 1.0 Administration Guide and Logical Domains (LDoms) 1.0 Release Notes can be found at the following site:
The Beginners Guide to LDoms: Understanding and Deploying Logical Domains can be found at the following Sun BluePrints site:
In a logical domains environment, the virtual switch service running in a service domain can directly interact with GLDv3-compliant network adapters. Though non-GLDv3 compliant network adapters can be used in these systems, the virtual switch cannot interface with them directly. Refer to "Configuring Virtual Switch and Service Domain for NAT and Routing" in the Logical Domains (LDoms) 1.0 Administration Guide for information about how to use non-GLDv3 compliant network adapters.
Currently, the following adapters with their corresponding drivers are supported by the virtual switch:
1. Use the Solaris OS dladm(1M) command, where, for example, bge0 is the network device name.
2. Look at type: in the output:
The following graphics card can be used with LDoms software on the Sun Fire and SPARC Enterprise T2000 servers:
Following are the specifics:
Logical Domains software does not impose a memory size limitation when creating a domain. The memory size requirement is a characteristic of the guest operating system. Some Logical Domains functionality might not work if the amount of memory present is less than the recommended size. For recommended and minimum size memory requirements, refer to the installation guide for the operating system you are using. For the Solaris 10 11/06 OS, 512 MB is the recommended size of memory to install or upgrade, and 128 MB is the minimum size. The default size for a swap area is 512 MB. For the Solaris 10 11/06 OS, refer to "System Requirements and Recommendations" in the Solaris 10 11/06 Installation Guide: Planning for Installation and Upgrade.
The OpenBoot PROM has a minimum size restriction for a domain. For system firmware 6.4.x, that restriction is 12 MB. If you have a domain less than that size, the Logical Domains Manager will automatically boost the size of the domain to 12 MB. Refer to the release notes for your system firmware for information about memory size requirements.
This section details the software that is compatible with and can be used with the Logical Domains software in the control domain.
This section contains general notes and issues concerning the Logical Domains 1.0 software.
The SUNWldomu package is missing from the SUNWCreq metacluster, and both the SUNWldomu and SUNWldomr packages are missing from the SUNWCrnet metacluster (Bug ID 6484072).
If either of those metaclusters are installed on a machine to be used with Logical Domains software, you must install the packages manually.
Currently, System Firmware 6.4.x does not support the following features on Netra T2000 Servers:
The following bug IDs are filed to add this support:
This Logical Domains 1.0 software release does not support the Sun x8 Express 1/10G Ethernet Adapter (nxge driver).
Rebooting the control domain when guest domains are running or bound is not supported, because doing so could cause the control domain to hang.
If you have made any configuration changes since last saving a configuration to the SC, before you attempt to power off or power cycle a Logical Domains system, make sure you save the latest configuration that you want to keep.
1. Shut down and unbind all the non-I/O domains.
2. Shut down and unbind any active I/O domains.
3. Halt the primary domain.
Because no other domains are bound, the firmware automatically powers off the system.
1. Shut down and unbind all the non-I/O domains.
2. Shut down and unbind any active I/O domains.
3. Reboot the primary domain.
Because no other domains are bound, the firmware automatically power cycles the system before rebooting it. When the system restarts, it boots into the Logical Domains configuration last saved or explicitly set.
Once the virtual switch driver (vswitch) has attached, either as part of the normal Solaris OS boot sequence, or as a result of an explicit Solaris OS add_drv(1M) command, removing or updating the driver might cause networking to fail because of Bug ID 6486145.
Workaround: Once vswitch has attached, do not remove the driver using the Solaris OS rem_drv(1M) command or update the driver using the Solaris OS update_drv(1M) command.
Recovery: If you do remove the driver using the rem_drv command and then attempt to reattach it using the add_drv command, you must reboot after the add_drv command completes to ensure the networking restarts correctly. Similarly, you must also reboot after an update_drv command completes to ensure the networking does not fail.
Each thread of the UltraSPARC® T1 processor has a limited capacity for handling multiple, simultaneous, outstanding interrupts. Overall, only 256 interrupts can be outstanding for each thread allocated to a logical domain. The chief producers of interrupts are the I/O subsystem and Logical Domain Channels (LDCs), which are the main interdomain communication mechanism.
When configuring logical domains, avoid creating more interrupt producers than 256 times the number of threads assigned to the domain. Practically speaking, this only becomes an issue on the control domain, because of the fact that it has at least part (if not all) of the I/O subsystem allocated to it, and because of the potentially large number of LDCs created for both virtual I/O data communications, and Logical Domains Manager control of the other logical domains.
The following guidelines can help you prevent creation of a configuration that could overflow the interrupt capabilities of the control domain:
1. There are (256 * # threads in domain) available slots for outstanding interrupts.
2. Each I/O bridge consumes 64 of these potential interrupts.
3. Each LDC consumes two interrupts.
4. The control domain allocates 12 LDCs for various communication purposes with the Hypervisor, Fault Management Architecture (FMA), and the system controller (SC).
5. The control domain allocates one LDC to every logical domain, including itself, for control traffic
6. Each virtual I/O service on the control domain consumes one LDC for every connected client of that service.
For example, consider a control domain with four threads that includes both I/O bridges, and 8 additional configured logical domains. Each logical domain needs at a minimum:
Applying the above guidelines yields the following statistics. The numbers on the statistics lines correlate with the numbers on the guidelines.
1. There are 256*4 = 1024 available interrupts.
2. The two I/O bridges consume 128 interrupts, leaving 896 for LDCs.
3 & 4. There are 24 interrupts consumed for standard control domain services, leaving 872.
3 & 5. There are 20 interrupts ((8 domains + 2 LDCs for loopback control channel) * 2 interrupts/LDC) consumed for control traffic to the domains, leaving 852.
3 & 6. There are 48 interrupts (3 services * 2 interrupts/service * 8 domains) consumed for virtual I/O services, leaving 804.
This configuration will not have problems with potential interrupt overload.
Now consider the case where there are 16 domains instead of 8, and the control domain is reduced to a single thread. The equation for the number of spare interrupt slots in this scenario is:
256 - 128 - 24 - 36 - 96 = -28
This configuration has the potential for interrupt overload and should be avoided.
The XML format produced by the -x option to the ldm list-constraints subcommand and consumed by the -i option of several ldm subcommands is undergoing changes.
If you use the format that is currently being produced for generating scripts or manual editing, you will have to make changes later for your scripts or manual editing.
An ldm stop-domain command can time out before the domain completes shutting down. When this happens, an error similar to the following is returned by the Logical Domains Manager:
However, the domain could still be processing the shutdown request. Use the ldm list-domain command to verify the status of the domain. For example:
The preceding list shows the domain as active, but the s flag indicates that the domain is in the process of stopping. This should be a transitory state.
The following example shows the domain has now stopped:
Under certain circumstances, the Logical Domains (LDoms) Manager rounds up the requested memory allocation to either the next largest 8 KB or 4 MB multiple. This can be seen in the following example output of the ldm list-domain -l command, where the constraint value is smaller than the actual allocated size:
Attempting to reset the system controller while the Logical Domains Manager is running can result in undefined behavior.
Workaround: When running the Logical Domains Manager, power-off the host completely before resetting the system controller.
Currently, there are several issues related to dynamic reconfiguration (DR) of virtual CPUs if a logical domain contains one or more cryptographic (mau) units:
Currently, Fault Management Architecture (FMA) diagnosis of I/O devices in a Logical Domains environment might not work correctly. The problems are:
LDom variables for a domain can be specified using any of the following methods:
The goal is that in all cases, variable updates made using any of these methods always persistent across reboots of the domain, and always reflect in any subsequent logical domain configurations saved to the SC.
In Logical Domains 1.0 software, there are a few cases where variable updates do not persist:
When running the factory-default configuration, if you want a variable update to persist across a reboot into the same factory-default configuration, use the eeprom command. If you want it saved as part of a new logical domains configuration saved to the SC, use the appropriate Logical Domains Manager command.
The eeprom(1M) command cannot be used to reset EEPROM values to null in Logical Domains systems. The following example shows what happens if you attempt this:
The same command works correctly on non-Logical Domains systems as shown in this example:
If the Logical Domains Manager stops and then restarts during execution of any Logical Domains Manager ldm command, the program returns the following error message:
Recovery: This message usually indicates the command did not successfully complete. Verify that is the case, then reissue the command if appropriate.
This section summarizes the bugs that you may encounter when using this version of the software. The bug descriptions are in numerical order by bug ID. If a recovery procedure and a workaround are available, they are specified.
When the Fault Management Architecture (FMA) places a CPU offline, it records that information so that when the machine is rebooted, the CPU remains offline. The offline designation persists in a non-Logical Domains environment.
In a Logical Domains environment, however, this persistence is not always maintained. The Logical Domains Manager does not currently record data on fault events sent to it. This means that a CPU which has been marked as faulty, or one that was not allocated to a logical domain at the time the fault event is replayed, could subsequently be allocated to another logical domain with the result that it is put back in service.
The Solaris 10 OS virtual disk drivers (vdc and vds) currently do not support the CDIO(7I) ioctls that are needed to install from DVDs.
Refer to "Operating the Solaris OS With Logical Domains" in Chapter 5 of the Logical Domains (LDoms) 1.0 Administration Guide for specific information.
Running a snoop session on a physical interface (for example, the e1000g0 which the virtual switch has been instructed to use) before the driver has attached, and then canceling the snoop session after the driver has attached, can cause the system to panic with a recursive rw_enter panic.
This is an issue only if the virtual switch has been explicitly unloaded previously by use of the rem_drv(1M) command.
Recovery: Reboot the domain containing the virtual switch.
Workaround: Run snoop only after the virtual switch has attached.
If you send a break on the guest console using the ~# option of the virtual network terminal server daemon, vntsd(1M), and type r for reboot, the guest hangs with no response from the domain.
Recovery: Stop and start the guest domain from the control domain using the ldm stop-domain and ldm start-domain commands.
Workaround: Issue a reboot at the command line from within the guest domain.
Under heavy network loads, one CPU might show 100% utilization dealing with the network traffic.
Workaround: Attach several CPUs to the domain containing the virtual switch to ensure that the system remains responsive under a heavy load.
Messages like the following might be seen on the console every time the Logical Domains Manager starts:
Workaround: Ignore the warning.
On halting, rebooting, or net installing a guest domain, the service domain containing the virtual switch might panic with the following message.
Once you establish the connection between the virtual switch and the virtual networks in the guest domains, this bug is not an issue.
Recovery: Reboot the service domain containing the virtual switch.
Under rare circumstances, when an ldom variable, such as boot-device, is being updated from within a guest domain by using the eeprom(1M) command at the same time that the Logical Domains Manager is being used to add or remove virtual CPUs from the same domain, the guest OS can hang.
Workaround: Ensure that these two operations are not performed simultaneously.
Recovery: Use ldm stop and ldm start - Stop and start the guest OS.
The following messages might be seen on the ALOM-CMT console or in output of the ALOM-CMT showlogs command when the system controller is reset:
Workaround: Ignore these messages as they have no effect on the system.
Under rare circumstances, if a guest domain is rebooted at a time when it is experiencing high interrupt activity, the OS might hang.
Recovery: Use ldm stop and ldm start - Stop and start the guest OS.
If too many guest domains are performing I/O to a control or I/O domain, and if that domain is in the middle of panicking, the interrupt request pool of 64 entries will overflow and the system will not be able to save a crash dump. The panic message is as follows:
Following repeated reboots, a domain's user interface can become unresponsive. This happens because of failure to re-establish connection with the service domain. (The system should still respond to network activity, such as the ping(1M) command from a remote system and should also respond to a system abort sequence.)
1. Stop the unresponsive domain using the ldm stop-domain command.
2. Restart the domain using the ldm start-domain command.
There are some cases where the behavior of the ldm stop-domain command is confusing.
If the Solaris OS is halted on the domain; for example, by using the halt(1M) command; and the domain is at the prompt "r)eboot, o)k prompt, h)alt?," the ldom stop-domain command fails with the following error message: Workaround: Force a stop by using the ldm stop-domain command with the -f option.
Recovery: If you restart the domain from the kmdb prompt, the stop notification is handled, and the domain does stop.
In a Logical Domains environment, there is no support currently for setting or deleting wide-area network (WAN) boot keys from within the Solaris OS using the ickey(1M) command. All ickey operations fail with the following error:
In addition, WAN boot keys that are set using OpenBoot firmware in logical domains other than the control domain are not remembered across reboots of the domain. In these domains, the keys set from the OpenBoot firmware are only valid for a single use.
The Solaris 10 OS vntsd(1M) command does not validate the listen_addr property in the vntsd command's Service Management Facility (SMF) manifest. If the listen_addr property is invalid, vntsd fails to bind the IP address and exits.
1. Update the SMF listen_addr property with the correct IP address.
2. Refresh vntsd.
3. Restart vntsd.
Under certain conditions, domains configured with less than one gigabyte of memory and a large number of virtual network devices can either hang or panic with the following stack trace:
Workaround: Reconfigure the logical domain with either more memory or fewer virtual network devices.
Recovery: Apply the workaround, and reboot the domain.
When a ZFS, SVM, or VxVM volume is exported as a virtual disk to another domain, then the other domain sees that virtual disk as a disk with a single slice (s0), and the disk cannot be partitioned. As a consequence, such a disk is not usable by the Solaris installer, and you cannot install Solaris on the disk.
For example, /dev/zvol/dsk/tank/zvol is a ZFS volume which is exported as a virtual disk from the primary domain to domain1 using these commands:
The domain1 only sees one device for that disk (for example, c0d0s0), and there is no other slice for that disk; for example, no device c0d0s1, c0d0s2, c0d0s3....
Workaround: You can create a file and export that file as a virtual disk. This example creates a file on a ZFS system:
Note - When exporting a ZFS, SVM, or VxVM volume as a virtual disk, be aware that you will have to change your configuration once this bug is fixed, and the instructions for changing the configuration will be provided.
When creating logical domains with virtual switches and virtual network devices, the Logical Domains Manager does not prevent you from creating these devices with the same given MAC address. This can become a problem if the logical domains with virtual switches and virtual networks that have conflicting MAC addresses are in a bound state simultaneously.
Workaround: Ensure that you do not bind logical domains whose vsw and vnet MAC addresses might conflict with another vsw or vnet MAC address.
There is currently an option to turn the bypass mode on when adding a I/O bus. (Refer to the description of the ldm add-io command in Appendix A of the Logical Domains (LDoms) 1.0 Administration Guide or the ldm man page for more information.) If you want to change the bypass property of a bus that is already allocated, do one of the following:
Note - Attempting to do an ldm remove-io command immediately followed by an ldm add-io command, so that the bypass mode property changes, causes the Logical Domains Manager to terminate and any delayed reconfiguration in process is cancelled.
If any domains require contiguous memory mappings, be sure to bind those domains before any other domains get bound and then unbound. Otherwise, memory fragmentation could result in discontiguous memory being assigned to the subject domains.
A service domain providing access to a virtual disk whose back end is a Zettabyte File System (ZFS) file may hang when this virtual disk is used by another domain.
Workaround: To prevent this problem, configure the service domain with a sufficient amount of memory. Four gigabytes of memory is shown to be sufficient for most cases.
A misleading error message is returned from certain ldm subcommands that take two or more required arguments, if one or more of those required arguments is missing.
For example, if the add-vsw subcommand is missing the vswitch_name or ldom argument, an error message like the following is returned:
For another example, if the add-vnet command is missing the vswitch_name of the virtual switch service with which to connect:
Recovery: Refer to Appendix A of the Logical Domains (LDoms) 1.0 Administration Guide or the ldm man page for the required arguments of the ldm subcommands.
Under rare circumstances, when a domain is added or removed, without an intervening attempt to connect to the console through telnet, the virtual network terminal server daemon, vntsd(1M), fails to clean up the console state correctly. A subsequent attempt to use this vntsd TCP port for a domain will result in the connection being terminated unexpectedly.
Workaround: Restart vntsd(1M):
As part of the normal processing of canceling a delayed reconfiguration operation using the ldm remove-reconf command, the Logical Domains Manager exits. The Logical Domains Manager relies on the Service Management Facility (SMF) to restart it. This causes the following message to be returned:
Recovery: Verify that the Logical Domains Manager did exit as part of the normal operation of the cancel operation, by looking in the Logical Domains Manager daemon (ldmd) log in /var/svc/log/ldoms-ldmd:default.log for this message:
Workaround: Ignore the message.
If an ldm stop-domain -f command is used, a subsequent ldm unbind-domain command can result in a panic of the service domain running the virtual console concentrator (vcc) and the virtual network terminal server daemon (vntsd), as it fails to properly close connections as a result of the unbind operation and sends the following message:
Following is a typical stack trace:
panic[cpu3]/thread=300096f1620: recursive mutex_enter, lp=3000c4f0548 owner=300096f1620 thread=300096f1620
Recovery: Reboot the service domain.
Workaround: Before issuing the ldm stop-domain -f command, ensure that all console telnet connections to the guest have been closed.
In a service domain, disks which are managed by Veritas Dynamic Multipathing (DMP) cannot be exported as virtual disks to other domains. If a disk which is managed by Veritas DMP is added to a virtual disk server (vds) and then added as a virtual disk to a guest domain, the domain is unable to access and use that virtual disk. In such a case, the service domain reports the following errors in the /var/adm/messages file after binding the guest domain:
Recovery: If Veritas Volume Manager (VxVM) is installed on your system, disable Veritas DMP for the disks you want to use as virtual disks.
Under rare conditions, when a guest boots, it fails to establish a connection with the virtual switch (vsw). As a consequence, it is not able to send network packets to the domain containing the virtual switch and, hence, to the outside world.
For example, if the guest is using DHCP to obtain its IP address then you might see a similar message to the following on the guest console:
Communications with other guest domains are unaffected. This is only a potential problem when booting a guest domain. Once the guest domain has booted the operating system and is sending network packets, then this is no longer an issue.
Workaround: Stop and unbind the guest domain, and then re-bind and restart it. If this fails to resolve the problem, then stop, unbind the guest domain, and remove and re-add the virtual switch driver before rebinding and restarting the guest domain.
If a command is entered on the console that causes a continuous stream of characters to be output, and on another terminal the ldm add-config command is entered to store an logical domain configuration on the SC, the Logical Domains Manager might time out and the command would fail. The following message is returned by the Logical Domains Manager when this happens:
Recovery: Try the ldm add-config command again after the console output completes or is interrupted.
Workaround: The problem only occurs if you are actually connected to the console from the SC, which is accomplished using the ALOM console command. Disconnect from the console, using the documented escape sequence of #. to return to the SC console, and then the problem is no longer present.
A service domain can panic if a guest domain is using a virtual disk backed by a file, and the service domain has an I/O error while accessing the file. The service domain crashes only if the following conditions are present:
The I/O error is usually caused by an hardware problem; for example, a disk error or storage access problem.
Recovery: Fix the I/O problem in the service domain; for example, change the faulty disk.
If virtual disk in a guest domain is not properly configured, then an attempt to boot can take about three minutes to time out. During this time-out period, the system might look hung as no progress indicator is displayed in the console.
The way that the Solaris Crypto Framework handles CPU dynamic reconfiguration (DR) events that affect crypto units, CPU DR is disabled for all logical domains that have any crypto units bound.
Workaround: To use CPU DR on the control domain, all the crypto units must be removed from the control domain before saving a new configuration to the SC and while the system is running in the factory-default configuration. To perform CPU DR on all other domains, stop the domain first so the domain is in a bound state.
When a domain is configured with more than 32 virtual I/O devices or services, some of the devices fail to get configured correctly. When this problem is encountered, and the impacted device is a virtual disk server, it can manifest as a hang during boot.
This problem is not encountered if there are no more than 32 virtual I/O devices or services in a domain. The number of devices in a logical domain can be determined from the device tree using the following command:
Recovery: Stop and unbind the domain with more than 32 virtual I/O devices or services. Reconfigure the domain to contain less than 32 virtual I/O devices or services, and bind and start the domain.
When booting multiple logical domains in parallel, one of the logical domains may become completely unresponsive early in the boot process. Attempts to send the break command to the logical domain will have no effect, and processing in the domain will not make any forward progress. The logical domain enters this state because it is incorrectly waiting for a message from the Logical Domains Manager that will never arrive.
Recovery: You can force the Logical Domains Manager to send the unresponsive logical domain the type of message it is expecting. Use the ldm set-var command to set an Logical Domains variable for the logical domain. The actual value of the variable is not important; for example, you could use the following:
Once the logical domain processes the message generated by the ldm set-var command, the boot should continue normally.
When adding the first virtual network (vnet) device to a logical domain, the MAC address of the domain, as contained in the system banner, changes. As a further consequence, the host ID of the domain also changes.
Recovery: To perform operations like JumpStart, make sure to specify the MAC address of the interface over which the netboot will occur, and not the system MAC address. In addition, any software that is dependent on the host ID should be configured with the final, resultant host ID after the domain has been completely configured.
If the ldm stop-domain command has a valid logical domain name followed by an invalid logical domain name, the command fails with the following error message:
A subsequent command, such as ldm stop-domain can cause the Logical Domains Manager to stop, and the command returns the following error message:
Recovery: The Logical Domains Manager restarts automatically, and you can retry the command.
When a read or write I/O operation on a virtual disk takes a long time to complete, and an ioctl is issued on the same virtual disk, it can trigger an virtual disk hang with the following console message:
This problem occurs when a disk managed by Solaris I/O Multipathing software is exported as a virtual disk and a storage or path failure forces the multipathing software to switch the disk access to use another path. Under these circumstances a pending read or write I/O operation can take a long time to complete. At the same time, if a format(1M) or prtvtoc(1M) command is issued on that virtual disk, a hang might result.
Recovery: If the access to the virtual disk hangs, reboot the domain using that virtual disk.
Workaround: Avoid doing ioctl operations on disks that have active read or write I/O operations. For example, do not use the format(1M) or prtvtoc(1M) command on a virtual disk when the disk is mounted and read/write operations are being executed on the disk.
Issuing the ldm start-domain with the -i xml-file option causes the Logical Domains Manager to stop, resulting in the following error message:
The Logical Domains Manager does restart automatically, and the domain is created and started successfully.
If the amount of memory allocated to a logical domain does not meet the minimum size as required by the OpenBoot PROM (currently set to 12MB, but could change), the Logical Domains Manager silently increases the allocation to meet the minimum.
When the Solaris OS reboot(1M) command is issued to reboot a guest OS, the following messages can appear on the guest console:
The reboot proceeds as usual, but all arguments passed to the OpenBoot PROM boot command; that is, arguments which appear after the -- delimiter of the Solaris OS reboot(1M) command, are ignored by the boot code. The same warnings can occur even if no arguments are passed to the reboot command, because the system always attempts to store a default boot command.
Recovery: Once this occurs, there is no recovery.
Workaround: To prevent it from happening on future boots, you can do one of the following:
The virtual disk server opens the physical disk exported as a virtual disk device at the time of the bind operation. In certain cases, a recovery operation on the physical disk following a disk failure may not be possible if the guest domain is bound.
For instance, when a RAID or a mirror Solaris Volume Manager (SVM) volume is used as a virtual disk by another domain, and if there is a failure on one of the components of the SVM volume, then the recovery of the SVM volume using the metareplace command or using a hot spare does not start. The metastat command shows the volume as resynchronizing, but there is no progress in the synchronization.
Similarly, when a Fibre Channel Arbitrated Loop (FC_AL) device is used as a virtual disk, you must use the Solaris OS luxadm(1M) command with a loop initialization primitive sequence (forcelip subcommand) to reinitialize the physical disk after unbinding the guest.
Recovery: To complete the recovery or SVM resynchronization, stop and unbind the domain using the SVM volume as a virtual disk. Then resynchronize the SVM volume using the metasync command.
When a new virtual device is added to a logical domain, it can fail to establish a connection with the virtual switch device. This results in loss of network connectivity to and from the logical domain. When this error is encountered, on inspection, it will reveal that the /dev/vnetN symbolic link for the virtual network instance is missing.
If present, and not in error, the link points to a corresponding /devices entry as shown here:
Recovery: Do one of the following:
If Sun Cluster software is in use with Logical Domains software, and the cluster is shut down, the console of each logical domain in the cluster displays the following prompt:
If the ok prompt (o option) is selected, the system can panic.
Use this procedure only for the primary domain.
1. Issue the following ALOM command to reset the domain:
The OpenBoot banner is displayed on the console:
2. Issue the following ALOM command to send a break to the domain immediately after the OpenBoot banner displays.
The logical domain immediately drops to the ok prompt.
Use this procedure for all logical domains, except the primary domain.
1. Issue the following command from the control domain to disable the auto-boot? variable for the logical domain:
2. Issue the following command from the control domain to reset the logical domain:
The logical domain stops at the ok prompt.
3. Issue the following OpenBoot command to restore the value of the auto-boot? variable:
If a guest domain is running the Solaris 10 OS and using a virtual disk built from a ZFS volume provided by a service domain running Solaris Express or OpenSolaris, then the guest domain might not be able to access that virtual disk.
The same problem can occur with a guest domain running Solaris Express or OpenSolaris using a virtual disk built from a ZFS volume provided by a service domain running Solaris 10 OS.
Workaround: Be sure the guest domain and the service domain are running the same version of Solaris software (Solaris 10 OS, Solaris Express, or OpenSolaris).
Rebooting the primary domain while there are other active or bound domains, as well as rebooting an I/O domain, is not supported in this release of Logical Domains firmware. If, when in a configuration with multiple active or bound domains, you inadvertently reboot either the primary domain or any I/O domain, or if any such domain panics, the I/O devices owned by that domain are left in an undefined state. In addition, there is a possibility that the following error messages appear during the reboot:
Recovery: If this occurs (regardless of whether the above errors are seen), the only recovery is to perform a clean power off and power on of the system. See Cleanly Shutting Down and Power Cycling a Logical Domains System for this procedure.
Workaround: If the primary or other I/O domain needs to be reset, you must power cycle the system. See Cleanly Shutting Down and Power Cycling a Logical Domains System for this procedure.
On a service domain, a file or a device which has been used as a virtual disk might still appear to be in use by the virtual disk server (vds), although it is not in use and no guest domain is running and bound.
In such a case, the ldm list command shows all domains (except the service domain) as inactive; for example:
In contrast, the fuser command shows that devices are still in use by the vds driver; for example:
Recovery: Reboot the system. See Cleanly Shutting Down and Power Cycling a Logical Domains System for more information.
If you upgrade the Logical Domains firmware, and you do not enable the Logical Domains Manager daemon, ldmd, the Solaris 10 OS Fault Management daemon fmd(1M), on the primary domain appears to hang. The Fault Management daemon attempts to communicate with the Logical Domains Manager daemon, which does not exist or is down. Since each request has a 20-minute time-out, it looks like a hang. If your machine is FMA-clean, you might not experience the problem.
When a memory page of the primary domain is diagnosed as faulty, the Logical Domains Manager retires the page. The fmd command fails to obtain the page status and does not replay the page fault.
When a memory page of a guest domain is diagnosed as faulty, the Logical Domains Manager retires the page in the logical domain. If the logical domain is stopped and restarted again, the page is no longer in a retired state.
The command fmadm faulty -a shows the page from either the primary or guest domain is faulty, but the page is not actually retired. This means the faulty page can continue to generate memory errors.
Currently, the virtual switch (vsw) does not support the use of aggregated network interfaces. If a virtual switch instance is told to use an aggregated device (aggr15 in this example), then a warning message similar to the following appears on the console during boot:
Recovery: Configure the virtual switch to use a supported GLDv3-compliant network interface, and then reboot the domain.
Exporting loopback (lofi) devices as virtual disks is not supported by Logical Domains 1.0 software. Exporting loopback devices can result in unexpected behavior including a system panic. The virtual disk server should be configured to export a disk image file directly.
If you see an error message similar to the following during boot, your serial port will be unusable:
There will be no corresponding device in either the OpenBoot device tree or the Solaris device tree, and no serial device drivers will be attached.
Recovery: Reset or power cycle the system.
The Logical Domains Manager does not persist physical I/O constraints in its constraint database. As a result, if the Logical Domains Manager restarts, then logical domains in the inactive state have lost any previously specified physical I/O constraints.
Recovery: Re-add the constraint.
On a system configured to use the Network Information Services (NIS) or NIS+ name service, if the Solaris Security Toolkit is applied with the server-secure.driver, NIS or NIS+ fails to contact external servers. A symptom of this problem is that the ypwhich(1) command, which returns the name of the NIS or NIS+ server or map master, fails with a message similar to the following:
This is true whether the Solaris Security Toolkit is applied indirectly through the ldm-install script menu options or applied directly using this command:
The recommended Solaris Security Toolkit driver to use with the Logical Domains Manager is ldm_control-secure.driver, and NIS and NIS+ work with this recommended driver.
As an alternative, NIS and NIS+ do work with the server-secure.driver by following the steps at the end of this file:
Here are those steps quoted from the end of the file:
"If you are using NIS as your name service, you will need to allow name resolution to pass through your firewall. This is not possible with only ipf.conf, since NIS is an rpc service without a fixed port. Instead, use the proxy in ipnat to redirect rpc traffic, with rules like
map eri0 0/0 -> 220.127.116.11/32 proxy port 111 rpcbu/udp
in file /etc/ipf/ipnat.conf (replace "eri0" with your network adapter instance and "18.104.22.168" with the adapter's IP address, both from "ifconfig -a")."
If the server-secure.driver is used on a system configured to use NIS or NIS+, you might fail to log in because either there are no local user accounts or the superuser account requires a password change which cannot be made because NIS or NIS+ fails. If this occurs, you must reset your system, and you do lose your logical domains configuration.
1. Log in to the system console from the system controller, and if necessary, switch to the ALOM mode by typing:
2. Power off the system by typing the following command in ALOM mode:
3. Power on the system.
4. Switch to the console mode at the ok prompt:
5. Boot the system to single user mode:
6. Edit the file /etc/shadow, and change the first line of the shadow file that has the root entry to:
7. You can now log in to the system and do one of the following:
The ldm stop-domain -f command is disabled if the domain has any PCI-Express I/O buses bound to it. This is currently the case with all the platforms supported by Logical Domains 1.0 software. In this case, an error message of the following form is returned:
Due to this restriction, if an I/O domain is unresponsive to console or network input, and is unable to process a domain service shutdown request from the Logical Domains Manager, then there is no way to perform an isolated stop of that domain.
Recovery: Shut down all the other domains, and power cycle the server.
If the time or date on a logical domain is modified, for example using the ntpdate command, the change persists across reboots of the domain but not across a power cycle of the host.
Workaround: For time changes to persist, save the configuration with the time change to the SC and boot from that configuration.
The ldm list-constraints -x ldom command fails to include physical I/O information in its XML output.
Workaround: Currently, the only way to add a physical I/O device is by using the ldm add-io or ldm set-io commands.
When using the ldm list-constraints command with the -x option, all virtual network (vnet) or virtual switch (vsw) devices that had their MAC addresses manually specified will have those addresses incorrectly formatted in the XML output. Any subsequent attempt to create devices using this same XML syntax will fail to create the specified vnet or vsw devices.
Recovery: Verify that the MAC address in the XML file is in colon format (xx:xx:xx:xx:xx:xx) format and not in hexadecimal format (0xxxxxxxxxxxxx). Edit the XML file as necessary to correct the MAC address.
The following system controller (SC) command only works when running on the factory-default system configuration:
Where script is an OpenBoot command string that is run during OpenBoot firmware initialization.
When using a system configuration created by the Logical Domains Manager, the above SC command has no effect.
The following table shows the expected behavior for the OpenBoot power-off command with Logical Domains 1.0 software.
When the system shows you the following error message, it means that either your network is not set up correctly on that system, or the Logical Domains Manager daemon (ldmd) tried to open up a socket prior to your network fully running:
Recovery: Check to ensure that your network is up and running. If so, stop and restart ldmd by using the following Solaris 10 OS Service Management Facility (SMF) commands:
When attempting to bind a new domain or reconfigure an existing bound or active domain, you may encounter a failure that manifests itself with the error:
This can occur if the machine description (MD) the Logical Domains Manager builds describing the current configuration of the system turns out to be too large for the memory space allocated. When this occurs, the Logical Domains Manager terminates rather than sending an MD to the hypervisor that cannot be instantiated. Also, the Logical Domains Manager log will contain a message of this form:
Recovery: Scale back on either the number of virtual I/O devices or the number of domains configured in the system.
Do not attempt to issue an ldm set-vnet command to modify the MAC address of the virtual network (vnet) device on an active logical domain. The command appears to succeed, but the change is not fully enacted until the domain reboots, and until then, networking over that device could fail to work.
Workaround: Stop the domain, perform an ldm set-vnet command, and start the domain.
If a disk device listed in a guest domain's configuration is being used by software other than the Logical Domains Manager (for example, if it is mounted in the service domain), the disk cannot be used by the virtual disk server (vds), but the Logical Domains Manager does not emit a warning that it is in use when the domain is bound or started.
When the guest domain tries to boot, a message similar to the following is printed on the guest's console:
Recovery: Unbind the guest domain, and unmount the disk device to make it available. Then bind the guest domain, and boot the domain.
During operations in a split-PCI configuration, if a bus is unassigned to a domain or is assigned to a domain but not running the Solaris OS, any error in that bus or any other bus might not get logged. Consider the following example:
In a split-PCI configuration the primary domain contains Bus B, and Bus A is not assigned to any domain. In this case, any error that occurs on Bus B might not be logged. (The situation occurs only during a short time period.) The problem resolves when the unassigned Bus A is assigned to a domain and is running the Solaris OS, but by then some error messages may be lost.
Workaround: When using a split-PCI configuration, quickly verify that all buses are assigned to domains and running the Solaris OS.
When using the force (-f) option to the ldm stop-domain command, the domain could panic with one of the following two signatures the next time the domain is started:
Recovery: The condition that causes the panic is not persistent. After the panic, the logical domain can be restarted normally.
Workaround: Whenever possible, do not use the force (-f) option to the ldm stop-domain command. If it is absolutely necessary to use the force option, unbind the logical domain before restarting it.
The ldm add-vdisk command issued while an logical domain is in the bound state does not take effect, even though the ldm command appears to succeed, and the virtual disk (vdisk) appears in a subsequent ldm list-domain command.
Recovery: To make the added virtual disk visible to the system, issue the ldm unbind-domain ldom command followed by the ldm bind-domain ldom command, where ldom is the domain to which the vdisk was added
Workaround: Issue the ldm add-vdisk command only to a domain that is in the inactive state.
Running the OpenBoot command watch-net-all on any domain that contains a virtual network (vnet) can result in one of the following errors:
Recovery: Reset the domain before continuing. If the domain contains physical I/O devices, power-cycle the domain before continuing.
Workaround: Avoid using the command watch-net-all on domains that contain virtual networks.
During wanboot or waninstall, the time it takes to download the miniroot can increase significantly when booting from a virtual network (vnet) device. Early tests showed miniroot download to be 5 to 6 times slower on a guest domain.
The following message appears at the ok prompt if an attempt is made to boot a guest domain that contains Emulex-based fibre channel host adapters (Sun Part # 375-3397):
These adapters are not supported in a split-PCI configuration on Sun Fire T1000 servers.
A logical domain that is not the primary domain and has no virtual I/O devices might fail to respond to configuration change messages from the Logical Domains Manager. A domain in this state also might not support such services as dynamic reconfiguration (DR) on a CPU or domain shutdown requests. This situation, in most cases, is encountered only by an I/O domain, because only a domain with physical I/O devices (that is, an IO domain) is likely to be configured with no virtual I/O devices.
A domain in a bound or running state encounters this situation if the domain is subsequently unbound and rebound, and there was an intervening Logical Domains Manager restart. The following conditions induce an Logical Domains Manager restart:
One way to tell if the domain is in this degraded state is by determining if the Logical Domains Manager has assigned logical domain channel (LDC) 0 to the domain's console. You can obtain this information by issuing the following command:
If the output shows that the console has been allocated LDC 0 (as shown in the preceding example), the situation has been triggered.
Recovery: Once in this state, the only recovery is to destroy and then recreate the domain. When recreating the domain, also be sure to apply the workaround described below to prevent any future recurrence.
Workaround: Always make sure each domain includes at least one virtual I/O client before it is first bound. If you have no need for a virtual I/O client, you can create a fictitious disk device as shown in the following procedure.
1. Use the Logical Domains Manager to create a virtual disk service on one of your logical domains.
2. On the domain hosting the virtual disk server, create a disk file to export.
3. Export the file as a virtual disk.
4. Create the fictitious virtual disk device on the domain lacking any virtual I/O client.
If the Logical Domains Manager database is deleted or otherwise lost, and the configuration includes virtual consoles that are bound to TCP ports outside the default range of 5000-5100, the Logical Domains Manager refuses to start.
Recovery: If the Logical Domains Manager is still running when the database is lost, perform any reconfiguration operation (for example, create a fictitious LDom variable) to cause a new database file to be created. If the Logical Domains Manager is not running, you must revert to the factory-default configuration and re-create the previous operating configuration to re-synchronize the Logical Domains Manager database with the configuration.
Workaround: To prevent this problem from occurring on the loss of the Logical Domains Manager database, follow both of these restrictions:
1. Do not configure any virtual console concentrator (vcc) device to allocate TCP ports outside the 5000-5100 range.
2. Do not configure more than one virtual console concentrator service per logical domain.
If you upgrade the Solaris OS image to a later version on any of the logical domains of an Logical Domains 1.0 system, problems with virtual networking might result. For example, if you execute an ldm add-config command with active domains after already booting with the active domains configured, the next time the system is restarted, those active guest domains will not have network support.
Recovery: Return to the last good logical domain configuration on the SC, unbind all domains, and rebind them. Then, use the ldm add-config command to set the new configuration.
Workaround: None. However, always check for and install the latest Logical Domains Manager patches before upgrading the Solaris OS on a domain.
When a guest domain is configured to a virtual disk that is backed by a non-existent storage device, the domain can hang either during reconfiguration boot or running the devfsadm(1M) command. The error is encountered because the virtual disk driver fails to detach properly following a attach failure.
Workaround: Add more than one CPU to the domain
Recovery: Unconfigure or replace the non-existent disk device with a valid disk device and reboot the domain.
In some cases, when a file is used as a virtual disk, the label of that virtual disk can be lost when rebinding a domain (ldm bind-domain command) using that file (or a copy of that file) as a virtual disk.
Workaround: To prevent this problem, run the file checksum (fcksum) script on any file that has been used as a virtual disk and for which the disk label or disk partitioning has been changed. The fcksum script follows:
Note - It is easier to cut and paste this script from an HTML file than a PDF file. Both formats of these release notes are available at the web site specified in Location of Documentation.
The fcksum script checks to see whether the label will be correctly validated during the next ldm bind-domain command, and if not, the script will change the label and its checksum so that it can be correctly validated.
Run the script right after the domain using the virtual disk is unbound (ldm unbind-domain command) for the first time.
For example, if the file-name file is used by a domain as a virtual disk, and if the Solaris system is being installed onto that virtual disk, then run the script after the first ldm unbind-domain command on that domain.
Run the fcksum script on any file that has been used as a virtual disk and for which the disk label or disk partitioning has been changed.
The script first backs up the existing label of the file in a file named label.file-name.day_time. Then one of the following occurs:
A system in which the virtual switch has been configured to use the bge network interface can trigger the watchdog time-out under heavy network load conditions. This often happens when the CPU count in guest domains running network intensive workloads is significantly larger than the number of CPUs in the service domain.
Even though watchdog time-outs do not cause a system to reset, the system does become progressively more non-responsive. A message similar to the following may also appear on the console:
If the watchdog message is seen, or you want to run network intensive loads in the guest domain, apply the following workaround. However, note that doing so may result in a slight degradation of network performance under certain loads.
Workaround: Set the following in the /etc/system file, and reboot the service domain.
Recovery: Apply the workaround, and power cycle the system.
When the primary domain is reconfigured from its factory-default settings, the domain's time might be incorrectly reset when the domain is rebooted into the new configuration. This change can cause various problems, including delayed or improper Fault Management Architecture (FMA) fault diagnosis.
Workaround: Before changing the factory configuration, explicitly set the date before the new configuration is stored on the SC. The following Solaris command, which sets the date to its current value, is sufficient to avoid this situation:
Sun Fire T1000 and T2000 servers with Intel PCI-Express network interfaces, and installed with early Solaris 10 OS releases, might have been configured to use the ipge Ethernet driver. This driver was a temporary support mechanism for these interfaces in the early releases of the Solaris 10 OS and has now been superseded by the Sun standard GLDv3-compliant e1000g driver. Additionally, in a LDoms environment the virtual switch requires that it be configured to use the GLDv3-compliant e1000g driver.
Systems freshly installed with the Solaris 10 11/06 OS at a minimum are automatically configured to exclusively use the e1000g driver. However, when these systems are upgraded from a earlier Solaris 10 OS release to use the Solaris 10 11/06 OS, you must manually convert these systems to use the Sun standard e1000g network driver.
Refer to SunSolve Doc ID 102502 for more information on enabling the systems to use e1000g drivers.
If you do not replace the ipge driver with the e1000g driver and configure the virtual switch to use the e1000g device, you will lose network connectivity to guest domains. A warning message similar to the following might appear on the console of the service domain.
Recovery: Update the system to use the e1000g driver. Configure the virtual switch to use the e1000g driver.
If SunVTS is started and stopped multiple times, it is possible that switching from the SC console to the host console, using the console SC command can result in either of the following messages being repeatedly emitted on the console:
Recovery: Reset the SC using the resetsc command.
This section of the release notes contains errors in the Logical Domains 1.0 documentation.
The Solaris 10 11/06 OS vntsd(1M) man page is missing information about the use of the double tilde (~~). A tilde (~) appearing as the first character of a line is an escape signal that directs the console to perform a special console command. When connected to the console using telnet from within another telnet session, use the tilde-tilde (~~) sequence to output a tilde to the domain's console.