C H A P T E R 1 |
These release notes contain changes for this release, supported platforms, a matrix of required software and patches, and other pertinent information about this release, including bugs that affect Logical domains 1.0.1 software.
The major changes for this release of Logical Domains 1.0.1 software are to provide support for:
Sun SPARC® Enterprise T5120 and T5220 Servers plus the Network Interface Unit (NIU)
Logical Domains (LDoms) Management Information Base (MIB) 1.0.1 software – Refer to the Logical Domains (LDoms) MIB 1.0.1 Administration Guide for more information.
Logical domains minimization – See “Minimizing Logical Domains” in the Logical Domains (LDoms) 1.0.1 Administration Guide for more information.
XML input and output enhancements for certain ldm commands and for the LDoms MIB.
Logical Domains (LDoms) Manager 1.0.1 software is supported on the following platforms:
The minimum hardware revision level of the Netra CP3060 Blade to support this LDoms 1.0.1 software release:
This hardware upgrade resulted from LDoms 1.0.1 software requirements described in Bug ID 6584875.
This section lists the required, recommended, and optional software for use with Logical Domains software.
Following is a matrix of required software for use with Logical Domains software.
Solaris Security Toolkit 4.2 software – This software can help you secure the Solaris OS in the control domain and other domains. Refer to the Solaris Security Toolkit 4.2 Administration Guide and Solaris Security Toolkit 4.2 Reference Manual for more information.
Logical Domains (LDoms) Management Information Base (MIB) 1.0.1 software – This software can help you enable third party applications to perform remote monitoring and a few control operations. Refer to the Logical Domains (LDoms) MIB 1.0.1 Administration Guide and Release Notes for more information.
Following are the required patches for Solaris 10 11/06 OS for use with Logical Domains software:
124921-02 at a minimum, which contains updates to the Logical Domains 1.0.1 drivers and utilities. Logical Domains networking will be broken without this patch.
125043-01 at a minimum, which contains updates to the console (qcn) drivers. This patch depends on kernel update (KU) 118833-36, so if this is not already updated on your system, you must install it also.
Following are the required system firmware patches at a minimum for use with Logical Domains software on supported servers:
The Logical Domains (LDoms) 1.0.1 Administration Guide and Logical Domains (LDoms) 1.0.1 Release Notes can be found at:
The Beginners Guide to LDoms: Understanding and Deploying Logical Domains can be found at the Sun BluePrints site.
In a logical domains environment, the virtual switch service running in a service domain can directly interact with GLDv3-compliant network adapters. Though non-GLDv3 compliant network adapters can be used in these systems, the virtual switch cannot interface with them directly. Refer to “Configuring Virtual Switch and Service Domain for NAT and Routing” in the Logical Domains (LDoms) 1.0.1 Administration Guide for information about how to use non-GLDv3 compliant network adapters.
The following adapters with their corresponding drivers are supported by the virtual switch on the Sun Fire and SPARC Enterprise T2000 servers:
The following adapters with their corresponding drivers are supported by the virtual switch on the Sun SPARC Enterprise T5120 and T5220 servers:
The following cards are not supported for this LDoms 1.0.1 software release:
The following bug IDs are filed to provide the support for the currently unsupported cards: 6552598, 6563713, 6589192, and 6598882.
Logical Domains software does not impose a memory size limitation when creating a domain. The memory size requirement is a characteristic of the guest operating system. Some Logical Domains functionality might not work if the amount of memory present is less than the recommended size. For recommended and minimum size memory requirements, refer to the installation guide for the operating system you are using. For the Solaris 10 11/06 OS, 512 megabyte is the recommended size of memory to install or upgrade, and 128 megabyte is the minimum size. The default size for a swap area is 512 megabyte. For the Solaris 10 11/06 OS, refer to "System Requirements and Recommendations" in the Solaris 10 11/06 Installation Guide: Planning for Installation and Upgrade.
The OpenBoot PROM has a minimum size restriction for a domain. Currently, that restriction is 12 megabyte. If you have a domain less than that size, the Logical Domains Manager will automatically boost the size of the domain to 12 megabyte. Refer to the release notes for your system firmware for information about memory size requirements.
This section details the software that is compatible with and can be used with the Logical Domains software in the control domain.
SunVTS 6.4 functionality is available in the control domain and guest domains on LDoms 1.0.1–enabled Sun SPARC Enterprise T5120 and T5220 Servers.
SunVTS 6.4 functionality is available in the control domain and guest domains on LDoms 1.0–enabled Sun Fire and SPARC Enterprise T1000 Servers and Sun Fire and SPARC Enterprise T2000 Servers.
Sun VTS 6.3 functionality is available for all hardware configured in the control domain on Sun Fire and SPARC Enterprise T1000 servers and Sun Fire and SPARC Enterprise T2000 servers with LDoms 1.0 software enabled. If you attempt to execute in a guest domain, SunVTS 6.3 software exits after printing a message.
SunVTS is Sun’s Validation Test Suite, which provides a comprehensive diagnostic tool that tests and validates Sun hardware by verifying the connectivity and proper functioning of most hardware controllers and devices on Sun servers. For more information about SunVTS, refer to the SunVTS User’s Guide for your version of SunVTS.
Sun Management Center 3.6 Version 6 Add-On Software can be used only on the control domain with the Logical Domains Manager software enabled. Sun Management Center is an open, extensible system monitoring and management solution that uses Java and a variant of the Simple Network Management Protocol (SNMP) to provide integrated and comprehensive enterprise-wide management of Sun products and their subsystem, component, and peripheral devices. Support for hardware monitoring within the Sun Management Center environment is achieved through the use of appropriate hardware server module add-on software, which presents hardware configuration and fault reporting information to the Sun Management Center management server and console. Refer to the Sun Management Center 3.6 Version 6 Add-On Software Release Notes: For Sun Fire, SunBlade, Netra, and SunUltra Systems for more information about using Sun Management Center 3.6 Version 6 on the supported servers.
Sun Management Center 3.6 Version 7 Add-on Software adds support for the Sun SPARC Enterprise T5120 and T5220 servers and contains bug fixes for previous releases. This software can be used on the control domain with the Logical Domains Manager 1.0.1 software enabled. Refer to the Sun Management Center 3.6 Version 7 Add-On Software Release Notes: For Sun Fire, SunBlade, Netra, and SunUltra Systems for more information about using Sun Management Center 3.6 Version 7 on the supported servers.
Sun Explorer 5.7 Data Collector can be used with the Logical Domains Manager 1.0.1 software enabled on the control domain. Sun Explorer is a diagnostic data collection tool. The tool comprises shell scripts and a few binary executables. Refer to the Sun Explorer User’s Guide for more information about using the Sun Explorer Data Collector.
Solaris Cluster software can be used only on an I/O domain, because it works only with the physical hardware, not the virtualized hardware. Refer to Sun Cluster documentation for more information about the Sun Cluster software.
The following system controller software interacts with the Logical Domains 1.0.1 software:
Sun Integrated Lights Out Manager (ILOM) 2.0 firmware is the system management firmware you can use to monitor, manage, and configure Sun UltraSPARC T2-based server platforms. ILOM is preinstalled on these platforms and can be used on the control domain on LDoms-supported servers with the Logical Domains Manager 1.0.1 software enabled. Refer to the Sun Integrated Lights Out Manager 2.0 User’s Guide for features and tasks that are common to Sun rackmounted servers or blade servers that support ILOM. Other user documents present ILOM features and tasks that are specific to the server platform you are using. You can find the ILOM platform-specific information within the documentation set that accompanies your system.
Advanced Lights Out Manager (ALOM) Chip Multithreading (CMT) Version 1.3 software can be used on the control domain on UltraSPARC® T1-based servers with the Logical Domains Manager 1.0.1 software enabled. Refer to “Using LDoms With ALOM CMT” in the Logical Domains (LDoms) 1.0.1 Administration Guide. The ALOM system controller enables you to remotely manage and administer your supported CMT servers. ALOM enables you to monitor and control your server either over a network or by using a dedicated serial port for connection to a terminal or terminal server. ALOM provides a command-line interface that you can use to remotely administer geographically distributed or physically inaccessible machines. For more information about using ALOM CMT Version 1.3 software, refer to the Advanced Lights Out Management (ALOM) CMT v1.3 Guide.
Netra Data Plane Software Suite 1.1 is a complete board software package solution. The software provides an optimized rapid development and runtime environment on top of multistrand partitioning firmware for Sun CMT platforms. The Logical Domains Manager contains some ldm subcommands (add-vdpcs, rm-vdpcs, add-vdpcc, and rm-vdpcc) for use with this software. Refer to the Netra Data Plane Software Suite 1.1 documentation for more information about this software.
This section contains general notes and issues concerning the Logical Domains 1.0.1 software.
Currently, there is a limit of 8 configurations for logical domains that can be saved on the system controller using the ldm add-config command, not including the factory-default configuration.
Currently, system firmware 6.4.x and 6.5.x do not support the following features on Netra T2000 Servers:
scadm(1M) command, which administers the system controller (SC)
sun4u-compatible Platform Information and Control Library (PICL)
sun4u-compatible prtdiag(1M) command, which displays system diagnostic information
The following bug IDs are still outstanding to add this support:
When you reboot the control domain when guest domains are running, you will encounter the following bugs:
Virtual Disk Service Should Support Unformatted Disks (Bug ID 6575050)
Guests Can Lose Access to Virtual Disk Services if I/O Domain Is Rebooted (Bug ID 6575216)
Guest Domain Can Lose Connection to the Virtual Switch When the Service Domain is Rebooted (Bug ID 6581720)
Virtual Disk Server Prints File Lookup Error During Service Domain Boot (Bug ID 6591399)
If you have made any configuration changes since last saving a configuration to the SC, before you attempt to power off or power cycle a Logical Domains system, make sure you save the latest configuration that you want to keep.
There is a limit to the number of LDCs available in any logical domain. In Logical Domains 1.0.1 software, that limit is 256. Practically speaking, this only becomes an issue on the control domain, because the control domain has at least part, if not all, of the I/O subsystem allocated to it, and because of the potentially large number of LDCs created for both virtual I/O data communications and the Logical Domains Manager control of the other logical domains.
If you try to add a service, or bind a domain, so that the number of LDC channels exceeds the 256 limit on the control domain, the operation fails with an error message similar to the following:
13 additional LDCs are required on guest primary to meet this request, but only 9 LDCs are available |
The following guidelines can help prevent creating a configuration that could overflow the LDC capabilities of the control domain:
The control domain allocates 12 LDCs for various communication purposes with the hypervisor, Fault Management Architecture (FMA), and the system controller (SC), independent of the number of other logical domains configured.
The control domain allocates one LDC to every logical domain, including itself, for control traffic.
Each virtual I/O service on the control domain consumes one LDC for every connected client of that service.
For example, consider a control domain and 8 additional logical domains. Each logical domain needs at a minimum:
Applying the above guidelines yields the following results (numbers in parentheses correspond to the preceding guideline number from which the value was derived):
12(1) + 9(2) + 8 x 3(3) = 45 LDCs in total
The Logical Domains Manager will accept this configuration.
Now consider the case where there are 32 domains instead of 8, and each domain includes 3 virtual disks, 3 virtual networks, and a virtual console. Now the equation becomes:
Under certain circumstances, the Logical Domains (LDoms) Manager rounds up the requested memory allocation to either the next largest 8-kilobyte or 4-megabyte multiple. This can be seen in the following example output of the ldm list-domain -l command, where the constraint value is smaller than the actual allocated size:
Memory: Constraints: 1965 M raddr paddr5 size 0x1000000 0x291000000 1968M |
Currently, there is an issue related to dynamic reconfiguration (DR) of virtual CPUs if a logical domain contains one or more cryptographic (mau) units:
Currently, Fault Management Architecture (FMA) diagnosis of I/O devices in a Logical Domains environment might not work correctly. The problems are:
Input/output (I/O) device faults diagnosed in a non-control domain are not logged on the control domain. These faults are only visible in the logical domain that owns the I/O device.
I/O device faults diagnosed in a non-control domain are not forwarded to the system controller. As a result, these faults are not logged on the SC and there are no fault actions on the SC, such as lighting of light-emitting diodes (LEDs) or updating the dynamic field-replaceable unit identifiers (DFRUIDs).
Errors associated with a root complex that is not owned by the control domain are not diagnosed properly. These errors can cause faults to be generated against the diagnosis engine (DE) itself.
LDom variables for a domain can be specified using any of the following methods:
Modifying, in a limited fashion, from the system controller (SC) using the bootmode command; that is, only certain variables, and only when in the factory-default configuration.
The goal is that, in all cases, variable updates made using any of these methods always persist across reboots of the domain, and always reflect in any subsequent logical domain configurations saved to the SC.
In Logical Domains 1.0.1 software, there are a few cases where variable updates do not persist:
When running in a factory-default configuration, variable updates specified through the Solaris OS eeprom(1M) command persist across a reboot of the primary domain into the same factory-default configuration, but do not persist into a configuration saved to the SC. Conversely, in this scenario, variable updates specified using the Logical Domains Manager do not persist across reboots, but are reflected in a configuration saved to the SC.
When running the factory-default configuration, if you want a variable update to persist across a reboot into the same factory-default configuration, use the eeprom command. If you want it saved as part of a new logical domains configuration saved to the SC, use the appropriate Logical Domains Manager command.
Once domaining has been enabled (that is, the machine is running in a configuration generated by the Logical Domains Manager, not the factory-default configuration), all methods of updating a variable (OpenBoot firmware, eeprom command, ldm subcommand) persist across reboots of that domain, but not across a power cycle of the system, unless a subsequent logical domain configuration is saved to the SC. In addition, in the control domain, updates made using OpenBoot firmware persist across a power cycle of the system; that is, even without subsequently saving a new logical domain configuration to the SC.
When reverting to the factory-default configuration from a configuration generated by the Logical Domains Manager, all LDoms variables start with their default values.
The following bug IDs have been filed to resolve these issues: 6520041, 6540368, and 6540937. See also Some Commands Read Old bootmode Settings (Bug ID 6585340).
If the Logical Domains Manager stops and then restarts during execution of any Logical Domains Manager ldm command, the program returns the following error message:
Receive failed: logical domain manager not responding |
Recovery: This message usually indicates that the command did not successfully complete. Verify that is the case, and then reissue the command if appropriate.
This section summarizes the bugs that you might encounter when using this version of the software. The bug descriptions are in numerical order by bug ID. If a recovery procedure and a workaround are available, they are specified.
When the Fault Management Architecture (FMA) places a CPU offline, it records that information so that when the machine is rebooted, the CPU remains offline. The offline designation persists in a non–Logical Domains environment.
However, in a Logical Domains environment, this persistence is not always maintained for CPUs in guest domains. The Logical Domains Manager does not currently record data on fault events sent to it. This means that a CPU in a guest domain that has been marked as faulty, or one that was not allocated to a logical domain at the time the fault event is replayed, can subsequently be allocated to another logical domain with the result that it is put back online.
The Solaris 10 OS virtual disk drivers (vdc and vds) currently do not support the CDIO(7I) ioctls that are needed to install guest domains from DVDs. Therefore, it is not possible at this time to install a guest domain from a DVD. However, a guest domain can access a CD/DVD to install applications. If the CD/DVD device is added to the guest domain, and the guest is booted from another virtual disk, the CD can be mounted in the guest domain after the boot operation.
Refer to “Operating the Solaris OS With Logical Domains” in Chapter 5 of the Logical Domains (LDoms) 1.0.1 Administration Guide for specific information.
The Solaris OS virtual disk drivers (vdc and vds) currently do not support multihost disk control operations (MHI(7I) ioctls).
If a disk device listed in a guest domain’s configuration is either non-existent, already opened by another process, or otherwise unusable, the disk cannot be used by the virtual disk server (vds) but the Logical Domains Manager does not emit any warning or error when the domain is bound or started.
When the guest tries to boot, messages similar to the following are printed on the guest’s console:
WARNING: /virtual-devices@100/channel-devices@200/disk@0: Timeout connecting to virtual disk server... retrying |
In addition, if a network interface specified using the net-dev= parameter does not exist or is otherwise unusable, the virtual switch is unable to communicate outside the physical machine, but the Logical Domains Manager does not emit any warning or error when the domain is bound or started.
In the case of an errant virtual disk service device or volume, perform the following steps:
Stop the domain owning the virtual disk bound to the errant device or volume.
Issue the ldm rm-vdsdev command to remove the errant virtual disk service device.
Issue the ldm add-vdsdev command to correct the physical path to the volume.
In the case of an errant net-dev= property specified for a virtual switch, perform the following steps:
If a disk device listed in a guest domain’s configuration is being used by software other than the Logical Domains Manager (for example, if it is mounted in the service domain), the disk cannot be used by the virtual disk server (vds), but the Logical Domains Manager does not emit a warning that it is in use when the domain is bound or started.
When the guest domain tries to boot, a message similar to the following is printed on the guest’s console:
WARNING: /virtual-devices@100/channel-devices@200/disk@0: Timeout connecting to virtual disk server... retrying |
Recovery: Unbind the guest domain, and unmount the disk device to make it available. Then bind the guest domain, and boot the domain.
Under heavy network loads, one CPU might show 100% utilization dealing with the network traffic.
Workaround: Attach several CPUs to the domain containing the virtual switch to ensure that the system remains responsive under a heavy load.
Under rare circumstances, when an ldom variable, such as boot-device, is being updated from within a guest domain by using the eeprom(1M) command at the same time that the Logical Domains Manager is being used to add or remove virtual CPUs from the same domain, the guest OS can hang.
Workaround: Ensure that these two operations are not performed simultaneously.
Recovery: Use the ldm stop-domain and ldm start-domain commands to stop and start the guest OS.
Under rare circumstances, if a guest domain is rebooted at a time when it is experiencing high interrupt activity, the OS might hang.
Recovery: Use the ldm stop-domain and ldm start-domain commands to stop and start the guest OS.
If too many guest domains are performing I/O to a control or I/O domain, and if that domain is in the middle of panicking, the interrupt request pool of 64 entries overflows and the system cannot save a crash dump. The panic message is as follows:
intr_req pool empty |
There are some cases where the behavior of the ldm stop-domain command is confusing.
If the Solaris OS is halted on the domain; for example, by using the halt(1M) command; and the domain is at the prompt "r)eboot, o)k prompt, h)alt?," the ldom stop-domain command fails with the following error message:
LDom <domain name> stop notification failed |
Workaround: Force a stop by using the ldm stop-domain command with the -f option.
# ldm stop-domain -f ldom |
If the domain is at the kernel module debugger, kmdb(1M) prompt, then the ldm stop-domain command fails with the following error message:
LDom <domain name> stop notification failed |
Recovery: If you restart the domain from the kmdb prompt, the stop notification is handled, and the domain does stop.
In a Logical Domains environment, there is no support for setting or deleting wide-area network (WAN) boot keys from within the Solaris OS using the ickey(1M) command. All ickey operations fail with the following error:
ickey: setkey: ioctl: I/O error |
In addition, WAN boot keys that are set using OpenBoot firmware in logical domains other than the control domain are not remembered across reboots of the domain. In these domains, the keys set from the OpenBoot firmware are only valid for a single use.
The Solaris 10 OS vntsd(1M) command does not validate the listen_addr property in the vntsd command’s Service Management Facility (SMF) manifest. If the listen_addr property is invalid, vntsd fails to bind the IP address and exits.
When a ZFS, SVM, or VxVM volume is exported as a virtual disk to another domain, then the other domain sees that virtual disk as a disk with a single slice (s0), and the disk cannot be partitioned. As a consequence, such a disk is not usable by the Solaris installer, and you cannot install Solaris on the disk.
For example, /dev/zvol/dsk/tank/zvol is a ZFS volume that is exported as a virtual disk from the primary domain to domain1 using these commands:
# ldm add-vdsdev /dev/zvol/dsk/tank/zvol disk_zvol@primary-vds0 # ldm add-vdisk vdisk0 disk_zvol@primary_vds0 domain1 |
The domain1 sees only one device for that disk (for example, c0d0s0), and there is no other slice for that disk; for example, no device c0d0s1, c0d0s2, c0d0s3....
Workaround: You can create a file and export that file as a virtual disk. This example creates a file on a ZFS system:
# mkfile 30g /tank/test/zfile # ldm add-vdsdev /tank/test/zfile disk_zfile@primary-vds0 # ldm add-vdisk vdisk0 disk_zfile@primary-vds0 domain1 |
When creating logical domains with virtual switches and virtual network devices, the Logical Domains Manager does not prevent you from creating these devices with the same given MAC address. This can become a problem if the logical domains with virtual switches and virtual networks that have conflicting MAC addresses are in a bound state simultaneously.
Workaround: Ensure that you do not bind logical domains whose vsw and vnet MAC addresses might conflict with another vsw or vnet MAC address.
Misleading error messages are returned from certain ldm subcommands that take two or more required arguments, if one or more of those required arguments is missing.
For example, if the add-vsw subcommand is missing the vswitch-name or ldom argument, you receive an error message similar to the following:
# ldm add-vsw net-dev=e1000g0 primary Illegal name for service: net-dev=e1000g0 |
For another example, if the add-vnet command is missing the vswitch-name of the virtual switch service with which to connect, you receive an error message similar to the following:
# ldm add-vnet mac-addr=08:00:20:ab:32:40 vnet1 ldg1 Illegal name for VNET interface: mac-addr=08:00:20:ab:32:40 |
As another example, if you fail to add a logical domain name at the end of an ldm add-vcc command, you receive an error message saying that the port-range= property must be specified.
Recovery: Refer to the Logical Domains (LDoms) Manager 1.0.1 Man Page Guide or the ldm man page for the required arguments of the ldm subcommands, and retry the commands with the correct arguments.
In a service domain, disks that are managed by Veritas Dynamic Multipathing (DMP) cannot be exported as virtual disks to other domains. If a disk that is managed by Veritas DMP is added to a virtual disk server (vds) and then added as a virtual disk to a guest domain, the domain is unable to access and use that virtual disk. In such a case, the service domain reports the following errors in the /var/adm/messages file after binding the guest domain:
vd_setup_vd(): ldi_open_by_name(/dev/dsk/c4t12d0s2) = errno 16 vds_add_vd(): Failed to add vdisk ID 0 |
Recovery: If Veritas Volume Manager (VxVM) is installed on your system, disable Veritas DMP for the disks you want to use as virtual disks.
Due to problems with the Solaris Crypto Framework and its handling of CPU dynamic reconfiguration (DR) events that affect MAU cryptographic units, CPU DR is disabled for all logical domains that have any crypto units bound to it.
Workaround: To be able to use CPU DR on the control domain, all the crypto units must be removed from it while the system is running in the factory-default configuration, before saving a new configuration to the SC. To perform CPU DR on all other domains, stop the domain first so it is in the bound state.
When the Solaris OS reboot(1M) command is issued to reboot a guest OS, the following messages can appear on the guest console:
WARNING: promif_ldom_setprop: ds response timeout WARNING: unable to store boot command for use on reboot |
The reboot proceeds as usual, but all arguments passed to the OpenBoot PROM boot command; that is, arguments that appear after the -- delimiter of the Solaris OS reboot(1M) command, are ignored by the boot code. The same warnings can occur even if no arguments are passed to the reboot command, because the system always attempts to store a default boot command.
Recovery: Once this occurs, there is no recovery.
Workaround: To prevent it from happening on future boots, you can do one of the following:
The virtual disk server opens the physical disk exported as a virtual disk device at the time of the bind operation. In certain cases, a recovery operation on the physical disk following a disk failure may not be possible if the guest domain is bound.
For instance, when a RAID or a mirror Solaris Volume Manager (SVM) volume is used as a virtual disk by another domain, and if there is a failure on one of the components of the SVM volume, then the recovery of the SVM volume using the metareplace command or using a hot spare does not start. The metastat command shows the volume as resynchronizing, but there is no progress in the synchronization.
Similarly, when a Fibre Channel Arbitrated Loop (FC_AL) device is used as a virtual disk, you must use the Solaris OS luxadm(1M) command with a loop initialization primitive sequence (forcelip subcommand) to reinitialize the physical disk after unbinding the guest.
Note - Recovery mechanisms may fail in a similar manner for other devices, if the mechanism requires that the device being recovered is not actively in use. |
Recovery: To complete the recovery or SVM resynchronization, stop and unbind the domain using the SVM volume as a virtual disk. Then resynchronize the SVM volume using the metasync command.
If Solaris™ Cluster software is in use with Logical Domains software, and the cluster is shut down, the console of each logical domain in the cluster displays the following prompt:
r)eboot, o)k prompt, h)alt? |
If the ok prompt (o option) is selected, the system can panic.
Select halt (h option) at the prompt on the logical domain console to avoid the panic.
To force the logical domain to stop at the ok prompt, even if the OpenBoot auto-boot? variable is set to true, follow one of the two following procedures.
Issue the following ALOM command to reset the domain:
sc> poweron |
The OpenBoot banner is displayed on the console:
Sun Fire T200, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.26.0, 4096 MB memory available, Serial #68100096. Ethernet address 0:14:4f:f:20:0, Host ID: 840f2000. |
Issue the following ALOM command to send a break to the domain immediately after the OpenBoot banner displays:
sc> break -y |
Issue the following command from the control domain to disable the auto-boot? variable for the logical domain:
# ldm set-var auto-boot?=false domain-name |
Issue the following command from the control domain to reset the logical domain:
# ldm start-domain domain-name |
Issue the following OpenBoot command to restore the value of the auto-boot? variable:
ok setenv auto-boot? true |
If a guest domain is running the Solaris 10 OS and using a virtual disk built from a ZFS volume provided by a service domain running the Solaris Express or OpenSolaris™ programs, then the guest domain might not be able to access that virtual disk.
The same problem can occur with a guest domain running the Solaris Express or OpenSolaris programs using a virtual disk built from a ZFS volume provided by a service domain running Solaris 10 OS.
Workaround: Ensure that the guest domain and the service domain are running the same version of Solaris software (Solaris 10 OS, Solaris Express, or OpenSolaris).
When plumbing the virtual switch device, you must explicitly set the virtual switch’s MAC address to that of the underlying physical device, rather than allowing the Logical Domains Manager to automatically generate the MAC address, so that your networking will function correctly.
The MAC address of the physical device can be found using the following command; for example:
# ifconfig e1000g0 e1000g0: flags=201104843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,ROUTER,IPv4,CoS> mtu 1500 index 2 inet 10.6.90.74 netmask fffffe00 broadcast 10.6.91.255 ether 0:3:ba:d8:d4:6e |
Then you can set the virtual switch to use that MAC address by specifying the ether value from the output as the mac-addr=<num> when executing the ldm add-vsw command.
When a memory page of a guest domain is diagnosed as faulty, the Logical Domains Manager retires the page in the logical domain. If the logical domain is stopped and restarted again, the page is no longer in a retired state.
The fmadm faulty -a command shows whether the page from either the control or guest domain is faulty, but the page is not actually retired. This means the faulty page can continue to generate memory errors.
Workaround: Use the following command in the control domain to restart the Fault Manager daemon, fmd(1M) :
primary# svcadm restart fmd |
Currently, the virtual switch (vsw) does not support the use of aggregated network interfaces. If a virtual switch instance is told to use an aggregated device (aggr15 in this example), then a warning message similar to the following appears on the console during boot:
WARNING: mac_open aggr15 failed |
Recovery: Configure the virtual switch to use a supported GLDv3-compliant network interface, and then reboot the domain.
If you reset the system controller while the host is powered on, subsequent error reports and faults are not delivered to the host.
On a system configured to use the Network Information Services (NIS) or NIS+ name service, if the Solaris Security Toolkit software is applied with the server-secure.driver, NIS or NIS+ fails to contact external servers. A symptom of this problem is that the ypwhich(1) command, which returns the name of the NIS or NIS+ server or map master, fails with a message similar to the following:
Domain atlas some.atlas.name.com not bound on nis-server-1. |
This is true whether the Solaris Security Toolkit software is applied indirectly through the ldm-install script menu options or applied directly using this command:
# /opt/SUNWjass/bin/jass-execute -d server-secure.driver |
The recommended Solaris Security Toolkit driver to use with the Logical Domains Manager is ldm_control-secure.driver, and NIS and NIS+ work with this recommended driver.
If you are using NIS as your name server, you cannot use the Solaris Security Toolkit profile server-secure.driver, because you may encounter Solaris OS Bug ID 6557663, IP Filter causes panic when using ipnat.conf. However, the default Solaris Security Toolkit driver, ldm_control-secure.driver, is compatible with NIS.
Log in to the system console from the system controller, and if necessary, switch to the ALOM mode by typing:
# #. |
Power off the system by typing the following command in ALOM mode:
sc> poweroff |
sc> poweron |
Switch to the console mode at the ok prompt:
sc> console |
Boot the system to single user mode:
ok boot -s |
Edit the file /etc/shadow, and change the first line of the shadow file that has the root entry to:
You can now log in to the system and do one of the following:
The virtual networking infrastructure adds additional overhead to communications from a logical domain. All packets are sent through a virtual network device, which, in turn, passes the packets to the virtual switch. The virtual switch then sends the packets out through the physical device. The lower performance is seen due to the inherent overheads of the stack.
Workarounds: Do one of the following depending on your server:
On Sun UltraSPARC T1-based servers, such as the Sun Fire T1000 and T2000 servers, assign a physical network card to the logical domain using a split-PCI configuration. For more information, refer to “Configuring Split PCI Express Bus to Use Multiple Logical Domains” in the Logical Domains (LDoms) 1.0.1 Administration Guide.
On Sun Ultra SPARC T2-based servers, such as the Sun SPARC Enterprise T5120 and T5220 servers, assign a Network Interface Unit (NIU) to the logical domain.
If the time or date on a logical domain is modified, for example using the ntpdate command, the change persists across reboots of the domain but not across a power cycle of the host.
Workaround: For time changes to persist, save the configuration with the time change to the SC and boot from that configuration.
During operations in a split-PCI configuration, if a bus is unassigned to a domain or is assigned to a domain but not running the Solaris OS, any error in that bus or any other bus may not get logged. Consider the following example:
In a split-PCI configuration, the primary domain contains Bus B, and Bus A is not assigned to any domain. In this case, any error that occurs on Bus B might not be logged. (The situation occurs only during a short time period.) The problem resolves when the unassigned Bus A is assigned to a domain and is running the Solaris OS, but by then some error messages might be lost.
Workaround: When using a split-PCI configuration, quickly verify that all buses are assigned to domains and are running the Solaris OS.
The intrastat(1M) command does not show the statistics corresponding to the interrupts of the virtual devices.
During boot or installation over a wide-area network (WAN), the time it takes to download the miniroot can increase significantly when using a virtual network (vnet) device. Early tests showed miniroot download to be 5 to 6 times slower than similar boots or installations over physical network devices.
This performance degradation appears only when trying to boot or install over a WAN using a virtual network device. Similar boots or installations using a physical network device works as expected, as does a traditional local area network (LAN) boot or installation from a virtual network device.
The following message appears at the ok prompt if an attempt is made to boot a guest domain that contains Emulex-based Fibre Channel host adapters (Sun Part Number 375-3397):
ok> FATAL:system is not bootable, boot command is disabled |
These adapters are not supported in a split-PCI configuration on Sun Fire T1000 servers.
When a guest domain is configured to a virtual disk that is backed by a nonexistent storage device, the domain can hang either during reconfiguration boot or running the devfsadm(1M) command. The error is encountered because the virtual disk driver fails to detach properly following a attach failure.
Workaround: Add more than one CPU to the domain.
Recovery: Unconfigure or replace the nonexistent disk device with a valid disk device and reboot the domain.
A system in which the virtual switch has been configured to use the bge network interface can trigger the watchdog timeout under heavy network load conditions. This often happens when the CPU count in guest domains running network intensive workloads is significantly larger than the number of CPUs in the service domain.
Even though watchdog timeouts do not cause a system to reset, the system does become progressively more nonresponsive. A message similar to the following might also appear on the console:
APR 19 17:05:47 ERROR: Watchdog timeout ignored because user is running on a Logical Domains Configuration |
If the watchdog message is displayed, or if you want to run network intensive loads in the guest domain, apply the following workaround. However, note that doing so might result in a slight degradation of network performance under certain loads.
Workaround: Set the following in the /etc/system file, and reboot the service domain.
set vsw_chain_len=20 |
Normally, when the verbose (-v) option is specified to the prtdiag(1M) command in the control domain, additional environmental status information is displayed. If the output of this information is interrupted by issuing a Control-C, the PICL daemon, picld(1M), can enter a state which prevents it from supplying the environmental status information to the prtdiag command from that point on, and the additional environmental data is no longer displayed.
Workaround: Restart the picld(1M) SMF service in the control domain using the following command:
# svcadm restart picl |
When a Fibre Channel Arbitrated Loop (FC_AL) disk is exported as a virtual disk to another domain, then some luxadm(1M) commands, such as luxadm display, can fail.
An example of the failure of the luxadm display command is:
# luxadm display /dev/rdsk/c1t44d0s2 /dev/rdsk/c1t44d0s2 Error: SCSI failure. - /dev/rdsk/c1t44d0s2. |
Workaround: To successfully issue a luxadm(1M) command on a disk exported as a virtual disk to another domain, you must first stop and unbind this other domain.
Do not specify a virtual switch (vsw) interface as the network device for a virtual switch configuration. That is, do not specify a virtual switch interface as the net-dev property for the ldm add-vswitch or ldm set-vswitch commands.
Occasionally, while doing a network installation on a guest domain, the installation begins normally and then hangs after printing the following message on the console:
NFS server <servername> not responding still trying |
The guest domain then stops sending and receiving network traffic.
Workaround: Stop the guest domain, restart the guest domain, and restart the network installation.
Occasionally during Solaris OS boot, a console message from the Domain Services (ds) module reports that reading or writing from a logical domain channel was unsuccessful. The reason code (131) indicates that the channel has been reset. Below are examples of the console message:
NOTICE: ds@1: ldc_read returned 131 WARNING: ds@0: send_msg: ldc_write failed (131) |
Recovery: None. These console messages do not affect the normal operation of the system and can be ignored.
The prtpicl(1M) and prtdiag(1M) utilities do not work in a guest domain. Each utility produces the following error message, and neither utility displays any other information:
picl_initialize failed: Daemon not responding |
In these situations, the PICL daemon, picld(1M), is in a hung state.
In a guest domain, if a virtual disk is unreachable because the service domain is down, then any I/O operations to that disk are blocked until the service domain is up and running. As a consequence, any application performing an I/O operation to an unreachable disk is blocked while the service domain is down and no I/O error is ever reported to the application.
Rarely, when rebooting the control domain of an LDoms system, the operation can hang, requiring a power cycle.
To clear the hang condition, use the powercycle command of the system controller or service processor.
Restart all guest domains that were running at the time of the hang.
Restart all applications that were running in the guest domains.
Recover databases if I/O operations that were in progress in the guest domains did not complete.
Perform any other necessary application-specific recovery operations.
After reverting to a logical domain configuration previously saved using the ldm add-config command, the Logical Domains Manager might crash with the following error message:
"0L != clientp->published_name". |
Workaround: When creating virtual I/O clients and services do not use the canonical names which the Logical Domains Manager applies when there is no match in the constraints database. These names are:
Device | Canonical Name Format |
---|---|
vdisk | vdiskNN |
vnet | vnetNN |
vsw | ldom-name-vswNN |
vcc | ldom-name-vccNN |
vds | dom-name-vdsNN |
vdsdev | ldom-name-vdsNN-volVV |
NN and VV refer to monotonically increasing instance numbers.
A physical disk that is unformatted or that does not have a valid disk label, either a Volume Table of Contents (VTOC) or an Extensible Firmware Interface (EFI) label, cannot be exported as a virtual disk to another domain.
Trying to export such a disk as a virtual disk fails when you attempt to bind the domain to which the disk is exported. A message similar to this one is issued and stored in the messages file of the service domain exporting the disk:
vd_setup_vd(): vd_read_vtoc returned errno 22 for /dev/dsk/c1t44d0s2 vds_add_vd(): Failed to add vdisk ID 1 |
To export a physical disk that is unformatted or that does not have a valid disk label, use the format(1M) command first in the service domain to write a valid disk label (VTOC or EFI) onto the disk to be exported.
When a service domain is rebooted, guest domains might lose access to virtual disks exported from that service domain. When this happens, the guest domain displays a message similar to the following:
NOTICE: [0] disk access failed |
Recovery: To recover from this failure, stop the guest domain (ldm stop-domain) and restart it (ldm start-domain).
Workaround: On the I/O service domain, add the following lines to the /etc/system file:
set vds:vds_dev_delay = 60000000 set vds:vds_dev_retries = 10 |
After updating the /etc/system file, reboot the I/O service domain for the new settings to be effective.
In a guest domain, virtual disks created from a file do not have a device ID (or devid). If such disks are used to stored Solaris Volume Manager metadevice state database (metadb) information, then Solaris Volume Manager issue a message similar to the following during the system boot:
NOTICE: mddb: unable to get devid for ’vdc’, 0xf |
Under certain conditions, after a service domain is rebooted while a guest domain is running, the virtual network (vnet) device on the guest fails to establish a connection with the virtual switch on the service domain. As a result, the guest domain cannot send and receive network packets.
Workarounds: Use one of the following workarounds on the domain with the virtual network:
Unplumb and replumb the vnet interface. You can do this if the domain with vnet cannot be rebooted. For example:
# ifconfig vnet0 down # ifconfig vnet0 unplumb # ifconfig vnet0 plumb # ifconfig vnet0 ip netmask mask broadcast + up |
Add the following lines to the /etc/system file on the domain with vnet and reboot the domain:
set vnet:vgen_hwd_interval = 5000 set vnet:vgen_max_hretries = 6 |
Users can change ldom variables in the control domain in one of three ways:
Using the OpenBoot firmware setenv command in the control domain
Using the Solaris OS eeprom(1M) command in the control domain
Changes made with the setenv and eeprom commands take effect immediately; changes made with the bootmode command are supposed to take effect on the next reset, no matter what kind of reset it is.
Changes made in any of these three ways are supposed to stay in effect until the next change, also made in any of these three ways. That is, it does not matter how the value of an ldom variable is changed; once changed, the value is supposed to stay in effect until it is changed again.
However, because some commands, such as uadmin 2 0 and reboot, read old bootmode settings, changes made using the bootmode command become effective only after a power-on reset and override any intervening change made using the setenv or eeprom commands on every reset (other than a power-on reset) that follows. That is, the changes made by the bootmode command require a power-on reset to be effective and changes made using the setenv or eeprom commands only persist until the next reset, at which point the variable reverts to the value set by the last bootmode command. This stickiness of the bootmode setting persists until the machine is power cycled. Upon power cycling, the prior bootmode setting does not take effect and any subsequent change using the setenv or eeprom command now persists over resets, at least until the next bootmode command followed by a power cycle.
See also Logical Domain Variable Persistence.
Workarounds: Use one of the following:
Restart the control domain with a power-on reset right after the bootmode command is executed, and restart again after the control domain boots to either the OpenBoot prompt or the Solaris OS. The first power-on reset makes the bootmode command effective, and the second power-on reset resolves the stickiness issue.
Reset the control domain using power-on reset by using the SC powercycle command. If the control domain is booted to the Solaris OS, then remember to shut it down before executing the SC powercycle command.
There are two similar prompts that can appear on the console as part of shutting down the system in certain ways:
You access the telnet prompt, type send brk, and receive the following monitor prompt:
c)ontinue, s)ync, r)eboot, h)alt? |
You enter the halt command at the shell running on the console and receive the following monitor prompt:
r)eboot, o)k prompt, h)alt? |
If you select s for sync or select o for the ok prompt, you can see the following error messages on the console:
WARNING: promif_ldom_setprop: ds response timeout WARNING: unable to store boot command for use on reboot |
Additionally, because of the failure underlying these error messages, unexpected behavior can occur on the next boot, depending on what you selected:
If the auto-boot? logical domain variable has the value false, correct behavior for the s selection is to override auto-boot? for the next boot only, and boot immediately into the OS. Instead, the system stops at the ok prompt after the system is reset.
If ou selected o for the ok prompt:
If the auto-boot? logical domain variable has the value false, correct behavior for the o selection is to override auto-boot? for the next boot only, and stop at the ok prompt. Instead, the system immediately boots the OS.
Workaround: You cannot suppress the error messages when using the above monitor prompts. To achieve reset behavior which differs from the current auto-boot? setting, change the setting to the desired new behavior, reboot, then reset auto-boot? to the previous value.
Logical domain configurations might not be saved if the system controller runs out of storage space. There is no error message when this happens.
The current behavior for the port number argument to the ldm set-vcons command, as well as the port range arguments to the ldm {add,set}-vcc commands, is to ignore anything starting with a non-numeric value. For example, if the value 0.051 is passed in as the port number for a virtual console, rather than returning an error, the value is interpreted as 0, which tells the Logical Domains Manager to use automatic port allocation.
Workaround: Do not use non-numeric values in port numbers for any ldm commands.
When a service domain is rebooted while some guest domains are bound, you can see messages similar to these from the virtual disk server:
vd_setup_file(): Cannot lookup file (/export/disk_image_s10u4_b12.1) errno=2 vd_setup_vd(): Cannot use device/file (/export/disk_image_S10u4_b12.1) errno=2 |
These message indicate that the specified file or device is to be exported to a guest domain, but that this file or device is not ready to be exported yet.
Workaround: These messages are usually harmless and should stop once the service domain has completed its boot sequence. If similar message are printed after the service domain is fully booted, you may want to check whether the specified file or device is accessible from the service domain.
If a CPU or memory fault occurs, it is possible that the affected domain will panic and reboot. If the Fault Management Architecture (FMA) attempts to retire the faulted component while the domain is rebooting, the Logical Domains Manager is not able to communicate with the domain, and the retire operation fails. In this situation, the fmadm faulty command lists the resource as degraded.
Recovery: Once the domain has completed rebooting, force FMA to replay the fault event by restarting fmd(1M) on the control domain using this command:
primary# svcadm restart fmd |
It is possible to erroneously add duplicate I/O constraints when configuring a logical domain.
The ldm set-variable command allows you to set an LDom variable to any arbitrary string. However, many LDom variables have only a small set of valid values. For example, boolean variables like auto-boot? and diag-switch? only accept the values true or false. If an LDom variable is set to a value that is not valid, the OpenBoot firmware issues a warning message during boot with a list of correct values, but without giving the name of the variable in question. For example:
Options: true More [<space>,<cr>,q,n,p,c] ? |
The preceding warning is sent by OpenBoot firmware if the auto-boot? variable is set to a NULL string. The boot stops at this point waiting for input. If you enters a space or a carriage return, the complete error message is displayed and the boot process continues:
Options: true false |
As a common example, you can receive this error if you omit the = sign when using the ldm set-variable command:
# ldm set-variable auto-boot? true guest_domain |
The preceding command actually results in two NULL LDoms variables:
auto-boot?= true= |
As discussed previously, auto-boot? is a boolean variable and setting it to NULL results in an OpenBoot warning during boot. The proper format for the above command is:
# ldm set-variable auto-boot?=true guest_domain |
If the service processor is reset while the control domain is at the ok prompt, then OpenBoot firmware permanently loses its ability to store non-volatile LDom variables or security keys until the host has been reset. Guest domains are not affected by this problem. Attempts to update LDom variables or security keys result in the following warning messages:
{0} ok setenv auto-boot? false WARNING: Unable to update LDOM Variable |
{0} ok set-security-key wanboot-key 545465 WARNING: Unable to store Security key |
Recovery: Reset the control domain using the reset-all OpenBoot command.
{0} ok reset-all |
When a logical domain with automatic port selection for the console is bound, the Logical Domains Manager assigns a port for the console. If the Logical Domains Manager is restarted with the logical domain in the bound state, the Logical Domains Manager attempts to re-assign the same port for the logical domain’s console, and the bind can fail if another console is already using the port.
Recovery: You can manually revert to automatic port selection for the console by running the following command before attempting a re-bind:
# ldm set-vcons port= ldom |
During a single delayed reconfiguration operation, do not attempt to add CPUs to a domain if any were previously removed during the same delayed reconfiguration. Either cancel the existing delayed reconfiguration first, if possible, or commit it by rebooting the target domain, and then add the CPU.
Failure to heed this restriction can, under certain circumstances, lead to the hypervisor returning a parse error to the Logical Domains Manager, resulting in the Logical Domains Manager stopping. Additionally, if any virtual I/O devices had been removed during the same delayed reconfiguration operation, when the Logical Domain Manager restarts, it incorrectly detects the need to perform a recovery operation; thus, creating a corrupt configuration, and leading to the hypervisor stopping and the server powering down.
When the verbose (-v) option is specified to the prtdiag(1M) command in the control domain, additional environmental status information is displayed. If the service processor (SP) is reset while the control domain is running, in some cases, the prtdiag command no longer displays the additional environmental data.
Workaround: Environmental status information can be obtained by using the service processor showenvironment command. Refer to the Integrated Lights Out Management 2.0 (ILOM 2.0) Supplement for Sun SPARC Enterprise T5120 and T5220 Servers for details.
Sometimes, issuing the following two commands within a few seconds of each other results in the Logical Domains Manager stopping and dumping core:
# ldm start-domain ldom # ldm ls -l -p ldom Receive failed: logical domain manager not responding |
Recovery: When this occurs, the Logical Domains Manager restarts and recovers automatically. However, the system can be in the state that triggers a stop and core dump for several seconds. Wait a short while, and then attempt the ldm ls -l -p command again.
If you configure more than four virtual networks (vnets) in a guest domain on the same network using the Dynamic Host Protocol (DHCP), the guest domain can eventually become unresponsive while running network traffic.
Recovery: Issue an ldm stop-domain ldom command followed by an ldm start-domain ldom command on the guest domain (ldom) in question.
The eeprom(1M) command cannot be used to reset EEPROM values to null in Logical Domains systems. The following example shows what happens if you attempt this:
primary# eeprom boot-file= eeprom: OPROMSETOPT: Invalid argument boot-file: invalid property. |
The same command works correctly on non–Logical Domains systems as shown in this example:
# eeprom boot-file= # eeprom boot-file boot-file: data not available. |
The following LDoms issues apply only if you have Solaris 10 11/06 OS running on your system.
Once the virtual switch driver (vswitch) has attached, either as part of the normal Solaris OS boot sequence, or as a result of an explicit Solaris OS add_drv(1M) command, removing or updating the driver can cause networking to fail.
Workaround: Once vswitch has attached, do not remove the driver using the Solaris OS rem_drv(1M) command or update the driver using the Solaris OS update_drv(1M) command.
Recovery: If you do remove the driver using the rem_drv command and then attempt to reattach it using the add_drv command, you must reboot after the add_drv command completes to ensure the networking restarts correctly. Similarly, you must also reboot after an update_drv command completes to ensure the networking does not fail.
If you are running the Solaris 10 11/06 OS, and you harden drivers on the control (primary) domain that is configured with only one strand, rebooting the primary domain or restarting the Fault Manager daemon, fmd(1M), can result in the fmd dumping core. The fmd dumps cores while it cleans up it resources, and this does not affect the Fault Management Architecture (FMA) diagnosis.
Workaround: Add a few more strands into the primary domain. For example:
# ldm add-vcpu 3 primary |
The following LDoms bugs were fixed for the Solaris 10 8/07 OS:
6405380 LDoms vSwitch needs to be modified to support network interfaces
6418780 vswitch needs to be able to process updates to its MD node
6447559 vswitch should take advantage of multiple unicast address support
6474949 vSwitch panics if mac_open of the underlying network device fails
6492423 vSwitch multi-ring code hangs when queue thread not started
6492705 vsw warning messages should identify device instance number
6496374 vsw: "turnstile_block: unowned mutex" panic on a diskless-clients test bed
6523926 handshake restart can fail following reboot under certain conditions
6523891 vsw needs to update lane state correctly for RDX pkts
6556036 vswitch panics when trying to boot over vnet interface
6520626 Assertion panic in vdc following primary domain reboot
6527265 Hard hang in guest ldom on issuing the format command
6534269 vdc incorrectly allocs mem handle for synchronous DKIOCFLUSHWRITECACHE calls
6547651 fix for 6524333 badly impact performance when writing to a vdisk
6524333 Service domain panics if it fails to map pages for a disk on file
6530040 vds does not close underlying physical device or file properly
6495154 mdeg should not print a warning when the MD generation number does not change
6520018 vntsd gets confused and immediately closes newly established console connections
6528180 link state change is not handled under certain conditions in ldc
6528758 ’ds_cap_send: invalid handle’ message during LDom boot
Copyright © 2007, Sun Microsystems, Inc. All rights reserved.