C H A P T E R 1 |
These release notes contain changes for this release, supported platforms, a matrix of required software and patches, and other pertinent information about this release, including bugs that affect Logical Domains 1.0.3 software.
The major changes for this release of Logical Domains 1.0.3 software are as follows:
Added DVD boot support. Refer to Chapter 5 in the Logical Domains (LDoms) 1.0.3 Administration Guide.
Added support for the format(1M) command and unformatted disks. Refer to Chapter 5 in the Logical Domains (LDoms) 1.0.3 Administration Guide.
Added support for the user SCSI command (USCSICMD) input/output control call (ioctl) pass-through and disk reset
Added support to use physical trunk devices for external connectivity
Enabled statistics reporting using the interrupt statistics command, intrstat(1M)
Improved Volume Manager support – You can now export a volume as a full disk and install on it.
Added timeout= argument for virtual disks to ldm add-vdisk command. Added set-vdisk subcommand to set timeout= or volume= arguments for a virtual disk. Refer to the ldm man page or Chapter 5 in the Logical Domains (LDoms) 1.0.3 Administration Guide.
Added options= argument to ldm add-vdsdev command to specify slice (slice), exclusive (excl), or read-only (ro) options. Added set-vdsdev subcommand to set options for a virtual disk server. Refer to the ldm man page or Chapter 5 in the Logical Domains (LDoms) 1.0.3 Administration Guide.
Added support for Solaris Cluster
software on a guest domain. See Software That Can Be Used With the Logical
Domains Manager.
Conformed the XML format produced by the ldm ls-constraints -x command to the new version 3 (v3) specification. The LDoms Manager continues to accept the version (v2) format XML files produced by previous versions of the ls-constraints -x subcommand, but it produces the new version only on output.
Logical Domains (LDoms) Manager 1.0.3 software is supported on the following platforms:
This section lists the required, recommended, and optional software for use with Logical Domains software.
If you want to use any of the features of LDoms 1.0.3 software, use one of the following configurations of the Solaris 10 OS on the control domain and any dependent domain:
Solaris 10 8/07 OS with Patch ID 127127-11, which contains bug fixes and LDoms 1.0.3 features
Solaris 10 11/06 OS with Patch ID 127127-11, which contains bug fixes and LDoms 1.0.3 features
Following is a matrix of required software to enable all the Logical Domains 1.0.3 features and bug fixes.
It is possible to run the Logical Domains 1.0.3 software along with previous revisions of the other software components. For example, you could have differing versions of the Solaris OS on the various domains in a machine. It is recommended to have all domains running Solaris 10 5/08 OS, but an alternate upgrade strategy could be to upgrade the control and service domains to Solaris 10 5/08 OS and to continue running the guest domains at the existing patch level.
Following is a matrix of the minimum versions of software required. The minimum software versions are platform specific and depend on the requirements of the CPU in the machine. The minimum Solaris OS version for a given CPU type applies to all domain types (control, service, I/O, and guest).
Supported Servers | Logical Domains Manager | System Firmware | Solaris OS |
---|---|---|---|
Sun UltraSPARC T2 Plus–based servers | 1.0.3 | 7.1.x | Solaris 10 8/07[1] |
Sun UltraSPARC T2–based servers | 1.0.3 | 7.0.x | Solaris 10 8/07 |
Sun UltraSPARC T1–based servers | 1.0.3 | 6.5.x | Solaris 10 11/06[2] |
Following are the required system firmware patches at a minimum for use with Logical Domains 1.0.3 software on supported servers:
You can find the required Solaris OS and system firmware patches
at the SunSolve site:
Solaris Security Toolkit 4.2 software – This software can help you secure the Solaris OS in the control domain and other domains. Refer to the Solaris Security Toolkit 4.2 Administration Guide and Solaris Security Toolkit 4.2 Reference Manual for more information.
Logical Domains (LDoms) Management Information Base (MIB) software – This software can help you enable third party applications to perform remote monitoring and a few control operations. Refer to the Logical Domains (LDoms) MIB 1.0.1 Administration Guide and Release Notes for more information.
Libvirt for LDoms software – This software provides virtual library (libvirt) interfaces for Logical Domains (LDoms) software so that virtualization customers can have consistent interfaces. The libvirt library (version 0.3.2) included in this software interacts with the Logical Domains Manager software running on Solaris 10 Operating System (OS) to support Logical Domains virtualization technology. Refer to the Libvirt for LDoms 1.0.1 Administration Guide and Release Notes for more information.
Note - LDoms MIB software and Libvirt for LDoms software works with LDoms 1.0.1 software at a minimum. |
The Logical Domains (LDoms) 1.0.3 Administration Guide and Logical Domains (LDoms) 1.0.3 Release Notes can be found at:
The Beginners Guide to LDoms:
Understanding and Deploying Logical Domains can be found
at the Sun BluePrints site.
http://www.sun.com/blueprints/0207/820-0832.html
Note - Most of the concepts in the Beginner’s Guide are valid. Some of the details and examples refer only to LDoms 1.0 software. |
The following cards are not supported for this LDoms 1.0.3 software release:
The following bug IDs are filed to provide the support for the currently unsupported cards: 6552598, 6563713, 6589192, and 6598882.
Logical Domains software does not impose a memory size limitation when creating a domain. The memory size requirement is a characteristic of the guest operating system. Some Logical Domains functionality might not work if the amount of memory present is less than the recommended size. For recommended and minimum size memory requirements, refer to the installation guide for the operating system you are using. The default size for a swap area is 512 megabytes. Refer to “System Requirements and Recommendations” in the Solaris 10 Installation Guide: Planning for Installation and Upgrade.
The OpenBoot PROM has
a minimum size restriction for a domain. Currently, that restriction
is 12 megabytes. If you have a domain less than that size, the Logical Domains
Manager will automatically boost the size of the domain to 12 megabytes. Refer
to the release notes for your system firmware for information about
memory size requirements.
Booting a Large Number of Domains
As sun4v systems with greater thread counts are released, you can have more domains per system than previous releases:
If unallocated virtual CPUs are available, assign them to the service domain to help process the virtual I/O requests. Allocate 4 to 8 virtual CPUs to the service domain when creating more than 32 domains.Since maximum domain configurations have only a single CPU in the service domain, do not put unnecessary stress on the single CPU when configuring and using the domain.The virtual switch (vsw) services should be spread over all the network adapters available in the machine. For example, if booting 128 domains on a Sun SPARC Enterprise T5240 server, create 4 vsw services, each serving 32 virtual net (vnet) instances. Do not have more than 32 vnet instances per vsw service, because having more than that tied to a single vsw could cause hard hangs in the service domain.To run the maximum configurations, a machine needs 64 gigabytes of memory (and up to 128 gigabytes in the Sun SPARC Enterprise T5240 server, if possible) so that the guest domains contain an adequate amount of memory. The guest domains require a minimum of 512 megabytes of memory but can benefit from having more than that, depending on the workload running in the domain and the configuration of the domain (number of virtual devices in the domain). Memory and swap space usage increases in a guest domain when the vsw services used by the domain provides services to many virtual networks (in multiple domains). This is due to the peer-to-peer links between all the vnet connected to the vsw.The service domain benefits from having extra memory. Four gigabytes is the recommended minimum when running more than 64 domains. Start domains serially rather than all at once. Start domains in groups of 10 or less and wait for them to boot before starting the next batch. The same advice applies to installing operating systems on domains.
There is a limit to the number of LDCs available in any logical domain. For Sun UltraSPARC T1-based platforms, that limit is 256; for all other platforms, the limit is 512. Practically speaking, this only becomes an issue on the control domain, because the control domain has at least part, if not all, of the I/O subsystem allocated to it, and because of the potentially large number of LDCs created for both virtual I/O data communications and the Logical Domains Manager control of the other logical domains.
Note - The examples in this section are what happens on Sun UltraSPARC T1-based platforms. However, the behavior is the same if you go over the limit on other supported platforms. |
If you try to add a service, or bind a domain, so that the number of LDC channels exceeds the limit on the control domain, the operation fails with an error message similar to the following:
13 additional LDCs are required on guest primary to meet this request, but only 9 LDCs are available |
The following guidelines can help prevent creating a configuration that could overflow the LDC capabilities of the control domain:
The control domain allocates 12 LDCs for various communication purposes with the hypervisor, Fault Management Architecture (FMA), and the system controller (SC), independent of the number of other logical domains configured.
The control domain allocates one LDC to every logical domain, including itself, for control traffic.
Each virtual I/O service on the control domain consumes one LDC for every connected client of that service.
For example, consider a control domain and 8 additional logical domains. Each logical domain needs at a minimum:
Applying the above guidelines yields the following results (numbers in parentheses correspond to the preceding guideline number from which the value was derived):
12(1) + 9(2) + 8 x 3(3) = 45 LDCs in total.
Now consider the case where there are 32 domains instead of 8, and each domain includes 3 virtual disks, 3 virtual networks, and a virtual console. Now the equation becomes:
12 + 33 + 32 x 7 = 269 LDCs in total.
Depending upon the number of supported LDCs of your platform, the Logical Domain Manager will either accept or reject the configurations.
This section details the software that is compatible with and can be used with the Logical Domains software. Be sure to check in the software documentation or your platform documentation to find the version number of the software that is available for your version of LDoms software and your platform.
SunVTS functionality
is available in the control domain and guest domains on certain
LDoms software releases and certain platforms. SunVTS is Sun’s Validation
Test Suite, which provides a comprehensive diagnostic tool that
tests and validates Sun hardware by verifying the connectivity and
proper functioning of most hardware controllers and devices on Sun
servers. For more information about SunVTS, refer to the SunVTS User’s Guide for your version
of SunVTS.
Sun Management
Center 4.0 Version 3 Add-On Software can be used only
on the control domain with the Logical Domains Manager software
enabled. Sun Management Center is an open, extensible system monitoring
and management solution that uses Java
and
a variant of the Simple Network Management Protocol (SNMP) to provide
integrated and comprehensive enterprise-wide management of Sun products
and their subsystem, component, and peripheral devices. Support
for hardware monitoring within the Sun Management Center environment
is achieved through the use of appropriate hardware server module add-on
software, which presents hardware configuration and fault reporting information
to the Sun Management Center management server and console. Refer
to the Sun Management Center 4.0 Version
3 Add-On Software Release Notes: For Sun Fire, SunBlade, Netra,
and SunUltra Systems for more information about using Sun
Management Center 4.0 Version 3 on the supported servers.
Sun Explorer
Data Collector can be used with the Logical Domains Manager software
enabled on the control domain. Sun Explorer is a diagnostic data collection
tool. The tool comprises shell scripts and a few binary executables. Refer
to the Sun Explorer User’s Guide for
more information about using the Sun Explorer Data Collector.
Solaris Cluster software
can be used only on an I/O domain in Logical Domains software releases
up through LDoms 1.0.2. In LDoms 1.0.3 software, Solaris Cluster
software can be used in a guest domain with some restrictions. Refer
to Solaris Cluster documentation for more information about any restrictions
and about the Solaris Cluster software in general.
The following system controller (SC) software interacts with the Logical Domains 1.0.3 software:
Sun Integrated Lights Out Manager (ILOM) 2.0 firmware is the system management firmware you can use to monitor, manage, and configure Sun UltraSPARC T2-based server platforms. ILOM is preinstalled on these platforms and can be used on the control domain on LDoms-supported servers with the Logical Domains Manager 1.0.3 software enabled. Refer to the Sun Integrated Lights Out Manager 2.0 User’s Guide for features and tasks that are common to Sun rackmounted servers or blade servers that support ILOM. Other user documents present ILOM features and tasks that are specific to the server platform you are using. You can find the ILOM platform-specific information within the documentation set that accompanies your system.
Advanced Lights Out Manager (ALOM) Chip Multithreading (CMT) Version 1.3 software can be used on the control domain on UltraSPARC® T1-based servers with the Logical Domains Manager 1.0.1 software enabled. Refer to “Using LDoms With ALOM CMT” in the Logical Domains (LDoms) 1.0.3 Administration Guide. The ALOM system controller enables you to remotely manage and administer your supported CMT servers. ALOM enables you to monitor and control your server either over a network or by using a dedicated serial port for connection to a terminal or terminal server. ALOM provides a command-line interface that you can use to remotely administer geographically distributed or physically inaccessible machines. For more information about using ALOM CMT Version 1.3 software, refer to the Advanced Lights Out Management (ALOM) CMT v1.3 Guide.
Netra Data Plane Software Suite is a complete board software package solution. The software provides an optimized rapid development and runtime environment on top of multistrand partitioning firmware for Sun CMT platforms. The Logical Domains Manager contains some ldm subcommands (add-vdpcs, rm-vdpcs, add-vdpcc, and rm-vdpcc) for use with this software. Refer to the Netra Data Plane Software Suite documentation for more information about this software.
This section contains general notes and issues concerning the Logical Domains 1.0.3 software.
For discussions in Logical Domains documentation, the terms system controller (SC) and service processor (SP) are interchangeable.
Currently, there is a limit of 8 configurations for logical domains that can be saved on the system controller using the ldm add-config command, not including the factory-default configuration.
If you have made any configuration changes since last saving a configuration to the SC, before you attempt to power off or power cycle a Logical Domains system, make sure you save the latest configuration that you want to keep.
Under certain circumstances, the Logical Domains (LDoms) Manager rounds up the requested memory allocation to either the next largest 8-kilobyte or 4-megabyte multiple. This can be seen in the following example output of the ldm list-domain -l command, where the constraint value is smaller than the actual allocated size:
Memory: Constraints: 1965 M raddr paddr5 size 0x1000000 0x291000000 1968M |
Currently, there is an issue related to dynamic reconfiguration (DR) of virtual CPUs if a logical domain contains one or more cryptographic (mau) units:
Currently, Fault Management Architecture (FMA) diagnosis of I/O devices in a Logical Domains environment might not work correctly. The problems are:
Input/output (I/O) device faults diagnosed in a non-control domain are not logged on the control domain. These faults are only visible in the logical domain that owns the I/O device.
I/O device faults diagnosed in a non-control domain are not forwarded to the system controller. As a result, these faults are not logged on the SC and there are no fault actions on the SC, such as lighting of light-emitting diodes (LEDs) or updating the dynamic field-replaceable unit identifiers (DFRUIDs).
Errors associated with a root complex that is not owned by the control domain are not diagnosed properly. These errors can cause faults to be generated against the diagnosis engine (DE) itself.
With domaining enabled, variable updates persist across a reboot, but not across a power cycle, unless the variable updates are either initiated from OpenBoot firmware on the control domain, or followed by saving the configuration to the SC.
In this context, it is important to note that a reboot of the control domain could initiate a power cycle of the system:
When the control domain reboots, if there are no bound guest domains, and no delayed reconfiguration in progress, the SC power cycles the system.
When the control domain reboots, if there are guest domains bound or active (or the control domain is in the middle of a delayed reconfiguration), the SC does not power cycle the system.
LDom variables for a domain can be specified using any of the following methods:
Modifying, in a limited fashion, from the system controller (SC) using the bootmode command; that is, only certain variables, and only when in the factory-default configuration.
The goal is that, variable updates made using any of these methods always persist across reboots of the domain, and always reflect in any subsequent logical domain configurations saved to the SC.
In Logical Domains 1.0.3 software, there are a few cases where variable updates do not persist as expected:
With domaining enabled (the default in all cases except the UltraSPARC T1000 and T2000 systems running in factory-default configuration), all methods of updating a variable (OpenBoot firmware, eeprom command, ldm subcommand) persist across reboots of that domain, but not across a power cycle of the system, unless a subsequent logical domain configuration is saved to the SC. In addition, in the control domain, updates made using OpenBoot firmware persist across a power cycle of the system; that is, even without subsequently saving a new logical domain configuration to the SC.
When domaining is not enabled, variable updates specified through the eeprom(1M) command persist across a reboot of the primary domain into the same factory-default configuration, but do not persist into a configuration saved to the SC. Conversely, in this scenario, variable updates specified using the Logical Domains Manager do not persist across reboots, but are reflected in a configuration saved to the SC.
So, when domaining is not enabled, if you want a variable update to persist across a reboot into the same factory-default configuration, use the eeprom command. If you want it saved as part of a new logical domains configuration saved to the SC, use the appropriate Logical Domains Manager command.
In all cases, when reverting to the factory-default configuration from a configuration generated by the Logical Domains Manager, all LDoms variables start with their default values.
The following bug IDs have been filed to resolve these issues: 6520041, 6540368, and 6540937.
This section summarizes the bugs that you might encounter when using this version of the software. The bug descriptions are in numerical order by bug ID. If a recovery procedure and a workaround are available, they are specified.
Format oddities and a core dump occur when using the Zettabyte File System (ZFS) Volume Emulation Driver (ZVOL) and when the Logical Domains environment has virtual disks with an extensible firmware interface (EFI) label. Selecting such disks with the format(1M) command cause a core dump.
When the Fault Management Architecture (FMA) places a CPU offline, it records that information, so that when the machine is rebooted the CPU remains offline. The offline designation persists in a non-Logical Domains environment.However, in a Logical Domains environment, this persistence is not always maintained for CPUs in guest domains. The Logical Domains Manager does not currently record data on fault events sent to it. This means that a CPU in a guest domain that has been marked as faulty, or one that was not allocated to a logical domain at the time the fault event is replayed, can subsequently be allocated to another logical domain with the result that it is put back online.
If a disk device listed in a guest domain’s configuration is either non-existent, already opened by another process, or otherwise unusable, the disk cannot be used by the virtual disk server (vds) but the Logical Domains Manager does not emit any warning or error when the domain is bound or started.
When the guest tries to boot, messages similar to the following are printed on the guest’s console:
WARNING: /virtual-devices@100/channel-devices@200/disk@0: Timeout connecting to virtual disk server... retrying |
In addition, if a network interface specified using the net-dev= parameter does not exist or is otherwise unusable, the virtual switch is unable to communicate outside the physical machine, but the Logical Domains Manager does not emit any warning or error when the domain is bound or started.
In the case of an errant virtual disk service device or volume, perform the following steps:
Stop the domain owning the virtual disk bound to the errant device or volume.
Issue the ldm rm-vdsdev command to remove the errant virtual disk service device.
Issue the ldm add-vdsdev command to correct the physical path to the volume.
In the case of an errant net-dev= property specified for a virtual switch, perform the following steps:
If a disk device listed in a guest domain’s configuration is being used by software other than the Logical Domains Manager (for example, if it is mounted in the service domain), the disk cannot be used by the virtual disk server (vds), but the Logical Domains Manager does not emit a warning that it is in use when the domain is bound or started.
When the guest domain tries to boot, a message similar to the following is printed on the guest’s console:
WARNING: /virtual-devices@100/channel-devices@200/disk@0: Timeout connecting to virtual disk server... retrying |
Recovery: Unbind the guest domain, and unmount the disk device to make it available. Then bind the guest domain, and boot the domain.
Under rare circumstances, when an ldom variable, such as boot-device, is being updated from within a guest domain by using the eeprom(1M) command at the same time that the Logical Domains Manager is being used to add or remove virtual CPUs from the same domain, the guest OS can hang.
Workaround: Ensure that these two operations are not performed simultaneously.
Recovery: Use the ldm stop-domain and ldm start-domain commands to stop and start the guest OS.
The iostat(1M) command does not return any meaningful information when run on a domain with virtual disks. This is because the LDoms vdisk client driver (vdc) does not measure I/O activity nor save any info to kstats which could be read by the iostat command.
Workaround: Gather the I/O statistics on the service domain exporting the virtual disks.
There are some cases where the behavior of the ldm stop-domain command is confusing.
If the Solaris OS is halted on the domain; for example, by using the halt(1M) command; and the domain is at the prompt “r)eboot, o)k prompt, h)alt?,“ the ldom stop-domain command fails with the following error message:
LDom <domain name> stop notification failed |
Workaround: Force a stop by using the ldm stop-domain command with the -f option.
# ldm stop-domain -f ldom |
If the domain is at the kernel module debugger, kmdb(1M) prompt, then the ldm stop-domain command fails with the following error message:
LDom <domain name> stop notification failed |
Recovery: If you restart the domain from the kmdb prompt, the stop notification is handled, and the domain does stop.
In a Logical Domains environment, there is no support for setting or deleting wide-area network (WAN) boot keys from within the Solaris OS using the ickey(1M) command. All ickey operations fail with the following error:
ickey: setkey: ioctl: I/O error |
In addition, WAN boot keys that are set using OpenBoot firmware in logical domains other than the control domain are not remembered across reboots of the domain. In these domains, the keys set from the OpenBoot firmware are only valid for a single use.
Misleading error messages are returned from certain ldm subcommands that take two or more required arguments, if one or more of those required arguments is missing.
For example, if the add-vsw subcommand is missing the vswitch-name or ldom argument, you receive an error message similar to the following:
# ldm add-vsw net-dev=e1000g0 primary Illegal name for service: net-dev=e1000g0 |
For another example, if the add-vnet command is missing the vswitch-name of the virtual switch service with which to connect, you receive an error message similar to the following:
# ldm add-vnet mac-addr=08:00:20:ab:32:40 vnet1 ldg1 Illegal name for VNET interface: mac-addr=08:00:20:ab:32:40 |
As another example, if you fail to add a logical domain name at the end of an ldm add-vcc command, you receive an error message saying that the port-range= property must be specified.
Recovery: Refer to the Logical Domains (LDoms) Manager 1.0.3 Man Page Guide or the ldm man page for the required arguments of the ldm subcommands, and retry the commands with the correct arguments.
This issue is summarized in Logical Domain Variable Persistence.
If Solaris™ Cluster software is in use with Logical Domains software, and the cluster is shut down, the console of each logical domain in the cluster displays the following prompt:
r)eboot, o)k prompt, h)alt? |
If the ok prompt (o option) is selected, the system can panic.
Select halt (h option) at the prompt on the logical domain console to avoid the panic.
To force the logical domain to stop at the ok prompt, even if the OpenBoot auto-boot? variable is set to true, follow one of the two following procedures.
Issue the following ALOM command to reset the domain:
sc> poweron |
The OpenBoot banner is displayed on the console:
Sun Fire T200, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.26.0, 4096 MB memory available, Serial #68100096. Ethernet address 0:14:4f:f:20:0, Host ID: 840f2000. |
Issue the following ALOM command to send a break to the domain immediately after the OpenBoot banner displays:
sc> break -y |
Issue the following command from the control domain to disable the auto-boot? variable for the logical domain:
# ldm set-var auto-boot?=false domain-name |
Issue the following command from the control domain to reset the logical domain:
# ldm start-domain domain-name |
Issue the following OpenBoot command to restore the value of the auto-boot? variable:
ok setenv auto-boot? true |
If a guest domain is running the Solaris 10 OS and using a
virtual disk built from a ZFS volume provided by a service domain
running the Solaris Express
or OpenSolaris™ programs, then the guest domain might not
be able to access that virtual disk.
The same problem can occur with a guest domain running the Solaris Express or OpenSolaris programs using a virtual disk built from a ZFS volume provided by a service domain running Solaris 10 OS.
Workaround: Ensure that the guest domain and the service domain are running the same version of Solaris software (Solaris 10 OS, Solaris Express, or OpenSolaris).
When a memory page of a guest domain is diagnosed as faulty, the Logical Domains Manager retires the page in the logical domain. If the logical domain is stopped and restarted again, the page is no longer in a retired state.
The fmadm faulty -a command shows whether the page from either the control or guest domain is faulty, but the page is not actually retired. This means the faulty page can continue to generate memory errors.
Workaround: Use the following command in the control domain to restart the Fault Manager daemon, fmd(1M):
primary# svcadm restart fmd |
If you reset the system controller while the host is powered on, subsequent error reports and faults are not delivered to the host.
On a system configured to use the Network Information Services (NIS) or NIS+ name service, if the Solaris Security Toolkit software is applied with the server-secure.driver, NIS or NIS+ fails to contact external servers. A symptom of this problem is that the ypwhich(1) command, which returns the name of the NIS or NIS+ server or map master, fails with a message similar to the following:
Domain atlas some.atlas.name.com not bound on nis-server-1.c |
The recommended Solaris Security Toolkit driver to use with the Logical Domains Manager is ldm_control-secure.driver, and NIS and NIS+ work with this recommended driver.
If you are using NIS as your name server, you cannot use the Solaris Security Toolkit profile server-secure.driver, because you may encounter Solaris OS Bug ID 6557663, IP Filter causes panic when using ipnat.conf. However, the default Solaris Security Toolkit driver, ldm_control-secure.driver, is compatible with NIS.
Log in to the system console from the system controller, and if necessary, switch to the ALOM mode by typing:
# #. |
Power off the system by typing the following command in ALOM mode:
sc> poweroff |
sc> poweron |
Switch to the console mode at the ok prompt:
sc> console |
ok boot -s |
Edit the file /etc/shadow, and change the first line of the shadow file that has the root entry to:
root::6445:::::: |
Log in to the system and do one of the following:
# /opt/SUNWjass/bin/jass-execute -ui # /opt/SUNWjass/bin/jass-execute -a ldm_control-secure.driver |
The virtual networking infrastructure adds additional overhead to communications from a logical domain. All packets are sent through a virtual network device, which, in turn, passes the packets to the virtual switch. The virtual switch then sends the packets out through the physical device. The lower performance is seen due to the inherent overheads of the stack.
Workarounds: Do one of the following depending on your server:
On Sun UltraSPARC T1-based servers, such as the Sun Fire T1000 and T2000, and Sun UltraSPARC T2+ based servers such as the Sun SPARC Enterprise T5140 and T5240, assign a physical network card to the logical domain using a split-PCI configuration. For more information, refer to “Configuring Split PCI Express Bus to Use Multiple Logical Domains” in the Logical Domains (LDoms) 1.0.3 Administration Guide.
On Sun Ultra SPARC T2-based servers, such as the Sun SPARC Enterprise T5120 and T5220 servers, assign a Network Interface Unit (NIU) to the logical domain.
If the time or date on a logical domain is modified, for example using the ntpdate command, the change persists across reboots of the domain but not across a power cycle of the host.
Workaround: For time changes to persist, save the configuration with the time change to the SC and boot from that configuration.
This issue is summarized in Logical Domain Variable Persistence and applies only to the control domain.
During operations in a split-PCI configuration, if a bus is unassigned to a domain or is assigned to a domain but not running the Solaris OS, any error in that bus or any other bus may not get logged. Consider the following example:
In a split-PCI configuration, Bus A is not assigned to any domain, and Bus B is assigned to the primary domain. In this case, any error that occurs on Bus B might not be logged. (The situation occurs only during a short time period.) The problem resolves when the unassigned Bus A is assigned to a domain and is running the Solaris OS, but by then some error messages might be lost.
Workaround: When using a split-PCI configuration, quickly verify that all buses are assigned to domains and are running the Solaris OS.
The following message appears at the ok prompt if an attempt is made to boot a guest domain that contains Emulex-based Fibre Channel host adapters (Sun Part Number 375-3397):
ok> FATAL:system is not bootable, boot command is disabled |
These adapters are not supported in a split-PCI configuration on Sun Fire T1000 servers.
If SunVTS™ is started and stopped multiple times, it is possible that switching from the SC console to the host console, using the console SC command can result in either of the following messages being repeatedly emitted on the console:
Enter #. to return to ALOM. |
Warning: Console connection forced into read-only mode |
The following Infiniband cards are not supported with LDoms 1.0.1, 1.0.2, and 1.0.3:
Workaround: If one of these unsupported configurations is used with LDoms software, all logical domains must be stopped and unbound before the primary or control domain is rebooted. Failure to do so might result in the device becoming unusable, and the system does not recognize the card.
Normally, when the verbose (-v) option is specified to the prtdiag(1M) command in the control domain, additional environmental status information is displayed. If the output of this information is interrupted by issuing a Control-C, the Platform Information and Control Library (PICL) daemon, picld(1M), can enter a state which prevents it from supplying the environmental status information to the prtdiag command from that point on, and the additional environmental data is no longer displayed.
Workaround: Restart the picld(1M) Service Management Facility (SMF) in the control domain using the following command:
# svcadm restart picld |
If a virtual disk is backed by a file, then this virtual disk cannot be labeled with an EFI label and cannot directly be added in a ZFS pool.
Workaround: The disk has to be labeled with a volume table of contents (VTOC) label using the format(1M) command. The disk can be added to a ZFS pool by creating a VTOC label with a slice covering the entire disk (for example, slice 0) and adding that slice to the ZFS pool instead of adding the entire disk. For example, use zpool create xyzpool c0d1s0 instead of zpool create xyzpool c0d1.
Occasionally during a Solaris OS boot, a console message from the Domain Services (ds) module reports that reading or writing from a logical domain channel was unsuccessful. The reason code (131) indicates that the channel has been reset. Following are examples of the console message.
NOTICE: ds@1: ldc_read returned 131 WARNING: ds@0: send_msg: ldc_write failed (131): |
These console messages do not affect the normal operation of the system and can be ignored.
If Solaris Cluster software is installed on a guest logical domain, Solaris Cluster heartbeat packets can get dropped under a heavy network load. This could cause the cluster node to panic.
Console behavior on the control domain is inconsistent when a graphics device and keyboard are specified for console use. This occurs when the OpenBoot variables input-device and output-device are set to anything other than the default value of virtual-console.
If the control domain is set this way, some console messages are sent to the graphics console and others are sent to the virtual console. This results in incomplete information on either console. In addition, when the system is halted, or a break is sent to the console, control is passed to the virtual console which requires keyboard input over the virtual console. As a result, the graphics console appears to hang.
Workaround: To avoid this problem, use only the virtual console. From the OpenBoot ok prompt, ensure that the default value of virtual-console is set for both the input-device and variables.
Recovery: Once the graphics console appears hung, do the following:
Connect to the virtual console from the system processor to provide the required input.
Press Return on the virtual console keyboard once to see the output on the virtual console.
If these solutions do not work for your configuration or if you have further questions, contact Sun Services.
If you use the service processor (SP) setdate command after you configure non-default logical domains and save them to the SP, the date on non-default logical domains changes.
Workaround: Configure the SP date using the setdate command before you configure the logical domains and save them on the SP.
Recovery: If you use the SP setdate command after you save the non-default logical domains configurations on the SP, you would need to boot each non-default logical domains to the Solaris OS and correct the date. Refer to the date(1) or ntpdate(1M) commands in the Solaris 10 OS Reference Manual Collection for more information about correcting the date.
If a CPU or memory fault occurs, the affected domain might panic and reboot. If FMA attempts to retire the faulted component while the domain is rebooting, the Logical Domains Manager is not able to communicate with the domain, and the retire fails. In this case, the fmadm faulty command lists the resource as degraded.
Recovery: Wait for the domain to complete rebooting and then force FMA to replay the fault event by restarting fmd on the control domain by using this command:
# svcadm restart fmd |
It is possible to erroneously add duplicate I/O constraints when configuring a logical domain.
When logical domains are being configured with no specific console port specified for any logical domain, any Logical Domains Manager (which may happen automatically as part of a delayed reconfiguration or LDoms manager exit), may change the LDoms Manager console port configuration state from what the user has originally entered. This may result in the following error message when attempting to bind a logical domain:
Unable to bind client vcons0 |
Workaround: Check the actual configuration state for the guest which failed to bind using the command:
# ldm ls-constraints |
The output should show that the port constraint in the console matches with one of the bound guests. Use the ldm destroy command to completely remove the guest. Create the guest from scratch without any constraint set on the console, or use another console port not currently assigned to any bound guest.
If you configure more than four virtual networks (vnets) in a guest domain on the same network using the Dynamic Host Protocol (DHCP), the guest domain can eventually become unresponsive while running network traffic.
Recovery: Issue an ldm stop-domain ldom command followed by an ldm start-domain ldom command on the guest domain (ldom) in question.
If you run the Solaris 10 11/06 OS and you harden drivers on the primary domain which is configured with only one strand, rebooting the primary domain or restarting the Fault Management daemon (fmd) can result in an fmd core dump. The fmd dumps core while it cleans up its resources, and this does not affect the FMA diagnosis.
Workaround: Add a few more strands into the primary domain. For example,
# ldm add-vcpu 3 primary |
When removing CPUs from a domain that is in delayed reconfiguration mode, if all the CPUs that are bound to that domain and on the same core are removed, and the Modular Arithmetic Unit (MAU) on that core had also been bound to the same domain, then that MAU becomes orphaned. It is no longer reachable by the domain to which it is bound, nor is it made available to any other domain which has CPUs bound to the same core. In addition, there is no warning or error returned at the time the MAU is orphaned.
Workaround: Remove sufficient MAUs from the domain prior to removing CPUs, so that removing CPUs does not result in MAUs becoming unreachable.
On UltraSPARC T1–based systems, there is one MAU for every four CPU strands
On UltraSPARC T2–based systems, there is one MAU for every eight CPU strands
To find out which MAUs are bound to the domain, type:
# ldm ls -l ldom |
To remove MAUs from a domain, type:
# ldm rm-mau number ldom |
The virtual switch (vsw) occasionally displays benign operational messages as either a WARNING or NOTICE. Some of these message are listed as follows and should be ignored as they do not impact the normal operation of the virtual switch.
WARNING: vsw0: failed to program addr 0:14:4f:f8:f0:2 for port 13 into device e1000g2 : err 28 NOTICE: vsw0: switching device e1000g2 into promiscuous mode NOTICE: vsw0: switching device e1000g2 back to programmed mode WARNING: vsw1: device (aggr15) does not support setting multiple unicast addresses WARNING: vsw1: Unable to setup layer2 switching |
WAN booting a logical domain using a miniroot created from a Solaris 10 8/07 OS installation DVD hangs during a boot of the miniroot.
The scadm command on a control domain running Solaris 10 11/06 or later can hang following a SC reset. This is caused because the system is unable to properly reestablish a connection following an SC reset.
Recovery: Reboot the host to reestablish connection with the SC.
Workaround: Reboot the host to reestablish connection with the SC.
If a physical disk is exported as a virtual disk through the Veritas Dynamic Multipathing (DMP) framework (that is, using /dev/vx/dmp/cXdXtXs2), the physical disk is not exported correctly, and it appears as a single slice disk in the guest domain.
Workaround: The physical disk should be exported without using the Veritas DMP framework. The disk should be exported using /dev/dsk/cXdXtXs2 instead of /dev/vx/dmp/cXdXtXs2.
If virtual devices are added to an active domain, and virtual devices are removed from that domain before that domain reboots, then the added devices do not function once the domain reboots.
Recovery: Remove and then add the non-functional virtual devices, making sure that all the remove requests precede all the add requests, and then reboot the domain.
Workaround: On an active domain, do not add and remove any virtual devices without an intervening reboot of the domain.
When requesting a change in the memory allocation of a domain (using any of the add-memory, set-memory, or rm-memory subcommands of ldm), there are cases in which a failure to make the requested change can result in the Logical Domains Manager terminating. When this happens, the following message is returned on the failed request:
Receive failed: logical domain manager not responding |
SMF then restarts the LDom Manager, and the system is fully functional once it is restarted.
If the hypervisor rejects an ldm panic-domain request (because the domain is already resetting for example), the error message returned by the LDoms Manager is misleading:
Invalid LDom ldg23 |
LDoms multidomain functionality does not support SNMP 1.5.4 on Sun SPARC Enterprise T5140 and Sun SPARC Enterprise T5240 systems. Only a single global domain is supported.
Simultaneous net installation of multiple guest domains fails on Sun SPARC Enterprise T5140 and Sun SPARC Enterprise T5240 systems with a common console group.
Workaround: Only net install on guest domains that each have their own console group. This failure is seen only on domains with a common console group shared among multiple net-installing domains.
After a delayed reconfiguration on a guest domain and a subsequent power cycle, the guest fails to boot with the following message:
Boot device: /virtual-devices@100/channel-devices@200/disk@0 File and args: WARNING: /virtual-devices@100/channel-devices@200/disk@0: Timeout connecting to virtual disk server... retrying |
This happens when a configuration is saved to the SP while a delayed reconfiguration is pending.
Workarounds: Either do not save the configuration to SP after the delayed reconfiguration is completed and the guest rebooted, or run the following on the primary domain after the guest is rebooted after a delayed reconfiguration:
# ldm stop ldom # ldm unbind ldom # ldm bind ldom # ldm start ldom |
On a guest domain, the virtual disk driver does not support the disk control operations partition (DKIOCPARTITION) ioctl. Using this ioctl fails, while it should succeed for disks with an EFI label.
With Solaris Cluster running in an LDoms guest domain, trying to add a virtual disk with an EFI label as a quorum device fails:
# scconf -a -q globaldev=d2 scconf: Failed to add quorum device (d2) - unable to scrub the device. |
Workaround: With Solaris Cluster running in an LDoms guest domain, only virtual disks with a VTOC label can be added as quorum devices.
On a guest domain, when opening a virtual disk for writing, the virtual disk driver does not check if the virtual disk was exported as a read-only device. Thus, opening the device succeeds instead of failing with a read-only file system (EROFS) error.
Instead of failing immediately, applications using a read-only virtual disk character device (/dev/rdsk/cXdXsX) for writing fail only when a write command is issued. Applications using a read-only virtual disk block device (/dev/dsk/cXdXsX) for writing see errors only if the write is not buffered in the system cache. Therefore, such applications might see write operations completed successfully although the write is not be committed to the device.
If a link aggregation device is being used as the physical device for a virtual switch (vswitch), the vswitch might not be able to open and configure the device. As a result, the client guest domains cannot send and receive network packets to and from the physical network.
Workaround: Add the following line to the /etc/system file on the domain with the vswitch, and reboot the domain.
set vsw:vsw_mac_open_retries = 1200 |
Occasionally, a service domain panics during reboot if the virtual switch is configured to use an aggregated network device for external connectivity.
Recovery: Reconfigure the virtual switch to a physical network device using the ldm set-vsw command, and then restart the domain.
Workaround: Configure the virtual switch to use a regular physical network device instead of an aggregated network device.
The sysfwdownload utility takes significantly longer to run from within an LDoms environment on systems based on UltraSPARC T1 processors. This happens if you use the sysfwdownload utility while LDoms software is enabled.
Workaround: Boot to the factory-default configuration with LDoms software disabled before using the utility.
When a file or volume is exported as a virtual disk, then the service domain exporting that file or volume is acting as a storage cache for the virtual disk. In that case, data written to the virtual disk might get cached into the service domain memory instead of being immediately written to the virtual disk backend. Data are not cached if the virtual disk backend is a physical disk or slice, or if it is a volume device exported as a single-slice disk.
Workaround: If the virtual disk backend is a file or a volume device exported as a full disk, then you can prevent data from being cached into the service domain memory and have data written immediately to the virtual disk backend by adding the following line to the /etc/system file on the service domain.
set vds:vd_file_write_flags = 0 |
Note - Setting this unable flag will have an impact on performance when writing to a virtual disk, but it does ensure that data are written immediately to the virtual disk backend. |
In certain cases, the prtdiag(1M) command does not list all the CPUs.
Workaround: For an accurate count of CPUs, use the psrinfo(1M) command.
If the SVM volume is built on top of a disk slice containing block 0 of the disk, then SVM prevents writing to block 0 of the volume to avoid overwriting the label of the disk.
If a SVM volume built on top of a disk slice containing block 0 of the disk is exported as a full virtual disk, then a guest domain is unable to write a disk label for that virtual disk, and this prevents the Solaris OS from being installed on such a disk.
Workaround: SVM volumes exported as a virtual disk should not be built on top of a disk slice containing block 0 of the disk.
A more generic guideline is that slices which start on the first block (block 0) of a physical disk should not be exported (either directly or indirectly) as a virtual disk. Refer to “Directly or Indirectly Exporting a Disk Slice” in the Logical Domains (LDoms) 1.0.3 Administration Guide.
Occasionally, SPARC Enterprise T2000 systems can hang on boot if the virtual switch (vsw) is configured to use Sun x8 Express 1/10G Ethernet Adapters (nxge). The occurrence of this bug might signal broken network hardware. Systems that have broken network interface cards (NICs) exhibit this behavior.
If an attempt to use dynamic reconfiguration to remove CPUs from a domain using the Logical Domains Manager CLI fails (that is, the request to unconfigure the CPUs results in an error being returned from the guest OS) a failure message is reported to the screen, but the exit code of the associated ldm subcommand is incorrectly set to 0. This can cause scripts that check the exit status of ldm subcommands in order to determine success or failure to incorrectly assume that the command succeeded when in fact it did not.
Normally, if a delayed reconfiguration is in progress for a domain, attempts to make configuration changes to any other domain are disallowed to prevent potential problems when the domain under the delayed reconfiguration has its new configuration instantiated. However, attempts to remove volumes from a virtual disk server (using the ldm rm-vdsdev command) continue to be allowed, even when another domain is in the process of a delayed reconfiguration.
Since a volume cannot be removed if there are any virtual disks bound to it, even if that binding is part of a delayed reconfiguration, this will not result in any problems when the delayed reconfiguration is instantiated. The only ramification is the unexpected success of the operation.
Under certain situations, a restart of the Logical Domains Manager (or a reboot of the control domain) results in virtual disk device (vdsdev) information becoming replicated in the constraint database. Once these replicated entries exist in the Logical Domains Manager constraint database, ldm rm-vdsdev operations work initially, but do not persist across a subsequent restart of the Logical Domains Manager (or reboot of the control domain); that is, the removed vdsdev reappears.
Recovery: It could take multiple cycles of removing the vdsdev followed by restarting the Logical Domains Manager to clear out the replicated entries in the database.
On Sun UltraSPARC T1–based servers, JumpStart installations of the Solaris
10 11/06 OS on guest domains can hang sometimes depending on the
memory configuration. Typically, this happens on guest domains with
1024 megabytes of memory (+/- 20 megabytes).
Recovery: Stop the domain. Add or remove some memory; for example, 100 megabytes. Attempt the net installation again.
Workaround: If possible, net install the guest domain with the Solaris 10 8/07 OS or later.
The following LDoms bugs were fixed for the Solaris 10 5/08 OS:
6416097 Remove bit fields in vio messages for portability
6434615 vdisk needs to support booting/installing from DVDs
6437722 vdisk should support USCSICMD ioctl
6437772 vdisk should support mhd (multihost disk control operations)
6469894 xcall timeouts should be derived from the machine description
6492023 Service domain thread pegged 100% system time
6501039 Rebooting multiple guests continuously causes a reboot thread to hang
6512526 RC1a: vntsd needs to validate the listen-to IP address
6514091 vdisk server should export volumes as full disks
6519849 vnet hot lock in vnet_m_tx affecting performance
6527622 Attempt to store boot command variable during a reboot can time out
6528156 Opening devices exclusively from vds causes several problems
6530331 vsw when plumbed and in prog mode should write its mac address into HW
6531030 fmd not replaying page retirement fault events at startup on the primary domain
6531266 Link-aggregation with Nemo on e1000g primary ldom does not work
6531557 format(1M) does not work with virtual disks
6531913 vds can lose access to vdisks built from files located on the root fs
6534456 vntsd does not recognize a listen_addr of 127.0.0.1
6536262 vds occasionally sends out-of-order responses
6539243 LDC prints warning messages on console when running newer Solaris with older SysFw (6.3.x)
6541689 vsw_process_data_dring_pkt doesn’t check the return value from allocb
6542560 Implement LDC dcmds and walkers for improved debugging
6543601 intrstat is not supported on an LDom for virtual devices
6544946 Adding nonexistent disk device to single-CPU domain causes hang
6554177 vswitch should verify net-dev property
6556778 vnet does not handle ldc_init failure properly
6557970 Data from OpenBoot PROM is double-copied in vsw driver
6559924 vgen_mdeg_cb fails to release lock properly when error occurs
6563508 prtdiag/prtpicl broken on guest domains on Solaris 10 8/07
6566086 vdc needs an I/O timeout
6571988 cnex should cache target cpuid for each channel
6572885 ldc_init does not properly compute queue length from mtu
6572891 ldc reliable mode does not process ACK packets properly
6573332 Format of MAC address set in vnet and vsw attr packet do not match
6573492 ldc_rx_hdlr always sends CTRL/NACK on seqID mismatch
6573657 vds type-conversion bug prevents raw disk accesses from working
6575050 vds should support unformatted disks
6575216 IO-DOMAIN-RESET: Guests may lose access to disk services (vds) if I/O domain is rebooted
6575608 i_ldc_send_pkt() uses seqID without grabbing Tx lock
6578761 System hangs in ds_cap_fini() and ds_cap_init()
6578918 Disk image should have a device ID
6581720 IO-DOMAIN-RESET (T2000/T5120/T5220): Guest domain may lose connection to vsw if primary is rebooted
6589682 IO-DOMAIN-RESET (T2000-AA): kern_postprom panic on tavor-pcix configuration (reboot)
6591399 vds prints file lookup error during service domain boot
6591825 ldc_read does not set qhead after processing control pkts
6593231 Domain Services logging facility must manage memory better
6593961 Transmit performance doesn’t scale with increasing number of TCP connections in guest domain
6596819 vds does not implement DKIOCFLUSHWRITECACHE for files-exported-as-vdisks
6604983 Multicast processing after a channel-reset is broken in vnet
6605716 Halting the system should not override auto-boot? on the next poweron
6607061 vdisk protocol version needs to be bumped up to v1.1
6610044 vsw should mac_register() in attach()
6616313 cnex incorrectly generates interrupt cookies
6616525 ldclist.rwlock could be acquired after it has been destroyed
6620322 The panic occurred when the system was booted on T5120
6621222 Need a tunable to export volumes as single slice disks
6622758 LDC channel statistics are missing in vsw
6627933 Panic in vsw_reclaim_dring while net installing a guest
6630945 vntsd runs out of file descriptors with very large domain counts
6634346 cnex panics if DTrace probes use values which have been destroyed by a call to remove intr
6639934 Recursive mutex_enter panics on vgen_handshake_reset when configuring 17 vnets on a service
6667939 Guest domain panics on boot after installing T127127-08
The following LDoms bugs were fixed for the LDoms 1.0.3 software release:
6515615 add-vnet allows creation of virtual network devices with the same MAC addr assigned to another LDom
6517269 CLI: the "Usage:" outputs of ldm list-services and list-constraints are not consistent
6532201 Automatically assigned MAC address cannot be manually re-assigned to the logical domain
6563513 ldm list-constraints -x silently ignores everything that is not a valid LDom
6571091 LDOM manager dies when svcs is starting ldmd_start due to resource duplication
6580000 ldm needs a set-vdisk command to update vdisk timeout
6580005 XML parser should use set-vcons if needed when creating a domain
6582402 IO bus "alias" property is missing in XML list-bindings
6589614 Calls to cons_bind_mem() should send in a resp pointer whenever possible
6591279 Re-enable support for specifying options to VDS volumes
6592851 MAU tag missing when no crypto units allocated to a domain
6594308 ldm needs a set-vdsdev command to update vds device options
6595398 FMA memory retire code does not translate RA back to PA in ldmd
6622205 LDoms should report domain memory in mega- or gigabytes instead of bytes to avoid overrun
6626770 LDOM disk and network services renamed after 1.0.1 to 1.0.2 upgrade
6627345 useradd/roleadd -A solaris.ldoms.grant user|role generates "is not a valid authorization" error
6627904 ldm ls-devices does not accept legal arguments
6628063 After file system full error, ldmd repeatedly gets fatal error on startup
6649585 Typo in cancel-reconf message
6651993 LDoms Manager aborts when using set-vcpu into a guest domain that is in the process of a delayed reconfig
6654736 add-vsw command should add a mode option for special packet processing
6657785 HV abort after multiple memory config changes under a delayed reconfig
6667621 ldmd core dump on ldm add-domain -i xml_file
6671117 ldm crash on add-config if in delayed reconfig
6675316 Usage messages for add-vdsev/set-vdsdev options need to be more user friendly
6678085 VCC service provided a fixed range of port numbers and ignored the port number provided by the user
6680451 ldm set-vcc should always trigger a delayed reconfig if the domain is active
6681878 vdisk timeout feature did not kick in when I/O domain service is in halting state
6684612 Guest domain panic on vdc <-> vds handshake
6685297 ldmd reports incorrect memory size if ldm set-mem to a memory size less than existing memory
6688287 CLI help message cleanup
6689234 Don’t disallow multiple exports of the same backing store from the same domain
Copyright © 2008, Sun Microsystems, Inc. All rights reserved.