This section summarizes the bugs that you might encounter when using this version of the software. The most recent bugs are described first. Workarounds and recovery procedures are specified, if available.
Bug ID 20619894: If the system/management/hwmgmtd package is not installed, a dynamic bus remove operation causes the rcm_daemon to print the following message on the console:
rcm_daemon[839]: rcm script ORCL,pcie_rc_rcm.pl: svcs: Pattern 'sp/management' doesn't match any instances
Workaround: You can safely ignore this message.
Bug ID 20570207: When the power management policy is set to elastic, the primary domain might hang while the Logical Domains Manager is recovering domains after detecting faulty or missing resources.
Recovery: Change the policy to disabled and then power cycle the system to restart recovery mode.
Bug ID 20432421: If you use the grow-socket or shrink-socket command to modify virtual CPUs or cores during a delayed reconfiguration, you might experience unexpected behavior. Memory that belongs to the primary domain is reassigned so that only memory in the specified socket is bound to the domain.
Workaround: Only modify virtual CPUs or cores by using the shrink-socket and grow-socket commands while not in a delayed reconfiguration.
Bug ID 20425271: While triggering a recovery after dropping into factory-default, recovery mode fails if the system boots from a different device than the one booted in the previously active configuration. This failure might occur if the active configuration uses a boot device other than the factory-default boot device.
Workaround: Perform the following steps any time you want to save a new configuration to the SP:
Determine the full PCI path to the boot device for the primary domain.
Use this path for the ldm set-var command in Step 4.
Remove any currently set boot-device property from the primary domain.
Performing this step is necessary only if the boot-device property has a value set. If the property does not have a value set, an attempt to remove the boot-device property results in the boot-device not found message.
primary# ldm rm-var boot-device primary
Save the current configuration to the SP.
primary# ldm add-spconfig config-name
Explicitly set the boot-device property for the primary domain.
primary# ldm set-var boot-device=value primary
If you set the boot-device property after saving the configuration to the SP as described, the specified boot device is booted when recovery mode is triggered.
Recovery: If recovery mode has already failed as described, perform the following steps:
Explicitly set the boot device to the one used in the last running configuration.
primary# ldm set-var boot-device=value primary
Reboot the primary domain.
primary# reboot
The reboot enables the recovery to proceed.
Bug ID 20426593: The ldm list-rsrc-group might show I/O resource information under the incorrect resource group when the numerical suffix of the resource group name has more than one digit.
In the following example, the ldm list-rsrc-group command incorrectly shows the PCIE bus information for /SYS/CMIOU10 under the /SYS/CMIOU1 resource group.
primary# ldm list-io NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ .. /SYS/CMIOU10/PCIE2 PCIE pci_50 primary OCC /SYS/CMIOU10/PCIE3 PCIE pci_51 primary OCC /SYS/CMIOU10/PCIE1 PCIE pci_53 primary OCC .. . primary# ldm list-rsrc-group -l -o io /SYS/CMIOU1 NAME /SYS/CMIOU1 IO DEVICE PSEUDONYM BOUND pci@305 pci_5 alt-root pci@306 pci_6 primary pci@308 pci_8 alt-root pci@309 pci_9 primary pci@332 pci_50 primary pci@333 pci_51 primary pci@335 pci_53 primary
PCIe busses pci_50, pci_51, and pci_53 are incorrectly shown under the /SYS/CMIOU1 resource group instead of the /SYS/CMIOU10 resource group.
Workaround: Run the ldm list-io -l command to obtain the correct resource group name for the PCIe bus from the I/O name. For example, the PCIe bus with the I/O name /SYS/CMIOU10/PCIE2 should belong to /SYS/CMIOU10 and not /SYS/CMIOU1.
Bug ID 20321459: If a virtual disk back end is missing and cannot be validated, the Logical Domains Manager does not recover a guest domain that is assigned the back end. This is true even if multipathing is configured.
Workaround: Perform the following steps:
Temporarily disable device validation.
primary# svccfg -s ldmd setprop ldmd/device_validation integer: 0 primary# svcadm refresh ldmd primary# svcadm restart ldmd
Recover the guest domains that are missing the back end manually.
Note that when device validation is disabled, the Logical Domains Manager adds a virtual device to a guest domain even if the back end or associated physical network device does not exist. Thus, ensure that you re-enable device validation after you have recovered the domain configuration.
primary# svccfg -s ldmd setprop ldmd/device_validation integer: -1 primary# svcadm refresh ldmd primary# svcadm restart ldmd
Bug ID 20307560: If you create a guest domain that uses any number of virtual CPUs and any amount of memory and run the ldm bind command, the command might issue an Invalid response error. This error might occur if the primary domain has all of the resources before you create the guest domain and run the ldm bind command.
Workaround: Remove some memory from the primary domain and then run the ldm bind command.
Bug ID 20257979: One of the methods to create virtual functions from a physical function is to place the root domain that owns the physical function into delayed reconfiguration. When in the delayed reconfiguration, you can create one or more virtual functions by using the ldm create-vf command.
Normally, an ldm list-io command shows that the physical function and its child virtual functions are in a clean state. However, if the ldmd service is restarted before the root domain is rebooted, or if the delayed reconfiguration is cancelled, the physical function and its virtual functions are marked with the INV state.
The same problem occurs when virtual functions are being destroyed while in delayed reconfiguration. When destroying virtual functions, restarting the Logical Domains Manager and then running the ldm list-io output shows no physical functions for that root domain.
Workaround: Perform one of the following workarounds:
Cancel the delayed reconfiguration.
When you next run the ldm list-io command, the physical function and any of its existing virtual functions are in a valid state.
Reboot the root domain that was in delayed reconfiguration.
Note that any modifications that you performed while the root domain was in delayed reconfiguration will be present in the OS on the guest domain.
Bug ID 20187197: If power capping is enabled, sometimes the lowest power state cannot be set. The power state was lowered, but not to the lowest state. When this occurs, the highest power state might not be resumed after setting a higher power limit that warrants the highest power state.
This situation occurs when setting a new power cap limit that is close to the minimum power limit for the system or when setting a new power cap limit where the difference between the actual power (when not power capped) and the new limit would cause the lowest power state to be used.
Workaround: Peform one of the following steps:
Disable the power cap
Set a new power cap limit that is not large or close to the minimum power limit for the system
Bug ID 20004281: When a primary domain is power cycled, ixgbevf nodes on the I/O domain might be reported as disabled by the ipadm command, and as nonexistent by the ifconfig command.
Workaround: Re-enable the IP interfaces:
# svcadm restart network/physical:default
Bug ID 19943809: The hxge driver cannot use interfaces inside an I/O domain when the card is assigned by using the direct I/O feature.
The following warning is issued to the system log file:
WARNING: hxge0 : <== hxge_setup_mutexes: failed 0x1
Workaround: Add the following line to the /etc/system and reboot:
set px:px_force_intx_support=1
Bug ID 19932842: An attempt to set an OBP variable from a guest domain might fail if you use the eeprom or the OBP command before one of the following commands is completed:
ldm add-spconfig
ldm remove-spconfig
ldm set-spconfig
ldm bind
This problem might occur when these commands take more than 15 seconds to complete.
# /usr/sbin/eeprom boot-file\=-k promif_ldom_setprop: promif_ldom_setprop: ds response timeout eeprom: OPROMSETOPT: Invalid argument boot-file: invalid property
Recovery: Retry the eeprom or OBP command after the ldm operation has completed.
Workaround: Retry the eeprom or OBP command on the affected guest domain. You might be able to avoid the problem by using the ldm set-var command on the primary domain.
Bug ID 19449221: A domain can have no more than 999 virtual network devices (vnets).
Workaround: Limit the number of vnets on a domain to 999.
Bug ID 19078763: Oracle VM Server for SPARC no longer keeps track of freed MAC addresses. MAC addresses are now allocated by randomly selecting an address and then confirming that address is not used by any logical domains on the local network.
Bug ID 18083904: The firmware for Sun Storage 16 Gb Fibre Channel Universal HBA, Emulex cards does not support setting bandwidth controls. The HBA firmware ignores any value that you specify for the bw-percent property.
Workaround: None.
Bug ID 18001028: In the root domain, the Oracle Solaris device path for a Fibre Channel virtual function is incorrect.
For example, the incorrect path name is pci@380/pci@1/pci@0/pci@6/fibre-channel@0,2 while it should be pci@380/pci@1/pci@0/pci@6/SUNW,emlxs@0,2.
The ldm list-io -l output shows the correct device path for the Fibre Channel virtual functions.
Workaround: None.
Bug ID 17576087: Performing a power cycle of the system to a saved configuration might not restore the memory after the faulty memory has been replaced.
Workaround: After you replace the faulty memory, perform a power cycle of the system to the factory-default configuration. Then, perform a power cycle of the system to the configuration that you want to use.
You cannot configure a DLMP aggregation on an SR-IOV NIC virtual function or a virtual network device in a guest domain.
Bug ID 17422973: The installation of the Oracle Solaris 11.1 OS on a single-slice disk might fail with the following error on a SPARC T4 server that runs at least system firmware version 8.4.0, a SPARC T5, SPARC M5, or SPARC M6 server that runs at least system firmware version 9.1.0, and a Fujitsu M10 server that runs at least XCP version 2230:
cannot label 'c1d0': try using fdisk(1M) and then provide a specific slice Unable to build pool from specified devices: invalid vdev configuration
Workaround: Relabel the disk with an SMI label.
Bug ID 17051532: When a PCIe device or a virtual function is removed from a guest domain, the autosave configuration is not updated. This problem might result in the device or virtual function reappearing in the guest domain after you perform an autosave recovery; namely, when autorecovery_policy=3. This problem can also cause the ldm add-spconfig -r command to fail with the Autosave configuration config-name is invalid message if you do not perform another ldm command that causes the autosave to be updated.
Workaround: Perform one of the following workarounds:
Save a new configuration after you remove the PCIe device or virtual function.
primary# ldm add-config new-config-name
Refresh the saved configuration after you remove the PCIe device or virtual function by removing and re-creating the configuration.
primary# ldm rm-config config-name primary# ldm add-config config-name
Note that this bug prevents the ldm add-config -r config-name command from working properly.
Issue another ldm command that causes an autosave update to occur such as ldm set-vcpu, ldm bind, or ldm unbind.
Bug ID 17020950: After migrating an active domain from a SPARC T4 platform to a SPARC T5, SPARC M5, or SPARC M6 platform that was bound using firmware version 8.3, performing a memory dynamic reconfiguration might result in a guest domain panic.
Workaround: Before you perform the migration, update the SPARC T4 system with version 8.4 of the system firmware. Then, rebind the domain.
Bug ID 17020481: A guest domain is in transition state (t) after a reboot of the primary domain. This problem arises when a large number of virtual functions are configured on the system.
Workaround: To avoid this problem, retry the OBP disk boot command several times to avoid a boot from the network.
Perform the following steps on each domain:
Access the console of the domain.
primary# telnet localhost 5000
Set the boot-device property.
ok> setenv boot-device disk disk disk disk disk disk disk disk disk disk net
The number of disk entries that you specify as the value of the boot-device property depends on the number of virtual functions that are configured on the system. On smaller systems, you might be able to include fewer instances of disk in the property value.
Verify that the boot-device property is set correctly by using the printenv.
ok> printenv
Return to the primary domain console.
Repeat Steps 1-4 for each domain on the system.
Reboot the primary domain.
primary# shutdown -i6 -g0 -y
Bug ID 16713362: PCIe slots cannot currently be removed from non-primary root domains during the recovery operation. The PCIe slots remain assigned to the non-primary root domain.
Workaround: The PCIe slots must be removed manually from the non-primary root domain and assigned to the appropriate I/O domain or domains after the recovery operation has finished.
For information about how to remove PCIe slots from a non-primary root domain, see Non-primary Root Domains Overview in Oracle VM Server for SPARC 3.2 Administration Guide .
Recovering I/O domains that use PCIe slots owned by non-primary root domains depends on the I/O domain configuration:
If the I/O domain uses only PCIe slots and none of its PCIe slots are available, the I/O domain is not recovered and is left in the unbound state with the PCIe slots marked as evacuated.
If the I/O domain uses SR-IOV virtual functions and PCIe slots, the domain is recovered with the unavailable PCIe slots marked as evacuated.
Use the ldm add-io command to add the PCIe slots to an I/O domain after you have manually removed them from the non-primary root domain.
Bug ID 16617981: ldm list output does not show the evacuated property for the physical I/O devices.
Workaround: Use the –p option with any of the ldm list commands to show the evacuated property for physical I/O devices.
Bug ID 16486383: This problem can occur if you assign a PCI device or bus directly to a guest domain where the domain does not have a core assigned from the /SYS/DCU where the PCI card physically resides. Because the hypervisor resets PCI devices on behalf of guest domains, during each guest domain reboot a domain with cores on the DCU connected to the PCI device might panic. More PCI devices assigned to non-DCU-local guests increases the possibility of panics.
Workaround: Perform one of the following workarounds:
Ensure that when you assign PCI devices to a guest domain, the card is located physically in the same DCU as the cores.
Manually assign cores for physical card placement flexibility.
As an example, for a PCI device on IOU0 (pci_0 through pci_15), choose a core between 0 and 127, and allocate it to the domain.
# ldm add-core cid=16 domain-name
View the system cores by using the following command:
# ldm ls-devices -a core
For a PCI device on IOU1 (pci_16 through pci_31), choose a core between 128 and 255. For a PCI device on IOU2 (pci_32 through pci_47), choose a core between 256 and 383. For a PCI device on IOU3 (pci_48 through pci_63), choose a core between 384 and 511.
Bug ID 16299053: After disabling a PCIe device, you might experience unexpected behavior. The subdevices that are under the disabled PCIe device revert to the non-assigned names while the PCIe device is still owned by the domain.
Workaround: If you decide to disable a PCIe slot on the ILOM, ensure that the PCIe slot is not assigned to a domain by means of the direct I/O (DIO) feature. That is, first ensure that the PCIe slot is assigned to the corresponding root domain before disabling the slot on the ILOM.
If you disable the PCIe slot on the ILOM while the PCIe slot is assigned to a domain with DIO, stop that domain and reassign the device to the root domain for the correct behavior.
Bug ID 16284767: This warning on the Oracle Solaris console means the interrupt supply was exhausted while attaching I/O device drivers:
WARNING: ddi_intr_alloc: cannot fit into interrupt pool
The hardware provides a finite number of interrupts, so Oracle Solaris limits how many each device can use. A default limit is designed to match the needs of typical system configurations, however this limit may need adjustment for certain system configurations.
Specifically, the limit may need adjustment if the system is partitioned into multiple logical domains and if too many I/O devices are assigned to any guest domain. Oracle VM Server for SPARC divides the total interrupts into smaller sets given to guest domains. If too many I/O devices are assigned to a guest domain, its supply might be too small to give each device the default limit of interrupts. Thus, it exhausts its supply before it completely attaches all the drivers.
Some drivers provide an optional callback routine which allows Oracle Solaris to automatically adjust their interrupts. The default limit does not apply to these drivers.
Workaround: Use the ::irmpools and ::irmreqs MDB macros to determine how interrupts are used. The ::irmpools macro shows the overall supply of interrupts divided into pools. The ::irmreqs macro shows which devices are mapped to each pool. For each device, ::irmreqs shows whether the default limit is enforced by an optional callback routine, how many interrupts each driver requested, and how many interrupts the driver is given.
The macros do not show information about drivers that failed to attach. However, the information that is shown helps calculate the extent to which you can adjust the default limit. Any device that uses more than one interrupt without providing a callback routine can be forced to use fewer interrupts by adjusting the default limit. Reducing the default limit below the amount that is used by such a device results in freeing of interrupts for use by other devices.
To adjust the default limit, set the ddi_msix_alloc_limit property to a value from 1 to 8 in the /etc/system file. Then, reboot the system for the change to take effect.
To maximize performance, start by assigning larger values and decrease the values in small increments until the system boots successfully without any warnings. Use the ::irmpools and ::irmreqs macros to measure the adjustment's impact on all attached drivers.
For example, suppose the following warnings are issued while booting the Oracle Solaris OS in a guest domain:
WARNING: emlxs3: interrupt pool too full. WARNING: ddi_intr_alloc: cannot fit into interrupt pool
The ::irmpools and ::irmreqs macros show the following information:
# echo "::irmpools" | mdb -k ADDR OWNER TYPE SIZE REQUESTED RESERVED 00000400016be970 px#0 MSI/X 36 36 36 # echo "00000400016be970::irmreqs" | mdb -k ADDR OWNER TYPE CALLBACK NINTRS NREQ NAVAIL 00001000143acaa8 emlxs#0 MSI-X No 32 8 8 00001000170199f8 emlxs#1 MSI-X No 32 8 8 000010001400ca28 emlxs#2 MSI-X No 32 8 8 0000100016151328 igb#3 MSI-X No 10 3 3 0000100019549d30 igb#2 MSI-X No 10 3 3 0000040000e0f878 igb#1 MSI-X No 10 3 3 000010001955a5c8 igb#0 MSI-X No 10 3 3
The default limit in this example is eight interrupts per device, which is not enough interrupts to accommodate the attachment of the final emlxs3 device to the system. Assuming that all emlxs instances behave in the same way, emlxs3 probably requested 8 interrupts.
By subtracting the 12 interrupts used by all of the igb devices from the total pool size of 36 interrupts, 24 interrupts are available for the emlxs devices. Dividing the 24 interrupts by 4 suggests that 6 interrupts per device would enable all emlxs devices to attach with equal performance. So, the following adjustment is added to the /etc/system file:
set ddi_msix_alloc_limit = 6
When the system successfully boots without warnings, the ::irmpools and ::irmreqs macros show the following updated information:
# echo "::irmpools" | mdb -k ADDR OWNER TYPE SIZE REQUESTED RESERVED 00000400018ca868 px#0 MSI/X 36 36 36 # echo "00000400018ca868::irmreqs" | mdb -k ADDR OWNER TYPE CALLBACK NINTRS NREQ NAVAIL 0000100016143218 emlxs#0 MSI-X No 32 8 6 0000100014269920 emlxs#1 MSI-X No 32 8 6 000010001540be30 emlxs#2 MSI-X No 32 8 6 00001000140cbe10 emlxs#3 MSI-X No 32 8 6 00001000141210c0 igb#3 MSI-X No 10 3 3 0000100017549d38 igb#2 MSI-X No 10 3 3 0000040001ceac40 igb#1 MSI-X No 10 3 3 000010001acc3480 igb#0 MSI-X No 10 3 3
Bug ID 16232834: When using the ldm add-vcpu command to assign CPUs to a domain, the Oracle Solaris OS might panic with the following message:
panic[cpu16]/thread=c4012102c860: mpo_cpu_add: Cannot read MD
This panic occurs if the following conditions exist:
Additional DCUs have been assigned to a host
The host is started by using a previously saved SP configuration that does not contain all the hardware that is assigned to the host
The target domain of the ldm add-vcpu operation is the domain that panics. The domain recovers with the additional CPUs when it reboots.
Workaround: Do not use configurations that are generated with fewer hardware resources than are assigned to the host.
To avoid the problem, do not add CPUs as described in the problem description. Or, perform the following steps:
Generate a new SP configuration after the DCUs have been added.
For example, the following command creates a configuration called new-config-more-dcus:
primary# ldm add-config new-config-more-dcus
Shutdown the domain.
Stop the host.
-> stop /HOST
Start the host.
-> start /HOST
Bug ID 16224353: After rebooting the primary domain, ixgbevf instances in primary domain might not work.
Workaround: None.
Bug ID 16219069: On a primary domain that runs the Oracle Solaris 10 1/13 OS, the virtual function interfaces might not be automatically plumbed or assigned an IP address based on the /etc/hostname.vf-interface file.
This issue occurs when you boot or reboot a SPARC T3, SPARC T4 or SPARC T5 system that runs the Oracle Solaris 10 1/13 OS on the primary domain. This problem affects virtual functions that have been created both on on-board physical functions and on add-in physical functions. This issue does not occur when you boot a Logical Domains guest domain image.
Bug ID 16080855: During a reboot or shutdown of the primary domain, the primary domain might experience a kernel panic with a panic message similar to the following:
panic[cpu2]/thread=c40043b818a0: mutex_enter: bad mutex, lp=c4005fa01c88 owner=c4005f70aa80 thread=c40043b818a0 000002a1075c3630 ldc:ldc_mem_rdwr_cookie+20 (c4005fa01c80, c4004e2c2000,2a1075c37c8, 6c80000, 1, 0) %l0-3: 00000000001356a4 0000000000136800 0000000000000380 00000000000002ff %l4-7: 00000000001ad3f8 0000000000000004 00000000ffbffb9c 0000c4005fa01c88 000002a1075c3710 vldc:i_vldc_ioctl_write_cookie+a4 (c4004c400030, 380,ffbff898, 100003, 0, 70233400) %l0-3: 0000000006c80000 0000000000156dc8 0000000000000380 0000000000100003 %l4-7: 00000000702337b0 000002a1075c37c8 0000000000040000 0000000000000000 000002a1075c37f0 vldc:vldc_ioctl+1a4 (3101, c4004c400030, ffbff898,c4004c400000, c4004c438030, 0) %l0-3: 0000000000100003 0000000000000000 000000007b340400 0000c4004c438030 %l4-7: 0000c4004c400030 0000000000000000 0000000000000000 0000000000000000 000002a1075c38a0 genunix:fop_ioctl+d0 (c4004d327800, 0, ffbff898, 100003,c4004384f718, 2a1075c3acc) %l0-3: 0000000000003103 0000000000100003 000000000133ce94 0000c4002352a480 %l4-7: 0000000000000000 0000000000000002 00000000000000c0 0000000000000000 000002a1075c3970 genunix:ioctl+16c (3, 3103, ffbff898, 3, 134d50, 0) %l0-3: 0000c40040e00a50 000000000000c6d3 0000000000000003 0000030000002000 %l4-7: 0000000000000003 0000000000000004 0000000000000000 0000000000000000
Recovery: Allow the primary domain to reboot. If the primary domain is configured not to reboot after a crash, manually boot the primary domain.
Bug ID 16071170: On a SPARC M5-32 or a SPARC M6-32 system, the internal SAS controllers are exported as SR-IOV-enabled controllers even though these cards do not support SR-IOV.
The Oracle VM Server for SPARC log shows the following messages when attempting to create the physical function on these cards:
Dec 11 04:27:54 warning: Dropping pf pci@d00/pci@1/pci@0/pci@0/pci@0/pci@4/LSI,sas@0: no IOV capable driver Dec 11 04:27:54 warning: Dropping pf pci@d80/pci@1/pci@0/pci@c/pci@0/pci@4/LSI,sas@0: no IOV capable driver Dec 11 04:27:54 warning: Dropping pf pci@c00/pci@1/pci@0/pci@c/pci@0/pci@4/LSI,sas@0: no IOV capable driver Dec 11 04:27:54 warning: Dropping pf pci@e00/pci@1/pci@0/pci@0/pci@0/pci@4/LSI,sas@0: no IOV capable driver
The system has four LSI SAS controller ports, each in one IOU of the SPARC M5-32 and SPARC M6-32 assembly. This error is reported for each port.
Workaround: You can ignore these messages. These messages indicate only that the LSI-SAS controller devices on the system are capable of SR-IOV but no SR-IOV support is available for this hardware.
Bug ID 16068376: On a T5-8 with approximately 128 domains, some ldm commands such as ldm list might show 0 seconds as the uptime for all domains.
Workaround: Log in to the domain and use the uptime command to determine the domain's uptime.
Bug ID 15962837: A core evacuation does not complete when a chip-level fault occurs. An evacuation that is followed by a core fault works as expected, but the chip-level fault does not complete when trying to retire an entire CMP node.
Workaround: None. Schedule a chip replacement when you diagnose a chip-level fault.
Bug ID 15942036: If you perform a memory DR operation to reduce memory below four Gbytes, the operation might hang forever. If you issue an ldm cancel-op memdr command on that domain, an incorrect message is issued:
The memory removal operation has completed. You cannot cancel this operation.
Despite the message, the memory DR operation is hung and you might not be able to perform other ldmd operations on that guest domain.
Workaround: Do not attempt to reduce memory in any domain below four Gbytes. If you are already in this state, issue the ldm stop -f command or log in to the domain and reboot it.
Bug ID 15826354: CPU dynamic reconfiguration (DR) of a very large number of CPUs causes the ldmd daemon to return a failure. Although ldmd times out, the DR operation continues in the background and eventually succeeds. Nevertheless, ldmd is no longer aligned with the resulting domain and subsequent DR operations might not be permitted.
For example:
# ldm ls NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME primary active -n-cv- UART 7 20G 2.7% 0.4% 1h 41m ldg0 active -n---- 5000 761 16G 75% 51% 6m # ldm rm-vcpu 760 ldg0 Request to remove cpu(s) sent, but no valid response received VCPU(s) will remain allocated to the domain, but might not be available to the guest OS Resource removal failed # ldm set-vcpu 1 ldg0 Busy executing earlier command; please try again later. Unable to remove the requested VCPUs from domain ldg0 Resource modification failed # ldm ls NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME primary active -n-cv- UART 7 20G 0.9% 0.1% 1h 45m ldg0 active -n---- 5000 761 16G 100% 0.0% 10m
Workaround: Wait a few minutes and then run the ldm set-vcpu command again:
# ldm set-vcpu 1 ldg0 # ldm ls NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME primary active -n-cv- UART 7 20G 0.9% 0.1% 1h 50m ldg0 active -n---- 5000 1 16G 52% 0.0% 15m
Note that 760 exceeds the recommended maximum.
Bug ID 15825330: Oracle VM Server for SPARC appears to hang at startup on some SPARC T4-4 configurations that have only a single processor board.
Workaround: Ensure that a processor board always occupies the slots for processors 0 and 1. Restarting the system in such a configuration enables the Oracle VM Server for SPARC software to start up.
Bug ID 15821246: On a system that runs the Oracle Solaris 11.1 OS, changing the threading property value on a migrated domain from max-ipc to max-throughput can lead to a panic on the guest domain.
Workaround: Do not change the threading status for a migrated guest domain until it is rebooted.
Bug ID 15820741: On an Oracle Solaris 11.1 system that has two domains with direct I/O configurations, the control domain might hang when you reboot it.
Recovery: To recover from the reboot hang, reset the control domain by issuing the following command on the SP:
-> reset -f /HOST/domain/control
Bug ID 15812823: In low free-memory situations, not all memory blocks can be used as part of a memory DR operation due to size. However, these memory blocks are included in the amount of free memory. This situation might lead to a smaller amount of memory being added to the domain than expected. No error message is shown if this situation occurs.
Workaround: None.
Bug ID 15783851: You might encounter a problem when attempting to re-create a configuration from an XML file that incorrectly represents virtual function constraints.
This problem occurs when you use the ldm list-constraints -x command to save the configuration of a domain that has PCIe virtual functions.
If you later re-create the domain by using the ldm add-domain -i command, the original virtual functions do not exist, and a domain bind attempt fails with the following error message:
No free matching PCIe device...
Even if you create the missing virtual functions, another domain bind attempt fails with the same error message because the virtual functions are miscategorized as PCIe devices by the ldm add-domain command.
Workaround: Perform the following steps:
Save the information about the virtual functions by using the ldm list-io command.
Destroy each affected domain by using the ldm rm-dom command.
Create all the required virtual functions by using the ldm create-vf command.
Rebuild the domains by using the ldm command.
When you use the ldm add-io command to add each virtual function, it is correctly categorized as a virtual function device, so the domain can be bound.
For information about rebuilding a domain configuration that uses virtual functions, see ldm init-system Command Might Not Correctly Restore a Domain Configuration on Which Physical I/O Changes Have Been Made.
Bug ID 15783608: When you change the control domain from using physically constrained cores to using unconstrained CPU resources, you might see the following extraneous message:
Whole-core partitioning has been removed from domain primary,because dynamic reconfiguration has failed and the domain is now configured with a partial CPU core.
Workaround: You can ignore this message.
Bug ID 15783031: You might experience problems when you use the ldm init-system command to restore a domain configuration that has used direct I/O or SR-IOV operations.
A problem arises if one or more of the following operations have been performed on the configuration to be restored:
A slot has been removed from a bus that is still owned by the primary domain.
A virtual function has been created from a physical function that is owned by the primary domain.
A virtual function has been assigned to the primary domain, to other guest domains, or to both.
A root complex has been removed from the primary domain and assigned to a guest domain, and that root complex is used as the basis for further I/O virtualization operations.
In other words, you created a non-primary root domain and performed any of the previous operations.
To ensure that the system remains in a state in which none of the previous actions have taken place, see Using the ldm init-system Command to Restore Domains on Which Physical I/O Changes Have Been Made.
Bug ID 15782994: Logical Domains Manager might crash and restart when you attempt an operation that affects the configuration of many domains. You might see this issue when you attempt to change anything related to the virtual networking configuration and if many virtual network devices in the same virtual switch exist across many domains. Typically, this issue is seen with approximately 90 or more domains that have virtual network devices connected to the same virtual switch, and the inter-vnet-link property is enabled (the default behavior). Confirm the symptom by finding the following message in the ldmd log file and a core file in the /var/opt/SUNWldm directory:
Frag alloc for 'domain-name'/MD memory of size 0x80000 failed
Workaround: Avoid creating many virtual network devices connected to the same virtual switch. If you intend to do so, set the inter-vnet-link property to off on the virtual switch. Be aware that this option might negatively affect network performance between guest domains.
Bug ID 15778392: The control domain requires the lowest core in the system. So, if core ID 0 is the lowest core, it cannot be shared with any other domain if you want to apply the whole-core constraint to the control domain.
For example, if the lowest core in the system is core ID 0, the control domain should look similar to the following output:
# ldm ls -o cpu primary NAME primary VCPU VID PID CID UTIL STRAND 0 0 0 0.4% 100% 1 1 0 0.2% 100% 2 2 0 0.1% 100% 3 3 0 0.2% 100% 4 4 0 0.3% 100% 5 5 0 0.2% 100% 6 6 0 0.1% 100% 7 7 0 0.1% 100%
Bug ID 15775668: A domain that has a higher-priority policy can steal virtual CPU resources from a domain with a lower-priority policy. While this “stealing” action is in progress, you might see the following warning messages in the ldmd log every 10 seconds:
warning: Unable to unconfigure CPUs out of guest domain-name
Workaround: You can ignore these misleading messages.
Bug ID 15775637: An I/O domain has a limit on the number of interrupt resources that are available per root complex.
On SPARC T3 and SPARC T4 systems, the limit is approximately 63 MSI/X vectors. Each igb virtual function uses three interrupts. The ixgbe virtual function uses two interrupts.
If you assign a large number of virtual functions to a domain, the domain runs out of system resources to support these devices. You might see messages similar to the following:
WARNING: ixgbevf32: interrupt pool too full. WARNING: ddi_intr_alloc: cannot fit into interrupt pool
Bug ID 15771384: A domain's guest console might freeze if repeated attempts are made to connect to the console before and during the time the console is bound. For example, this might occur if you use an automated script to grab the console as a domain is being migrated onto the machine.
Workaround: To unfreeze console, perform the following commands on the domain that hosts the domain's console concentrator (usually the control domain):
primary# svcadm disable vntsd primary# svcadm enable vntsd
Bug ID 15765858: The resources on the root complex are not restored after you destroy all the virtual functions and return the slots to the root domain.
Workaround: Set the iov option to off for the specific PCIe bus.
primary# ldm start-reconf primary primary# ldm set-io iov=off pci_0
Bug ID 15761509: Use only the PCIe cards that support the Direct I/O (DIO) feature, which are listed in this support document.
Workaround: Use the ldm add-io command to add the card to the primary domain again.
Bug ID 15759601: If you issue an ldm stop command immediately after an ldm start command, the ldm stop command might fail with the following error:
LDom domain-name stop notification failed
Workaround: Reissue the ldm stop command.
Bug ID 15758883: The ldm init-system command fails to restore the named CPU core constraints for guest domains from a saved XML file.
Workaround: Perform the following steps:
Create an XML file for the primary domain.
# ldm ls-constraints -x primary > primary.xml
Create an XML file for the guest domain or domains.
# ldm ls-constraints -x domain-name[,domain-name][,...] > guest.xml
Power cycle the system and boot a factory default configuration.
Apply the XML configuration to the primary domain.
# ldm init-system -r -i primary.xml
Apply the XML configuration to the guest domain or domains.
# ldm init-system -f -i guest.xml
Bug ID 15750727: A system might panic when you reboot a primary domain that has a very large number of virtual functions assigned to it.
Workaround: Perform one of the following workarounds:
Decrease the virtual function number to reduce the number of failed virtual functions. This change might keep the chip responsive.
Create more Interrupt Resource Management (IRM) pools for the ixgbe virtual function because only one IRM pool is created by default for all the ixgbe virtual functions on the system.
Bug ID 15748348: When the primary domain shares the lowest physical core (usually 0) with another domain, attempts to set the whole-core constraint for the primary domain fail.
Workaround: Perform the following steps:
Determine the lowest bound core that is shared by the domains.
# ldm list -o cpu
Unbind all the CPU threads of the lowest core from all domains other than the primary domain.
As a result, CPU threads of the lowest core are not shared and are free for binding to the primary domain.
Set the whole-core constraint by doing one of the following:
Bind the CPU threads to the primary domain, and set the whole-core constraint by using the ldm set-vcpu -c command.
Use the ldm set-core command to bind the CPU threads and set the whole-core constraint in a single step.
Bug ID 15738561: The ldm list-io command might show the UNK or INV state for PCIe slots and SR-IOV virtual functions if the command runs immediately after the primary domain is booted. This problem is caused by the delay in the Logical Domains agent's reply from the Oracle Solaris OS.
This problem has been reported only on a few systems.
Workaround: The status of the PCIe slots and the virtual functions is automatically updated after the information is received from the Logical Domains agent.
The following bugs describe failures that might occur when removing a large number of CPUs from a domain.
Control domain.
Bug ID 15677358: Use a delayed reconfiguration rather than dynamic reconfiguration to remove more than 100 CPUs from the control domain (also known as the primary domain). Use the following steps:
Use the ldm start-reconf primary command to put the control domain in delayed reconfiguration mode.
Remove the desired number of CPU resources.
If you make a mistake while removing CPU resources, do not attempt another request to remove CPUs while the control domain is still in a delayed reconfiguration. If you do so, the commands will fail (see Only One CPU Configuration Operation Is Permitted to Be Performed During a Delayed Reconfiguration in Oracle VM Server for SPARC 3.2 Administration Guide ). Instead, undo the delayed reconfiguration operation by using the ldm cancel-reconf command, and start over.
Reboot the control domain.
Guest domain.
Bug ID 15726205: You might see the following error message when you attempt to remove a large number of CPUs from a guest domain:
Request to remove cpu(s) sent, but no valid response received VCPU(s) will remain allocated to the domain, but might not be available to the guest OS Resource modification failed
Workaround: Stop the guest domain before you remove more than 100 CPUs from the domain.
Bug ID 15721872: You cannot use Oracle Solaris hot-plug operations to hot-remove a PCIe endpoint device after that device is removed from the primary domain by using the ldm rm-io command. For information about replacing or removing a PCIe endpoint device, see Making PCIe Hardware Changes in Oracle VM Server for SPARC 3.2 Administration Guide .
Bug ID 15707426: If the system log service, svc:/system/system-log, fails to start and does not come online, the Logical Domains agent service will not come online. When the Logical Domains agent service is not online, the virtinfo, ldm add-vsw, ldm add-vdsdev, and ldm list-io commands might not behave as expected.
Workaround: Ensure that the svc:/ldoms/agents:default service is enabled and online:
# svcs -l svc:/ldoms/agents:default
If the svc:/ldoms/agents:default service is offline, verify that the service is enabled and that all dependent services are online.
Bug ID 15702475: A No response message might appear in the Oracle VM Server for SPARC log when a loaded domain's DRM policy expires after the CPU count has been substantially reduced. The ldm list output shows that more CPU resources are allocated to the domain than is shown in the psrinfo output.
Workaround: Use the ldm set-vcpu command to reset the number of CPUs on the domain to the value that is shown in the psrinfo output.
Bug ID 15701258: Running the ldm set-vcpu 1 command on a guest domain that has over 100 virtual CPUs and some cryptographic units fails to remove the virtual CPUs. The virtual CPUs are not removed because of a DR timeout failure. The cryptographic units are successfully removed.
Workaround: Use the ldm rm-vcpu command to remove all but one of the virtual CPUs from the guest domain. Do not remove more than 100 virtual CPUs at a time.
Bug ID 15668881: When using the pkgadd command to install the SUNWldm.v package from a directory that is exported by means of NFS from a Sun ZFS Storage Appliance, you might see the following error message:
cp: failed to set acl entries on /var/svc/manifest/platform/sun4v/ldmd.xml
Workaround: Ignore this message.
Bug ID 15668368: A SPARC T3-1 system can be installed with dual-ported disks, which can be accessed by two different direct I/O devices. In this case, assigning these two direct I/O devices to different domains can cause the disks to be used by both domains and affect each other based on the actual usage of those disks.
Workaround: Do not assign direct I/O devices that have access to the same set of disks to different I/O domains. To determine whether you have dual-ported disks on a SPARC T3-1 system, run the following command on the SP:
-> show /SYS/SASBP
If the output includes the following fru_description value, the corresponding system has dual-ported disks:
fru_description = BD,SAS2,16DSK,LOUISE
If dual disks are found to be present in the system, ensure that both of the following direct I/O devices are always assigned to the same domain:
pci@400/pci@1/pci@0/pci@4 /SYS/MB/SASHBA0 pci@400/pci@2/pci@0/pci@4 /SYS/MB/SASHBA1
Bug ID 15667770: When multiple NIU nxge instances are plumbed on a domain, the ldm rm-mem and ldm set-mem commands, which are used to remove memory from the domain, might never complete. To determine whether the problem has occurred during a memory removal operation, monitor the progress of the operation with the ldm list -o status command. You might have encountered this problem if the progress percentage remains constant for several minutes.
Workaround: Cancel the ldm rm-mem or ldm set-mem command, and check whether a sufficient amount of memory was removed. If not, a subsequent memory removal command to remove a smaller amount of memory might complete successfully.
If the problem has occurred on the primary domain, do the following:
Start a delayed reconfiguration operation on the primary domain.
# ldm start-reconf primary
Assign the desired amount of memory to the domain.
Reboot the primary domain.
If the problem occurred on another domain, stop the domain before adjusting the amount of memory that is assigned to the domain.
Bug ID 15664666: When a reset dependency is created, an ldm stop -a command might result in a domain with a reset dependency being restarted instead of only stopped.
Workaround: First, issue the ldm stop command to the master domain. Then, issue the ldm stop command to the slave domain. If the initial stop of the slave domain results in a failure, issue the ldm stop -f command to the slave domain.
Bug ID 15655199: Sometimes an in-use MAC address is not detected and it is erroneously reassigned.
Workaround: Manually ensure that an in-use MAC address cannot be reassigned.
Bug ID 15654965: The ldmconfig script cannot properly create a stored domain configuration on the service processor (SP).
Workaround: Do not power cycle the system after the ldmconfig script completes and the domain reboots. Instead, perform the following manual steps:
Add the configuration to the SP.
# ldm add-spconfig new-config-name
Remove the primary-with-clients configuration from the SP.
# ldm rm-spconfig primary-with-clients
Power cycle the system.
If you do not perform these steps prior to the system being power cycled, the existence of the primary-with-client configuration causes the domains to be inactive. In this case, you must bind each domain manually and then start them by running the ldm start -a command. After the guests have booted, repeating this sequence enables the guest domains to be booted automatically after a power cycle.
Bug ID 15631119: If you modify the maximum transmission unit (MTU) of a virtual network device on the control domain, a delayed reconfiguration operation is triggered. If you subsequently cancel the delayed reconfiguration, the MTU value for the device is not restored to the original value.
Recovery: Rerun the ldm set-vnet command to set the MTU to the original value. Resetting the MTU value puts the control domain into delayed reconfiguration mode, which you need to cancel. The resulting MTU value is now the original, correct MTU value.
# ldm set-vnet mtu=orig-value vnet1 primary # ldm cancel-op reconf primary
Bug ID 15600969: If all the hardware cryptographic units are dynamically removed from a running domain, the cryptographic framework fails to seamlessly switch to the software cryptographic providers, and kills all the ssh connections.
Recovery: Re-establish the ssh connections after all the cryptographic units are removed from the domain.
Workaround: Set UseOpenSSLEngine=no in the /etc/ssh/sshd_config file on the server side, and run the svcadm restart ssh command.
All ssh connections will no longer use the hardware cryptographic units (and thus not benefit from the associated performance improvements), and ssh connections will not be disconnected when the cryptographic units are removed.
Bug ID 15597025: When you run the ldm ls-io -l command on a system that has a PCI Express Dual 10-Gigabit Ethernet Fiber card (X1027A-Z) installed, the output might show the following:
primary# ldm ls-io -l ... pci@500/pci@0/pci@c PCIE5 OCC primary network@0 network@0,1 ethernet ethernet
The output shows four subdevices even though this Ethernet card has only two ports. This anomaly occurs because this card has four PCI functions. Two of these functions are disabled internally and appear as ethernet in the ldm ls-io -l output.
Workaround: You can ignore the ethernet entries in the ldm ls-io -l output.
Bug ID 15572184: An ldm command might be slow to respond when several domains are booting. If you issue an ldm command at this stage, the command might appear to hang. Note that the ldm command will return after performing the expected task. After the command returns, the system should respond normally to ldm commands.
Workaround: Avoid booting many domains simultaneously. However, if you must boot several domains at once, refrain from issuing further ldm commands until the system returns to normal. For instance, wait for about two minutes on Sun SPARC Enterprise T5140 and T5240 servers and for about four minutes on the Sun SPARC Enterprise T5440 server or Sun Netra T5440 server.
Bug ID 15560811: In Oracle Solaris 11, zones that are configured with an automatic network interface (anet) might fail to start in a domain that has Logical Domains virtual network devices only.
Workaround 1: Assign one or more physical network devices to the guest domain. Use PCIe bus assignment, the Direct I/O (DIO), or the SR-IOV feature to assign a physical NIC to the domain.
Workaround 2: If the zones configuration requirement is to have interzone communication only within the domain, create an etherstub device. Use the etherstub device as the “lower link” in the zones configuration so that virtual NICs are created on the etherstub device.
Workaround 3: Use exclusive link assignment to assign a Logical Domains virtual network device to a zone. Assign virtual network devices, as needed, to the domain. You might also choose to disable inter-vnet links to be able to create a large number of virtual network devices.
Bug ID 15560201: Sometimes ifconfig indicates that the device does not exist after you add a virtual network or virtual disk device to a domain. This situation might occur as the result of the /devices entry not being created.
Although this problem should not occur during normal operation, the error sometimes occurs when the instance number of a virtual network device does not match the instance number listed in /etc/path_to_inst file.
For example:
# ifconfig vnet0 plumb ifconfig: plumb: vnet0: no such interface
The instance number of a virtual device is shown under the DEVICE column in the ldm list output:
# ldm list -o network primary NAME primary MAC 00:14:4f:86:6a:64 VSW NAME MAC NET-DEV DEVICE DEFAULT-VLAN-ID PVID VID MTU MODE primary-vsw0 00:14:4f:f9:86:f3 nxge0 switch@0 1 1 1500 NETWORK NAME SERVICE DEVICE MAC MODE PVID VID MTU vnet1 primary-vsw0@primary network@0 00:14:4f:f8:76:6d 1 1500
The instance number (0 for both the vnet and vsw shown previously) can be compared with the instance number in the path_to_inst file to ensure that they match.
# egrep '(vnet|vsw)' /etc/path_to_inst "/virtual-devices@100/channel-devices@200/virtual-network-switch@0" 0 "vsw" "/virtual-devices@100/channel-devices@200/network@0" 0 "vnet"
Workaround: In the case of mismatching instance numbers, remove the virtual network or virtual switch device. Then, add them again by explicitly specifying the instance number required by setting the id property.
You can also manually edit the /etc/path_to_inst file. See the path_to_inst(4) man page.
Caution - Changes should not be made to /etc/path_to_inst without careful consideration. |
Bug ID 15555509: When Logical Domains is configured on a system and you add another XAUI network card, the card is not visible after the machine has undergone a power cycle.
Recovery: To make the newly added XAUI visible in the control domain, perform the following steps:
Set and clear a dummy variable in the control domain.
The following commands use a dummy variable called fix-xaui:
# ldm set-var fix-xaui=yes primary # ldm rm-var fix-xaui primary
Save the modified configuration to the SP, replacing the current configuration.
The following commands use a configuration name of config1:
# ldm rm-spconfig config1 # ldm add-spconfig config1
Perform a reconfiguration reboot of the control domain.
# reboot -- -r
At this time, you can configure the newly available network or networks for use by Logical Domains.
Bug ID 15543982: You can configure a maximum of two domains with dedicated PCI-E root complexes on systems such as the Sun Fire T5240. These systems have two UltraSPARC T2 Plus CPUs and two I/O root complexes.
pci@500 and pci@400 are the two root complexes in the system. The primary domain will always contain at least one root complex. A second domain can be configured with an unassigned or unbound root complex.
The pci@400 fabric (or leaf) contains the on-board e1000g network card. The following circumstances could lead to a domain panic:
If the system is configured with a primary domain that contains pci@500 and a second domain that contains pci@400
The e1000g device on the pci@400 fabric is used to boot the second domain
Avoid the following network devices if they are configured in a non-primary domain:
/pci@400/pci@0/pci@c/network@0,1 /pci@400/pci@0/pci@c/network@0
When these conditions are true, the domain will panic with a PCI-E Fatal error.
Avoid such a configuration or, if the configuration is used, do not boot from the listed devices.
Bug ID 15518409: If you do not have a network configured on your machine and have a Network Information Services (NIS) client running, the Logical Domains Manager will not start on your system.
Workaround: Disable the NIS client on your non-networked machine:
# svcadm disable nis/client
Bug ID 15511551: Sometimes, executing the uadmin 1 0 command from the command line of a Logical Domains system does not leave the system at the ok prompt after the subsequent reset. This incorrect behavior is seen only when the Logical Domains variable auto-reboot? is set to true. If auto-reboot? is set to false, the expected behavior occurs.
Workaround: Use this command instead:
uadmin 2 0
Or, always run with auto-reboot? set to false.
Bug ID 15505014: A domain shutdown or memory scrub can take over 15 minutes with a single CPU and a very large memory configuration. During a shutdown, the CPUs in a domain are used to scrub all the memory owned by the domain. The time taken to complete the scrub can be quite long if a configuration is imbalanced, for example, a single CPU domain with 512 Gbytes of memory. This prolonged scrub time extends the amount of time needed to shut down a domain.
Workaround: Ensure that large memory configurations (more than 100 Gbytes) have at least one core.
Bug ID 15469227: The scadm command on a control domain running at least the Oracle Solaris 10 5/08 OS can hang following an SC reset. The system is unable to properly re-establish a connection following an SC reset.
Recovery: Reboot the host to re-establish connection with the SC.
Bug ID 15453968: Simultaneous net installation of multiple guest domains fails on systems that have a common console group.
Workaround: Only net-install on guest domains that each have their own console group. This failure is seen only on domains with a common console group shared among multiple net-installing domains.
Bug ID 15422900: If you configure more than four virtual networks (vnets) in a guest domain on the same network using the Dynamic Host Configuration Protocol (DHCP), the guest domain can eventually become unresponsive while running network traffic.
Workaround: Set ip_ire_min_bucket_cnt and ip_ire_max_bucket_cnt to larger values, such as 32, if you have 8 interfaces.
Recovery: Issue an ldm stop-domain domain-name command followed by an ldm start-domain domain-name command on the guest domain (domain-name) in question.
Bug ID 15387338: This issue is summarized in Logical Domains Variable Persistence in Oracle VM Server for SPARC 3.2 Administration Guide and affects only the control domain.
Bug ID 15370442: The Logical Domains environment does not support setting or deleting wide-area network (WAN) boot keys from within the Oracle Solaris OS by using the ickey(1M) command. All ickey operations fail with the following error:
ickey: setkey: ioctl: I/O error
In addition, WAN boot keys that are set using OpenBoot firmware in logical domains other than the control domain are not remembered across reboots of the domain. In these domains, the keys set from the OpenBoot firmware are valid only for a single use.
Bug ID 15368170: In some cases, the behavior of the ldm stop-domain command is confusing.
# ldm stop-domain -f domain-name
If the domain is at the kernel module debugger, kmdb(1), prompt, then the ldm stop-domain command fails with the following error message:
LDom <domain-name> stop notification failed