Go to main content

Oracle® VM Server for SPARC 3.5 Release Notes

Exit Print View

Updated: July 2018
 
 

Known Issues

This section contains general issues and specific bugs concerning the Oracle VM Server for SPARC 3.5 software.

Bugs Affecting the Oracle VM Server for SPARC Software

This section summarizes the bugs that you might encounter when using this version of the software. The most recent bugs are described first. Workarounds and recovery procedures are specified, if available.

Bugs Affecting the Oracle VM Server for SPARC 3.5 Software

SPARC M8 and SPARC T8 Series Servers: Oracle Solaris Might Experience a Kernel Panic or a Fatal Error When cpu-arch=migration-class1 is Set on a Guest Domain

Bug ID 27952673: A SPARC M8 or SPARC T8 series server that runs the Oracle VM Server for SPARC 3.5, 3.5.0.1, or 3.5.0.2 software might experience a kernel panic or another fatal error condition if a guest domain has the cpu-arch property value set to migration-class1.

The panic might occur on any domain that has cpu-arch=migration-class1 on the source system when the domain is migrated to a SPARC M8 or SPARC T8 series server.

In addition, a fatal error might occur on any domain that you create with cpu-arch=migration-class1 on a SPARC M8 or SPARC T8 series server.


Note - No other SPARC servers are affected directly by this issue. However, a domain on another server that supports the migration-class1 value of the cpu-arch property (at least the SPARC T4, SPARC M5, or SPARC S7 series server) becomes vulnerable if that domain is migrated to a SPARC M8 or a SPARC T8 series server.

To determine whether a guest domain is vulnerable to this issue, run the following command on the primary domain for each guest domain:

primary# ldm list -l domain-name | grep cpu-arch

A guest domain is vulnerable to this issue if the output shows cpu-arch=migration-class1.


Cause: The problem is triggered when the Oracle Solaris kernel or any user application attempts to reference a 2-Gbyte page on a SPARC M8 or SPARC T8 series server. The cpu-arch=migration-class1 setting incorrectly permits 2-Gbyte page sizes on the underlying hardware platform, even though the SPARC M8 and SPARC T8 series servers do not support 2-Gbyte page sizes.

Symptoms: While the symptoms can vary, the Oracle Solaris kernel typically panics in the guest domain that has the cpu-arch=migration-class1 setting. The associated panic string is bad unexpected error from hypervisor call at TL 1.

Workaround: Until an update is available, change the value of the cpu-arch property on all SPARC M8 and SPARC T8 series servers as follows:

  • For a guest domain that you do not plan to live migrate or you plan to live migrate to another SPARC M8 or SPARC T8 series server, set cpu-arch=native.

  • For a guest domain that you plan to live migrate to or from an older generation SPARC server, set cpu-arch=generic.


Note - Reboot the guest domain after you set the cpu-arch property value to make the change take effect.
ldmd Crashes After a Failure to Remove Cores From a Domain

Bug ID 26435797: The ldmd daemon might dump core if a virtual CPU or CPU core removal operation fails. This failure might occur when all the CPUs in the target domain are bound or heavily loaded.

When this failure occurs, the ldm remove-core command might issue one of the following error messages:

Invalid response

Failed to receive version negotiation response from logical domain manager:
  Connection reset by peer

To perform the removal of virtual CPUs, you must unbind some of the CPUs from the target domain or lower the workload. Note that this problem does not affect virtual CPU removal operations on bound or unbound domains.

vsw-relay-mode Behavior Reverts From remote to local When the Associated Service Domain Reboots

Bug ID 26184111: The vsw-relay-mode property is set on a virtual switch to enable reflective relay mode. This mode is not retained after a service domain reboot, so the state reverts to the default value of local.

Workaround: Run the following command on the virtual switch after the reboot that enables the vsw-relay-mode property:

primary# ldm set-vsw vsw-relay-mode=remote primary-vsw0
Cross-CPU Migration Can Fail if Global Performance Counters are Enabled

Bug ID 26047815: In certain cross-CPU migration scenarios, a migration can fail with the following errors:

API group 0x20b v1.0 is not supported in the version of the firmware
 running on the target machine.
API group 0x214 v1.0 is not supported in the version of the firmware
 running on the target machine.

    All of the following conditions must be present to encounter this problem:

  • The domain has the cpu-arch property set to generic or migration-class1

  • The domain has a perf-counter property setting that includes the global value

  • The domain was booted on at least a SPARC M7 series server or a SPARC T7 series server

  • The target machine is a platform released prior to the SPARC M7 series server or SPARC T7 series server

This problem occurs because a domain booted on at least a SPARC M7 series server or a SPARC T7 series server with a perf-counter property setting that includes the global value will register platform-specific performance counter Hypervisor interfaces that do not exist on older platforms. As part of the migration, a check is performed to ensure that all the interfaces used by the domain are present on the target machine. When these SPARC M7 series server or SPARC T7 series server specific interfaces are detected, the migration is aborted.

Workaround: Do not set perf-counter=global if cpu-arch is note native and at least SPARC M7 series servers and SPARC T7 series servers are part of the migration pool.

Virtual SCSI HBA Subsystem Does Not Support All SCSI Enclosure Services Devices

Bug ID 25865708: An SES device that is seen by the Oracle Solaris OS as a secondary function is an SES device type that cannot be supported by vhba. vhba can support an SES device whose device type has a value of 0xd as specified in the inq_dtype field of the INQUIRY payload.

When the vhba binary in the guest domain attempts to initialize some SCSI enclosure services (SES) devices, vhba causes scsi to issue the following warning message:

... scsi: WARNING: scsi_enumeration_failed: vhba2 probe@w50080e51bfd32004,0,d enumeration failed during tran_tgt_init

The ,d substring represents the 0xd hexadecimal digit, which is the SCSI industry standard code for an SES device. The ,d string indicates that this warning message is a result of an unsupported type of SES device.

vhba can support an SES device that has a device type of 0xd that is specified in the inq_dtype field of the INQUIRY payload:

# mdb -k
> ::vsan
vsan_t( 6400126e08c0 ) cfg-hdl(0) iport-path(/pci@300/pci@1/pci@0/pci@4/SUNW,emlxs@0,11/fp@0,0)
    vsan_iport_t( 6400125b8710 )
        vsan_tport_t( 64001bf89718 ) tport_phys(w216000c0ff8089d5)
        vsan_lun_t( 640011aa65d0 ) lun(0) vlun-id(1127b) []

> 640011aa65d0::print vsan_lun_t vl_sd |::print struct scsi_device sd_inq |::print struct scsi_inquiry inq_dtype
inq_dtype = d
ldomHbaTable Is Empty

Bug ID 24393532: The fix for bug ID 23591953 disabled both Oracle VM Server for SPARC Oracle VM Server for SPARC MIB monitoring, such as listing the Oracle VM Server for SPARC MIB objects by using the snmpwalk command, and trap generation for the ldomHbaTable table. As a result, the Oracle VM Server for SPARC MIB ldomHbaTable table does not show contents.

primary# snmpwalk -v1 -c public localhost SUN-LDOM-MIB::ldomHbaTable
primary#

Workaround: Use the ldm list-hba command to view the HBA information.

Inaccurate Unable to Send Suspend Request Error Reported During a Successful Domain Migration

 

Bug ID 23206413: In rare circumstances, a successful domain migration reports the following error:

Unable to send suspend request to domain domain-name

This issue occurs when the Logical Domains Manager detects an error while suspending the domain, and the Logical Domains Manager is able to recover and completes the migration. The exit status of the command is 0, reflecting the successful migration.

Workaround: Because the migration completes successfully, you can ignore the error message.

Cold Migrating a Bound Domain With Many Virtual Devices Might Fail and Leave Two Bound Copies of the Domain

 

Bug ID 23180427: When cold migrating a bound domain that has a large number of virtual devices, the operation might fail with the following message in the SMF log:

warning: Timer expired: Failed to read feasibility response type (9) from target LDoms Manager

This failure indicates that the Logical Domains Manager running on the source machine timed out while waiting for the domain to be bound on the target machine. The chances of encountering this problem increases as the number of virtual devices in the migrating domain increases.

The timing of this failure results in a bound copy of the domain on both the source machine and the target machine. Do not start both copies of this domain. This action can cause data corruption because both domains reference the same virtual disk backends.

Recovery: After verifying that the copy of the migrated domain is correct on the target machine, manually unbind the copy of the domain on the source machine and destroy it.

Migration Fails When the Target Machine Has Insufficient Free LDCs

 

Bug ID 23031413: When the target machine's control domain runs out of LDCs during a domain migration, the migration fails with no explanation and the following message is written to the SMF log:

warning: Failed to read feasibility response type (5) from target LDoms Manager

This error is issued when the domain being migrated fails to bind on the target machine. Note that the bind operation might fail for other reasons on the target machine, as well.

Workaround: For the migration to succeed, the number of LDCs must be reduced either in the domain being migrated or in the control domain of the target machine. You can reduce the number of LDCs by reducing the number of virtual devices being used by or being serviced by a domain. For more information about managing LDCs, see Using Logical Domain Channels in Oracle VM Server for SPARC 3.5 Administration Guide.

ovmtlibrary Limits Disk Image File Name to 50 Characters

 

Bug ID 23024583: The ovmtlibrary command limits the disk image file name to 50 characters. The ovmtlibrary checks the .ovf file and compares the information in the <ovf:References> section with the actual file names of the decompressed disks.

An error is issued if the files are different or if the disk image file name is longer than 50 characters. For example:

# ovmtlibrary -c store -d "example" -q -o file:/template.ova -l /export/user1/ovmtlibrary_example
event id is 3
ERROR: The actual disk image file name(s) or the actual number of disk
image(s) is different from OVF file: template.ovf
exit code: 1

The following example XML shows a disk image file name that is greater than 50 characters:

<ovf:References>
<ovf:File ovf:compression="gzip"
ovf:href="disk_image.ldoms3.5_build_s11_u3_sru15_01_kz_42G.img.gz"
ovf:id="ldoms3" ovf:size="6687633773"/>
</ovf:References>

Workaround: Limit the length of disk image file names to fewer than 50 characters.

Virtual Network Devices Added to an Inactive Guest Domain Never Gets the Default linkprop Value

 

Bug ID 22842188: For linkprop=phys-state to be supported on a virtual network device, the Logical Domains Manager must be able to validate that the virtual switch to which the virtual network device is attached has a physical NIC that backs the virtual switch.

The Oracle VM Server for SPARC netsvc agent must be running on the guest domain so that the virtual switch can be queried.

If the guest domain is not active and cannot communicate with the agent in the domain that has the virtual network device's virtual switch, the virtual network device does not have linkprop=phys-state set.

Workaround: Only set linkprop=phys-state when the domain is active.

ldm set-vsw net-dev= Fails When linkprop=phys-state

 

Bug ID 22828100: If a virtual switch has attached virtual network devices that have linkprop=phys-state, the virtual switch to which they are attached must have a valid backing NIC device specified by the net-dev property. The net-dev property value must be the name of a valid network device.

If this action is performed using net-dev=, the virtual switch still shows linkprop=phys-state even though the net-dev property value is not a valid NIC device.

Workaround: First, remove all the virtual network devices that are attached to the virtual switch, and then remove the virtual switch. Then, re-create the virtual switch with a valid net-dev backing device, and then re-create all the virtual network devices.

A Domain That Has Socket Constraints Cannot Be Re-Created From an XML File

 

Bug ID 21616429: The Oracle VM Server for SPARC 3.3 software introduced socket support for Fujitsu M12 servers and Fujitsu M10 servers only.

Software running on Oracle SPARC servers and Oracle VM Server for SPARC versions older than 3.3 cannot re-create a domain with socket constraints from an XML file.

Attempting to re-create a domain with socket constraints from an XML file with an older version of the Oracle VM Server for SPARC software or on an Oracle SPARC server fails with the following message:

primary# ldm add-domain -i ovm3.3_socket_ovm11.xml
socket not a known resource

If Oracle VM Server for SPARC 3.2 is running on a Fujitsu M12 server or Fujitsu M10 server and you attempt to re-create a domain with socket constraints from an XML file, the command fails with various error messages, such as the following:

primary# ldm add-domain -i ovm3.3_socket_ovm11.xml
Unknown property: vcpus

primary# ldm add-domain -i ovm3.3_socket_ovm11.xml
perf-counters property not supported, platform does not have
performance register access capability, ignoring constraint setting.

Workaround: Edit the XML file to remove any sections that reference the socket resource type.

Kernel Zones Block Live Migration of Guest Domains

 

 

Bug ID 21289174: On a SPARC server, a running kernel zone within an Oracle VM Server for SPARC domain will block live migration of the guest domain. The following error message is shown:

Guest suspension failed because Kernel Zones are active.
Stop Kernel Zones and retry.

Workaround: Choose one of the following workarounds:

After Dropping Into factory-default, Recovery Mode Fails if the System Boots From a Different Device Than the One Booted in the Previously Active Configuration

 

Bug ID 20425271: While triggering a recovery after dropping into factory-default, recovery mode fails if the system boots from a different device than the one booted in the previously active configuration. This failure might occur if the active configuration uses a boot device other than the factory-default boot device.


Note - This problem applies to UltraSPARC T2, UltraSPARC T2 Plus, SPARC T3, and SPARC T4 series servers. This problem also applies to SPARC T5, SPARC M5, and SPARC M6 series servers that run a system firmware version prior to 9.5.3.

Workaround: Perform the following steps any time you want to save a new configuration to the SP:

  1. Determine the full PCI path to the boot device for the primary domain.

    Use this path for the ldm set-var command in Step 4.

  2. Remove any currently set boot-device property from the primary domain.

    Performing this step is necessary only if the boot-device property has a value set. If the property does not have a value set, an attempt to remove the boot-device property results in the boot-device not found message.

    primary# ldm rm-var boot-device primary
  3. Save the current configuration to the SP.

    primary# ldm add-spconfig config-name
  4. Explicitly set the boot-device property for the primary domain.

    primary# ldm set-var boot-device=value primary

    If you set the boot-device property after saving the configuration to the SP as described, the specified boot device is booted when recovery mode is triggered.

Recovery: If recovery mode has already failed as described, perform the following steps:

  1. Explicitly set the boot device to the one used in the last running configuration.

    primary# ldm set-var boot-device=value primary
  2. Reboot the primary domain.

    primary# reboot

    The reboot enables the recovery to proceed.

Guest Domain eeprom Updates Are Lost if an ldm add-spconfig Operation Is Not Complete

 

Bug ID 19932842: An attempt to set an OBP variable from a guest domain might fail if you use the eeprom or the OBP command before one of the following commands is completed:

  • ldm add-spconfig

  • ldm remove-spconfig

  • ldm set-spconfig

  • ldm bind

This problem might occur when these commands take more than 15 seconds to complete.

# /usr/sbin/eeprom boot-file\=-k
promif_ldom_setprop: promif_ldom_setprop: ds response timeout
eeprom: OPROMSETOPT: Invalid argument
boot-file: invalid property

Recovery: Retry the eeprom or OBP command after the ldm operation has completed.

Workaround: Retry the eeprom or OBP command on the affected guest domain. You might be able to avoid the problem by using the ldm set-var command on the primary domain.

Rebooting a Guest Domain With More Than 1000 Virtual Network Devices Results in a Panic

 

 

Bug ID 19449221: A domain can have no more than 999 virtual network devices (vnets).

Workaround: Limit the number of vnets on a domain to 999.

Incorrect Device Path for Fibre Channel Virtual Functions in a Root Domain

 

Bug ID 18001028: In the root domain, the Oracle Solaris device path for a Fibre Channel virtual function is incorrect.

For example, the incorrect path name is pci@380/pci@1/pci@0/pci@6/fibre-channel@0,2 while it should be pci@380/pci@1/pci@0/pci@6/SUNW,emlxs@0,2.

The ldm list-io -l output shows the correct device path for the Fibre Channel virtual functions.

Workaround: None.

Oracle Solaris 11.3 SRU 12: ssd and sd Driver Functionality Is Merged for Fibre Channel Devices on SPARC Platforms

 

Bug ID 17036795: The Oracle Solaris 11.3 SRU 12 OS has merged the ssd and sd driver functionality for Fibre Channel devices on SPARC platforms.

This change affects device node names on the physical device path. The device node names change from ssd@ to disk@. This change also affects device driver bindings from ssd to sd.


Note - Ensure that any application or client in the Oracle Solaris OS system that depends on these device node names or device driver bindings is adjusted.

This change is not enabled by default for Oracle Solaris 11.3 systems.

You must enable this change to perform live migrations of domains that use virtual HBA and Fibre Channel devices.

Before you enable this change, ensure that MPxIO is already enabled by running the stmsboot -D fp -e command.

Run the format command to determine whether MPxIO is enabled. When enabled, you should see vhci in device names. Alternatively, if the mpathadm -list lu output is empty, no MPxIO devices are enumerated.

Use the beadm command to create a new boot environment (BE). By using BEs, you can roll back easily to a previous boot environment if you experience unexpected problems.

Mount the BE and replace the /etc/devices/inception_points file with the /etc/devices/inception_points.vhba file. The .vhba file includes some feature flags to enable this change.

Finally, reboot after you activate the new BE.

# beadm create BE-name
# beadm mount BE-name /mnt
# cp /mnt/etc/devices/inception_points.vhba /mnt/etc/devices/inception_points
# beadm umount BE-name
# beadm activate BE-name
# reboot

After rebooting, use the prtconf -D | grep driver | grep sd command to verify the change.

If any disks use the ssd driver, there is a problem with the configuration.

You can also use the mpathadm list lu command to show multiple paths to the same disks if virtual HBA and the FibreChannel virtual function are both configured to see the same LUNs.

Misleading Messages Shown for InfiniBand SR-IOV Remove Operations

 

Bug ID 16979993: An attempt to use a dynamic SR-IOV remove operation on an InfiniBand device results in confusing and inappropriate error messages.

Dynamic SR-IOV remove operations are not supported for InfiniBand devices.

Workaround: Remove InfiniBand virtual functions by performing one of the following procedures:

ldm migrate -n Should Fail When Performing a Cross-CPU Migration From SPARC T5, SPARC M5, or SPARC M6 Server to UltraSPARC T2 or SPARC T3 Server

 

Bug ID 16864417: The ldm migrate -n command does not report failure when attempting to migrate between a SPARC T5, SPARC M5, or SPARC M6 server and an UltraSPARC T2 or SPARC T3 server.

Workaround: None.

Resilient I/O Domain Should Support PCI Device Configuration Changes After the Root Domain Is Rebooted

 

Bug ID 16691046: If virtual functions are assigned from the root domain, an I/O domain might fail to provide resiliency in the following hotplug situations:

  • You add a root complex (PCIe bus) dynamically to the root domain, and then you create the virtual functions and assign them to the I/O domain.

  • You hot-add an SR-IOV card to the root domain that owns the root complex, and then you create the virtual functions and assign them to the I/O domain.

  • You replace or add any PCIe card to an empty slot (either through hotplug or when the root domain is down) on the root complex that is owned by the root domain. This root domain provides virtual functions from the root complex to the I/O domain.

Workaround: Perform one of the following steps:

  • If the root complex already provides virtual functions to the I/O domain and you add, remove, or replace any PCIe card on that root complex (through hotplug or when the root domain is down), you must reboot both the root domain and the I/O domain.

  • If the root complex does not have virtual functions currently assigned to the I/O domain and you add an SR-IOV card or any other PCIe card to the root complex, you must stop the root domain to add the PCIe card. After the root domain reboots, you can assign virtual functions from that root complex to the I/O domain.

  • If you want to add a new PCIe bus to the root domain and then create and assign virtual functions from that bus to the I/O domain, perform one of the following steps and then reboot the root domain:

    • Add the bus during a delayed reconfiguration

    • Add the bus dynamically

Guest Domains in Transition State After Reboot of the primary Domain

 

 

Bug ID 16659506: A guest domain is in transition state (t) after a reboot of the primary domain. This problem arises when a large number of virtual functions are configured on the system.

Workaround: To avoid this problem, retry the OBP disk boot command several times to avoid a boot from the network.

    Perform the following steps on each domain:

  1. Access the console of the domain.

    primary# telnet localhost 5000
  2. Set the boot-device property.

    ok> setenv boot-device disk disk disk disk disk disk disk disk disk disk net

    The number of disk entries that you specify as the value of the boot-device property depends on the number of virtual functions that are configured on the system. On smaller systems, you might be able to include fewer instances of disk in the property value.

  3. Verify that the boot-device property is set correctly by using the printenv.

    ok> printenv
  4. Return to the primary domain console.

  5. Repeat Steps 1-4 for each domain on the system.

  6. Reboot the primary domain.

    primary# shutdown -i6 -g0 -y
WARNING: ddi_intr_alloc: cannot fit into interrupt pool Means That Interrupt Supply Is Exhausted While Attaching I/O Device Drivers

 

Bug ID 16284767: This warning on the Oracle Solaris console means the interrupt supply was exhausted while attaching I/O device drivers:

WARNING: ddi_intr_alloc: cannot fit into interrupt pool

This limitation applies only to the supported SPARC systems prior to the SPARC M7 series servers and SPARC T7 series servers.

The hardware provides a finite number of interrupts, so Oracle Solaris limits how many each device can use. A default limit is designed to match the needs of typical system configurations, however this limit may need adjustment for certain system configurations.

Specifically, the limit may need adjustment if the system is partitioned into multiple logical domains and if too many I/O devices are assigned to any guest domain. Oracle VM Server for SPARC divides the total interrupts into smaller sets given to guest domains. If too many I/O devices are assigned to a guest domain, its supply might be too small to give each device the default limit of interrupts. Thus, it exhausts its supply before it completely attaches all the drivers.

Some drivers provide an optional callback routine which allows Oracle Solaris to automatically adjust their interrupts. The default limit does not apply to these drivers.

Workaround: Use the ::irmpools and ::irmreqs MDB macros to determine how interrupts are used. The ::irmpools macro shows the overall supply of interrupts divided into pools. The ::irmreqs macro shows which devices are mapped to each pool. For each device, ::irmreqs shows whether the default limit is enforced by an optional callback routine, how many interrupts each driver requested, and how many interrupts the driver is given.

The macros do not show information about drivers that failed to attach. However, the information that is shown helps calculate the extent to which you can adjust the default limit. Any device that uses more than one interrupt without providing a callback routine can be forced to use fewer interrupts by adjusting the default limit. Reducing the default limit below the amount that is used by such a device results in freeing of interrupts for use by other devices.

To adjust the default limit, set the ddi_msix_alloc_limit property to a value from 1 to 8 in the /etc/system file. Then, reboot the system for the change to take effect.

To maximize performance, start by assigning larger values and decrease the values in small increments until the system boots successfully without any warnings. Use the ::irmpools and ::irmreqs macros to measure the adjustment's impact on all attached drivers.

For example, suppose the following warnings are issued while booting the Oracle Solaris OS in a guest domain:

WARNING: emlxs3: interrupt pool too full.
WARNING: ddi_intr_alloc: cannot fit into interrupt pool

The ::irmpools and ::irmreqs macros show the following information:

# echo "::irmpools" | mdb -k
ADDR             OWNER   TYPE   SIZE  REQUESTED  RESERVED
00000400016be970 px#0    MSI/X  36    36         36

# echo "00000400016be970::irmreqs" | mdb -k
ADDR             OWNER   TYPE   CALLBACK NINTRS NREQ NAVAIL
00001000143acaa8 emlxs#0 MSI-X  No       32     8    8
00001000170199f8 emlxs#1 MSI-X  No       32     8    8
000010001400ca28 emlxs#2 MSI-X  No       32     8    8
0000100016151328 igb#3   MSI-X  No       10     3    3
0000100019549d30 igb#2   MSI-X  No       10     3    3
0000040000e0f878 igb#1   MSI-X  No       10     3    3
000010001955a5c8 igb#0   MSI-X  No       10     3    3

The default limit in this example is eight interrupts per device, which is not enough interrupts to accommodate the attachment of the final emlxs3 device to the system. Assuming that all emlxs instances behave in the same way, emlxs3 probably requested 8 interrupts.

By subtracting the 12 interrupts used by all of the igb devices from the total pool size of 36 interrupts, 24 interrupts are available for the emlxs devices. Dividing the 24 interrupts by 4 suggests that 6 interrupts per device would enable all emlxs devices to attach with equal performance. So, the following adjustment is added to the /etc/system file:

set ddi_msix_alloc_limit = 6

When the system successfully boots without warnings, the ::irmpools and ::irmreqs macros show the following updated information:

# echo "::irmpools" | mdb -k
ADDR             OWNER   TYPE   SIZE  REQUESTED  RESERVED
00000400018ca868 px#0    MSI/X  36    36         36
 
# echo "00000400018ca868::irmreqs" | mdb -k
ADDR             OWNER   TYPE   CALLBACK NINTRS NREQ NAVAIL
0000100016143218 emlxs#0 MSI-X  No       32     8    6
0000100014269920 emlxs#1 MSI-X  No       32     8    6
000010001540be30 emlxs#2 MSI-X  No       32     8    6
00001000140cbe10 emlxs#3 MSI-X  No       32     8    6
00001000141210c0 igb#3   MSI-X  No       10     3    3
0000100017549d38 igb#2   MSI-X  No       10     3    3
0000040001ceac40 igb#1   MSI-X  No       10     3    3
000010001acc3480 igb#0   MSI-X  No       10     3    3
SPARC T5-8 Server: Uptime Data Shows a Value of 0 for Some ldm List Commands

 

Bug ID 16068376: On a SPARC T5-8 server with approximately 128 domains, some ldm commands such as ldm list might show 0 seconds as the uptime for all domains.

Workaround: Log in to the domain and use the uptime command to determine the domain's uptime.

ldm list -o status on Control Domain Reports Incorrect Migration Progress

 

 

Bug ID 15819714: In rare circumstances, the ldm list -o status command reports an incorrect completion percentage when used to observe the status of a migration on a control domain.

This problem has no impact on the domain that is being migrated or on the ldmd daemons on the source or target control domains.

Workaround: Run the ldm list -o status command on the other control domain that is involved in the migration to observe the progress.

ldm init-system Command Might Not Correctly Restore a Domain Configuration on Which Physical I/O Changes Have Been Made

 

Bug ID 15783031: You might experience problems when you use the ldm init-system command to restore a domain configuration that has used direct I/O or SR-IOV operations.

    A problem arises if one or more of the following operations have been performed on the configuration to be restored:

  • A slot has been removed from a bus that is still owned by the primary domain.

  • A virtual function has been created from a physical function that is owned by the primary domain.

  • A virtual function has been assigned to the primary domain, to other guest domains, or to both.

  • A root complex has been removed from the primary domain and assigned to a guest domain, and that root complex is used as the basis for further I/O virtualization operations.

    In other words, you created a non-primary root domain and performed any of the previous operations.

If you have performed any of the previous actions, perform the workaround shown in Oracle VM Server for SPARC PCIe Direct I/O and SR-IOV Features (Doc ID 1325454.1) (https://support.oracle.com/epmos/faces/SearchDocDisplay?amp;_adf.ctrl-state=10c69raljg_77&_afrLoop=506200315473090).

Guest Domain Panics When Running the cputrack Command During a Migration to a SPARC T4 Server

 

Bug ID 15776123: If the cputrack command is run on a guest domain while that domain is migrated to a SPARC T4 server, the guest domain might panic on the target machine after it has been migrated.

Workaround: Do not run the cputrack command during the migration of a guest domain to a SPARC T4 server.

Limit the Maximum Number of Virtual Functions That Can Be Assigned to a Domain

 

Bug ID 15775637: An I/O domain has a limit on the number of interrupt resources that are available per root complex.

On SPARC T3 and SPARC T4 servers, the limit is approximately 63 MSI/X vectors. Each igb virtual function uses three interrupts. The ixgbe virtual function uses two interrupts.

If you assign a large number of virtual functions to a domain, the domain runs out of system resources to support these devices. You might see messages similar to the following:

WARNING: ixgbevf32: interrupt pool too full.
WARNING: ddi_intr_alloc: cannot fit into interrupt pool
Trying to Connect to Guest Domain Console While It Is Being Bound Might Cause Input to Be Blocked

 

Bug ID 15771384: A domain's guest console might freeze if repeated attempts are made to connect to the console before and during the time the console is bound. For example, this might occur if you use an automated script to grab the console as a domain is being migrated onto the machine.

Workaround: To unfreeze console, perform the following commands on the domain that hosts the domain's console concentrator (usually the control domain):

primary# svcadm disable vntsd
primary# svcadm enable vntsd
ldm remove-io of PCIe Cards That Have PCIe-to-PCI Bridges Should Be Disallowed

 

Bug ID 15761509: Use only the PCIe cards that support the Direct I/O (DIO) feature, which are listed in this support document (https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=REFERENCE&id=1325454.1).


Note - The direct I/O feature is deprecated starting with the SPARC T7 series servers and the SPARC M7 series servers.

Workaround: Use the ldm add-io command to add the card to the primary domain again.

Live Migration of a Domain That Depends on an Inactive Master Domain on the Target Machine Causes ldmd to Fault With a Segmentation Fault

 

Bug ID 15701865: If you attempt a live migration of a domain that depends on an inactive domain on the target machine, the ldmd daemon faults with a segmentation fault and crashes. The ldmd daemon is restarted automatically, but the migration is aborted.

    Workaround: Perform one of the following actions before you attempt the live migration:

  • Remove the guest dependency from the domain to be migrated.

  • Start the master domain on the target machine.

DRM and ldm list Output Shows a Different Number of Virtual CPUs Than Are Actually in the Guest Domain

 

 

Bug ID 15701853: A No response message might appear in the Oracle VM Server for SPARC log when a loaded domain's DRM policy expires after the CPU count has been substantially reduced. The ldm list output shows that more CPU resources are allocated to the domain than is shown in the psrinfo output.

Workaround: Use the ldm set-vcpu command to reset the number of CPUs on the domain to the value that is shown in the psrinfo output.

Simultaneous Migration Operations in “Opposite Direction” Might Cause ldm to Hang

 

Bug ID 15696986: If two ldm migrate commands are issued between the same two systems simultaneously in the “opposite direction,” the two commands might hang and never complete. An opposite direction situation occurs when you simultaneously start a migration on machine A to machine B and a migration on machine B to machine A.

The hang occurs even if the migration processes are initiated as dry runs by using the –n option. When this problem occurs, all other ldm commands might hang.

Recovery: Restart the Logical Domains Manager on both the source machine and the target machine:

primary# svcadm restart ldmd

Workaround: None.

SPARC T3-1 Server: Issue With Disks That Are Accessible Through Multiple Direct I/O Paths

 

Bug ID 15668368: A SPARC T3-1 server can be installed with dual-ported disks, which can be accessed by two different direct I/O devices. In this case, assigning these two direct I/O devices to different domains can cause the disks to be used by both domains and affect each other based on the actual usage of those disks.

Workaround: Do not assign direct I/O devices that have access to the same set of disks to different I/O domains. To determine whether you have dual-ported disks on a SPARC T3-1 server, run the following command on the SP:

-> show /SYS/SASBP

If the output includes the following fru_description value, the corresponding system has dual-ported disks:

fru_description = BD,SAS2,16DSK,LOUISE

If dual disks are found to be present in the system, ensure that both of the following direct I/O devices are always assigned to the same domain:

pci@400/pci@1/pci@0/pci@4  /SYS/MB/SASHBA0
pci@400/pci@2/pci@0/pci@4  /SYS/MB/SASHBA1
Using the ldm stop -a Command on Domains in a Master-Slave Relationship Leaves the Slave With the stopping Flag Set

 

Bug ID 15664666: When a reset dependency is created, an ldm stop -a command might result in a domain with a reset dependency being restarted instead of only stopped.

Workaround: First, issue the ldm stop command to the master domain. Then, issue the ldm stop command to the slave domain. If the initial stop of the slave domain results in a failure, issue the ldm stop -f command to the slave domain.

Dynamically Removing All the Cryptographic Units From a Domain Causes SSH to Terminate

 

Bug ID 15600969: If all the hardware cryptographic units are dynamically removed from a running domain, the cryptographic framework fails to seamlessly switch to the software cryptographic providers, and kills all the ssh connections.

This issue only applies to UltraSPARC T2, UltraSPARC T2 Plus and SPARC T3 servers.

Recovery: Re-establish the ssh connections after all the cryptographic units are removed from the domain.

Workaround: Set UseOpenSSLEngine=no in the /etc/ssh/sshd_config file on the server side, and run the svcadm restart ssh command.

All ssh connections will no longer use the hardware cryptographic units (and thus not benefit from the associated performance improvements), and ssh connections will not be disconnected when the cryptographic units are removed.

The Logical Domains Manager Does Not Start if the Machine Is Not Networked and an NIS Client Is Running

 

 

Bug ID 15518409: If you do not have a network configured on your machine and have a Network Information Services (NIS) client running, the Logical Domains Manager will not start on your system.

Workaround: Disable the NIS client on your non-networked machine:

# svcadm disable nis/client
Cannot Connect to Migrated Domain's Console Unless vntsd Is Restarted

 

Bug ID 15513998: Occasionally, after a domain has been migrated, it is not possible to connect to the console for that domain.

Note that this problem occurs when the migrated domain is running an OS version older than Oracle Solaris 11.3.

Workaround: Restart the vntsd SMF service to enable connections to the console:

# svcadm restart vntsd

Note - This command will disconnect all active console connections.
Simultaneous Net Installation of Multiple Domains Fails When in a Common Console Group

 

Bug ID 15453968: Simultaneous net installation of multiple guest domains fails on systems that have a common console group.

Workaround: Only net-install on guest domains that each have their own console group. This failure is seen only on domains with a common console group shared among multiple net-installing domains.

Behavior of the ldm stop-domain Command Can Be Confusing

 

 

Bug ID 15368170: In some cases, the behavior of the ldm stop-domain command is confusing.

# ldm stop-domain -f domain-name

If the domain is at the kernel module debugger, kmdb(1), prompt, then the ldm stop-domain command fails with the following error message:

LDom <domain-name> stop notification failed

Documentation Issues

This section contains documentation issues and errors that have been found too late to resolve for the Oracle VM Server for SPARC 3.5 release.


Note - The changes described in the following documentation errata have been made to the English version of Oracle VM Server for SPARC 3.5 Reference Manual on OTN.

These changes are not reflected in the man pages delivered with the Oracle VM Server for SPARC 3.5 software product or in the Japanese version of Oracle VM Server for SPARC 3.5 Reference Manual on OTN.


ldmd(1M): Missing Description of the ldmd/migration_adi_legacy_compat SMF Property

The ldmd(1M) man page is missing the following description of the ldmd/migration_adi_legacy_compat SMF property:

ldmd/migration_adi_legacy_compat

Specifies whether to permit a domain migration between servers that support Silicon Secured Memory (SSM) even if one of the machines does not have support for the migration of Application Data Integrity (ADI) version information that is introduced in Oracle VM Server for SPARC 3.5.

If both the source machine and the target machine are running the latest versions of the Oracle VM Server for SPARC software, you do not need to use this SMF property.


Caution

Caution  - If you intend to perform a domain migration on your servers that support SSM, it is best that they run at least the Oracle VM Server for SPARC 3.5 software. If this is not possible, take extreme caution when using the ldmd/migration_adi_legacy_compat SMF property. Improper use of this property can result in undefined application behavior if ADI is in use in the domain being migrated.


By default, the property value is false, which prevents a domain migration unless both the source machine and the target machine support SSM and run the required version of the Oracle VM Server for SPARC software. This property has no effect on servers that do not support SSM.

When the value is true, the domain migration proceeds without support for the migration of ADI version information.

So, if either the source machine or target machine runs a version of the Oracle VM Server for SPARC software that is older than 3.5, which does not support the migration of ADI version information, the migration is permitted.

    Only set the ldmd/migration_adi_legacy_compat SMF property value to true if both the following circumstances are true:

  • You cannot upgrade both the source machine and target machine to a version of the Oracle VM Server for SPARC software that supports the migration of ADI version information

  • You know for certain that ADI versioning is not in use within the domain to be migrated

Setting this property to true permits migrations where ADI version information is not transferred to the target machine. This situation can result in undefined application behavior if ADI is in use in the domain being migrated.

The ldmd/migration_adi_legacy_compat SMF property is not recognized by Oracle VM Server for SPARC versions older than 3.5. Use of this property is applicable only on a source machine or a target machine is running at least Oracle VM Server for SPARC 3.5.

ldm(1M): Updated Description of the set-domain Subcommand and the –i Option

The ldm(1M) man page includes the following updates:

  • The first paragraph now reads as follows:

    The set-domain subcommand enables you to modify properties such as boot-policy, mac-addr, hostid, failure-policy, extended-mapin-space, master, and max-cores for a domain. You cannot use this command to update resources.

  • The description of the –i now reads as follows:

    –i file specifies the XML configuration file to use in setting the properties of the logical domain.

    Only the ldom_info nodes specified in the XML file are parsed. Resource nodes, such as vcpu, mau, and memory, are ignored.

    If the hostid property in the XML file is already in use, the ldm set-domain -i command fails with the following error:

    Hostid host-ID is already in use

    Before you re-run the ldm set-domain -i command, remove the hostid entry from the XML file.

ldm(1M) Incorrectly References the Command History Buffer

The ldm(1M) man page incorrectly refers to a command history buffer that you can view by using the ldm list-history command.

The first and second paragraphs of the Command History section have been updated with the following paragraphs:

Use the ldm list-history command to view the Oracle VM Server for SPARC command history log. This log captures ldm commands and commands that are issued through the XMPP interface. By default, the number of commands shown by the ldm list-history command is ten.

To change the number of commands output by the ldm list-history command, use the ldm set-logctl command to set the history property value. If you set history=0, the saving of command history is disabled. You can re-enable this feature by setting the history property to a non-zero value.

The description of the history property in the Control Logging Operations section has been updated as follows:

history=num specifies the number of commands output by the ldm list-history command. Setting the value to 0 disables the saving of command history.

The description of the –a option in the View Logging Capabilities section has been updated as follows:

–a shows the logging capability values for all logging types and the number of commands output by the ldm list-history command.