Oracle® VM Server for SPARC 3.2 Release Notes

Exit Print View

Updated: May 2015
 
 

SR-IOV Issues

ldm remove-io Command Reports a Timeout and Fails to Remove the Last SR-IOV Virtual Function From an I/O Domain

Bug ID 20731016: When you use the ldm remove-io command to remove the last SR-IOV virtual function from an I/O domain, the command might report a timeout and fail to remove the virtual function.

Workaround: If this problem occurs, perform the following steps:

  1. Verify that the system/management/hwmgmtd package is installed on the system.

    # pkg info system/management/hwmgmtd
  2. Disable the svc:/system/sp/management service.

    # svcadm disable -st svc:/system/sp/management
  3. Retry the ldm remove-io command.

  4. When the SR-IOV virtual function is successfully removed, enable the svc:/system/sp/management service.

    # svcadm enable svc:/system/sp/management

Bad Trap Panic Occurs Rarely When Rebooting an Oracle Solaris 10 Root Domain That Has SR-IOV Virtual Functions Assigned to Guest Domains

Bug ID 18323562: An Oracle Solaris 10 root domain might panic when rebooting. The Oracle Solaris 10 root domain has at least two PCIe buses and the virtual functions from the physical functions in different buses are assigned to guest domains. That is, if events from different buses on guest domains are received in parallel, the root domain might panic. This panic occurs rarely.

panic[cpu3]/thread=2a100365c80: BAD TRAP: type=31 rp=2a1003652b0 addr=2000
mmu_fsr=0 occurred in module "pcie" due to an illegal access to a user
address

Workaround: None.

prtdiag Might Cause an Oracle Solaris 10 Root Domain to Panic After Destroying SR-IOV Virtual Functions

Bug ID 18323370: An Oracle Solaris 10 root domain might panic if you destroy virtual functions and then run the prtdiag command.

The prtdiag command might cause a panic when attempting to access virtual function device nodes that were just destroyed:

panic[cpu31]/thread=2a10140bc80: Fatal error has occured in: PCIe
fabric.(0x1)(0x43)

And the prtdiag command prints messages such as the following:

DEV_GET failed -1 Invalid argument  4.0.2 offset 0xff
/SYS/PCI-EM4      PCIE  fibre-channel-pciex10df,e200                   --
                     /pci@600/pci@1/pci@0/pci@4/fibre-channel@0,2

These messages occur because the prtdiag command attempts to access virtual function device nodes that have been destroyed. The nodes still appear in the picl tree, but not in the actual device tree.

Workaround: To avoid the panic, add the following line to the /etc/system file on the Oracle Solaris 10 root domain:

set px:pxtool_cfg_delay_usec=25000

Also refresh the picl daemon to avoid the Invalid argument messages:

# svcadm refresh picl

Fibre Channel Physical Function Is Faulted by FMA And Disabled

Bug IDs 18168525 and 18156291: You must connect the Fibre Channel PCIe card to a Fibre Channel switch that supports NPIV and that is compatible with the PCIe card. If you do not use this configuration, using the format command, or creating or destroying a virtual function might cause the physical function to be faulted by FMA and disabled. If this fault occurs, the message is similar to the following:

SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical
EVENT-TIME: event-time
PLATFORM: platform-type
SOURCE: eft, REV: 1.16
EVENT-ID: event-ID
DESC: A problem was detected for a PCIEX device.
AUTO_RESPONSE: One or more device instances may be disabled
IMPACT: Loss of services provided by the device instances associated with
this fault
REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event.
Please refer to the associated reference document at
http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures
and policies regarding this diagnosis.

Workaround: If the card has been faulted by FMA, first check its connections and ensure that the card is not directly connected to storage. Then, perform the step that matches your configuration:

  • Card is directly connected to storage – Correctly configure the Fibre Channel PCIe card by connecting it to a Fibre Channel switch that supports NPIV and is compatible with the PCIe card. Then, run the fmadm repair command to override the FMA diagnosis.

  • Card is not directly connected to storage – Replace the card.

Control Domain Hangs When Stopping or Starting I/O Domains

 

Bug ID 18030411: The primary domain might hang if you stop and start I/O domains frequently and in rapid succession. As a result of this behavior, the InfiniBand HCA stops responding and causes the primary domain to hang.

If you experience this problem, you might see messages on the console or in the messages file that are similar to the following:

VF3: PF has failed

Mcxnex: HW2SW_MPT command @ failed: 0000ffff

Hermon: MAD_IFC (port 01) command failed: 0000ffff

WARNING: mcxnex0: Device Error: HCR Timeout waiting for command go bit

Recovery: To avoid this problem, do not perform unnecessary stop and start operations of the I/O domains. Instead, perform an orderly shutdown of the I/O domain.

Workaround: If the primary domain hangs for this reason, reset the system in one of the following ways:

  • Perform a reboot of the domain

    primary# ldm stop -r domain-name
  • Perform a reset in the SP

    -> reset /SYS

Warnings Appear on Console When Creating Fibre Channel Virtual Functions

Bug ID 17623156: When you create Fibre Channel virtual functions, you might see the following warnings:

WARNING: kmem_cache_destroy: 'px0_emlxs3_3_cache2'
  (3000383e030) not empty
WARNING: vmem_destroy('px0_emlxs3_3_vmem_top'):
  leaked 262144 identifiers

These messages do not affect the normal operation of the system and you can ignore them.

Workaround: None.

Fibre Channel Physical Function Configuration Changes Require Several Minutes to Complete

Bug ID 16397888: After you add or destroy virtual functions, it might take up to five minutes before you can attempt to add or destroy more virtual functions from the Fibre Channel physical function.

If you attempt to perform these operations before five minutes elapse, the operations fail with a message similar to the following:

The attempt to offline the pf /SYS/PCI-EM4/IOVFC.PF0 in domain
primary failed.
Error message from svc:/ldoms/agents in domain primary:
CMD_OFFLINE Failed. ERROR: devices or resources are busy.

Workaround: Wait five minutes before you attempt another IOV operation on the Fibre Channel physical function.

To perform all necessary configuration options in a single command, use the ldm create-vf -n max or ldm destroy-vf -n max command.

Fujitsu M10 Server Has Different SR-IOV Feature Limitations

On a Fujitsu M10 server you can assign PCIe endpoint devices and SR-IOV virtual functions from a particular PCIe bus to a maximum of 24 domains. The maximum is 15 domains for supported SPARC T-Series and SPARC M-Series platforms.

InfiniBand SR-IOV Issues


Caution

Caution  - Review this section before you deploy InfiniBand SR-IOV in your Oracle VM Server for SPARC 3.1 environment.


    This section describes the known issues for the InfiniBand SR-IOV feature in the initial release of Oracle VM Server for SPARC 3.1.

  • The reboot of an Oracle Solaris 11.1.10.5.0 I/O domain that has InfiniBand virtual functions assigned to it occasionally panics the corresponding root domain. See bug ID 17336355.

  • An Oracle Solaris 10 1/13 I/O domain that has InfiniBand virtual functions assigned to it sometimes panics during reboot. The I/O domain runs the Oracle Solaris 10 1/13 OS plus the required patches. See bug IDs 17382933, 17361763, 17329218, and 17336035.

Misleading Messages Shown For InfiniBand SR-IOV Operations

Bug ID 16979993: An attempt to use dynamic SR-IOV operations on an InfiniBand device results in confusing and inappropriate error messages.

Dynamic SR-IOV is not supported for InfiniBand devices.

Workaround: Manage InfiniBand virtual functions by performing the one of the following procedures: