Bug ID 20731016: When you use the ldm remove-io command to remove the last SR-IOV virtual function from an I/O domain, the command might report a timeout and fail to remove the virtual function.
Workaround: If this problem occurs, perform the following steps:
Verify that the system/management/hwmgmtd package is installed on the system.
# pkg info system/management/hwmgmtd
Disable the svc:/system/sp/management service.
# svcadm disable -st svc:/system/sp/management
Retry the ldm remove-io command.
When the SR-IOV virtual function is successfully removed, enable the svc:/system/sp/management service.
# svcadm enable svc:/system/sp/management
Bug ID 18323562: An Oracle Solaris 10 root domain might panic when rebooting. The Oracle Solaris 10 root domain has at least two PCIe buses and the virtual functions from the physical functions in different buses are assigned to guest domains. That is, if events from different buses on guest domains are received in parallel, the root domain might panic. This panic occurs rarely.
panic[cpu3]/thread=2a100365c80: BAD TRAP: type=31 rp=2a1003652b0 addr=2000 mmu_fsr=0 occurred in module "pcie" due to an illegal access to a user address
Workaround: None.
Bug ID 18323370: An Oracle Solaris 10 root domain might panic if you destroy virtual functions and then run the prtdiag command.
The prtdiag command might cause a panic when attempting to access virtual function device nodes that were just destroyed:
panic[cpu31]/thread=2a10140bc80: Fatal error has occured in: PCIe fabric.(0x1)(0x43)
And the prtdiag command prints messages such as the following:
DEV_GET failed -1 Invalid argument 4.0.2 offset 0xff /SYS/PCI-EM4 PCIE fibre-channel-pciex10df,e200 -- /pci@600/pci@1/pci@0/pci@4/fibre-channel@0,2
These messages occur because the prtdiag command attempts to access virtual function device nodes that have been destroyed. The nodes still appear in the picl tree, but not in the actual device tree.
Workaround: To avoid the panic, add the following line to the /etc/system file on the Oracle Solaris 10 root domain:
set px:pxtool_cfg_delay_usec=25000
Also refresh the picl daemon to avoid the Invalid argument messages:
# svcadm refresh picl
Bug IDs 18168525 and 18156291: You must connect the Fibre Channel PCIe card to a Fibre Channel switch that supports NPIV and that is compatible with the PCIe card. If you do not use this configuration, using the format command, or creating or destroying a virtual function might cause the physical function to be faulted by FMA and disabled. If this fault occurs, the message is similar to the following:
SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical EVENT-TIME: event-time PLATFORM: platform-type SOURCE: eft, REV: 1.16 EVENT-ID: event-ID DESC: A problem was detected for a PCIEX device. AUTO_RESPONSE: One or more device instances may be disabled IMPACT: Loss of services provided by the device instances associated with this fault REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis.
Workaround: If the card has been faulted by FMA, first check its connections and ensure that the card is not directly connected to storage. Then, perform the step that matches your configuration:
Card is directly connected to storage – Correctly configure the Fibre Channel PCIe card by connecting it to a Fibre Channel switch that supports NPIV and is compatible with the PCIe card. Then, run the fmadm repair command to override the FMA diagnosis.
Card is not directly connected to storage – Replace the card.
Bug ID 18030411: The primary domain might hang if you stop and start I/O domains frequently and in rapid succession. As a result of this behavior, the InfiniBand HCA stops responding and causes the primary domain to hang.
If you experience this problem, you might see messages on the console or in the messages file that are similar to the following:
VF3: PF has failed Mcxnex: HW2SW_MPT command @ failed: 0000ffff Hermon: MAD_IFC (port 01) command failed: 0000ffff WARNING: mcxnex0: Device Error: HCR Timeout waiting for command go bit
Recovery: To avoid this problem, do not perform unnecessary stop and start operations of the I/O domains. Instead, perform an orderly shutdown of the I/O domain.
Workaround: If the primary domain hangs for this reason, reset the system in one of the following ways:
Perform a reboot of the domain
primary# ldm stop -r domain-name
Perform a reset in the SP
-> reset /SYS
Bug ID 17623156: When you create Fibre Channel virtual functions, you might see the following warnings:
WARNING: kmem_cache_destroy: 'px0_emlxs3_3_cache2' (3000383e030) not empty WARNING: vmem_destroy('px0_emlxs3_3_vmem_top'): leaked 262144 identifiers
These messages do not affect the normal operation of the system and you can ignore them.
Workaround: None.
Bug ID 16397888: After you add or destroy virtual functions, it might take up to five minutes before you can attempt to add or destroy more virtual functions from the Fibre Channel physical function.
If you attempt to perform these operations before five minutes elapse, the operations fail with a message similar to the following:
The attempt to offline the pf /SYS/PCI-EM4/IOVFC.PF0 in domain primary failed. Error message from svc:/ldoms/agents in domain primary: CMD_OFFLINE Failed. ERROR: devices or resources are busy.
Workaround: Wait five minutes before you attempt another IOV operation on the Fibre Channel physical function.
To perform all necessary configuration options in a single command, use the ldm create-vf -n max or ldm destroy-vf -n max command.
On a Fujitsu M10 server you can assign PCIe endpoint devices and SR-IOV virtual functions from a particular PCIe bus to a maximum of 24 domains. The maximum is 15 domains for supported SPARC T-Series and SPARC M-Series platforms.
Caution - Review this section before you deploy InfiniBand SR-IOV in your Oracle VM Server for SPARC 3.1 environment. |
This section describes the known issues for the InfiniBand SR-IOV feature in the initial release of Oracle VM Server for SPARC 3.1.
The reboot of an Oracle Solaris 11.1.10.5.0 I/O domain that has InfiniBand virtual functions assigned to it occasionally panics the corresponding root domain. See bug ID 17336355.
An Oracle Solaris 10 1/13 I/O domain that has InfiniBand virtual functions assigned to it sometimes panics during reboot. The I/O domain runs the Oracle Solaris 10 1/13 OS plus the required patches. See bug IDs 17382933, 17361763, 17329218, and 17336035.
Bug ID 16979993: An attempt to use dynamic SR-IOV operations on an InfiniBand device results in confusing and inappropriate error messages.
Dynamic SR-IOV is not supported for InfiniBand devices.
Workaround: Manage InfiniBand virtual functions by performing the one of the following procedures: