In rare circumstances, when power is first applied to the system the PCI-Express (PCIE) link leading to the onboard Ethernet device may train at less that its optimal speed. This will not result in any loss of connectivity or service, but may result in a loss of network bandwidth or throughput. This problem only occurs when AC power is applied to the chassis. If the links show the correct speed, they will remain at the correct speed as long as AC power is applied.
Identifying the Error Condition
To troubleshoot this problem, you must determine whether your onboard network has trained to less than its full capability, and must then retrain any links that are trained to less than their full potential. Use either of the following methods to identify whether this error condition exists on your system; both methods can be run from within Solaris in the control domain.
Option 1: FMA Fault Log
If, and only if, you have run the Power On Self Test (POST) during boot, there will be faults registered in the FMA log for degraded links.
Display a list of diagnosed faults.
# fmadm faulty -------------------------------------------------------------------------------------- TIME EVENT-ID MSG-ID SEVERITY -------------------------------------------------------------------------------------- Sep 14 06:21:49 33055e24-2f39-679e-9482-ec1c5f83b69b SPSUN4V-8001-0J Major Problem Status:open Diag Engine : fdd / 1.0 System Manufacturer : Oracle Corporation Name : SPARC T8-1 Part_Number : 32884356+1+1 Serial_Number : AK00271486 Host_ID : 86bbdd30 ---------------------------------------- Suspect 1 of 1 : Problem class : fault.io.pciex.bus-linkerr-deg Certainty : 100% Affects : location:////SYS/MB/NET2 Status : faulted but still in service FRU Status : faulty Location : "/SYS/MB" ... Resource Location : "/SYS/MB/NET2"
Look for any faults with the "Problem class" set to "fault.io.pciex.bus-linkerr-deg" in any of these locations. The previous example shows this fault in /SYS/MB/NET2.
/SYS/MB/NET0
/SYS/MB/NET1
/SYS/MB/NET2
/SYS/MB/NET3
/SYS/MB/IOH/IOS2/RP0/PCIE_LINK
If you see fault.io.pciex.bus-linkerr-deg in any of these locations, the onboard network's PCI-E link did not train to its full potential.
Record the value listed under the EVENT-ID for each fault.io.pciex.bus-linkerr-deg fault. You will use these EVENT-IDs later to clear the faults.
In the previous example, this value is 33055e24-2f39-679e-9482-ec1c5f83b69b.
Option 2: prtdiag(1M) command
You can use the prtdiag(1M) command from Solaris when logged into the primary domain whether or not POST had been enabled.
In the Solaris control domain, display the PCI-E width and speed for the onboard Ethernet device. In this case, the network ports for the onboard Ethernet device are named /SYS/MB/XGBE, /SYS/MB/NET1, /SYS/MB/NET2, and /SYS/MB/NET3.
# prtdiag System Configuration: Oracle Corporation sun4v SPARC T8-1 Memory size: 243200 Slot … Cur Speed/Width... /SYS/MB/XGBE PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8 /pci@300/pci@1/network@0 /SYS/MB/NET1 PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8 /pci@300/pci@1/network@0,1 /SYS/MB/NET2 PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8 /pci@300/pci@1/network@0,2 /SYS/MB/NET3 PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8
Retrain Under-Optimized Links
If either of the previous methods indicate that the is trained to a less-than-optimal setting, retrain them as follows.
Halt all guests on the chassis.
Power off the host.
Remove the AC power from the chassis for a few seconds.
Repair the faults, using the fmadm repair uuid-of-fault command and the EVENT-ID you recorded for the uuid.
As shown in the previous example where the EVENT-ID for /SYS/MB/NET2 was 33055e24-2f39-679e-9482-ec1c5f83b69b. In this case, you would clear the fault as follows:
# fmadm repair 33055e24-2f39-679e-9482-ec1c5f83b69b