Go to main content

SPARC T8 Series Servers Product Notes

Exit Print View

Updated: January 2022
 
 

XGBE not training at MAX advertised speed/width (26526760)

In rare circumstances, when power is first applied to the system the PCI-Express (PCIE) link leading to the onboard Ethernet device may train at less that its optimal speed. This will not result in any loss of connectivity or service, but may result in a loss of network bandwidth or throughput. This problem only occurs when AC power is applied to the chassis. If the links show the correct speed, they will remain at the correct speed as long as AC power is applied.

Identifying the Error Condition

To troubleshoot this problem, you must determine whether your onboard network has trained to less than its full capability, and must then retrain any links that are trained to less than their full potential. Use either of the following methods to identify whether this error condition exists on your system; both methods can be run from within Solaris in the control domain.

Option 1: FMA Fault Log

If, and only if, you have run the Power On Self Test (POST) during boot, there will be faults registered in the FMA log for degraded links.

  1. Display a list of diagnosed faults.

    # fmadm faulty
    --------------------------------------------------------------------------------------
    TIME            EVENT-ID                                 MSG-ID               SEVERITY
    --------------------------------------------------------------------------------------
    Sep 14 06:21:49 33055e24-2f39-679e-9482-ec1c5f83b69b SPSUN4V-8001-0J Major Problem Status:open
    Diag Engine : fdd / 1.0
    System Manufacturer : Oracle Corporation
    Name : SPARC T8-1
    Part_Number : 32884356+1+1
    Serial_Number : AK00271486
    Host_ID : 86bbdd30
    ----------------------------------------
    Suspect 1 of 1 : Problem class : fault.io.pciex.bus-linkerr-deg
    Certainty : 100%
    Affects : location:////SYS/MB/NET2
    Status : faulted but still in service
    FRU Status : faulty
    Location : "/SYS/MB"
    ...
    Resource Location : "/SYS/MB/NET2" 
  2. Look for any faults with the "Problem class" set to "fault.io.pciex.bus-linkerr-deg" in any of these locations. The previous example shows this fault in /SYS/MB/NET2.

    • /SYS/MB/NET0

    • /SYS/MB/NET1

    • /SYS/MB/NET2

    • /SYS/MB/NET3

    • /SYS/MB/IOH/IOS2/RP0/PCIE_LINK

    If you see fault.io.pciex.bus-linkerr-deg in any of these locations, the onboard network's PCI-E link did not train to its full potential.

  3. Record the value listed under the EVENT-ID for each fault.io.pciex.bus-linkerr-deg fault. You will use these EVENT-IDs later to clear the faults.

    In the previous example, this value is 33055e24-2f39-679e-9482-ec1c5f83b69b.

Option 2: prtdiag(1M) command

You can use the prtdiag(1M) command from Solaris when logged into the primary domain whether or not POST had been enabled.

  1. In the Solaris control domain, display the PCI-E width and speed for the onboard Ethernet device. In this case, the network ports for the onboard Ethernet device are named /SYS/MB/XGBE, /SYS/MB/NET1, /SYS/MB/NET2, and /SYS/MB/NET3.

  2. # prtdiag
    System Configuration: Oracle Corporation sun4v SPARC T8-1 Memory size: 243200 Slot
    …                                          Cur Speed/Width...
    /SYS/MB/XGBE PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8 /pci@300/pci@1/network@0
    /SYS/MB/NET1 PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8 /pci@300/pci@1/network@0,1
    /SYS/MB/NET2 PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8 /pci@300/pci@1/network@0,2
    /SYS/MB/NET3 PCIE network-pciex8086,1589 8.0GT/x8 8.0GT/x8 

Retrain Under-Optimized Links

If either of the previous methods indicate that the is trained to a less-than-optimal setting, retrain them as follows.

  1. Halt all guests on the chassis.

  2. Power off the host.

  3. Remove the AC power from the chassis for a few seconds.

  4. Repair the faults, using the fmadm repair uuid-of-fault command and the EVENT-ID you recorded for the uuid.

    As shown in the previous example where the EVENT-ID for /SYS/MB/NET2 was 33055e24-2f39-679e-9482-ec1c5f83b69b. In this case, you would clear the fault as follows:

    # fmadm repair 33055e24-2f39-679e-9482-ec1c5f83b69b