InfiniBand Fabric Problems
The following table lists situations that might occur with the InfiniBand fabric and
corrective steps that can be taken to resolve the problem.
|
|
Performance of the
InfiniBand fabric seems diminished. |
- Determine if there are errors or problems with the InfiniBand fabric.
See:
Locate the affected nodes by the GUID provided in the output of the ibdiagnet command. See Locate a Switch Chip or Connector From the GUID.
If the problem is at a cable connection, swap the suspect cable with a known good cable or reconnect the cable to a known good remote port and repeat Step 1. See Servicing the InfiniBand Cables.
If the problem still remains at the cable connection, disable and re-enable the respective port and repeat Step 1. See Disable a Port and Enable a Port.
Temporary solution:
Permanent solution:
|
An InfiniBand Link LED is blinking. |
- Disconnect and properly reconnect both ends of the respective InfiniBand cable.
See Switch Service, servicing an InfiniBand cable.
If the LED is still blinking, determine the significance of the errors through use of the ibdiagnet command. See Determine Which Links Are Experiencing Significant Errors.
Determine which connectors map to the affected link by deconstructing the node’s GUID and port. See Locate a Switch Chip or Connector From the GUID.
If some of the links are running at 1x or SDR, use that situation elsewhere in this table to rectify the problem.
Disable and re-enable the respective ports. See Disable a Port and Enable a Port.
If the errors are still significant, swap the cable with a known good one or reconnect the cable to a known good remote port, and repeat from 2.
Depending upon what does or does not rectify the problem, replace that component. See Servicing the InfiniBand Cables. See remote port’s documentation for replacement procedures.
|
Some InfiniBand links
are running at 1x or SDR. |
|
There are
errors on some InfiniBand links. |
- Clear the error counters.
See Clear Error Counters.
Start a fabric stress test.
Identify the suspect links using the ibdiagnet command. See Determine Which Links Are Experiencing Significant Errors. Look for text like the following: -W- lid=0x0006 guid=0x0021283a8816c0a0 dev=48438 Port=34 Performance Monitor counter : Value link_recovery_error_counter : 0x1 symbol_error_counter : 0x25 (Increase by 3 during ibdiagnet)
For links that are experiencing recovery errors or substantial symbol errors, refer to other parts of this table to help identify the cause and rectify the problem.
|
Output of InfiniBand commands provides only GUID and port,
not switch chip or CXP connectors. |
|
|
Related Information