Documentation, Support, and Training
Understanding Switch Specifications
Understanding InfiniBand Cabling
Understanding the Installation
Install the Switch in the Rack
Verifying the InfiniBand Fabric
Administrative Command Overview
Monitoring the InfiniBand Fabric
Controlling the InfiniBand Fabric
Perform Comprehensive Diagnostics for the Entire Fabric
Perform Comprehensive Diagnostics for a Route
Determine Changes to the InfiniBand Fabric Topology
Find 1x or SDR or DDR Links in the Fabric
Controlling the Subnet Manager
Set the Subnet Manager Priority
Start the Subnet Manager With the opensmd Daemon
Stop the Subnet Manager With the opensmd Daemon
Understanding Service Procedures
You can use the ibdiagnet command to determine which links are experiencing symbol errors and recovery errors by injecting packets.
On the management controller, type.
# ibdiagnet -c 100 -P all=1
In this instance of the ibdiagnet command, 100 test packets are injected into each link and the -P all=1 option returns all counters that increment during the test.
In the output of the ibdiagnet command, search for the symbol_error_counter string.
That line contains the symbol error count in hexadecimal. The preceding lines identify the node and port with the errors. Symbol errors are minor errors, and if there are relatively few during the diagnostic, they can be monitored.
Note - According to the InfiniBand specification 10E-12 BER, the maximum allowable symbol error rate is 120 errors per hour.
Also in the output of the ibdiagnet command, search for the link_error_recovery_counter string.
That line contains the recovery error count in hexadecimal. The preceding lines identify the node and port with the errors. Recovery errors are major errors and the respective links must be investigated for the cause of the rapid symbol error propagation.
Note - Additionally, the ibdiagnet.log file contains the log of the testing.
Switch Reference, ibdiagnet command