Documentation, Support, and Training
Understanding Administrative Commands
Understanding the Component Addressing Scheme
Line Card CXP Connector Addressing
Switch-Specific Command Overview
InfiniBand Software Command Overview
Display the General Health of the Switch
Display Power Supplies Present
Check the Status of a Power Supply
Display the Firmware Version of a Power Supply
Check Internal Power and Temperature of a CMC
Check the Status LEDs of a CMC
Display the Firmware Version of a CMC
Display the Firmware Version of the Switch Chassis Manager
Check Fabric Card Power Faults
Check the Internal Power and Temperature of a Fabric Card
Check Fabric Card Internal Temperatures
Check Fabric Card Internal Voltages
Display the Base GUIDs of a Fabric Card
Check Fabric Card Link Status (Simple)
Check Fabric Card Link Status (Detailed)
Check Fabric Card Switch Chip Health
Check the IPMB State of a Fabric Card
Check the Status LEDs of a Fabric Card
Display the Firmware Versions of a Fabric Card
Check the Internal Power and Temperature of a Line Card
Check Line Card Internal Temperatures
Check Line Card Internal Voltages
Display the Base GUIDs of a Line Card
Check Line Card Link Status (Simple)
Check Line Card Link Status (Detailed)
Check Line Card Switch Chip Health
Check the IPMB State of a Line Card
Check the Status LEDs of a Line Card
Display the Firmware Versions of a Line Card
Checking Other Switch Characteristics
Display the Network Management Configuration
Locate a Switch Chip or Connector From the GUID
Monitoring the InfiniBand Fabric
Display Information About the Local HCA
Identify All HCAs in the Fabric
Identify All Switches in the Fabric
Display the InfiniBand Fabric Topology
Display a Route Through the Fabric
Display the Link Status of a Node
Display Data Counters for a Node
Display Low-Level Detailed Information About a Node
Display Low-Level Detailed Information About a Port
Determine the GUID and LID for a Node Within the Switch
Display OFED Software Version Information
Reconfigure the Network Management Parameters
Reconfigure the CMCs for Identical Addresses
Enable and Activate a Power Supply
Deactivate and Disable a Power Supply
Restart a Fabric Card or Filler
Enable a Fabric Card or Filler
Disable a Fabric Card or Filler
Enable Standby Power for a Fabric Card or Filler
Disable Standby Power for a Fabric Card or Filler
Enable a Fabric Card Slot for Hot-Insertion
Activate a Fabric Card or Filler
Deactivate a Fabric Card or Filler
Turn On a Fabric Card or Filler Locator LED
Turn Off a Fabric Card or Filler Locator LED
Enable Downed Fabric Card Links
Enable a Fabric Card Switch Chip Port
Disable a Fabric Card Switch Chip Port
Reset a Fabric Card Switch Chip
Enable Standby Power for a Line Card
Disable Standby Power for a Line Card
Enable a Line Card Slot for Hot-Insertion
Turn On a Line Card Locator LED
Turn Off a Line Card Locator LED
Enable a Line Card Switch Chip Port
Disable a Line Card Switch Chip Port
Controlling the InfiniBand Fabric
Perform Comprehensive Diagnostics for the Entire Fabric
Perform Comprehensive Diagnostics for a Route
Determine Changes to the InfiniBand Fabric Topology
Find 1x or SDR or DDR Links in the Fabric
Controlling the Subnet Manager
Start the Subnet Manager With Min Hop Routing
Start the Subnet Manager With Fat Tree Routing
Start the Subnet Manager With the opensmd Daemon
Stop the Subnet Manager With the opensmd Daemon
Understanding ILOM on the Switch
You can use the ibdiagnet command to determine which links are experiencing symbol errors and recovery errors by injecting packets.
On the Linux InfiniBand host, type.
# ibdiagnet -c 100 -P all=1
In this instance of the ibdiagnet command, 100 test packets are injected into each link and the -P all=1 option returns all counters that increment during the test.
In the output of the ibdiagnet command, search for the symbol_error_counter string.
That line contains the symbol error count in hexadecimal. The preceding lines identify the node and port with the errors. Symbol errors are minor errors, and if there are relatively few during the diagnostic, they can be monitored.
Note - According to the InfiniBand specification 10E-12 BER, the maximum allowable symbol error rate is 120 errors per hour.
Also in the output of the ibdiagnet command, search for the link_error_recovery_counter string.
That line contains the recovery error count in hexadecimal. The preceding lines identify the node and port with the errors. Recovery errors are major errors and the respective links must be investigated for the cause of the rapid symbol error propagation.
Note - Additionally, the ibdiagnet.log file contains the log of the testing.
Switch Reference, ibdiagnet command