Exit Print View

Sun Datacenter InfiniBand Switch 72 User’s Guide

Get PDF Book Print View
 

Document Information

Using This Documentation

Related Documentation

Documentation, Support, and Training

Documentation Feedback

Installing the Switch

Understanding Switch Specifications

Routing Service Cables

Understanding InfiniBand Cabling

Understanding the Installation

Shipping Carton Contents

Install the Switch in the Rack

Powering On the Switch

Connecting InfiniBand Cables

Verifying the InfiniBand Fabric

Administering the Switch

Troubleshooting the Switch

Administrative Command Overview

Monitoring the Hardware

Monitoring the InfiniBand Fabric

Controlling the Hardware

Controlling the InfiniBand Fabric

Perform Comprehensive Diagnostics for the Entire Fabric

Perform Comprehensive Diagnostics for a Route

Determine Changes to the InfiniBand Fabric Topology

Find 1x or SDR or DDR Links in the Fabric

Determine Which Links Are Experiencing Significant Errors

Clear Error Counters

Clear Data Counters

Check All Ports

Reset a Port

Set Port Speed

Disable a Port

Enable a Port

Controlling the Subnet Manager

Create the guid.txt File

Create the opensm.conf File

Enable the Subnet Manager

Disable the Subnet Manager

Set the Subnet Manager Priority

Start the Subnet Manager With the opensmd Daemon

Stop the Subnet Manager With the opensmd Daemon

Servicing the Switch

Understanding Service Procedures

Servicing the Power Supplies

Servicing the Fans

Servicing the InfiniBand Cables

Servicing the Battery

Upgrading the Firmware

Index

Determine Which Links Are Experiencing Significant Errors

You can use the ibdiagnet command to determine which links are experiencing symbol errors and recovery errors by injecting packets.

  1. On the management controller, type.

    # ibdiagnet -c 100 -P all=1

    In this instance of the ibdiagnet command, 100 test packets are injected into each link and the -P all=1 option returns all counters that increment during the test.

  2. In the output of the ibdiagnet command, search for the symbol_error_counter string.

    That line contains the symbol error count in hexadecimal. The preceding lines identify the node and port with the errors. Symbol errors are minor errors, and if there are relatively few during the diagnostic, they can be monitored.


    Note - According to the InfiniBand specification 10E-12 BER, the maximum allowable symbol error rate is 120 errors per hour.


  3. Also in the output of the ibdiagnet command, search for the link_error_recovery_counter string.

    That line contains the recovery error count in hexadecimal. The preceding lines identify the node and port with the errors. Recovery errors are major errors and the respective links must be investigated for the cause of the rapid symbol error propagation.


    Note - Additionally, the ibdiagnet.log file contains the log of the testing.


Related Information