Monitoring and Troubleshooting IB Devices

In the Oracle Solaris 11 release, new commands and utilities allow you to manage the IB fabric more effectively. These commands are included in the system/io/infiniband/open-fabrics package and the man pages are installed automatically when the open-fabrics package is installed. For example:

% man rping
Reformatting page.  Please Wait... done

librdmacm                                                RPING(1)

     rping - RDMA CM connection and RDMA ping-pong test.

     rping -s [-v] [-V] [-d] [-P] [-a address] [-p port]
               [-C message_count] [-S message_size]
     rping -c [-v] [-V] [-d] -a address [-p port]
               [-C message_count] [-S message_size]

The following new commands and utilities provide the ability to list and query IB devices, diagnose and trouble shoot IB fabric issues, and measure IB performance.

Table 9-1 General IB Monitoring Commands

Monitors InfiniBand asynchronous events
ibv_devices or ibv_devinfo
Lists InfiniBand devices or device information
ibv_rc_pingpong, ibv_srq_pingpong, or ibv_ud_pingpong
Tests node to node connectivity by using RC connection, SRQs, or UD connection
Tests RDMA CM multicast setup and simple data transfer
Tests RDMA CM connection and attempts RDMA ping-pong
Tests RDMA CM connection and attempts simple ping-pong
Tests RDMA CM datagram setup and attempts simple ping-pong

Table 9-2 General IB Performance Testing Commands

rdma_bw or rdma_lat
Tests RDMA write transactions for streaming bandwidth or latency
ib_read_bw or ib_read_lat
Tests RDMA read transactions for bandwidth or latency
ib_send_bw or ib_send_lat
Tests RDMA send transactions for bandwidth or latency
ib_write_bw or ib_write_bw_postlist
Tests RDMA write transactions for bandwidth that displays one I/O request at a time or post list bandwidth that displays a list of I/O requests
Tests RDMA write transactions for latency
Tests accuracy of system clock
Measures socket and RDMA performance

Table 9-3 RDS Monitoring and Testing Tools

Displays RDS kernel module information
Determines if remote node over RDS is reachable
Sends message between processes over RDS sockets

Table 9-4 Fabric Diagnostic Tools

Performs diagnostic check of the entire fabric
Queries InfiniBand address or addresses
Discovers remote InfiniBand topology
Validates connectivity between IB nodes
Queries physical port state and link speed of an IB port
Displays InfiniBand switch forwarding tables
ibstat or ibsysstat
Query status of InfiniBand device or devices or the status of a system on an IB address
Traces an IB path
perfquery or saquery
Queries IB port counters or sIB subnet administration attributes
Queries IB SMInfo attribute
smpquery or smpdump
Queries or dumps IB subnet management attributes
ibcheckerrors or ibcheckerrs
Validates IB port (or node) or IB subnet and reports errors
ibchecknet, ibchecknode, or ibcheckport
Validates IB subnet, node, or port and reports errors
ibcheckportstate, ibcheckportwidth, ibcheckstate, or ibcheckwidth
Validates IB port that are link up but not active, ports for 1x (2.0 Gbps) link width, ports in IB subnet that are link up but not active, or lx links in IB subnet
ibclearcounters or ibclearerrors
Clears port counters or error counters in IB subnet
ibdatacounters or ibdatacounts
Queries for data counters in IB subnet or IB port data counters
Annotates and compares IB topology
Displays IB host nodes in topology or iblinkinfo
Displays link information for all links in the fabric
Displays IB nodes in topology
Displays either the CA specified or the list of CAs from the ibnetdiscover output
Displays either only the router specified or a list of routers from the ibnetdiscover output
Displays either the switch specified or a list of switches from the ibnetdiscover output
Queries and report non-zero IB port counters
Displays IB router nodes in topology
Queries basic status of IB devices
Displays IB switch nodes in topology
Polls the counters on the specified switch or port and report rate of change information
Sets or displays node description string for IB Host Controller Adapters (HCA)s
Dumps PSL file based on opensm output file that is used for credit loop checking
Dumps SLVL file based on opensm output file that is used for credit loop checking
An extended TCL shell for IB management inband services