Monitoring and Troubleshooting IB Devices
In the Oracle Solaris 11 release, new commands and utilities enable you to manage the IB fabric
more effectively. These commands are included in the
system/io/infiniband/open-fabrics package and the man pages are
installed automatically when the open-fabrics package is installed.
For example:
% man rping
Reformatting page. Please Wait... done
librdmacm RPING(1)
NAME
rping - RDMA CM connection and RDMA ping-pong test.
SYNOPSIS
rping -s [-v] [-V] [-d] [-P] [-a address] [-p port]
[-C message_count] [-S message_size]
rping -c [-v] [-V] [-d] -a address [-p port]
[-C message_count] [-S message_size]
.
.
.
The following new commands and utilities provide the ability to list and query IB
devices, diagnose and trouble shoot IB fabric issues, and measure IB performance.
Table 3-1 General IB Monitoring Commands
|
|
ibv_asyncwatch
|
Monitors InfiniBand asynchronous events
|
ibv_devices or ibv_devinfo
|
Lists InfiniBand devices or device information
|
ibv_rc_pingpong,
ibv_srq_pingpong, or
ibv_ud_pingpong
|
Tests node to node connectivity by using RC connection, SRQs, or
UD connection
|
mckey
|
Tests RDMA CM multicast setup and simple data transfer
|
rping
|
Tests RDMA CM connection and attempts RDMA ping-pong
|
ucmatose
|
Tests RDMA CM connection and attempts simple ping-pong
|
udaddy
|
Tests RDMA CM datagram setup and attempts simple ping-pong
|
|
Table 3-2 General IB Performance Testing Commands
|
|
rdma_bw or rdma_lat
|
Tests RDMA write transactions for streaming bandwidth or
latency.
|
ib_read_bw or
ib_read_lat
|
Tests RDMA read transactions for bandwidth or latency.
|
ib_send_bw or
ib_send_lat
|
Tests RDMA send transactions for bandwidth or latency.
|
ib_write_bw or
ib_write_bw_postlist
|
Tests RDMA write transactions for bandwidth that displays one I/O
request at a time or post list bandwidth that displays a list of I/O
requests.
|
ib_write_lat
|
Tests RDMA write transactions for latency.
|
ib_clock_test
|
Tests accuracy of system clock
|
qperf
|
Measures socket and RDMA performance
|
|
Table 3-3 RDS Monitoring and Testing Tools
|
|
rds-info
|
Displays RDS kernel module information
|
rds-ping
|
Determines if remote node over RDS is reachable
|
rds-stress
|
Sends message between processes over RDS sockets
|
|
Note that RDSv3 does not support unconfiguring HCAs. If the system has an RDSv3 driver
installed at the time of DR, unconfiguring the HCA fails with an error message such as
the following example for ib::rdsv3,0.
# cfgadm -c unconfigure ib::rdsv3,0
This operation will suspend activity on the IB device
Continue (yes/no)? yes
cfgadm: Hardware specific failure: unconfigure operation
failed ap_id: /devices/ib:fabric::rdsv3,0
# cfgadm -c unconfigure PCI-EM0
cfgadm: Component system is busy, try again: unconfigure failed
Workaround:
Remove the RDSv3 driver and reboot the system before performing the HCA DR
operation.
# rem_drv rdsv3
Device busy
Cannot unload module: rdsv3
Will be unloaded upon reboot.
# init 6
Table 3-4 Fabric Diagnostic Tools
|
|
ibdiagnet
|
Performs diagnostic check of the entire fabric
|
ibaddr
|
Queries InfiniBand address or addresses
|
ibnetdiscover
|
Discovers remote InfiniBand topology
|
ibping
|
Validates connectivity between IB nodes
|
ibportstate
|
Queries physical port state and link speed of an IB port
|
ibroute
|
Displays InfiniBand switch forwarding tables
|
ibstat or ibsysstat
|
Query status of InfiniBand device or devices or the status of a
system on an IB address
|
ibtracert
|
Traces an IB path
|
perfquery or saquery
|
Queries IB port counters or sIB subnet administration
attributes
|
sminfo
|
Queries IB SMInfo attribute
|
smpquery or smpdump
|
Queries or dumps IB subnet management attributes
|
ibcheckerrors or
ibcheckerrs
|
Validates IB port (or node) or IB subnet and reports errors
|
ibchecknet, ibchecknode, or
ibcheckport
|
Validates IB subnet, node, or port and reports errors
|
ibcheckportstate, ibcheckportwidth,
ibcheckstate, or
ibcheckwidth
|
Validates IB port that are link up but not active, ports for 1x
(2.0 Gbps) link width, ports in IB subnet that are link up but not
active, or lx links in IB subnet
|
ibclearcountersibclearerrors or
ibclearerrors
|
Clears port counters or error counters in IB subnet
|
ibdatacountersibdatacounts, or
ibdatacounts
|
Queries for data counters in IB subnet or IB port data
counters
|
ibdiscover.pl
|
Annotates and compares IB topology
|
ibhosts
|
Displays IB host nodes in topology
|
iblinkinfo.pl or
iblinkinfo
|
Displays link information for all links in the fabric
|
ibnodes
|
Displays IB nodes in topology
|
ibprintca.pl
|
Displays either the CA specified or the list of CAs from the
ibnetdiscover output
|
ibprintrt.pl
|
Displays either only the router specified or a list of routers
from the ibnetdiscover output
|
ibprintswitch.pl
|
Displays either the switch specified or a list of switches from
the ibnetdiscover output
|
ibqueryerrors.pl
|
Queries and report non-zero IB port counters
|
ibrouters
|
Displays IB router nodes in topology
|
ibstatus
|
Queries basic status of IB devices
|
ibswitches
|
Displays IB switch nodes in topology
|
ibswportwatch.pl
|
Polls the counters on the specified switch or port and report rate
of change information
|
set_nodedesc.sh
|
Sets or displays node description string for IB Host Controller
Adapters (HCA)s
|
dump2psl.pl
|
Dumps PSL file based on opensm output file that
is used for credit loop checking
|
dump2slvl.pl
|
Dumps SLVL file based on opensm output file
that is used for credit loop checking
|
ibis
|
An extended TCL shell for IB management inband services
|
|